This is the home page for the Data Format Description Language (DFDL) standard of the Open Grid Forum.
Note: Several DFDL Documents are open for public comment!
Data Format Description Language (DFDL) is a language for describing text and binary data formats. A DFDL description allows any text or binary data to be read from its native format and to be presented as an instance of an information set. DFDL also allows data to be taken from an instance of an information set and written out to its native format. DFDL achieves this by leveraging W3C XML Schema Definition Language (XSDL) 1.0. It is therefore very easy to use DFDL to convert text and binary data to a corresponding XML document.
An XML schema is written for the logical model of the data. The schema is augmented with special DFDL annotations. These annotations are used to describe the native (non-XML) format of the data.
This is an established approach that is already being used today in commercial systems. DFDL evolves this approach into an open standard capable of describing almost any format of text or binary data.
It is an important observation that both XML format and standardized binary formats are prescriptive in that they specify or prescribe a representation of the data. To use them your applications must be written to conform to their encodings and mechanisms of expression.
DFDL suggests an entirely different scheme. The approach is descriptive in that one chooses an appropriate data representation for an application based on its needs and one then describes the format using DFDL so that multiple programs can directly interchange the described data. DFDL descriptions can be provided by the creator of the format, or developed as needed by third parties intending to use the format. That is, DFDL is not a format for data; it is a way of describing any data format. DFDL is intended for data commonly found in scientific and numeric computations, as well as record-oriented representations found in commercial data processing.
DFDL can be used to describe legacy data files, to simplify transfer of data across domains without requiring global standard formats, or to allow third-party tools to easily access multiple formats. DFDL can also be a powerful tool for supporting backward compatibility as formats evolve.
DFDL is designed to provide flexibility and also permit implementations that achieve very high levels of performance. DFDL descriptions are separable and native applications do not need to use DFDL libraries to parse their data formats. DFDL parsers can also be highly efficient. The DFDL language is designed to permit implementations that use lazy evaluation of formats and to support seekable, random access to data. The following goals can be achieved by DFDL implementations:
The logical model for this data can be described by the following fragment of an XML schema document that simply provides a description of the name and type of each element:
Now, suppose we have the same data but represented in a non-XML format. A binary representation of the data could be visualized like this (shown as hexadecimal):
To describe this using DFDL, we take our original XML schema document that described the data model and we annotate the element declarations as follows:
These simple DFDL annotations express that the data are represented in a binary format where integers are two's complement and floats are IEEE, and that the byte order will be big endian.
For flexibility and ease of implementation, the DFDL language is divided into core features and optional features. A DFDL processor can choose to be a minimal conforming processor (all core features), an extended conforming processor (all core and some optional features) or a fully conforming processor (all core and optional features). Additionally, a DFDL processor can choose to provide a parser, a serializer or both a parser and serializer.
Note: Several DFDL Documents are open for public comment!
Data Format Description Language (DFDL) is a language for describing text and binary data formats. A DFDL description allows any text or binary data to be read from its native format and to be presented as an instance of an information set. DFDL also allows data to be taken from an instance of an information set and written out to its native format. DFDL achieves this by leveraging W3C XML Schema Definition Language (XSDL) 1.0. It is therefore very easy to use DFDL to convert text and binary data to a corresponding XML document.
An XML schema is written for the logical model of the data. The schema is augmented with special DFDL annotations. These annotations are used to describe the native (non-XML) format of the data.
This is an established approach that is already being used today in commercial systems. DFDL evolves this approach into an open standard capable of describing almost any format of text or binary data.
Background
Data interchange is critically important for most computing. Grid computing and all forms of distributed computing require distributed software and hardware resources to work together. Inevitably, these resources read and write data in a variety of formats. General tools for data interchange are essential to solving such problems. Scalable and High Performance Computing (HPC) applications require high-performance data handling, so data interchange standards must enable efficient representation of data. Data Format Description Language (DFDL) enables powerful data interchange and very high-performance data handling. The DFDL Working Group envisages three dominant kinds of data in the future, as follows:- Textual data defined by a format specific schema such as XML or JSON.
- Binary data in standard formats.
- Data with DFDL descriptors.
It is an important observation that both XML format and standardized binary formats are prescriptive in that they specify or prescribe a representation of the data. To use them your applications must be written to conform to their encodings and mechanisms of expression.
DFDL suggests an entirely different scheme. The approach is descriptive in that one chooses an appropriate data representation for an application based on its needs and one then describes the format using DFDL so that multiple programs can directly interchange the described data. DFDL descriptions can be provided by the creator of the format, or developed as needed by third parties intending to use the format. That is, DFDL is not a format for data; it is a way of describing any data format. DFDL is intended for data commonly found in scientific and numeric computations, as well as record-oriented representations found in commercial data processing.
DFDL can be used to describe legacy data files, to simplify transfer of data across domains without requiring global standard formats, or to allow third-party tools to easily access multiple formats. DFDL can also be a powerful tool for supporting backward compatibility as formats evolve.
DFDL is designed to provide flexibility and also permit implementations that achieve very high levels of performance. DFDL descriptions are separable and native applications do not need to use DFDL libraries to parse their data formats. DFDL parsers can also be highly efficient. The DFDL language is designed to permit implementations that use lazy evaluation of formats and to support seekable, random access to data. The following goals can be achieved by DFDL implementations:
- Density. Fewest bytes to represent information content (without resorting to compression). Fastest possible I/O.
- Optimized I/O. Applications can write data aligned to byte, word, or even page boundaries and to use memory-mapped I/O to insure access to data content with the smallest number of machine cycles for common use cases without sacrificing general access.
DFDL 1.0
DFDL 1.0 is the initial release of DFDL. The specification may be found on the OGF documents page or by direct link here (PDF). Release 1.0 includes the following language features:- Subset of XML Schema 1.0
- Rich text content including bi-di support
- Rich binary content including bit support
- Text and binary delimiters
- Ordered and unordered sequences
- Scoping rules to allow modular construction and re-use
- Validation
- Defaults for missing values
- Nil capability for out-of-band values
- Expression language including variables to model dynamic data
- Stratagems to resolve choices, optionality and other points of uncertainty
- One dimensional arrays
- Hidden elements and calculated values
- Very general parsing and serializing capability
- Direct access by offset
- Multi-dimensional arrays
- Multi-layered models
- Custom language extensions
Example
Consider the following XML data:
The logical model for this data can be described by the following fragment of an XML schema document that simply provides a description of the name and type of each element:
Now, suppose we have the same data but represented in a non-XML format. A binary representation of the data could be visualized like this (shown as hexadecimal):
To describe this using DFDL, we take our original XML schema document that described the data model and we annotate the element declarations as follows:
These simple DFDL annotations express that the data are represented in a binary format where integers are two's complement and floats are IEEE, and that the byte order will be big endian.
Implementations
Implementations of DFDL processors that can parse and serialize data using DFDL schemas are emerging.- The IBM WebSphere Message Broker product now includes a DFDL 1.0 streaming parser, modeler and graphical mapper. A trial edition is available.
- An Open Source DFDL processor known as Daffodil is also under active development with initial release in spring 2013.
For flexibility and ease of implementation, the DFDL language is divided into core features and optional features. A DFDL processor can choose to be a minimal conforming processor (all core features), an extended conforming processor (all core and some optional features) or a fully conforming processor (all core and optional features). Additionally, a DFDL processor can choose to provide a parser, a serializer or both a parser and serializer.
login
RSS