Go to the documentation of this file.
    1 /** \page h5mmain H5M File Format API
    2  *
    3  *\section Intro   Introduction
    4  *
    5  * MOAB's native file format is based on the HDF5 file format.  
    6  * The most common file extension used for such files is .h5m.
    7  * A .h5m file can be identified by the top-level \c tstt group
    8  * in the HDF5 file.
    9  *
   10  * The API implemented by this library is a wrapper on top of the
   11  * underlying HDF5 library.  It provides the following features:
   12  * - Enforces and hides MOAB's expected file layout 
   13  * - Provides a slightly higher-level API
   14  * - Provides some backwards compatibility for file layout changes
   15  *
   16  *
   17  *\section Overview   H5M File Layout
   18  *
   19  * The H5M file format relies on the use of a unique entity ID space for
   20  * all vertices, elements, and entity sets stored in the file.  This
   21  * ID space is defined by the application.  IDs must be unique over all
   22  * entity types (a vertex and an entity set may not have the same ID.)
   23  * The IDs must be positive (non-zero) integer values.  
   24  * There are no other requirements imposed by the format on the ID space.
   25  *
   26  * Elements, with the exception of polyhedra, are defined by a list of 
   27  * vertex IDs.  Polyhedra are defined by a list of face IDs.  Entity sets 
   28  * have a list of contained entity IDs, and lists of parent and child 
   29  * entity set IDs.  The set contents may include any valid entity ID,
   30  * including other sets.  The parent and child lists are expected to
   31  * contain only entity IDs corresponding to other entity sets.  A zero
   32  * entity ID may be used in some contexts (tag data with the mhdf_ENTITY_ID
   33  * property) to indicate a 'null' value,
   34  *
   35  * Element types are defined by the combination of a topology identifier (e.g. 
   36  * hexahedral topology) and the number of nodes in the element.  
   37  *
   38  *
   39  *\section Root   The tstt Group
   40  *
   41  * All file data is stored in the \c tstt group in the HDF5 root group.
   42  * The \c tstt group may have an optional scalar integer attribute 
   43  * named \c max_id .  This attribute, if present, should contain the
   44  * value of the largest entity ID used internally to the file.  It can
   45  * be used to verify that the code reading the file is using an integer
   46  * type of sufficient size to accommodate the entity IDs.
   47  *
   48  * The \c tstt group contains four sub-groups, a datatype object, and a 
   49  * dataset object.  The four sub-groups are: \c nodes, \c elements,
   50  * \c sets, and \c tags.  The dataset is named \c history .
   51  *
   52  * The \c elemtypes datatype is an enumeration of the elem topologies
   53  * used in the file.  The element topologies understood by MOAB are:
   54  * - \c Edge
   55  * - \c Tri
   56  * - \c Quad
   57  * - \c Polygon
   58  * - \c Tet
   59  * - \c Pyramid
   60  * - \c Prism
   61  * - \c Knife
   62  * - \c Hex
   63  * - \c Polyhedron
   64  * 
   65  *
   66  *\section History   The history DataSet
   67  *
   68  * The \c history DataSet is a list of variable-length strings with
   69  * application-defined meaning.  
   70  *
   71  *\section Nodes   The nodes Group
   72  *
   73  *
   74  * The \c nodes group contains a single DataSet and an optional
   75  * subgroup.  The \c tags subgroup is described in the 
   76  * \ref Dense "section on dense tag storage".  
   77  *
   78  * The \c coordinates
   79  * DataSet contains the coordinates of all vertices in the mesh.
   80  * The DataSet should contain floating point values and have a dimensions
   81  * \f$ n \times d \f$, where \c n is the number of vertices and \c d
   82  * is the number of coordinate values for each vertex.
   83  *
   84  * The \c coordinates DataSet must have an integer attribute named \c start_id .
   85  * The vertices are then defined to have IDs beginning with this value
   86  * and increasing sequentially in the order that they are defined in the
   87  * \c coordinates table.
   88  *
   89  *
   90  *\section Elements   The elements Group
   91  *
   92  * The \c elements group contains an application-defined number of 
   93  * subgroups.  Each subgroup defines one or more mesh elements that
   94  * have the same topology and length of connectivity (number of nodes
   95  * for any topology other than \c Polyhedron.)  The names of the subgroups
   96  * are application defined.  MOAB uses a combination of the element
   97  * topology name and connectivity length (e.g. "Hex8".).  
   98  *
   99  * Each subgroup must have an attribute named \c element_type that 
  100  * contains one of the enumerated element topology values defined 
  101  * in the \c elemtypes datatype described in a \ref Root "previous section".
  102  *
  103  * Each subgroup contains a single DataSet named \c connectivity and an 
  104  * optional subgroup named \c tags.  The \c tags subgroup is described in the 
  105  * \ref Dense "section on dense tag storage". 
  106  *
  107  * The \c connectivity DataSet is an \f$ n \times m \f$ array of integer
  108  * values.  The DataSet contains one row for each of the \c n contained
  109  * elements, where the connectivity of each element contains \c m IDs.  For
  110  * all element types supported by MOAB, with the exception of polyhedra,
  111  * the element connectivity list is expected to contain only IDs 
  112  * corresponding to nodes.  
  113  *
  114  * Each element \c connectivity DataSet must have an integer attribute 
  115  * named \c start_id .  The elements defined in the connectivity table
  116  * are defined to have IDs beginning with this value and increasing
  117  * sequentially in the order that they are defined in the table.
  118  *
  119  *
  120  *\section Sets   The sets Group
  121  *
  122  * The \c sets group contains the definitions of any entity sets stored
  123  * in the file.  It contains 1 to 4 DataSets and the optional \c tags 
  124  * subgroup.  The \c contents, \c parents, and \c children data sets
  125  * are one dimensional arrays containing the concatenation of the
  126  * corresponding lists for all of the sets represented in the file.
  127  *
  128  * The \c lists DataSet is a \f$ n \times 4 \f$ table, having one
  129  * row of four integer values for each set.  The first three values
  130  * for each set are the indices into the \c contents, \c children, 
  131  * and \c parents DataSets, respectively, at which the \em last value
  132  * for set is stored.  The contents, child, and parent lists for
  133  * sets are stored in the corresponding datasets in the same order as
  134  * the sets are listed in the \c lists DataSet, such that the index of
  135  * the first value in one of those tables is one greater than the 
  136  * corresponding end index in the \em previous row of the table.  The
  137  * number of content entries, parents, or children for a given set can
  138  * be calculated as the difference between the corresponding end index
  139  * entry for the current set and the same entry in the previous row 
  140  * of the table.  If the first set in the \c lists DataSet had no parent
  141  * sets, then the corresponding index in the third column of the table
  142  * would be \c -1.  If it had one parent, the index would be \c 0.  If it
  143  * had two parents, the index would be \c 1, as the first parent would be
  144  * stored at position 0 of the \c parents DataSet and the second at position
  145  * 1.
  146  *
  147  * The fourth column of the \c lists DataSet is a series of bit flags
  148  * defining some properties of the sets.  The four bit values currently
  149  * defined are:
  150  *  - 0x1 owner
  151  *  - 0x2 unique
  152  *  - 0x4 ordered
  153  *  - 0x8 range compressed
  154  *
  155  * The fourth (most significant) bit indicates that, in the \c contents 
  156  * data set, that the contents list for the corresponding set is stored
  157  * using a single range compression.  Rather than storing the IDs of the
  158  * contained entities individually, each ID \c i is followed by a count 
  159  * \c n indicating that the set contains the contiguous range of IDs
  160  * \f$ [i, i+n-1] \f$.
  161  *
  162  * The three least significant bits specify intended properties of the
  163  * set and are unrelated to how the set data is stored in the file.  These
  164  * properties, described briefly from least significant bit to most 
  165  * significant are: contained entities should track set membership;
  166  * the set should contain each entity only once (strict set); and
  167  * that the order of the entries in the set should be preserved.  
  168  *
  169  * Similar to the \c nodes/coordinates and \c elements/.../connectivity
  170  * DataSets, the \c lists DataSet must have an integer attribute 
  171  * named \c start_id .  IDs are assigned to to sets in the order that
  172  * they occur in the \c lists table, beginning with the attribute value.
  173  *
  174  * The \c sets group may contain a subgroup names \c tags.  The \c tags 
  175  * subgroup is described in the \ref Dense "section on dense tag storage". 
  176  *
  177  * 
  178  * \section Tags   The tags Group
  179  *
  180  * The \c tags group contains a sub-group for each tag defined
  181  * in the file.  These sub-groups contain the definition of the
  182  * tag and may contain some or all of the tag values associated with
  183  * entities in the file.  However, it should be noted that tag values
  184  * may also be stored in the "dense" format as described in the 
  185  * \ref Dense "section on dense tag storage".
  186  *
  187  * Each sub-group of the \c tags group contains the definition for
  188  * a single tag.  The name of each sub-group is the name of the 
  189  * corresponding tag.  Non-printable characters, characters
  190  * prohibited in group names in the HDF5 file format, and the
  191  * backslash ('\') character are encoded
  192  * in the name string by a backslash ('\') character followed by
  193  * the ASCII value of the character expressed as a pair of hexadecimal
  194  * digits.  Thus the backslash character would be represented as \c \5C .
  195  * Each tag group should also contain a comment which contains the
  196  * unencoded tag name.
  197  *
  198  * The tag sub-group may have any or all of the following four attributes:
  199  * \c default, \c global, \c is_handle, and \c variable_length.  
  200  * The \c default attribute, if present,
  201  * must contain a single tag value that is to be considered the 'default'
  202  * value of the tag.  The \c global attribute, if present, must contain a
  203  * single tag value that is the value of the tag as set on the mesh instance
  204  * (MOAB terminology) or root set (ITAPS terminology.)  The presence of the
  205  * \c is_handle attribute (the value, if any, is meaningless) indicates
  206  * that the tag values are to be considered entity IDs.  After reading the
  207  * file, the reader should map any such tag values to whatever mechanism
  208  * it uses to reference the corresponding entities read from the file.
  209  * The presence of the \c variable_length attribute indicates that each 
  210  * tag value is a variable-length array.  The reader should rely on the
  211  * presence of this attribute rather than the presence of the \c var_indices
  212  * DataSet discussed below because the file may contain the definition of
  213  * a variable length tag without containing any values for that tag.  In such
  214  * a case, the \c var_indices DataSet will not be present.
  215  *
  216  * Each tag sub-group will contain a committed type object named \c type .
  217  * This type must be the type instance used by the \c global and \c default
  218  * attributes discussed above and any tag value data sets.  For fixed-length
  219  * tag data, the tag types understood by MOAB are:
  220  *  - opaque data
  221  *  - a single floating point value
  222  *  - a single integer value
  223  *  - a bit field
  224  *  - an array of floating point values
  225  *  - an array of integer values
  226  * Any other data types will be treated as opaque data.
  227  * For Variable-length tag data, MOAB expects the \c type object to be
  228  * one of:
  229  *  - opaque data
  230  *  - a single floating point value
  231  *  - a single integer value
  232  *
  233  * For fixed-length tags, the tag sub-group may contain 'sparse' formatted
  234  * tag data, which is comprised of two data sets: \c id_list and \c values.
  235  * Both data sets must be 1-dimensional arrays of the same length.  The 
  236  * \c id_list data set contains a list of entity IDs and the \c values 
  237  * data set contains a list of corresponding tag values.  The data stored in
  238  * the \c values table must be of type \c type.  Fixed-length tag values
  239  * may also be stored in the "dense" format as described in the 
  240  * \ref Dense "section on dense tag storage".  A mixture of both sparse-
  241  * and dense-formatted tag values may be present for a single tag.
  242  *
  243  * For variable-length tags the tag values, if any, are always stored
  244  * in the tag sub-group of the \c tags group and are represented by
  245  * three one-dimensional data sets: \c id_list, \c var_indices, and \c values.  
  246  * Similar to the fixed-length sparse-formatted tag data, the \c id_list
  247  * contains the IDs of the entities for which tag values are defined.
  248  * The \c values dataset contains the concatenation of the tag values
  249  * for each of the entities referenced by ID in the \c id_list table, 
  250  * in the order that the entities are referenced in the \c id_list table.
  251  * The \c var_indices table contains an index into the \c values data set
  252  * for each entity in \c id_list.  The index indicates the position of
  253  * the \em last tag value for the entity in \c values.  The index of
  254  * the first value is one greater than the 
  255  * corresponding end index for the \em entry in \c var_indices.  The
  256  * number of tag values for a given entity can
  257  * be calculated as the difference between the corresponding end index
  258  * entry for the current entity and the previous value in the \c var_indices
  259  * dataset.  
  260  *
  261  *
  262  * \section Dense   The tags Sub-Groups
  263  *
  264  * Data for fixed-length tags may also be stored in the \c tags sub-group
  265  * of the \c nodes, \c sets, and subgroups of the \c elements group.  
  266  * Values for given tag are stored in a dataset within the \c tags sub-group
  267  * that has the following properties:
  268  *  - The name must be the same as that of the tag definition in the main 
  269  *      \c tags group
  270  *  - The type of the data set must be the committed type object stored
  271  *      as \c /tstt/tags/<tagname>/type .
  272  *  - The data set must have the same length as the data set in the
  273  *    parent group with the \c start_id attribute.  
  274  *
  275  * If dense-formatted data is specified for any entity in the group, then
  276  * it must be specified for every entity in the group.  The table is 
  277  * expected to contain one value for each entity in the corresponding 
  278  * primary definition table (\c /tstt/nodes/coordinates , 
  279  * \c /tstt/elements/<name>/connectivity , or \c /tstt/sets/list), in the
  280  * same order as the entities in that primary definition table.
  281  *
  282  *
  283  *\section mhdf_set mhdf Meshset data
  284  *
  285  * Meshset data is divided into three groups of data.  The set-list/meta-information table,
  286  * the set contents table and the set children table.  Each is written and read independently.
  287  *
  288  * The set list table contains one row for each set.  Each row contains four values:
  289  * {content list end index, child list end index, parent list end index, and flags}.  The flags 
  290  * value is a collection of bits with
  291  * values defined in \ref mhdf_set_flag .  The all the flags except \ref mhdf_SET_RANGE_BIT are
  292  * saved properties of the mesh data and are not relevant to the actual file in any way.  The
  293  * \ref mhdf_SET_RANGE_BIT flag is a toggle for how the meshset contents (not children) are saved.
  294  * It is an internal property of the file format and should not be passed on to the mesh database.
  295  * The content list end index and child list end index are the indices of the last entry for the
  296  * set in the contents and children tables respectively.  In the case where a set has either no
  297  * children or no contents, the last index of should be the same as the last index of the previous
  298  * set in the table, or -1 for the first set in the table.  Thus the first index is always one
  299  * greater than the last index of the previous set.  If the first index, calculated as one greater
  300  * that the last index of the previous set is greater than the last index of the current set, then
  301  * there are no values in the corresponding contents or children table for that set.
  302  *
  303  * The set contents table is a vector of integer global IDs that is the concatenation of the contents
  304  * data for all of the mesh sets.  The values are stored corresponding to the order of the sets
  305  * in the set list table.  Depending on the value of \ref mhdf_SET_RANGE_BIT in the flags field of
  306  * the set list table, the contents for a specific set may be stored in one of two formats.  If the
  307  * flag is set, the contents list is a list of pairs where each pair is a starting global Id and a 
  308  * count.  For each pair, the set contains the range of global Ids beginning at the start value. 
  309  * If the \ref mhdf_SET_RANGE_BIT flag is not set, the meshset contents are a simple list of global Ids.
  310  *
  311  * The meshset child table is a vector of integer global IDs.  It is a concatenation of the child
  312  * lists for all the mesh sets, in the order the sets occur in the meshset list table.  The values
  313  * are always simple lists.  The child table may never contain ranges of IDs.
  314  *
  315  *
  316  *\section mhdf_tag mhdf Tag data
  317  *
  318  * The data for each tag can be stored in two places/formats:  sparse and/or
  319  * dense.  The data may be stored in both, but there should not be redundant 
  320  * values for the same entity.  
  321  *
  322  * Dense tag data is stored as multiple tables of tag values, one for each
  323  * element group.  (Note:  special \ref mhdf_ElemHandle values are available
  324  * for accessing dense tag data on nodes or meshsets via the \ref mhdf_node_type_handle
  325  * and \ref mhdf_set_type_handle functions.)  Each dense tag table should contain
  326  * the same number of entries as the element connectivity table.  The tag values 
  327  * are associated with the corresponding element in the connectivity table.
  328  *
  329  * Sparse tag data is stored as a global table pair for each tag type.  The first
  330  * if the pair of tables is a list of Global IDs.  The second is the corresponding
  331  * tag value for each entity in the ID list.
  332  */
  333