1 /** \page h5mmain H5M File Format API
2 *
3 *\section Intro Introduction
4 *
5 * MOAB's native file format is based on the HDF5 file format.
6 * The most common file extension used for such files is .h5m.
7 * A .h5m file can be identified by the top-level \c tstt group
8 * in the HDF5 file.
9 *
10 * The API implemented by this library is a wrapper on top of the
11 * underlying HDF5 library. It provides the following features:
12 * - Enforces and hides MOAB's expected file layout
13 * - Provides a slightly higher-level API
14 * - Provides some backwards compatibility for file layout changes
15 *
16 *
17 *\section Overview H5M File Layout
18 *
19 * The H5M file format relies on the use of a unique entity ID space for
20 * all vertices, elements, and entity sets stored in the file. This
21 * ID space is defined by the application. IDs must be unique over all
22 * entity types (a vertex and an entity set may not have the same ID.)
23 * The IDs must be positive (non-zero) integer values.
24 * There are no other requirements imposed by the format on the ID space.
25 *
26 * Elements, with the exception of polyhedra, are defined by a list of
27 * vertex IDs. Polyhedra are defined by a list of face IDs. Entity sets
28 * have a list of contained entity IDs, and lists of parent and child
29 * entity set IDs. The set contents may include any valid entity ID,
30 * including other sets. The parent and child lists are expected to
31 * contain only entity IDs corresponding to other entity sets. A zero
32 * entity ID may be used in some contexts (tag data with the mhdf_ENTITY_ID
33 * property) to indicate a 'null' value,
34 *
35 * Element types are defined by the combination of a topology identifier (e.g.
36 * hexahedral topology) and the number of nodes in the element.
37 *
38 *
39 *\section Root The tstt Group
40 *
41 * All file data is stored in the \c tstt group in the HDF5 root group.
42 * The \c tstt group may have an optional scalar integer attribute
43 * named \c max_id . This attribute, if present, should contain the
44 * value of the largest entity ID used internally to the file. It can
45 * be used to verify that the code reading the file is using an integer
46 * type of sufficient size to accommodate the entity IDs.
47 *
48 * The \c tstt group contains four sub-groups, a datatype object, and a
49 * dataset object. The four sub-groups are: \c nodes, \c elements,
50 * \c sets, and \c tags. The dataset is named \c history .
51 *
52 * The \c elemtypes datatype is an enumeration of the elem topologies
53 * used in the file. The element topologies understood by MOAB are:
54 * - \c Edge
55 * - \c Tri
56 * - \c Quad
57 * - \c Polygon
58 * - \c Tet
59 * - \c Pyramid
60 * - \c Prism
61 * - \c Knife
62 * - \c Hex
63 * - \c Polyhedron
64 *
65 *
66 *\section History The history DataSet
67 *
68 * The \c history DataSet is a list of variable-length strings with
69 * application-defined meaning.
70 *
71 *\section Nodes The nodes Group
72 *
73 *
74 * The \c nodes group contains a single DataSet and an optional
75 * subgroup. The \c tags subgroup is described in the
76 * \ref Dense "section on dense tag storage".
77 *
78 * The \c coordinates
79 * DataSet contains the coordinates of all vertices in the mesh.
80 * The DataSet should contain floating point values and have a dimensions
81 * \f$ n \times d \f$, where \c n is the number of vertices and \c d
82 * is the number of coordinate values for each vertex.
83 *
84 * The \c coordinates DataSet must have an integer attribute named \c start_id .
85 * The vertices are then defined to have IDs beginning with this value
86 * and increasing sequentially in the order that they are defined in the
87 * \c coordinates table.
88 *
89 *
90 *\section Elements The elements Group
91 *
92 * The \c elements group contains an application-defined number of
93 * subgroups. Each subgroup defines one or more mesh elements that
94 * have the same topology and length of connectivity (number of nodes
95 * for any topology other than \c Polyhedron.) The names of the subgroups
96 * are application defined. MOAB uses a combination of the element
97 * topology name and connectivity length (e.g. "Hex8".).
98 *
99 * Each subgroup must have an attribute named \c element_type that
100 * contains one of the enumerated element topology values defined
101 * in the \c elemtypes datatype described in a \ref Root "previous section".
102 *
103 * Each subgroup contains a single DataSet named \c connectivity and an
104 * optional subgroup named \c tags. The \c tags subgroup is described in the
105 * \ref Dense "section on dense tag storage".
106 *
107 * The \c connectivity DataSet is an \f$ n \times m \f$ array of integer
108 * values. The DataSet contains one row for each of the \c n contained
109 * elements, where the connectivity of each element contains \c m IDs. For
110 * all element types supported by MOAB, with the exception of polyhedra,
111 * the element connectivity list is expected to contain only IDs
112 * corresponding to nodes.
113 *
114 * Each element \c connectivity DataSet must have an integer attribute
115 * named \c start_id . The elements defined in the connectivity table
116 * are defined to have IDs beginning with this value and increasing
117 * sequentially in the order that they are defined in the table.
118 *
119 *
120 *\section Sets The sets Group
121 *
122 * The \c sets group contains the definitions of any entity sets stored
123 * in the file. It contains 1 to 4 DataSets and the optional \c tags
124 * subgroup. The \c contents, \c parents, and \c children data sets
125 * are one dimensional arrays containing the concatenation of the
126 * corresponding lists for all of the sets represented in the file.
127 *
128 * The \c lists DataSet is a \f$ n \times 4 \f$ table, having one
129 * row of four integer values for each set. The first three values
130 * for each set are the indices into the \c contents, \c children,
131 * and \c parents DataSets, respectively, at which the \em last value
132 * for set is stored. The contents, child, and parent lists for
133 * sets are stored in the corresponding datasets in the same order as
134 * the sets are listed in the \c lists DataSet, such that the index of
135 * the first value in one of those tables is one greater than the
136 * corresponding end index in the \em previous row of the table. The
137 * number of content entries, parents, or children for a given set can
138 * be calculated as the difference between the corresponding end index
139 * entry for the current set and the same entry in the previous row
140 * of the table. If the first set in the \c lists DataSet had no parent
141 * sets, then the corresponding index in the third column of the table
142 * would be \c -1. If it had one parent, the index would be \c 0. If it
143 * had two parents, the index would be \c 1, as the first parent would be
144 * stored at position 0 of the \c parents DataSet and the second at position
145 * 1.
146 *
147 * The fourth column of the \c lists DataSet is a series of bit flags
148 * defining some properties of the sets. The four bit values currently
149 * defined are:
150 * - 0x1 owner
151 * - 0x2 unique
152 * - 0x4 ordered
153 * - 0x8 range compressed
154 *
155 * The fourth (most significant) bit indicates that, in the \c contents
156 * data set, that the contents list for the corresponding set is stored
157 * using a single range compression. Rather than storing the IDs of the
158 * contained entities individually, each ID \c i is followed by a count
159 * \c n indicating that the set contains the contiguous range of IDs
160 * \f$ [i, i+n-1] \f$.
161 *
162 * The three least significant bits specify intended properties of the
163 * set and are unrelated to how the set data is stored in the file. These
164 * properties, described briefly from least significant bit to most
165 * significant are: contained entities should track set membership;
166 * the set should contain each entity only once (strict set); and
167 * that the order of the entries in the set should be preserved.
168 *
169 * Similar to the \c nodes/coordinates and \c elements/.../connectivity
170 * DataSets, the \c lists DataSet must have an integer attribute
171 * named \c start_id . IDs are assigned to to sets in the order that
172 * they occur in the \c lists table, beginning with the attribute value.
173 *
174 * The \c sets group may contain a subgroup names \c tags. The \c tags
175 * subgroup is described in the \ref Dense "section on dense tag storage".
176 *
177 *
178 * \section Tags The tags Group
179 *
180 * The \c tags group contains a sub-group for each tag defined
181 * in the file. These sub-groups contain the definition of the
182 * tag and may contain some or all of the tag values associated with
183 * entities in the file. However, it should be noted that tag values
184 * may also be stored in the "dense" format as described in the
185 * \ref Dense "section on dense tag storage".
186 *
187 * Each sub-group of the \c tags group contains the definition for
188 * a single tag. The name of each sub-group is the name of the
189 * corresponding tag. Non-printable characters, characters
190 * prohibited in group names in the HDF5 file format, and the
191 * backslash ('\') character are encoded
192 * in the name string by a backslash ('\') character followed by
193 * the ASCII value of the character expressed as a pair of hexadecimal
194 * digits. Thus the backslash character would be represented as \c \5C .
195 * Each tag group should also contain a comment which contains the
196 * unencoded tag name.
197 *
198 * The tag sub-group may have any or all of the following four attributes:
199 * \c default, \c global, \c is_handle, and \c variable_length.
200 * The \c default attribute, if present,
201 * must contain a single tag value that is to be considered the 'default'
202 * value of the tag. The \c global attribute, if present, must contain a
203 * single tag value that is the value of the tag as set on the mesh instance
204 * (MOAB terminology) or root set (ITAPS terminology.) The presence of the
205 * \c is_handle attribute (the value, if any, is meaningless) indicates
206 * that the tag values are to be considered entity IDs. After reading the
207 * file, the reader should map any such tag values to whatever mechanism
208 * it uses to reference the corresponding entities read from the file.
209 * The presence of the \c variable_length attribute indicates that each
210 * tag value is a variable-length array. The reader should rely on the
211 * presence of this attribute rather than the presence of the \c var_indices
212 * DataSet discussed below because the file may contain the definition of
213 * a variable length tag without containing any values for that tag. In such
214 * a case, the \c var_indices DataSet will not be present.
215 *
216 * Each tag sub-group will contain a committed type object named \c type .
217 * This type must be the type instance used by the \c global and \c default
218 * attributes discussed above and any tag value data sets. For fixed-length
219 * tag data, the tag types understood by MOAB are:
220 * - opaque data
221 * - a single floating point value
222 * - a single integer value
223 * - a bit field
224 * - an array of floating point values
225 * - an array of integer values
226 * Any other data types will be treated as opaque data.
227 * For Variable-length tag data, MOAB expects the \c type object to be
228 * one of:
229 * - opaque data
230 * - a single floating point value
231 * - a single integer value
232 *
233 * For fixed-length tags, the tag sub-group may contain 'sparse' formatted
234 * tag data, which is comprised of two data sets: \c id_list and \c values.
235 * Both data sets must be 1-dimensional arrays of the same length. The
236 * \c id_list data set contains a list of entity IDs and the \c values
237 * data set contains a list of corresponding tag values. The data stored in
238 * the \c values table must be of type \c type. Fixed-length tag values
239 * may also be stored in the "dense" format as described in the
240 * \ref Dense "section on dense tag storage". A mixture of both sparse-
241 * and dense-formatted tag values may be present for a single tag.
242 *
243 * For variable-length tags the tag values, if any, are always stored
244 * in the tag sub-group of the \c tags group and are represented by
245 * three one-dimensional data sets: \c id_list, \c var_indices, and \c values.
246 * Similar to the fixed-length sparse-formatted tag data, the \c id_list
247 * contains the IDs of the entities for which tag values are defined.
248 * The \c values dataset contains the concatenation of the tag values
249 * for each of the entities referenced by ID in the \c id_list table,
250 * in the order that the entities are referenced in the \c id_list table.
251 * The \c var_indices table contains an index into the \c values data set
252 * for each entity in \c id_list. The index indicates the position of
253 * the \em last tag value for the entity in \c values. The index of
254 * the first value is one greater than the
255 * corresponding end index for the \em entry in \c var_indices. The
256 * number of tag values for a given entity can
257 * be calculated as the difference between the corresponding end index
258 * entry for the current entity and the previous value in the \c var_indices
259 * dataset.
260 *
261 *
262 * \section Dense The tags Sub-Groups
263 *
264 * Data for fixed-length tags may also be stored in the \c tags sub-group
265 * of the \c nodes, \c sets, and subgroups of the \c elements group.
266 * Values for given tag are stored in a dataset within the \c tags sub-group
267 * that has the following properties:
268 * - The name must be the same as that of the tag definition in the main
269 * \c tags group
270 * - The type of the data set must be the committed type object stored
271 * as \c /tstt/tags/<tagname>/type .
272 * - The data set must have the same length as the data set in the
273 * parent group with the \c start_id attribute.
274 *
275 * If dense-formatted data is specified for any entity in the group, then
276 * it must be specified for every entity in the group. The table is
277 * expected to contain one value for each entity in the corresponding
278 * primary definition table (\c /tstt/nodes/coordinates ,
279 * \c /tstt/elements/<name>/connectivity , or \c /tstt/sets/list), in the
280 * same order as the entities in that primary definition table.
281 *
282 *
283 *\section mhdf_set mhdf Meshset data
284 *
285 * Meshset data is divided into three groups of data. The set-list/meta-information table,
286 * the set contents table and the set children table. Each is written and read independently.
287 *
288 * The set list table contains one row for each set. Each row contains four values:
289 * {content list end index, child list end index, parent list end index, and flags}. The flags
290 * value is a collection of bits with
291 * values defined in \ref mhdf_set_flag . The all the flags except \ref mhdf_SET_RANGE_BIT are
292 * saved properties of the mesh data and are not relevant to the actual file in any way. The
293 * \ref mhdf_SET_RANGE_BIT flag is a toggle for how the meshset contents (not children) are saved.
294 * It is an internal property of the file format and should not be passed on to the mesh database.
295 * The content list end index and child list end index are the indices of the last entry for the
296 * set in the contents and children tables respectively. In the case where a set has either no
297 * children or no contents, the last index of should be the same as the last index of the previous
298 * set in the table, or -1 for the first set in the table. Thus the first index is always one
299 * greater than the last index of the previous set. If the first index, calculated as one greater
300 * that the last index of the previous set is greater than the last index of the current set, then
301 * there are no values in the corresponding contents or children table for that set.
302 *
303 * The set contents table is a vector of integer global IDs that is the concatenation of the contents
304 * data for all of the mesh sets. The values are stored corresponding to the order of the sets
305 * in the set list table. Depending on the value of \ref mhdf_SET_RANGE_BIT in the flags field of
306 * the set list table, the contents for a specific set may be stored in one of two formats. If the
307 * flag is set, the contents list is a list of pairs where each pair is a starting global Id and a
308 * count. For each pair, the set contains the range of global Ids beginning at the start value.
309 * If the \ref mhdf_SET_RANGE_BIT flag is not set, the meshset contents are a simple list of global Ids.
310 *
311 * The meshset child table is a vector of integer global IDs. It is a concatenation of the child
312 * lists for all the mesh sets, in the order the sets occur in the meshset list table. The values
313 * are always simple lists. The child table may never contain ranges of IDs.
314 *
315 *
316 *\section mhdf_tag mhdf Tag data
317 *
318 * The data for each tag can be stored in two places/formats: sparse and/or
319 * dense. The data may be stored in both, but there should not be redundant
320 * values for the same entity.
321 *
322 * Dense tag data is stored as multiple tables of tag values, one for each
323 * element group. (Note: special \ref mhdf_ElemHandle values are available
324 * for accessing dense tag data on nodes or meshsets via the \ref mhdf_node_type_handle
325 * and \ref mhdf_set_type_handle functions.) Each dense tag table should contain
326 * the same number of entries as the element connectivity table. The tag values
327 * are associated with the corresponding element in the connectivity table.
328 *
329 * Sparse tag data is stored as a global table pair for each tag type. The first
330 * if the pair of tables is a list of Global IDs. The second is the corresponding
331 * tag value for each entity in the ID list.
332 */
333