Mesh Oriented datABase  (version 5.5.1)
An array-based unstructured mesh library
d_devguide.h
Go to the documentation of this file.
1 /*! \page developerguide Developer's Guide
2 
3  \tableofcontents
4 
5  \section sequence 1.EntitySequence & SequenceData
6 
7  \subsection figure1 Figure 1: EntitySequences For One SequenceData
8  \image html figures/figure1.jpg
9 
10  \ref dg-figures "List of Figures"
11 
12 The <I>SequenceData</I> class manages as set of arrays of per-entity values. Each
13 <I>SequenceData</I> has a start and end handle denoting the block of entities for which
14 the arrays contain data. The arrays managed by a <I>SequenceData</I> instance are
15 divided into three groups:
16 
17 - Type-specific data (connectivity, coordinates, etc.): zero or more arrays.
18 - Adjacency data: zero or one array.
19 - Dense tag data: zero or more arrays.
20 .
21 
22 The abstract <I>EntitySequence</I> class is a non-strict subset of a <I>SequenceData</I>.
23 It contains a pointer to a <I>SequenceData</I> and the start and end handles to indicate
24 the subset of the referenced <I>SequenceData</I>. The <I>EntitySequence</I> class is
25 used to represent the regions of valid (or allocated) handles in a <I>SequenceData</I>.
26 A <I>SequenceData</I> is expected to be referenced by one or more <I>EntitySequence</I>
27 instances.
28 
29 Initial <I>EntitySequence</I> and <I>SequenceData</I> pairs are typically created in one
30 of two configurations. When reading from a file, a <I>SequenceData</I> will be created
31 to represent all of a single type of entity contained in a file. As all entries in the <I>SequenceData</I> correspond to valid handles (entities read from the file) a single
32 <I>EntitySequence</I> instance corresponding to the entire <I>SequenceData</I> is initially
33 created. The second configuration arises when allocating a single entity. If no
34 entities have been allocated yet, a new <I>SequenceData</I> must be created to store
35 the entity data. It is created with a constant size (e.g. 4k entities). The new
36 <I>EntitySequence</I> corresponds to only the first entity in the <I>SequenceData</I>: the
37 one allocated entity. As subsequent entities are allocated, the <I>EntitySequence</I>
38 is extended to cover more of the corresponding <I>SequenceData</I>.
39 
40 Concrete subclasses of the <I>EntitySequence</I> class are responsible for representing
41 specific types of entities using the array storage provided by the
42 <I>SequenceData</I> class. They also handle allocating <I>SequenceData</I> instances with
43 appropriate arrays for storing a particular type of entity. Each concrete subclass
44 typically provides two constructors corresponding to the two initial allocation
45 configurations described in the previous paragraph. <I>EntitySequence</I> implementations
46 also provide a split method, which is a type of factory method. It
47 modifies the called sequence and creates a new sequence such that the range of
48 entities represented by the original sequence is split.
49 
50 The <I>VertexSequence</I> class provides an <I>EntitySequence</I> for storing vertex
51 data. It references a SequenceData containing three arrays of doubles
52 for storing the blocked vertex coordinate data. The <I>ElementSequence</I> class
53 extends the <I>EntitySequence</I> interface with element-specific functionality. The
54 <I>UnstructuredElemSeq</I> class is the concrete implementation of <I>ElementSequence</I>
55 used to represent unstructured elements, polygons, and polyhedra. <I>MeshSetSequence</I>
56 is the <I>EntitySequence</I> used for storing entity sets.
57 
58 Each <I>EntitySequence</I> implementation also provides an implementation of
59 the values per entity method. This value is used to determine if an existing
60 <I>SequenceData</I> that has available entities is suitable for storing a particular
61 entity. For example, <I>UnstructuredElemSeq</I> returns the number of nodes per element
62 from values per entity. When allocating a new element with a specific
63 number of nodes, this value is used to determine if that element may be stored
64 in a specific <I>SequenceData</I>. For vertices, this value is always zero. This could
65 be changed to the number of coordinates per vertex, allowing representation of
66 mixed-dimension data. However, API changes would be required to utilize such
67 a feature. Sequences for which the corresponding data cannot be used to store
68 new entities (e.g. structured mesh discussed in a later section) will return -1 or
69 some other invalid value.
70 
71 
72  \section manager 2.TypeSequenceManager & SequenceManager
73 
74 The <I>TypeSequenceManager</I> class maintains an organized set of <I>EntitySequence</I>
75 instances and corresponding <I>SequenceData</I> instances. It is used to manage
76 all such instances for entities of a single <I>EntityType</I>. <I>TypeSequenceManager</I>
77 enforces the following four rules on its contained data:
78 
79 -# No two <I>SequenceData</I> instances may overlap.
80 -# No two <I>EntitySequence</I> instances may overlap.
81 -# Every <I>EntitySequence</I> must be a subset of a <I>SequenceData</I>.
82 -# Any pair of <I>EntitySequence</I> instances referencing the same <I>SequenceData</I> must be separated by at least one unallocated handle.
83 .
84 
85  \subsection figure2 Figure 2: SequenceManager and Related Classes
86  \image html figures/figure2.jpg
87 
88  \ref dg-figures "List of Figures"
89 
90 The first three rules are required for the validity of the data model. The
91 fourth rule avoids unnecessary inefficiency. It is implemented by merging such
92 adjacent sequences. In some cases, other classes (e.g. <I>SequenceManager</I>) can
93 modify an <I>EntitySequence</I> such that the fourth rule is violated. In such cases,
94 the <I>TypeSequenceManager::notify</I> prepended or <I>TypeSequenceManager::notify</I> appended
95 method must be called to maintain the integrity of the data<sup>1</sup>. The above rules
96 (including the fourth) are assumed in many other methods of the <I>TypeSequenceManager</I>
97 class, such that those methods will fail or behave unexpectedly if the managed
98 data does not conform to the rules.
99 
100 <I>TypeSequenceManager</I> contains three principal data structures. The first is
101 a <I>std::set</I> of <I>EntitySequence</I> pointers sorted using a custom comparison
102 operator that queries the start and end handles of the referenced sequences. The
103 comparison operation is defined as: <I>a->end_handle() < b->start_handle()</I>.
104 This method of comparison has the advantage that a sequence corresponding to
105 a specific handle can be located by searching the set for a “sequence” beginning
106 and ending with the search value. The lower bound and find methods provided
107 by the library are guaranteed to return the sequence, if it exists. Using
108 such a comparison operator will result in undefined behavior if the set contains
109 overlapping sequences. This is acceptable, as rule two above prohibits such
110 a configuration. However, some care must be taken in writing and modifying
111 methods in <I>TypeSequenceManager</I> so as to avoid having overlapping sequences
112 as a transitory state of some operation.
113 
114 The second important data member of <I>TypeSequenceManager</I> is a pointer
115 to the last referenced <I>EntitySequence</I>. This “cached” value is used to speed up
116 searches by entity handle. This pointer is never null unless the sequence is empty.
117 This rule is maintained to avoid unnecessary branches in fast query paths. In
118 cases where the last referenced sequence is deleted, <I>TypeSequenceManager</I> will
119 typically assign an arbitrary sequence (e.g. the first one) to the last referenced
120 pointer. (Note: this cached value might give problems for threading models; it
121 should probably be different for each thread)
122 
123 The third data member of <I>TypeSequenceManager</I> is a <I>std::set</I> of <I>SequenceData</I>
124 instances that are not completely covered by a <I>EntitySequence</I> instance<sup>2</sup>.
125 This list is searched when allocating new handles. <I>TypeSequenceManager</I> also
126 embeds in each <I>SequenceData</I> instance a reference to the first corresponding
127 <I>EntitySequence</I> so that it may be located quickly from only the <I>SequenceData</I>
128 pointer.
129 
130 The <I>SequenceManager</I> class contains an array of <I>TypeSequenceManager</I> in-
131 stances, one for each <I>EntityType</I>. It also provides all type-specific operations
132 such as allocating the correct <I>EntitySequence</I> subtype for a given <I>EntityType</I>.
133 
134 <sup>1</sup>This source of potential error can be eliminated with changes to the entity set representation.
135 This is discussed in a later section.
136 
137 <sup>2</sup>Given rule four for the data managed by a <I>TypeSequenceManager</I>, any
138 <I>SequenceData</I> for which all handles are allocated will be referenced by exactly one <I>EntitySequence</I>.
139 
140 
141  \section s-mesh 3.Structured Mesh
142 
143 Structured mesh storage is implemented using subclasses of <I>SequenceData</I>:
144 <I>ScdElementData</I> and <I>ScdVertexData</I>. The <I>StructuredElementSeq</I> class is
145 used to access the structured element connectivity. A standard <I>VertexSequence</I>
146 instance is used to access the ScdVertexData because the vertex data storage
147 is the same as for unstructured mesh.
148 
149 
150  \section sets 4.Entity Sets
151 
152 - MeshSetSequence
153 
154 The <I>MeshSetSequence</I> class is the same as most other subclasses of <I>EntitySequence</I>
155 in that it utilizes SequenceData to store its data. A single array in the <I>SequenceData</I>
156 is used to store instances of the MeshSet class, one per allocated <I>EntityHandle</I>.
157 <I>SequenceData</I> allocates all of its managed arrays using malloc and free as
158 simple arrays of bytes. <I>MeshSetSequence</I> does in-place construction and
159 destruction of <I>MeshSet</I> instances within that array. This is similar to what is
160 done by <I>std::vector</I> and other container classes that may own more storage
161 than is required at a given time for contained objects.
162 
163 - MeshSet
164 
165  \subsection figure3 Figure 3: SequenceManager and Related Classes
166  \image html figures/figure3.jpg
167 
168  \ref dg-figures "List of Figures"
169 
170 The <I>MeshSet</I> class is used to represent a single entity set instance in MOAB.
171 The class is optimized to minimize storage (further possible improvements in
172 storage size are discussed later.)
173 
174 Figure 3 shows the memory layout of an instance of the <I>MeshSet</I> class.
175 The flags member holds the set creation bit flags: <I>MESHSET_TRACK_OWNER</I>,
176 <I>MESHSET_SET</I>, and <I>MESHSET_ORDERED</I>. The presence of the <I>MESHSET_TRACK_OWNER</I>
177 indicates that reverse links from the contained entities back to the owning set
178 should be maintained in the adjacency list of each entity. The <I>MESHSET_SET</I>
179 and <I>MESHSET_ORDERED</I> bits are mutually exclusive, and as such most code only
180 tests for the <I>MESHSET_ORDERED</I>, meaning that in practice the <I>MESHSET_SET</I> bit is
181 ignored. <I>MESHSET_ORDERED</I> indicates that the set may contain duplicate handles
182 and that the order that the handles are added to the set should be preserved.
183 In practice, such sets are stored as a simple list of handles. <I>MESHSET_SET</I> (or in
184 practice, the lack of <I>MESHSET_ORDERED</I>) indicates that the order of the handles
185 need not be preserved and that the set may not contain duplicate handles. Such
186 sets are stored in a sorted range-compacted format similar to that of the Range
187 class.
188 
189 The memory for storing contents, parents, and children are each handled in
190 the same way. The data in the class is composed of a 2-bit ‘size’ field and two
191 values, where the two values may either be two handles or two pointers. The size
192 bit-fields are grouped together to reduce the required amount of memory. If the
193 numerical value of the 2-bit size field is 0 then the corresponding list is empty.
194 If the 2-bit size field is either 1 or 2, then the contents of the corresponding list
195 are stored directly in the corresponding two data fields of the MeshSet object.
196 If the 2-bit size field has a value of 3 (11 binary), then the corresponding two
197 data fields store the begin and end pointers of an external array of handles.
198 The number of handles in the external array can be obtained by taking the
199 difference of the start and end pointers. Note that unlike <I>std::vector</I>, we
200 do not store both an allocated and used size. We store only the ‘used’ size
201 and call std::realloc whenever the used size is modified, thus we rely on the
202 std::malloc implementation in the standard C library to track ‘allocated’ size
203 for us. In practice this performs well but does not return memory to the ‘system’
204 when lists shrink (unless they shrink to zero). This overall scheme could exhibit
205 poor performance if the size of one of the data lists in the set frequently changes
206 between less than two and more than two handles, as this will result in frequent
207 releasing and re-allocating of the memory for the corresponding array.
208 
209 If the <I>MESHSET_ORDERED</I> flag is not present, then the set contents list (parent
210 and child lists are unaffected) is stored in a range-compacted format. In this
211 format the number of handles stored in the array is always a multiple of two.
212 Each consecutive pair of handles indicate the start and end, inclusive, of a range
213 of handles contained in the set. All such handle range pairs are stored in sorted
214 order and do not overlap. Nor is the end handle of one range ever one less than
215 the start handle of the next. All such ‘adjacent’ range pairs are merged into a
216 single pair. The code for insertion and removal of handles from range-formatted
217 set content lists is fairly complex. The implementation will guarantee that a
218 given call to insert entities into a range or remove entities from a range is never
219 worse than O(ln n) + O(m + n), where ‘n’ is the number of handles to insert
220 and ‘m’ is the number of handles already contained in the set. So it is generally
221 much more efficient to build Ranges of handles to insert (and remove) and call
222 MOAB to insert (or remove) the entire list at once rather than making many
223 calls to insert (or remove) one or a few handles from the contents of a set.
224 The set storage could probably be further minimized by allowing up to six
225 handles in one of the lists to be elided. That is, as there are six potential ‘slots’
226 in the MeshSet object then if two of the lists are empty it should be possible to
227 store up to six values of the remaining list directly in the MeshSet object.
228 However, the additional runtime cost of such complexity could easily outweigh
229 any storage advantage. Further investigation into this has not been done because
230 the primary motivation for the storage optimization was to support binary trees.
231 
232 Another possible optimization of storage would be to remove the <I>MeshSet</I>
233 object entirely and instead store the data in a ‘blocked’ format. The corresponding
234 <I>SequenceData</I> would contain four arrays: flags, parents, children, and
235 contents instead of a single array of <I>MeshSet</I> objects. If this were done then
236 no storage need ever be allocated for parent or child links if none of the sets
237 in a <I>SequenceData</I> has parent or child links. The effectiveness of the storage
238 reduction would depend greatly on how sets get grouped into <I>SequenceDatas</I>.
239 This alternate storage scheme might also allow for better cache utilization as it
240 would group like data together. It is often the case that application code that
241 is querying the contents of one set will query the contents of many but never
242 query the parents or children of any set. Or that an application will query only
243 parent or child links of a set without every querying other set properties. The
244 downside of this solution is that it makes the implementation a little less mod-
245 ular and maintainable because the existing logic contained in the <I>MeshSet</I> class
246 would need to be spread throughout the <I>MeshSetSequence</I> class.
247 
248  \section impl-error-handling 5.Implementation of Error Handling
249 
250 When a certain error occurs, a MOAB routine can return an enum type ErrorCode (defined in src/moab/Types.hpp)
251 to its callers. Since MOAB 4.8, the existing error handling model has been completely redesigned to better set
252 and check errors.
253 
254  \subsection dgfiveone 5.1. Existing Error Handling Model
255 
256 To keep track of detail information about errors, a class Error (defined in src/moab/Error.hpp) is used to
257 store corresponding error messages. Some locally defined macros call Error::set_last_error() to report a new
258 error message, before a non-success error code is returned. At any time, user may call Core::get_last_error()
259 to retrieve the latest error message from the Error class instance held by MOAB.
260 
261 Limitations:
262 - The Error class can only store the last error message that is being set. When an error originates from a lower
263 level in a call hierarchy, upper level callers might continue to report more error messages. As a result, previously
264 reported error messages get overwritten and they will no longer be available to the user.
265 - There is no useful stack trace for the user to find out where an error first occurs, making MOAB difficult to debug.
266 
267  \subsection dgfivetwo 5.2. Enhanced Error Handling Model
268 
269 The error handling model of PETSc (http://www.mcs.anl.gov/petsc/) has nice support for stack trace, and our design has
270 borrowed many ideas from that. The new features of the enhanced model include:
271 - Lightweight and easy to use with a macro-based interface
272 - Generate a stack trace starting from the first non-success error
273 - Support C++ style streams to build up error messages rather than C style sprintf:
274 \code
275 MB_SET_ERR(MB_FAILURE, "Failed to iterate over tag on " << NUM_VTX << " vertices");
276 \endcode
277 - Have preprocessor-like functionality such that we can do something like:
278 \code
279 result = MOABRoutine(...);MB_CHK_SET_ERR(result, "Error message to set if result is not MB_SUCCESS");
280 \endcode
281 - Define and handle globally fatal errors or per-processor specific errors.
282 
283 The public include file for error handling is src/moab/ErrorHandler.hpp, the source code for the error
284 handling is in src/ErrorHandler.cpp.
285 
286 \subsection dgfivethree 5.3. Error Handler
287 
288 The error handling function MBError() only calls one default error handler, MBTraceBackErrorHandler(), which tries to print
289 out a stack trace. In the future, we need to provide a callback function to user routine before a complete abort. Something
290 like a UserTeardown that is a function pointer with a context so that the user can destroy and free essential handles before
291 an MPI abort.
292 
293 The arguments to MBTraceBackErrorHandler() are the line number where the error occurred, the function where error was detected,
294 the file in which the error was detected, the source directory, the error message, and the error type indicating whether the
295 error message should be printed.
296 \code
297 ErrorCode MBTraceBackErrorHandler(int line, const char* func, const char* file, const char* dir, const char* err_msg, ErrorType err_type);
298 \endcode
299 This handler will print out a line of stack trace, indicating line number, function name, directory and file name. If MB_ERROR_TYPE_EXISTING
300 is passed as the error type, the error message will not be printed.
301 
302 \subsection dgfivefour 5.4. Simplified Interface
303 
304 The simplified C/C++ macro-based interface consists of the following three basic macros:
305 \code
306 MB_SET_ERR(Error code, "Error message");
307 MB_CHK_ERR(Error code);
308 MB_CHK_SET_ERR(Error code, "Error message");
309 \endcode
310 
311 The macro MB_SET_ERR(err_code, err_msg) is given by
312 \code
313 std::ostringstream err_ostr;
314 err_ostr << err_msg;
315 return MBError(__LINE__, __func__, __FILENAME__, __SDIR__, err_code, err_ostr.str().c_str(), MB_ERROR_TYPE_NEW_LOCAL);
316 \endcode
317 It calls the error handler with the current function name and location: line number, file and directory, plus an error code,
318 an error message and an error type. With an embedded std::ostringstream object, it supports C++ style streams to build up error
319 messages. The error type is MB_ERROR_TYPE_NEW_LOCAL/MB_ERROR_TYPE_NEW_GLOBAL on detection of the initial error and MB_ERROR_TYPE_EXISTING
320 for any additional calls. This is so that the detailed error information is only printed once instead of for all levels of returned errors.
321 
322 The macro MB_CHK_ERR(err_code) is defined by
323 \code
324 if (MB_SUCCESS != err_code)
325  return MBError(__LINE__, __func__, __FILENAME__, __SDIR__, err_code, "", MB_ERROR_TYPE_EXISTING);
326 \endcode
327 It checks the error code, if not MB_SUCCESS, calls the error handler with error type MB_ERROR_TYPE_EXISTING and return.
328 
329 The MB_CHK_SET_ERR(err_code, err_msg) is defined by
330 \code
331 if (MB_SUCCESS != err_code)
332  MB_SET_ERR(err_code, err_msg);
333 \endcode
334 It checks the error code, if not MB_SUCCESS, calls MB_SET_ERR() to set a new error.
335 
336 To obtain correct line numbers in the stack trace, we recommend putting MB_CHK_ERR() and MB_CHK_SET_ERR() at the same line as a MOAB routine.
337 
338 In addition to the basic macros mentioned above, there are some variations, such as (for more information, refer to src/moab/ErrorHandler.hpp):
339 - MB_SET_GLB_ERR() to set a globally fatal error (for all processors)
340 - MB_SET_ERR_RET() for functions that return void type
341 - MB_SET_ERR_RET_VAL() for functions that return any data type
342 - MB_SET_ERR_CONT() to continue execution instead of returning from current function
343 These macros should be used appropriately in MOAB source code depending on the context.
344 
345 \subsection dgfivefive 5.5. Embedded Parallel Functionality
346 
347 We define a global MPI rank with which to prefix the output, as most systems have mechanisms for separating output by rank anyway.
348 For the error handler, we can pass error type MB_ERROR_TYPE_NEW_GLOBAL for globally fatal errors and MB_ERROR_TYPE_NEW_LOCAL for
349 per-processor relevant errors.
350 
351 Note, if the error handler uses std::cout to print error messages and stack traces in each processor, it can result in a messy output.
352 This is a known MPI issue with std::cout, and existing DebugOutput class has solved this issue with buffered lines. A new class
353 ErrorOutput (implemented similar to DebugOutput) is used by the error handler to print each line prefixed with the MPI rank.
354 
355 \subsection dgfivesix 5.6. Handle Non-error Conditions
356 
357 We should notice that sometimes ErrorCode is used to return a non-error condition (some internal error code that can be handled, or even expected,
358 e.g. MB_TAG_NOT_FOUND). Therefore, MB_SET_ERR() should be appropriately placed to report an error to the the caller. Before it is used, we need to
359 carefully decide whether that error is intentional. For example, a lower level MOAB routine that could return MB_TAG_NOT_FOUND should probably not
360 set an error on it, since the caller might expect to get that error code. In this case, the lower level routine just return MB_TAG_NOT_FOUND as a
361 condition, and no error is being set. It is then up to the upper level callers to decide whether it should be a true error or not.
362 
363 */
364