Go to the documentation of this file.
    1 /*! \page developerguide Developer's Guide
    2  
    3   \tableofcontents
    4  
    5   \section sequence  1.EntitySequence & SequenceData
    6  
    7   \subsection figure1 Figure 1: EntitySequences For One SequenceData
    8   \image html figures/figure1.jpg
    9  
   10   \ref dg-figures "List of Figures"
   11  
   12 The <I>SequenceData</I> class manages as set of arrays of per-entity values. Each
   13 <I>SequenceData</I> has a start and end handle denoting the block of entities for which
   14 the arrays contain data. The arrays managed by a <I>SequenceData</I> instance are
   15 divided into three groups:
   16  
   17 - Type-specific data (connectivity, coordinates, etc.): zero or more arrays.
   18 - Adjacency data: zero or one array.
   19 - Dense tag data: zero or more arrays.
   20 .
   21  
   22 The abstract <I>EntitySequence</I> class is a non-strict subset of a <I>SequenceData</I>.
   23 It contains a pointer to a <I>SequenceData</I> and the start and end handles to indicate
   24 the subset of the referenced <I>SequenceData</I>. The <I>EntitySequence</I> class is
   25 used to represent the regions of valid (or allocated) handles in a <I>SequenceData</I>.
   26 A <I>SequenceData</I> is expected to be referenced by one or more <I>EntitySequence</I>
   27 instances.
   28  
   29 Initial <I>EntitySequence</I> and <I>SequenceData</I> pairs are typically created in one
   30 of two configurations. When reading from a file, a <I>SequenceData</I> will be created
   31 to represent all of a single type of entity contained in a file. As all entries in the <I>SequenceData</I> correspond to valid handles (entities read from the file) a single
   32 <I>EntitySequence</I> instance corresponding to the entire <I>SequenceData</I> is initially
   33 created. The second configuration arises when allocating a single entity. If no
   34 entities have been allocated yet, a new <I>SequenceData</I> must be created to store
   35 the entity data. It is created with a constant size (e.g. 4k entities). The new
   36 <I>EntitySequence</I> corresponds to only the first entity in the <I>SequenceData</I>: the
   37 one allocated entity. As subsequent entities are allocated, the <I>EntitySequence</I>
   38 is extended to cover more of the corresponding <I>SequenceData</I>.
   39  
   40 Concrete subclasses of the <I>EntitySequence</I> class are responsible for representing
   41 specific types of entities using the array storage provided by the
   42 <I>SequenceData</I> class. They also handle allocating <I>SequenceData</I> instances with
   43 appropriate arrays for storing a particular type of entity. Each concrete subclass
   44 typically provides two constructors corresponding to the two initial allocation
   45 configurations described in the previous paragraph. <I>EntitySequence</I> implementations
   46 also provide a split method, which is a type of factory method. It
   47 modifies the called sequence and creates a new sequence such that the range of
   48 entities represented by the original sequence is split.
   49  
   50 The <I>VertexSequence</I> class provides an <I>EntitySequence</I> for storing vertex
   51 data. It references a SequenceData containing three arrays of doubles
   52 for storing the blocked vertex coordinate data. The <I>ElementSequence</I> class
   53 extends the <I>EntitySequence</I> interface with element-specific functionality. The
   54 <I>UnstructuredElemSeq</I> class is the concrete implementation of <I>ElementSequence</I>
   55 used to represent unstructured elements, polygons, and polyhedra. <I>MeshSetSequence</I>
   56 is the <I>EntitySequence</I> used for storing entity sets.
   57  
   58 Each <I>EntitySequence</I> implementation also provides an implementation of
   59 the values per entity method. This value is used to determine if an existing
   60 <I>SequenceData</I> that has available entities is suitable for storing a particular
   61 entity. For example, <I>UnstructuredElemSeq</I> returns the number of nodes per element
   62 from values per entity. When allocating a new element with a specific
   63 number of nodes, this value is used to determine if that element may be stored
   64 in a specific <I>SequenceData</I>. For vertices, this value is always zero. This could
   65 be changed to the number of coordinates per vertex, allowing representation of
   66 mixed-dimension data. However, API changes would be required to utilize such
   67 a feature. Sequences for which the corresponding data cannot be used to store
   68 new entities (e.g. structured mesh discussed in a later section) will return -1 or
   69 some other invalid value.
   70  
   71  
   72   \section manager 2.TypeSequenceManager & SequenceManager
   73  
   74 The <I>TypeSequenceManager</I> class maintains an organized set of <I>EntitySequence</I>
   75 instances and corresponding <I>SequenceData</I> instances. It is used to manage
   76 all such instances for entities of a single <I>EntityType</I>. <I>TypeSequenceManager</I>
   77 enforces the following four rules on its contained data:
   78  
   79 -# No two <I>SequenceData</I> instances may overlap.  
   80 -# No two <I>EntitySequence</I> instances may overlap.
   81 -# Every <I>EntitySequence</I> must be a subset of a <I>SequenceData</I>.
   82 -# Any pair of <I>EntitySequence</I> instances referencing the same <I>SequenceData</I> must be separated by at least one unallocated handle.
   83 .
   84  
   85   \subsection figure2 Figure 2: SequenceManager and Related Classes
   86   \image html figures/figure2.jpg
   87  
   88   \ref dg-figures "List of Figures"
   89  
   90 The first three rules are required for the validity of the data model. The
   91 fourth rule avoids unnecessary inefficiency. It is implemented by merging such
   92 adjacent sequences. In some cases, other classes (e.g. <I>SequenceManager</I>) can
   93 modify an <I>EntitySequence</I> such that the fourth rule is violated. In such cases,
   94 the <I>TypeSequenceManager::notify</I> prepended or <I>TypeSequenceManager::notify</I> appended
   95 method must be called to maintain the integrity of the data<sup>1</sup>. The above rules
   96 (including the fourth) are assumed in many other methods of the <I>TypeSequenceManager</I>
   97 class, such that those methods will fail or behave unexpectedly if the managed
   98 data does not conform to the rules.
   99  
  100 <I>TypeSequenceManager</I> contains three principal data structures. The first is
  101 a <I>std::set</I> of <I>EntitySequence</I> pointers sorted using a custom comparison
  102 operator that queries the start and end handles of the referenced sequences. The
  103 comparison operation is defined as: <I>a->end_handle() < b->start_handle()</I>.
  104 This method of comparison has the advantage that a sequence corresponding to
  105 a specific handle can be located by searching the set for a “sequence” beginning
  106 and ending with the search value. The lower bound and find methods provided
  107 by the library are guaranteed to return the sequence, if it exists. Using
  108 such a comparison operator will result in undefined behavior if the set contains
  109 overlapping sequences. This is acceptable, as rule two above prohibits such
  110 a configuration. However, some care must be taken in writing and modifying
  111 methods in <I>TypeSequenceManager</I> so as to avoid having overlapping sequences
  112 as a transitory state of some operation.
  113  
  114 The second important data member of <I>TypeSequenceManager</I> is a pointer
  115 to the last referenced <I>EntitySequence</I>. This “cached” value is used to speed up
  116 searches by entity handle. This pointer is never null unless the sequence is empty.
  117 This rule is maintained to avoid unnecessary branches in fast query paths. In
  118 cases where the last referenced sequence is deleted, <I>TypeSequenceManager</I> will
  119 typically assign an arbitrary sequence (e.g. the first one) to the last referenced
  120 pointer. (Note: this cached value might give problems for threading models; it
  121 should probably be different for each thread)
  122  
  123 The third data member of <I>TypeSequenceManager</I> is a <I>std::set</I> of <I>SequenceData</I>
  124 instances that are not completely covered by a <I>EntitySequence</I> instance<sup>2</sup>.
  125 This list is searched when allocating new handles. <I>TypeSequenceManager</I> also
  126 embeds in each <I>SequenceData</I> instance a reference to the first corresponding
  127 <I>EntitySequence</I> so that it may be located quickly from only the <I>SequenceData</I>
  128 pointer.
  129  
  130 The <I>SequenceManager</I> class contains an array of <I>TypeSequenceManager</I> in-
  131 stances, one for each <I>EntityType</I>. It also provides all type-specific operations
  132 such as allocating the correct <I>EntitySequence</I> subtype for a given <I>EntityType</I>.
  133  
  134 <sup>1</sup>This source of potential error can be eliminated with changes to the entity set representation.
  135 This is discussed in a later section.
  136  
  137 <sup>2</sup>Given rule four for the data managed by a <I>TypeSequenceManager</I>, any
  138 <I>SequenceData</I> for which all handles are allocated will be referenced by exactly one <I>EntitySequence</I>.
  139  
  140  
  141  \section s-mesh 3.Structured Mesh
  142  
  143 Structured mesh storage is implemented using subclasses of <I>SequenceData</I>:
  144 <I>ScdElementData</I> and <I>ScdVertexData</I>. The <I>StructuredElementSeq</I> class is
  145 used to access the structured element connectivity. A standard <I>VertexSequence</I>
  146 instance is used to access the ScdVertexData because the vertex data storage
  147 is the same as for unstructured mesh.
  148  
  149  
  150   \section sets 4.Entity Sets
  151  
  152 - MeshSetSequence
  153  
  154 The <I>MeshSetSequence</I> class is the same as most other subclasses of <I>EntitySequence</I>
  155 in that it utilizes SequenceData to store its data. A single array in the <I>SequenceData</I>
  156 is used to store instances of the MeshSet class, one per allocated <I>EntityHandle</I>.
  157 <I>SequenceData</I> allocates all of its managed arrays using malloc and free as
  158 simple arrays of bytes. <I>MeshSetSequence</I> does in-place construction and
  159 destruction of <I>MeshSet</I> instances within that array. This is similar to what is
  160 done by <I>std::vector</I> and other container classes that may own more storage
  161 than is required at a given time for contained objects.
  162  
  163 - MeshSet
  164  
  165   \subsection figure3 Figure 3: SequenceManager and Related Classes
  166   \image html figures/figure3.jpg
  167  
  168   \ref dg-figures "List of Figures"
  169  
  170 The <I>MeshSet</I> class is used to represent a single entity set instance in MOAB.
  171 The class is optimized to minimize storage (further possible improvements in
  172 storage size are discussed later.)
  173  
  174 Figure 3 shows the memory layout of an instance of the <I>MeshSet</I> class.
  175 The flags member holds the set creation bit flags: <I>MESHSET_TRACK_OWNER</I>,
  176 <I>MESHSET_SET</I>, and <I>MESHSET_ORDERED</I>. The presence of the <I>MESHSET_TRACK_OWNER</I>
  177 indicates that reverse links from the contained entities back to the owning set
  178 should be maintained in the adjacency list of each entity. The <I>MESHSET_SET</I>
  179 and <I>MESHSET_ORDERED</I> bits are mutually exclusive, and as such most code only
  180 tests for the <I>MESHSET_ORDERED</I>, meaning that in practice the <I>MESHSET_SET</I> bit is
  181 ignored. <I>MESHSET_ORDERED</I> indicates that the set may contain duplicate handles
  182 and that the order that the handles are added to the set should be preserved.
  183 In practice, such sets are stored as a simple list of handles. <I>MESHSET_SET</I> (or in
  184 practice, the lack of <I>MESHSET_ORDERED</I>) indicates that the order of the handles
  185 need not be preserved and that the set may not contain duplicate handles. Such
  186 sets are stored in a sorted range-compacted format similar to that of the Range
  187 class.
  188  
  189 The memory for storing contents, parents, and children are each handled in
  190 the same way. The data in the class is composed of a 2-bit ‘size’ field and two
  191 values, where the two values may either be two handles or two pointers. The size
  192 bit-fields are grouped together to reduce the required amount of memory. If the
  193 numerical value of the 2-bit size field is 0 then the corresponding list is empty.
  194 If the 2-bit size field is either 1 or 2, then the contents of the corresponding list
  195 are stored directly in the corresponding two data fields of the MeshSet object.
  196 If the 2-bit size field has a value of 3 (11 binary), then the corresponding two
  197 data fields store the begin and end pointers of an external array of handles.
  198 The number of handles in the external array can be obtained by taking the
  199 difference of the start and end pointers. Note that unlike <I>std::vector</I>, we
  200 do not store both an allocated and used size. We store only the ‘used’ size
  201 and call std::realloc whenever the used size is modified, thus we rely on the
  202 std::malloc implementation in the standard C library to track ‘allocated’ size
  203 for us. In practice this performs well but does not return memory to the ‘system’
  204 when lists shrink (unless they shrink to zero). This overall scheme could exhibit
  205 poor performance if the size of one of the data lists in the set frequently changes
  206 between less than two and more than two handles, as this will result in frequent
  207 releasing and re-allocating of the memory for the corresponding array.
  208  
  209 If the <I>MESHSET_ORDERED</I> flag is not present, then the set contents list (parent
  210 and child lists are unaffected) is stored in a range-compacted format. In this
  211 format the number of handles stored in the array is always a multiple of two.
  212 Each consecutive pair of handles indicate the start and end, inclusive, of a range
  213 of handles contained in the set. All such handle range pairs are stored in sorted
  214 order and do not overlap. Nor is the end handle of one range ever one less than
  215 the start handle of the next. All such ‘adjacent’ range pairs are merged into a
  216 single pair. The code for insertion and removal of handles from range-formatted
  217 set content lists is fairly complex. The implementation will guarantee that a
  218 given call to insert entities into a range or remove entities from a range is never
  219 worse than O(ln n) + O(m + n), where ‘n’ is the number of handles to insert
  220 and ‘m’ is the number of handles already contained in the set. So it is generally
  221 much more efficient to build Ranges of handles to insert (and remove) and call
  222 MOAB to insert (or remove) the entire list at once rather than making many
  223 calls to insert (or remove) one or a few handles from the contents of a set.
  224 The set storage could probably be further minimized by allowing up to six
  225 handles in one of the lists to be elided. That is, as there are six potential ‘slots’
  226 in the MeshSet object then if two of the lists are empty it should be possible to
  227 store up to six values of the remaining list directly in the MeshSet object.
  228 However, the additional runtime cost of such complexity could easily outweigh
  229 any storage advantage. Further investigation into this has not been done because
  230 the primary motivation for the storage optimization was to support binary trees.
  231  
  232 Another possible optimization of storage would be to remove the <I>MeshSet</I>
  233 object entirely and instead store the data in a ‘blocked’ format. The corresponding
  234 <I>SequenceData</I> would contain four arrays: flags, parents, children, and
  235 contents instead of a single array of <I>MeshSet</I> objects. If this were done then
  236 no storage need ever be allocated for parent or child links if none of the sets
  237 in a <I>SequenceData</I> has parent or child links. The effectiveness of the storage
  238 reduction would depend greatly on how sets get grouped into <I>SequenceDatas</I>.
  239 This alternate storage scheme might also allow for better cache utilization as it
  240 would group like data together. It is often the case that application code that
  241 is querying the contents of one set will query the contents of many but never
  242 query the parents or children of any set. Or that an application will query only
  243 parent or child links of a set without every querying other set properties. The
  244 downside of this solution is that it makes the implementation a little less mod-
  245 ular and maintainable because the existing logic contained in the <I>MeshSet</I> class
  246 would need to be spread throughout the <I>MeshSetSequence</I> class.
  247  
  248  \section impl-error-handling 5.Implementation of Error Handling
  249  
  250 When a certain error occurs, a MOAB routine can return an enum type ErrorCode (defined in src/moab/Types.hpp)
  251 to its callers. Since MOAB 4.8, the existing error handling model has been completely redesigned to better set
  252 and check errors.
  253  
  254  \subsection dgfiveone 5.1. Existing Error Handling Model
  255  
  256 To keep track of detail information about errors, a class Error (defined in src/moab/Error.hpp) is used to
  257 store corresponding error messages. Some locally defined macros call Error::set_last_error() to report a new
  258 error message, before a non-success error code is returned. At any time, user may call Core::get_last_error()
  259 to retrieve the latest error message from the Error class instance held by MOAB.
  260  
  261 Limitations:
  262 - The Error class can only store the last error message that is being set. When an error originates from a lower
  263 level in a call hierarchy, upper level callers might continue to report more error messages. As a result, previously
  264 reported error messages get overwritten and they will no longer be available to the user.
  265 - There is no useful stack trace for the user to find out where an error first occurs, making MOAB difficult to debug.
  266  
  267  \subsection dgfivetwo 5.2. Enhanced Error Handling Model
  268  
  269 The error handling model of PETSc (http://www.mcs.anl.gov/petsc/) has nice support for stack trace, and our design has
  270 borrowed many ideas from that. The new features of the enhanced model include:
  271 - Lightweight and easy to use with a macro-based interface
  272 - Generate a stack trace starting from the first non-success error
  273 - Support C++ style streams to build up error messages rather than C style sprintf:
  274 \code
  275 MB_SET_ERR(MB_FAILURE, "Failed to iterate over tag on " << NUM_VTX << " vertices");
  276 \endcode
  277 - Have preprocessor-like functionality such that we can do something like:
  278 \code
  279 result = MOABRoutine(...);MB_CHK_SET_ERR(result, "Error message to set if result is not MB_SUCCESS");
  280 \endcode
  281 - Define and handle globally fatal errors or per-processor specific errors.
  282  
  283 The public include file for error handling is src/moab/ErrorHandler.hpp, the source code for the error
  284 handling is in src/ErrorHandler.cpp.
  285  
  286 \subsection dgfivethree 5.3. Error Handler
  287  
  288 The error handling function MBError() only calls one default error handler, MBTraceBackErrorHandler(), which tries to print
  289 out a stack trace. In the future, we need to provide a callback function to user routine before a complete abort. Something
  290 like a UserTeardown that is a function pointer with a context so that the user can destroy and free essential handles before
  291 an MPI abort.
  292  
  293 The arguments to MBTraceBackErrorHandler() are the line number where the error occurred, the function where error was detected,
  294 the file in which the error was detected, the source directory, the error message, and the error type indicating whether the
  295 error message should be printed.
  296 \code
  297 ErrorCode MBTraceBackErrorHandler(int line, const char* func, const char* file, const char* dir, const char* err_msg, ErrorType err_type);
  298 \endcode
  299 This handler will print out a line of stack trace, indicating line number, function name, directory and file name. If MB_ERROR_TYPE_EXISTING
  300 is passed as the error type, the error message will not be printed.
  301  
  302 \subsection dgfivefour 5.4. Simplified Interface
  303  
  304 The simplified C/C++ macro-based interface consists of the following three basic macros:
  305 \code
  306 MB_SET_ERR(Error code, "Error message");
  307 MB_CHK_ERR(Error code);
  308 MB_CHK_SET_ERR(Error code, "Error message");
  309 \endcode
  310  
  311 The macro MB_SET_ERR(err_code, err_msg) is given by
  312 \code
  313 std::ostringstream err_ostr;
  314 err_ostr << err_msg;
  315 return MBError(__LINE__, __func__, __FILENAME__, __SDIR__, err_code, err_ostr.str().c_str(), MB_ERROR_TYPE_NEW_LOCAL);
  316 \endcode
  317 It calls the error handler with the current function name and location: line number, file and directory, plus an error code,
  318 an error message and an error type. With an embedded std::ostringstream object, it supports C++ style streams to build up error
  319 messages. The error type is MB_ERROR_TYPE_NEW_LOCAL/MB_ERROR_TYPE_NEW_GLOBAL on detection of the initial error and MB_ERROR_TYPE_EXISTING
  320 for any additional calls. This is so that the detailed error information is only printed once instead of for all levels of returned errors.
  321  
  322 The macro MB_CHK_ERR(err_code) is defined by
  323 \code
  324 if (MB_SUCCESS != err_code)
  325   return MBError(__LINE__, __func__, __FILENAME__, __SDIR__, err_code, "", MB_ERROR_TYPE_EXISTING);
  326 \endcode
  327 It checks the error code, if not MB_SUCCESS, calls the error handler with error type MB_ERROR_TYPE_EXISTING and return.
  328  
  329 The MB_CHK_SET_ERR(err_code, err_msg) is defined by
  330 \code
  331 if (MB_SUCCESS != err_code)
  332   MB_SET_ERR(err_code, err_msg);
  333 \endcode
  334 It checks the error code, if not MB_SUCCESS, calls MB_SET_ERR() to set a new error.
  335  
  336 To obtain correct line numbers in the stack trace, we recommend putting MB_CHK_ERR() and MB_CHK_SET_ERR() at the same line as a MOAB routine.
  337  
  338 In addition to the basic macros mentioned above, there are some variations, such as (for more information, refer to src/moab/ErrorHandler.hpp):
  339 - MB_SET_GLB_ERR() to set a globally fatal error (for all processors)
  340 - MB_SET_ERR_RET() for functions that return void type
  341 - MB_SET_ERR_RET_VAL() for functions that return any data type
  342 - MB_SET_ERR_CONT() to continue execution instead of returning from current function
  343 These macros should be used appropriately in MOAB source code depending on the context.
  344  
  345 \subsection dgfivefive 5.5. Embedded Parallel Functionality
  346  
  347 We define a global MPI rank with which to prefix the output, as most systems have mechanisms for separating output by rank anyway.
  348 For the error handler, we can pass error type MB_ERROR_TYPE_NEW_GLOBAL for globally fatal errors and MB_ERROR_TYPE_NEW_LOCAL for
  349 per-processor relevant errors.
  350  
  351 Note, if the error handler uses std::cout to print error messages and stack traces in each processor, it can result in a messy output.
  352 This is a known MPI issue with std::cout, and existing DebugOutput class has solved this issue with buffered lines. A new class
  353 ErrorOutput (implemented similar to DebugOutput) is used by the error handler to print each line prefixed with the MPI rank.
  354  
  355 \subsection dgfivesix 5.6. Handle Non-error Conditions
  356  
  357 We should notice that sometimes ErrorCode is used to return a non-error condition (some internal error code that can be handled, or even expected,
  358 e.g. MB_TAG_NOT_FOUND). Therefore, MB_SET_ERR() should be appropriately placed to report an error to the the caller. Before it is used, we need to
  359 carefully decide whether that error is intentional. For example, a lower level MOAB routine that could return MB_TAG_NOT_FOUND should probably not
  360 set an error on it, since the caller might expect to get that error code. In this case, the lower level routine just return MB_TAG_NOT_FOUND as a
  361 condition, and no error is being set. It is then up to the upper level callers to decide whether it should be a true error or not.
  362  
  363 */
  364