Review: stefanozampini/fix-mumps-schur-reset vs origin/release

DEST: origin/release (merge-base of origin/main…HEAD is an ancestor of origin/release).

Two files changed: src/mat/impls/aij/mpi/mumps/impl/imumps.c (Schur/redrhs memory ownership rework) and src/snes/tutorials/makefile (print the failing .tmp on diff mismatch). I read the changed hunks plus the surrounding ownership model (MatMumpsMakeMumpsScalarArray, FreeInternalIDFields, the outer/inner alias macros, the public MatFactorSetSchurIS wrapper, and all MatMumpsResetSchur_Private/MatMumpsEnsureSchurArray_Private call sites).

The overall direction is sound and several latent issues are genuinely improved (e.g. size*size now computed in PetscCount instead of MUMPS_INT, removing an overflow risk; redrhs/schur freeing moved out of FreeInternalIDFields and centralized; Mat-based reset signature). Findings below.


MEDIUMid.schur cleanup in MatMumpsResetSchur_Private() is dead on the re-MatFactorSetSchurIS() path, leaving id.schur stale (UAF / overflow)

src/mat/impls/aij/mpi/mumps/impl/imumps.c:681-689

The new design makes MatMumpsEnsureSchurArray_Private() (imumps.c:653) allocate/alias id.schur only when it is NULL:

if (F->schur && !mumps->id.schur) { ... allocate/alias ... }

So correctness now depends on MatMumpsResetSchur_Private() nulling id.schur whenever the Schur matrix is replaced. But the schur-cleanup block is guarded by if (F->schur):

if (F->schur) {
  PetscCall(MatDenseGetArrayRead(F->schur, &array));
  if (array != mumps->id.schur) PetscCall(PetscFree(mumps->id.schur));
  else mumps->id.schur = NULL;
  PetscCall(MatDenseRestoreArrayRead(F->schur, &array));
}
PetscCall(MatDestroy(&F->schur));

The public wrapper MatFactorSetSchurIS() (src/mat/interface/matrix.c:9837) destroys F->schur before calling the implementation:

PetscCall(MatDestroy(&mat->schur));     // F->schur is now NULL
PetscCall((*f)(mat, is));               // MatFactorSetSchurIS_MUMPS -> reset(F)

So when MatFactorSetSchurIS_MUMPS() calls MatMumpsResetSchur_Private(F) at imumps.c:3296, F->schur is already NULL, the if (F->schur) block is skipped, and id.schur is left untouched (only id.schur_len is zeroed at line 675).

Consequence, on a second MatFactorSetSchurIS() against an already-factored F (the wrapper explicitly supports re-setting — that's why it destroys the old schur first):

This is the exact memory correctness the MR set out to fix. The old code sidestepped it by re-binding id.schur unconditionally inside MatFactorNumeric_MUMPS() on every factor (the block this MR deleted at the @@ -2516 @@ hunk), so the alias was always refreshed.

Reachability caveat: no in-tree caller calls MatFactorSetSchurIS() twice on the same F (bddc/hpddm/ex192/ex154 all create a fresh factor and set the IS once), so this is latent — but it is reachable through the documented public API.

Suggested fix — make schur cleanup independent of F->schur, using the ownership signal that already exists (schur_len > 0 ⇒ separately owned; == 0 ⇒ alias), mirroring the unconditional redrhs handling just above:

if (mumps->id.schur_len) PetscCall(PetscFree(mumps->id.schur)); /* owned (mixed precision) */
mumps->id.schur     = NULL;                                     /* alias case: just drop it */
mumps->id.schur_len = 0;
...
PetscCall(MatDestroy(&F->schur));

This nulls id.schur even when F->schur was pre-destroyed by the wrapper, so EnsureSchurArray re-aliases correctly on the next factorization.


STYLEInconsistent error-message punctuation introduced by this MR

src/mat/impls/aij/mpi/mumps/impl/imumps.c:1816

This MR adds INFO(2) to the termination check but omits the comma the same MR added to every other touched message:

"MUMPS error in termination: INFOG(1)=%d INFO(2)=%d " MUMPS_MANUALS   // here (no comma)
"MUMPS error in solve: INFOG(1)=%d, INFO(2)=%d " MUMPS_MANUALS        // everywhere else this MR touched

Minor; align to INFOG(1)=%d, INFO(2)=%d for consistency with the lines added at imumps.c:168/200/228 etc.


OKMakefile change — looks correct

src/snes/tutorials/makefile: dropping "diffs above" and cat-ing ex19.tmp/ex3k.tmp on failure is sensible given the diff output is no longer echoed. All ten recipes updated consistently; the space-indented inner block in runex3k_kokkos is pre-existing and harmless (it's inside a backslash-joined shell command, not a recipe-leading tab).


No issues found in the EnsureSchurArray placement (precision is known by the time symbolic runs MatSetFromOptions_MUMPS() before EnsureSchurArray), the destroy ordering (reset(F) runs while F->schur is still alive in MatDestroy_MUMPS, before the generic MatDestroy(&(*A)->schur)), the redrhs cleanup (unconditional, alias-safe), or the typo/grammar fixes.

(2 LOW findings suppressed; ask to show them.)