stefanozampini/fix-mumps-schur-reset vs origin/releaseTwo files changed: src/mat/impls/aij/mpi/mumps/impl/imumps.c (Schur/redrhs memory ownership rework) and src/snes/tutorials/makefile (print the failing .tmp on diff mismatch). I read the changed hunks plus the surrounding ownership model (MatMumpsMakeMumpsScalarArray, FreeInternalIDFields, the outer/inner alias macros, the public MatFactorSetSchurIS wrapper, and all MatMumpsResetSchur_Private/MatMumpsEnsureSchurArray_Private call sites).
The overall direction is sound and several latent issues are genuinely improved (e.g. size*size now computed in PetscCount instead of MUMPS_INT, removing an overflow risk; redrhs/schur freeing moved out of FreeInternalIDFields and centralized; Mat-based reset signature). Findings below.
id.schur cleanup in MatMumpsResetSchur_Private() is dead on the re-MatFactorSetSchurIS() path, leaving id.schur stale (UAF / overflow)src/mat/impls/aij/mpi/mumps/impl/imumps.c:681-689
The new design makes MatMumpsEnsureSchurArray_Private() (imumps.c:653) allocate/alias id.schur only when it is NULL:
if (F->schur && !mumps->id.schur) { ... allocate/alias ... }
So correctness now depends on MatMumpsResetSchur_Private() nulling id.schur whenever the Schur matrix is replaced. But the schur-cleanup block is guarded by if (F->schur):
if (F->schur) {
PetscCall(MatDenseGetArrayRead(F->schur, &array));
if (array != mumps->id.schur) PetscCall(PetscFree(mumps->id.schur));
else mumps->id.schur = NULL;
PetscCall(MatDenseRestoreArrayRead(F->schur, &array));
}
PetscCall(MatDestroy(&F->schur));
The public wrapper MatFactorSetSchurIS() (src/mat/interface/matrix.c:9837) destroys F->schur before calling the implementation:
PetscCall(MatDestroy(&mat->schur)); // F->schur is now NULL
PetscCall((*f)(mat, is)); // MatFactorSetSchurIS_MUMPS -> reset(F)
So when MatFactorSetSchurIS_MUMPS() calls MatMumpsResetSchur_Private(F) at imumps.c:3296, F->schur is already NULL, the if (F->schur) block is skipped, and id.schur is left untouched (only id.schur_len is zeroed at line 675).
Consequence, on a second MatFactorSetSchurIS() against an already-factored F (the wrapper explicitly supports re-setting — that's why it destroys the old schur first):
id.schur aliased the now-destroyed dense array → dangling. EnsureSchurArray sees id.schur != NULL → skips → MUMPS writes the Schur block into freed memory during the next factorization.id.schur buffer leaks, and since id.schur_len was zeroed but id.schur kept, EnsureSchurArray skips re-allocation → the old (possibly smaller) buffer is reused against the new size_schur → heap overflow.This is the exact memory correctness the MR set out to fix. The old code sidestepped it by re-binding id.schur unconditionally inside MatFactorNumeric_MUMPS() on every factor (the block this MR deleted at the @@ -2516 @@ hunk), so the alias was always refreshed.
Reachability caveat: no in-tree caller calls MatFactorSetSchurIS() twice on the same F (bddc/hpddm/ex192/ex154 all create a fresh factor and set the IS once), so this is latent — but it is reachable through the documented public API.
Suggested fix — make schur cleanup independent of F->schur, using the ownership signal that already exists (schur_len > 0 ⇒ separately owned; == 0 ⇒ alias), mirroring the unconditional redrhs handling just above:
if (mumps->id.schur_len) PetscCall(PetscFree(mumps->id.schur)); /* owned (mixed precision) */
mumps->id.schur = NULL; /* alias case: just drop it */
mumps->id.schur_len = 0;
...
PetscCall(MatDestroy(&F->schur));
This nulls id.schur even when F->schur was pre-destroyed by the wrapper, so EnsureSchurArray re-aliases correctly on the next factorization.
src/mat/impls/aij/mpi/mumps/impl/imumps.c:1816
This MR adds INFO(2) to the termination check but omits the comma the same MR added to every other touched message:
"MUMPS error in termination: INFOG(1)=%d INFO(2)=%d " MUMPS_MANUALS // here (no comma)
"MUMPS error in solve: INFOG(1)=%d, INFO(2)=%d " MUMPS_MANUALS // everywhere else this MR touched
Minor; align to INFOG(1)=%d, INFO(2)=%d for consistency with the lines added at imumps.c:168/200/228 etc.
src/snes/tutorials/makefile: dropping "diffs above" and cat-ing ex19.tmp/ex3k.tmp on failure is sensible given the diff output is no longer echoed. All ten recipes updated consistently; the space-indented inner block in runex3k_kokkos is pre-existing and harmless (it's inside a backslash-joined shell command, not a recipe-leading tab).
No issues found in the EnsureSchurArray placement (precision is known by the time symbolic runs MatSetFromOptions_MUMPS() before EnsureSchurArray), the destroy ordering (reset(F) runs while F->schur is still alive in MatDestroy_MUMPS, before the generic MatDestroy(&(*A)->schur)), the redrhs cleanup (unconditional, alias-safe), or the typo/grammar fixes.
(2 LOW findings suppressed; ask to show them.)