

From leslie.baksmaty@physics.gatech.edu Sun Sep 24 13:58:25 2006
Date: Wed, 20 Sep 2006 23:37:42 -0400
From: leslie O. Baksmaty <leslie.baksmaty@physics.gatech.edu>
To: petsc-maint <petsc-maint@mcs.anl.gov>
Cc: petsc-maint@mcs.anl.gov
Subject: Re: [PETSC #15344] Problems with LU factorization

Hi,
    Before getting you into the mud I did some further research into the
problem and this is what I found.

a) For a given Matrix vector pair (H,b), the KSPSolve doesn't fail consistently
as the number of processors involved in the computation varies. I noticed that
even though the KSPSolve just hanged when I was using 32 processors for a given
(H,b) It yielded results when I resubmitted the same solve using 4 processors.

b) This is the case for Superlu_dist as well as Mumps.

I suspect there may be a problem with our MPI installation but I will wait
patiently for you verdict.


To perform the tests, I  first just dumped the pair (H,b) into binary format as
you suggested then reread them with a simple program which just does one solve.
I have attached this program for your convenience. However H and b are too large
to send by mail. what is the best way to send them to you ?



--
Leslie O. Baksmaty, Ph.D
Center for computational materials science,
School of Physics,
Georgia Institute of Techonology

404-385-7189



Quoting Hong Zhang <petsc-maint@mcs.anl.gov>:

>
> leslie,
>
> Is possible to write your matrix (H-sigma*M) and vector b into
> a file in petsc binary format
> (see ~petsc/src/mat/examples/tests/ex31.c as an example)
> and send it to us?
>
> Then I can play with it to see what is going on.
> The hang occurs with one process or np>1?
>
> Hong
>
> On Mon, 18 Sep 2006, leslie O. Baksmaty wrote:
>
> > > Do you solve eigenvalue problem with shif-and-invert scheme?
> > > How this LU factorization relates to your eigenvalue problem?
> > > How large is the matrix size as it start to choke
> > > ("choke"? do you mean the factored matrix becomes too large for storage
> or
> > > something else)?
> > > How do you know the factorization step didn't not complete?
> > > I don't see from below any error message about the crash of
> > > factorization.
> >
> > Yes, I solve with a shift invert scheme. The eigensovlve relies on being
> able to
> > do two operations: a) Matrix-vector multiplications
> >                    b) solving (H-sigma*M)x=b for x; sigma is shift
> parameter and
> > H and M have the structure, I previously described.
> >
> > Petsc provides these two routines, natively for a) and through Mumps or
> > superlu_dist for b).
> > In other words I am using the Petsc Mumps/superlu_dist interface. When
> using
> > mumps, one can set  "-mat_mumps_icntl_4 <0,1,2,3,4> 	- print level" and I
> am
> > using level 2.
> >
> > On the question of whether, the program crashes or not,
> > in the case of MUMPS, the program doesn't crash it just sits there for as
> long
> > as my patience has tested ( 1 day !).  Which I think is long enough to
> conclude
> > that something is wrong because for comparable matrices and number of
> > processors, a successful factorization step takes about 2 seconds !. Since
> I am
> > constantly using a triangular mesh the sparsity patterns are the same.
> >
> > So there is no error message. However if you set the "print messages" level
> to 2
> > one can observe, by comparing to a successful run,  that the program pauses
> > indefinitely at the factorization stage. For comparision to the output in
> my
> > first correspondence, I am providing a printout for a succesfull
> calculation:
> >
> > - ****** ANALYSIS STEP ********
> >
> >  ** Max-trans not allowed because matrix is distributed
> >  ** Scaling not allowed because matrix is distributed
> >  ... Structural symmetry (in percent)=  100
> >  Density: NBdense, Average, Median   =    0   13   13
> >  Ordering based on QAMD
> >  ** Peak of sequential stack size (number of real entries)   :  3050072.
> >  A root of estimated size         1086  has been selected for Scalapack.
> >
> > Leaving analysis phase with  ...
> > INFOG(1)                                       =           0
> > INFOG(2)                                       =           0
> >  -- (20) Number of entries in factors (estim.) =    24247620
> >  --  (3) Storage of factors  (REAL, estimated) =    28673347
> >  --  (4) Storage of factors  (INT , estimated) =     1876810
> >  --  (5) Maximum frontal size      (estimated) =        1268
> >  --  (6) Number of nodes in the tree           =       29420
> >  --  (7) Ordering option effectively used      =           6
> > ICNTL(6) Maximum transversal option            =           0
> > ICNTL(7) Pivot order option                    =           7
> > Percentage of memory relaxation (effective)    =          40
> > Number of level 2 nodes                        =          11
> > Number of nodes cut for better parallelism     =           4
> > RINFO(1) Operations during elimination (estim) =   9.172D+09
> >  ** Rank of processor needing largest memory in facto        :         4
> >  ** Estimated corresponding space in MBYTES for facto        :       241
> >  ** Estimated avg. space in MBYTES per working proc at facto :       219
> >  ** TOTAL     space in MBYTES for factorization              :      3504
> >
> >  ****** FACTORIZATION STEP ********
> >
> >
> >  GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
> >  NUMBER OF WORKING PROCESSES          =          16
> >  REAL SPACE FOR FACTORS               =    28673347
> >  INTEGER SPACE FOR FACTORS            =     1876810
> >  MAXIMUM FRONTAL SIZE (ESTIMATED)     =        1268
> >  NUMBER OF NODES IN THE TREE          =       29420
> >  MAXIMUM RELAXED VALUE OF MAXS        =     6619724
> >  AVERAGE RELAXED VALUE OF MAXS        =     5363007
> >
> >  REDISTRIB: TOTAL DATA LOCAL/SENT     =       53344     2080391
> >  GLOBAL TIME FOR MATRIX DISTRIBUTION  =      0.0698
> >  ** Memory relaxation parameter ( ICNTL(14)  )            :        40
> >  ** Rank of processor needing largest memory in facto     :         4
> >  ** Space in MBYTES used by this processor for facto      :       241
> >  ** Avg. Space in MBYTES per working proc during facto    :       219
> >
> >  ELAPSED TIME FOR FACTORIZATION       =      1.9750
> >  MAXIMUM EFF. SPACE USED IN S(MAXS)   =     2980636
> >  AVERAGE EFF. SPACE USED IN S(MAXS)   =     1985275
> >  ** EFF Min: Rank of processor needing largest memory :         4
> >  ** EFF Min: Space in MBYTES used by this processor   :       190
> >  ** EFF Min: Avg. Space in MBYTES per working proc    :       172
> >
> >  GLOBAL STATISTICS
> >  RINFOG(2)  OPERATIONS DURING NODE ASSEMBLY     = 4.945D+07
> >  ------(3)  OPERATIONS DURING NODE ELIMINATION  = 9.172D+09
> >  INFOG (9)  REAL SPACE FOR FACTORS              =    24247620
> >  INFOG(10)  INTEGER SPACE FOR FACTORS           =     1846776
> >  INFOG(11)  MAXIMUM FRONT SIZE                  =        1268
> >  INFOG(13)  NUMBER OF OFF DIAGONAL PIVOTS =           0
> >  INFOG(12)  NUMBER OF DELAYED PIVOTS      =           0
> >  INFOG(25)  NUMBER OF TINY PIVOTS         =           0
> >  INFOG(14)  NUMBER OF MEMORY COMPRESS           =           0
> >
> >  ****** SOLVE & CHECK STEP ********
> >
> >
> >  STATISTICS PRIOR SOLVE PHASE     ...........
> >  NUMBER OF RIGHT-HAND-SIDES                    =           1
> >  BLOCKING FACTOR FOR MULTIPLE RHS              =           1
> >
> >
> > etc.....
> >
> > >
> > > Petsc and Mumps do not have the support to ignore small pivots.
> > >
> > > >
> > > > MUMPs printout:
> > > >
> > > >  ZMUMPS Version 4.5.5 -- October 2005
> > > > L U Solver for unsymmetric matrices
> > > > Type of parallelism: Working host
> > >
> > > The latest mumps version is 4.6.3 June 2006, which is
> > > compatible with petsc2.3.2, released  September, 1, 2006.
> > >
> > > Since the problem also crashes/chokes with superlu_dist,
> > > you need to investegate your matrix propertities before
> > > calling a solver.
> >
> >
> > The petsc interface to mumps seems to have access to pivot control. These
> lines
> > have been culled from the petsc man pages for mataijmumps:
> >
> >         -mat_mumps_cntl_1 <delta> 	- relative pivoting threshold
> > 	-mat_mumps_cntl_2 <tol> 	- stopping criterion for refinement
> > 	-mat_mumps_cntl_3 <adelta> 	- absolute pivoting threshold
> >
> > However from the first lines of the mumps printout of the "ANALYSIS PHASE"
> above
> > and I quote:
> > "
> >  ****** ANALYSIS STEP ********
> >
> >  ** Max-trans not allowed because matrix is distributed
> >  ** Scaling not allowed because matrix is distributed "
> >
> > It seems that this is not enabled within mumps itself ! and so whatever
> settings
> > I make with the flags above are bound to be ignored. I guess this goes to
> the
> > heart of my concern.
> >
> > Thanks a lot for your attention.
> > -
> > Leslie O. Baksmaty, Ph.D
> > Center for computational materials science,
> > School of Physics,
> > Georgia Institute of Techonology
> >
> > 404-385-7189
> >
> >
> >
> > Quoting Hong Zhang <petsc-maint@mcs.anl.gov>:
> >
> > >
> > > leslie,
> > >
> > > > I am solving a complex-valued generalized eigenvalue problem: Hx=eMx
> with
> > > SLEPc.
> > > > Where x and e are the eigen-vector and eigenvalue respectively.
> > > >
> > > > H is non-symmetric and non-positive and has the form:   A  -B
> > > >                                                         B* -A
> > > >
> > > > and M has the form:                                     I    0
> > > >                                                         0    I
> > > >
> > > >
> > > > This is a quantum mechanics problem and the boundary conditions are set
> at
> > > > infinity e.g. value of of the eigenvectors goes to zero on the
> boundaries
> > > of the
> > > > domain iff it is large enough and HEREIN lies the problem: As I
> increase
> > > the
> > > > size of the domain the
> > > > LU factorization begins to choke. Please see below for a printout from
> > > MUMPS.
> > > > SuperLU_DIST also fails.
> > >
> > > Do you solve eigenvalue problem with shif-and-invert scheme?
> > > How this LU factorization relates to your eigenvalue problem?
> > > How large is the matrix size as it start to choke
> > > ("choke"? do you mean the factored matrix becomes too large for storage
> or
> > > something else)?
> > > How do you know the factorization step didn't not complete?
> > > I don't see from below any error message about the crash of
> > > factorization.
> > >
> > > >  My suspicion is that INCREASING THE DOMAIN adds a large number of very
> > > small
> > > > pivots to the LU factorization. In some instances this problem also
> occurs
> > > when
> > > > the domain size remains the same but the mesh is made finer.
> > > >     I was particularly alarmed to read the ff. statement from the MUMPS
> > > printout
> > > > (below):
> > > > "
> > > >  ** Max-trans not allowed because matrix is distributed
> > > >  ** Scaling not allowed because matrix is distributed "
> > >
> > > Are you using petsc-mumps interface for the computation?
> > > Where these statement comes from?
> > > >
> > > > because I suppose proper scaling of the input matrices could fix this
> > > problem.
> > > > I suspect there is a way to ignore pivots that are too small in the LU
> > > > factorization ?
> > >
> > > Petsc and Mumps do not have the support to ignore small pivots.
> > >
> > > >
> > > > MUMPs printout:
> > > >
> > > >  ZMUMPS Version 4.5.5 -- October 2005
> > > > L U Solver for unsymmetric matrices
> > > > Type of parallelism: Working host
> > >
> > > The latest mumps version is 4.6.3 June 2006, which is
> > > compatible with petsc2.3.2, released  September, 1, 2006.
> > >
> > > Since the problem also crashes/chokes with superlu_dist,
> > > you need to investegate your matrix propertities before
> > > calling a solver.
> > >
> > > Hong
> > >
> > >
> > > >
> > > >  ****** ANALYSIS STEP ********
> > > >
> > > >  ** Max-trans not allowed because matrix is distributed
> > > >  ** Scaling not allowed because matrix is distributed
> > > >  ... Structural symmetry (in percent)=  100
> > > >  Density: NBdense, Average, Median   =    0   13   13
> > > >  Ordering based on QAMD
> > > >  ** Peak of sequential stack size (number of real entries)   :
> 4603848.
> > > >  A root of estimated size         1372  has been selected for
> Scalapack.
> > > >
> > > > Leaving analysis phase with  ...
> > > > INFOG(1)                                       =           0
> > > > INFOG(2)                                       =           0
> > > >  -- (20) Number of entries in factors (estim.) =    40473004
> > > >  --  (3) Storage of factors  (REAL, estimated) =    44840522
> > > >  --  (4) Storage of factors  (INT , estimated) =     2921040
> > > >  --  (5) Maximum frontal size      (estimated) =        1574
> > > >  --  (6) Number of nodes in the tree           =       45752
> > > >  --  (7) Ordering option effectively used      =           6
> > > > ICNTL(6) Maximum transversal option            =           0
> > > > ICNTL(7) Pivot order option                    =           7
> > > > Percentage of memory relaxation (effective)    =          40
> > > > Number of level 2 nodes                        =          17
> > > > Number of nodes cut for better parallelism     =           3
> > > > RINFO(1) Operations during elimination (estim) =   1.830D+10
> > > >  ** Rank of processor needing largest memory in facto        :
> 6
> > > >  ** Estimated corresponding space in MBYTES for facto        :
> 281
> > > >  ** Estimated avg. space in MBYTES per working proc at facto :
> 247
> > > >  ** TOTAL     space in MBYTES for factorization              :
> 3964
> > > >
> > > >  ****** FACTORIZATION STEP ********
> > > >
> > > >
> > > >  GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
> > > >  NUMBER OF WORKING PROCESSES          =          16
> > > >  REAL SPACE FOR FACTORS               =    44840522
> > > >  INTEGER SPACE FOR FACTORS            =     2921040
> > > >  MAXIMUM FRONTAL SIZE (ESTIMATED)     =        1574
> > > >  NUMBER OF NODES IN THE TREE          =       45752
> > > >  MAXIMUM RELAXED VALUE OF MAXS        =     8805205
> > > >  AVERAGE RELAXED VALUE OF MAXS        =     6878605
> > > >
> > > >  REDISTRIB: TOTAL DATA LOCAL/SENT     =      367119     2986670
> > > >  GLOBAL TIME FOR MATRIX DISTRIBUTION  =      0.1078
> > > >  ** Memory relaxation parameter ( ICNTL(14)  )            :        40
> > > >  ** Rank of processor needing largest memory in facto     :         6
> > > >  ** Space in MBYTES used by this processor for facto      :       281
> > > >  ** Avg. Space in MBYTES per working proc during facto    :       247
> > > >
> > > >
> > > > Obviously the factorization step didn't not complete
> > > >
> > > >
> > > > Miscellaneous:
> > > > The matrix is generated using finite elements from TRIANGLE
> > > > and ordered using metis.
> > > >
> > > > --
> > > > Leslie O. Baksmaty, Ph.D
> > > > Center for computational materials science,
> > > > School of Physics,
> > > > Georgia Institute of Techonology
> > > >
> > > > 404-385-7189
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
>

    [ Part 2, Text/X-FORTRAN (Name: "ksp_read_and_solve.F")  41 lines. ]
    [ Unable to print this part. ]

