mechanical analysis on large domains (>200k nodes)

Numerical methods and mathematical models of Elmer
Post Reply
msui
Posts: 9
Joined: 15 Feb 2011, 13:59

mechanical analysis on large domains (>200k nodes)

Post by msui »

We are small technology start-up company and use with great satisfaction ElmerFEM for our mechanical analysis. We need to perform modal analysis on a large (> 200k nodes) domain.
From our experience with Elmer for smaller domains (~10k - 100k nodes), the direct solver Umfpack handles this type of simulation perfectly.

However on the large domain during solving with equal settings, Umfpack terminates with an error message. Alternatively, we have tried iterative solvers with ILU(n) preconditioning but the memory consumption was very high and computation time very long.

1. Question now for the iterative solver settings is which type preconditioning settings are suitable for getting a result.

2. Also this preconditioning is a black box to us: Where can we find explanation or who can we ask to explain the working principles?

3. Preferably we would like to stick with direct solvers for reliable simulation performance. However Umfpack fails on the larger sized domains. In some comments in this user forum it is suggested to use MUMPS, SuperLU, Pardiso instead. Could anybody help us to find clues how to compile these alternative solvers?

many thanks in advance and regards,

Matthijs Suijlen
Juha
Site Admin
Posts: 357
Joined: 21 Aug 2009, 15:11

Re: mechanical analysis on large domains (>200k nodes)

Post by Juha »

Hi,

Direct solvers use a lot of memory, and potentially a lot of computation time. That said, you
have a few options:

o Compile & install SuiteSparse's CHOLMOD package, and compile &
install ElmerSolver with the CHOLMOD library available. The Cholesky factorization
implemented by CHOLMOD is quite a bit faster and less memory hungry
than the LU factorization of Umfpack (also part of SuiteSparse) for postive definite
systems. If you'd like to go here i can provide some installation instructions.

o go parallel and use Mumps. Unless you are running the Ubuntu distribution of Elmer
you'd need to compile & install ElmerSolver from sources; and have the Mumps library
available the the compile time.
http://www.elmerfem.org/wiki/index.php/ ... on_(Linux)
has some installation instructions. Not sure how uptodate these are.

o go parallel, and try out the (still somewhat experimental) FETI solver. The CHOLMOD
is usable also here, and will speed things up considerably. Propably the fastest solver
available for pos.def. systems, but only usable in parallel (at least for the time being).

For all (structurally) symmetric systems remember to set the "Linear System Symmetric=True"
in the solver section of the .sif file.

Regards, Juha
msui
Posts: 9
Joined: 15 Feb 2011, 13:59

Re: mechanical analysis on large domains (>200k nodes)

Post by msui »

Hello Juha,

Thanks a lot for your extensive reply! We already tried to get the MUMPS solver up and running last week, but haven't looked at the other options yet. Compiling ElmerSolver is not a problem; we have been running custom builds from your repository for about half a year now. Getting it to compile with MUMPS and HYPRE linked in on our OpenSUSE systems however turned out to be quite a nightmare, so after a day or two of struggling to get all dependencies compiled we continued our experiments using a virtualized Debian system.(v6.0.1a). That made a huge difference, as we didn't need to compile anything (thanks to hazelsct's efforts, which are invaluable!)

We got quite far already, and succeeded in the parititioning the mesh for attached test case (following the hints in these threads viewtopic.php?f=4&t=46, viewtopic.php?f=3&t=452, viewtopic.php?f=3&t=102). This test case runs fine with Umfpack, but breaks with MUMPS:

Code: Select all

ELMER SOLVER (v 6.1) STARTED AT: 2011/06/29 13:03:01
ELMER SOLVER (v 6.1) STARTED AT: 2011/06/29 13:03:01
ParCommInit:  Initialize #PEs:            2
MAIN:
MAIN: ==========================================
MAIN:  E L M E R  S O L V E R  S T A R T I N G
MAIN:  Library version: 6.1 (Rev: 5210)
MAIN:  Running in parallel using 2 tasks.
MAIN:  HYPRE library linked in.
MAIN:  MUMPS library linked in.
MAIN: ==========================================
MAIN:
MAIN:
MAIN: -----------------------
MAIN: Reading Model ...
Loading user function library: [StressSolve]...[StressSolver_Init0]
Loading user function library: [ResultOutputSolve]...[ResultOutputSolver_Init0]
LoadMesh: Scaling coordinates: 1.000E-03 1.000E-03 1.000E-03
MAIN: Done
MAIN: -----------------------
Loading user function library: [StressSolve]...[StressSolver_Init]
Loading user function library: [StressSolve]...[StressSolver]
OptimizeBandwidth: ---------------------------------------------------------
OptimizeBandwidth: Computing matrix structure for: linear elasticity...done.
OptimizeBandwidth:  Half bandwidth without optimization:        12254
OptimizeBandwidth:
OptimizeBandwidth: Bandwidth Optimization ...done.
OptimizeBandwidth:  Half bandwidth after optimization:         1272
OptimizeBandwidth: ---------------------------------------------------------
Loading user function library: [ResultOutputSolve]...[ResultOutputSolver_Init]
Loading user function library: [ResultOutputSolve]...[ResultOutputSolver]
MAIN:
MAIN: -------------------------------------
MAIN:  Steady state iteration:            1
MAIN: -------------------------------------
MAIN:
StressSolve:
StressSolve:
StressSolve: -------------------------------------
StressSolve:  DISPLACEMENT SOLVER ITERATION           1
StressSolve: -------------------------------------
StressSolve:
StressSolve: Starting assembly...
StressSolve: Assembly:
: .................... 60%
: .............Assembly done
DefUtils::DefaultDirichletBCs: Setting Dirichlet boundary conditions
DefUtils::DefaultDirichletBCs: Dirichlet boundary conditions set
StressSolve: Set boundaries done
WARNING:: ListFind:
WARNING:: ListFind:  Requested property: [Linear System Convergence Tolerance], not found
WARNING:: ListFind:
WARNING:: ListFind:
WARNING:: ListFind:  Requested property: [Linear System Convergence Tolerance], not found
WARNING:: ListFind:
[debianmatthijs:4371] *** An error occurred in MPI_Allreduce
[debianmatthijs:4371] *** on communicator MPI_COMM_WORLD
[debianmatthijs:4371] *** MPI_ERR_OP: invalid reduce operation
[debianmatthijs:4371] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 4371 on
node debianmatthijs exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[debianmatthijs:04370] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[debianmatthijs:04370] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Do you have any idea what could be going on here? Is there anything we're missing?

We're certainly interested in looking into the SuiteParse's CHOLMOD package, so if you could share installation instructions we would be greatly helped!

regards,

Matthijs Suijlen

P.S. all libraries used were precompiled and downloaded from the Debian repositories.
Attachments
testcase.tar.gz
solver input file and meshed geometry of testcase
(733.48 KiB) Downloaded 366 times
Juha
Site Admin
Posts: 357
Joined: 21 Aug 2009, 15:11

Re: mechanical analysis on large domains (>200k nodes)

Post by Juha »

Hi,
i tried your testcase with mvapich2, seemed to work OK. Any possibility
of mixups btw mpi ElmerSolver was compiled with vs your installed mpi?
Your mpi is openmpi, i guess?
Juha
Juha
Site Admin
Posts: 357
Joined: 21 Aug 2009, 15:11

Re: mechanical analysis on large domains (>200k nodes)

Post by Juha »

Hi, here are some instruction for CHOLMOD:

get SuiteSparse
http://www.cise.ufl.edu/research/sparse/SuiteSparse/
o get Metis
http://glaros.dtc.umn.edu/gkhome/metis/metis/download
Easiest to put the metis source under SuiteSparse/.
o Edit the SuiteSparse/UFconfig/UFconfig.mk
- change the installation directory
- add -fPIC to CFLAGS on x86_64 platforms
o Edit SuiteSparse/metis-4.x.x/Lib/Makefile
- add -fPIC to CFLAGS on x86_64 platforms
o make
make install
o replace $ELMER_HOME/lib/libamd.a with SuiteSparseInstall/libamd.a
o Compile elmer/fem with

export CHOLMOD_DIR=SuiteSparseInstall/
export CFLAGS="$CFLAGS -DHAVE_CHOLMOD -I$CHOLMOD_DIR/"
export FCPPFLAGS="$FCPPFLAGS -DHAVE_CHOLMOD"

and f.ex.

configure .. --with-blas="-L$CHOLMOD_DIR -lcholmod -lcamd -lccolamd -lcolamd -lmetis -lblas"

or some such thing.

Then when running use flags "Linear system Symmetric=True; Linear System Direct Method=cholmod"
in solver section of the .sif file.

Regards, Juha
msui
Posts: 9
Joined: 15 Feb 2011, 13:59

Re: mechanical analysis on large domains (>200k nodes)

Post by msui »

Hello Juha,
Juha wrote:Any possibility of mixups btw mpi ElmerSolver was compiled with vs your installed mpi? Your mpi is openmpi, i guess?
Juha
Yes, we have only tried OpenMPI so far. Whatever we tried on Debian 6.0.1a or Ubuntu 10.04, we couldn't get our test case to complete at all when using MUMPS / OpenMPI. We tried the binaries from the repositories, both of revision #4499 and of #5210, of which only the latter appeared to be compiled with MPI enabled. We tested #5210 with v1.4.1-2, v1.4.2-4, and v1.4.3-2.1 of the OpenMPI libraries without success. Each time we experienced the exact same error, as mentioned in my first message.

We managed to get our simple test case running though with a freshly compiled version from trunk (revision #5261, on Ubuntu 10.04 64 bits, using OpenMPI 1.4.1-2). So apparently something is mixed-up in the repositories.

However for our current eigenmode simulation case we still end up with an abrupt abort:

Code: Select all

Starting program Elmergrid
Elmergrid reading in-line arguments
The mesh will be partitioned with simple division to 4 partitions.
Nodes that do not appear in any element will be removed
Output will be saved to file /home/innoluce/Documents/eigenmodes_grootmodel_LPv8.

Elmergrid loading data:
-----------------------
Loading mesh in ElmerSolver format from directory /home/innoluce/Documents/eigenmodes_grootmodel_LPv8.
Loading header from mesh.header
Allocating for 155679 knots and 71352 elements.
Loading 155679 Elmer nodes from mesh.nodes
Loading 71352 bulk elements from mesh.elements
Loading 54596 boundary elements from mesh.boundary
All done

Elmergrid creating and manipulating meshes:
-------------------------------------------
All 155679 nodes were used by the mesh elements

Elmergrid partitioning meshes:
------------------------------
Making a simple partitioning for 71352 elements in 3-dimensions.
Ordering in the 2nd direction.
Ordering in the 3rd direction.
Creating an inverse topology of the finite element mesh
There are from 1 to 42 connections in the inverse topology.
Set the node partitions by the dominating element partition.
There are from 38069 to 39743 nodes in the 4 partitions.
Succesfully made a partitioning with 17838 to 17838 elements.
Optimizing the partitioning at boundaries.
Ownership of 0 parents was changed at BCs
Creating a table showing all parenting partitions of nodes.
Nodes belong to 4 partitions in maximum
There are 31394 shared nodes which is 20.17 % of all nodes.
The initial owner was not any of the elements for 0 nodes
Checking for partitioning
Distribution of elements, nodes and shared nodes
     partition  elements   nodes      shared    
     1          17838      39160      8284      
     2          17838      38707      7133      
     3          17838      39743      8648      
     4          17838      38069      7702      
Maximum deviation in ownership 1674
Average deviation in ownership 613.29
Checking for problematic sharings
There shouldn't be any problematic sharings, knock, knock...
A posteriori checking
Checking for partitioning
Distribution of elements, nodes and shared nodes
     partition  elements   nodes      shared    
     1          17838      39160      8284      
     2          17838      38707      7133      
     3          17838      39743      8648      
     4          17838      38069      7702      
The partitioning was optimized.

Elmergrid saving data:
----------------------
Saving mesh in parallel ElmerSolver format to directory /home/innoluce/Documents/eigenmodes_grootmodel_LPv8/partitioning.4.
Nodes belong to 4 partitions in maximum
Saving mesh for 4 partitions
   part  elements   nodes      shared   bc elems orphan  
   1     17838      39160      8284     16251    1852    
   2     17838      38707      7133     15967    3067    
   3     17838      39743      8648     16908    1844    
   4     17838      38069      7702     15883    3650    
Writing of partitioned mesh finished

Thank you for using Elmergrid!
Send bug reports and feature wishes to peter.raback@csc.fi
EigenSolve: ...................ELMER SOLVER (v 6.2) STARTED AT: 2011/07/05 14:45:17
ELMER SOLVER (v 6.2) STARTED AT: 2011/07/05 14:45:17
ELMER SOLVER (v 6.2) STARTED AT: 2011/07/05 14:45:17
ELMER SOLVER (v 6.2) STARTED AT: 2011/07/05 14:45:17
ParCommInit:  Initialize #PEs:            4
MAIN: 
MAIN: ==========================================
MAIN:  E L M E R  S O L V E R  S T A R T I N G
MAIN:  Library version: 6.2 (Rev: 5261)
MAIN:  Running in parallel using 4 tasks.
MAIN:  HYPRE library linked in.
MAIN:  MUMPS library linked in.
MAIN: ==========================================
MAIN: 
MAIN: 
MAIN: -----------------------
MAIN: Reading Model ...
Loading user function library: [StressSolve]...[StressSolver_Init0]
Loading user function library: [ResultOutputSolve]...[ResultOutputSolver_Init0]
LoadMesh: Scaling coordinates: 1.000E-03 1.000E-03 1.000E-03
MAIN: Done
MAIN: -----------------------
Loading user function library: [StressSolve]...[StressSolver_Init]
Loading user function library: [StressSolve]...[StressSolver]
OptimizeBandwidth: ---------------------------------------------------------
OptimizeBandwidth: Computing matrix structure for: linear elasticity...done.
OptimizeBandwidth:  Half bandwidth without optimization:        47215
OptimizeBandwidth: 
OptimizeBandwidth: Bandwidth Optimization ...done.
OptimizeBandwidth:  Half bandwidth after optimization:          621
OptimizeBandwidth: ---------------------------------------------------------
Loading user function library: [ResultOutputSolve]...[ResultOutputSolver_Init]
Loading user function library: [ResultOutputSolve]...[ResultOutputSolver]
MAIN: 
MAIN: -------------------------------------
MAIN:  Steady state iteration:            1
MAIN: -------------------------------------
MAIN: 
StressSolve: 
StressSolve: 
StressSolve: -------------------------------------
StressSolve:  DISPLACEMENT SOLVER ITERATION           1
StressSolve: -------------------------------------
StressSolve: 
StressSolve: Starting assembly...
StressSolve: Assembly:
: .............  6%
: ................... 13%
: .................... 23%
: .................... 52%
: .................... 87%
: ........Assembly done
DefUtils::DefaultDirichletBCs: Setting Dirichlet boundary conditions
DefUtils::DefaultDirichletBCs: Dirichlet boundary conditions set
StressSolve: Set boundaries done
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 4240 on node ubuntu exited on signal 9 (Killed).
--------------------------------------------------------------------------
Please advise on a possible cause. We are working on a test geometry we will share for reproducing this failure.

regards, Matthijs
hazelsct
Posts: 153
Joined: 05 Oct 2009, 17:02
Location: Boston, MA, USA
Contact:

Re: mechanical analysis on large domains (>200k nodes)

Post by hazelsct »

Hello,

You can build the Debian package from source using the instructions on the Wiki. To change MPI implementation, just install MPICH2 or LAM and then (as root):

Code: Select all

update-alternatives --config mpi
which will let you choose between the installed MPI implementations. You need to do this at compile time, as the MPI standard is API-compatible but not ABI-compatible, and make sure your runtime is consistent with the build environment using:

Code: Select all

update-alternatives --config mpirun
Glad to hear you like the Debian packaging!

-Adam
dvlierop
Posts: 15
Joined: 16 Feb 2011, 23:21

Re: mechanical analysis on large domains (>200k nodes)

Post by dvlierop »

Thanks Adam!

We compiled Elmer with MPICH2 for a change, but for this we also had to change the makescript (obtained here http://www.elmerfem.org/wiki/index.php/ ... ation_page) as it has been hardcoded to use OpenMPI

Code: Select all

#the compilers
export CC=mpicc.mpich2
export CXX=mpic++.mpich2
export FC=mpif90.mpich2
export F77=mpif90.mpich2
After using update-alternatives and applying the change to the script compilation went fine, but we're now running into another MPI error:

Code: Select all

Starting program Elmergrid
Elmergrid reading in-line arguments
The mesh will be partitioned with simple division to 4 partitions.
Nodes that do not appear in any element will be removed
Output will be saved to file /home/innoluce/Documents/schijf_MUMPStest.

Elmergrid loading data:
-----------------------
Loading mesh in ElmerSolver format from directory /home/innoluce/Documents/schijf_MUMPStest.
Loading header from mesh.header
Allocating for 45163 knots and 227768 elements.
Loading 45163 Elmer nodes from mesh.nodes
Loading 227768 bulk elements from mesh.elements
Loading 25032 boundary elements from mesh.boundary
All done

Elmergrid creating and manipulating meshes:
-------------------------------------------
All 45163 nodes were used by the mesh elements

Elmergrid partitioning meshes:
------------------------------
Making a simple partitioning for 227768 elements in 3-dimensions.
Ordering in the 2nd direction.
Ordering in the 3rd direction.
Creating an inverse topology of the finite element mesh
There are from 2 to 72 connections in the inverse topology.
Set the node partitions by the dominating element partition.
There are from 11225 to 11341 nodes in the 4 partitions.
Succesfully made a partitioning with 56942 to 56942 elements.
Optimizing the partitioning at boundaries.
Ownership of 0 parents was changed at BCs
Creating a table showing all parenting partitions of nodes.
Nodes belong to 4 partitions in maximum
There are 9422 shared nodes which is 20.86 % of all nodes.
The initial owner was not any of the elements for 0 nodes
Checking for partitioning
Distribution of elements, nodes and shared nodes
     partition  elements   nodes      shared    
     1          56942      11286      2574      
     2          56942      11225      2529      
     3          56942      11341      2391      
     4          56942      11311      2278      
Maximum deviation in ownership 116
Average deviation in ownership 42.66
Checking for problematic sharings
Changed the ownership of 8 nodes
Partitions 2 and 3 in element 8443 (3 owners) oddly related 1 times
Partitions 4 and 1 in element 10755 (3 owners) oddly related 1 times
Partitions 3 and 2 in element 92964 (3 owners) oddly related 1 times
Partitions 4 and 1 in element 192196 (3 owners) oddly related 1 times
Changed the ownership of 4 nodes
Partitions 2 and 3 in element 8443 (3 owners) oddly related 1 times
Partitions 4 and 1 in element 10755 (3 owners) oddly related 1 times
Partitions 3 and 2 in element 92964 (3 owners) oddly related 1 times
Partitions 4 and 1 in element 192196 (3 owners) oddly related 1 times
Changed the ownership of 4 nodes
4 problematic sharings may still exist
A posteriori checking
Checking for partitioning
Distribution of elements, nodes and shared nodes
     partition  elements   nodes      shared    
     1          56942      11283      2577      
     2          56942      11226      2528      
     3          56942      11344      2388      
     4          56942      11310      2279      
The partitioning was optimized.

Elmergrid saving data:
----------------------
Saving mesh in parallel ElmerSolver format to directory /home/innoluce/Documents/schijf_MUMPStest/partitioning.4.
Nodes belong to 4 partitions in maximum
Saving mesh for 4 partitions
   part  elements   nodes      shared   bc elems orphan  
   1     56942      11283      2577     6516     8       
   2     56942      11226      2528     6395     10      
   3     56942      11344      2388     6292     136     
   4     56942      11310      2279     6130     147     
Writing of partitioned mesh finished

Thank you for using Elmergrid!
Send bug reports and feature wishes to peter.raback@csc.fi
ELMER SOLVER (v 6.2) STARTED AT: 2011/07/08 10:54:21
ELMER SOLVER (v 6.2) STARTED AT: 2011/07/08 10:54:21
ELMER SOLVER (v 6.2) STARTED AT: 2011/07/08 10:54:21
ELMER SOLVER (v 6.2) STARTED AT: 2011/07/08 10:54:21
ParCommInit:  Initialize #PEs:            4
MAIN: 
MAIN: ==========================================
MAIN:  E L M E R  S O L V E R  S T A R T I N G
MAIN:  Library version: 6.2 (Rev: 5267)
MAIN:  Running in parallel using 4 tasks.
MAIN:  HYPRE library linked in.
MAIN:  MUMPS library linked in.
MAIN: ==========================================
MAIN: 
MAIN: 
MAIN: -----------------------
MAIN: Reading Model ...
Loading user function library: [StressSolve]...[StressSolver_Init0]
Loading user function library: [ResultOutputSolve]...[ResultOutputSolver_Init0]
LoadMesh: Scaling coordinates: 1.000E-03 1.000E-03 1.000E-03
MAIN: Done
MAIN: -----------------------
Loading user function library: [StressSolve]...[StressSolver_Init]
Loading user function library: [StressSolve]...[StressSolver]
OptimizeBandwidth: ---------------------------------------------------------
OptimizeBandwidth: Computing matrix structure for: linear elasticity...done.
OptimizeBandwidth:  Half bandwidth without optimization:        13626
OptimizeBandwidth: 
OptimizeBandwidth: Bandwidth Optimization ...done.
OptimizeBandwidth:  Half bandwidth after optimization:          521
OptimizeBandwidth: ---------------------------------------------------------
Loading user function library: [ResultOutputSolve]...[ResultOutputSolver_Init]
Loading user function library: [ResultOutputSolve]...[ResultOutputSolver]
MAIN: 
MAIN: -------------------------------------
MAIN:  Steady state iteration:            1
MAIN: -------------------------------------
MAIN: 
StressSolve: 
StressSolve: 
StressSolve: -------------------------------------
StressSolve:  DISPLACEMENT SOLVER ITERATION           1
StressSolve: -------------------------------------
StressSolve: 
StressSolve: Starting assembly...
StressSolve: Assembly:
: .....Assembly done
DefUtils::DefaultDirichletBCs: Setting Dirichlet boundary conditions
DefUtils::DefaultDirichletBCs: Dirichlet boundary conditions set
StressSolve: Set boundaries done
WARNING:: ListFind: 
WARNING:: ListFind:  Requested property: [Linear System Convergence Tolerance], not found
WARNING:: ListFind: 
WARNING:: ListFind: 
WARNING:: ListFind:  Requested property: [Linear System Convergence Tolerance], not found
WARNING:: ListFind: 
WARNING:: ListFind: 
WARNING:: ListFind:  Requested property: [Linear System Convergence Tolerance], not found
WARNING:: ListFind: 
WARNING:: ListFind: 
WARNING:: ListFind:  Requested property: [Linear System Convergence Tolerance], not found
WARNING:: ListFind: 
Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
MPI_Allreduce(773): MPI_Allreduce(sbuf=0x7fffd6d37e08, rbuf=0x7fffd6d37e00, count=1, INVALID DATATYPE, op=0x1, comm=0x84000002) failed
MPI_Allreduce(637): Invalid MPI_Op
MPI_Allreduce(636): Invalid datatype
Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
MPI_Allreduce(773): MPI_Allreduce(sbuf=0x7fff74c2a3e8, rbuf=0x7fff74c2a3e0, count=1, INVALID DATATYPE, op=0x1, comm=0x84000002) failed
MPI_Allreduce(637): Invalid MPI_Op
MPI_Allreduce(636): Invalid datatype
Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
MPI_Allreduce(773): MPI_Allreduce(sbuf=0x7fff7aba7498, rbuf=0x7fff7aba7490, count=1, INVALID DATATYPE, op=0x1, comm=0x84000004) failed
MPI_Allreduce(637): Invalid MPI_Op
MPI_Allreduce(636): Invalid datatype
Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
MPI_Allreduce(773): MPI_Allreduce(sbuf=0x7fff103d4908, rbuf=0x7fff103d4900, count=1, INVALID DATATYPE, op=0x1, comm=0x84000002) failed
MPI_Allreduce(637): Invalid MPI_Op
MPI_Allreduce(636): Invalid datatype
EigenSolve: .rank 3 in job 1  ubuntu_42672   caused collective abort of all ranks
  exit status of rank 3: killed by signal 9 
We will now try to use LAM, and after that mvapich2 as Juha suggested.

Diederik
dvlierop
Posts: 15
Joined: 16 Feb 2011, 23:21

Re: mechanical analysis on large domains (>200k nodes)

Post by dvlierop »

Hi Juha,

LAM installs just fine, but apparently it does not provide a F90 compiler (as confirmed here http://www.lam-mpi.org/MailArchives/lam ... 1/9186.php). We have not looked into ways around this, if any, but continued our trials with MVAPICH2 (v1.6) as suggested. Compiling took some time to figure out but was not too difficult in the end, but again we ran into similar errors as before when running our test-case:

Code: Select all

Starting program Elmergrid
Elmergrid reading in-line arguments
The mesh will be partitioned with simple division to 4 partitions.
Nodes that do not appear in any element will be removed
Output will be saved to file /home/innoluce/Documents/schijf_MUMPStest.

Elmergrid loading data:
-----------------------
Loading mesh in ElmerSolver format from directory /home/innoluce/Documents/schijf_MUMPStest.
Loading header from mesh.header
Allocating for 45163 knots and 227768 elements.
Loading 45163 Elmer nodes from mesh.nodes
Loading 227768 bulk elements from mesh.elements
Loading 25032 boundary elements from mesh.boundary
All done

Elmergrid creating and manipulating meshes:
-------------------------------------------
All 45163 nodes were used by the mesh elements

Elmergrid partitioning meshes:
------------------------------
Making a simple partitioning for 227768 elements in 3-dimensions.
Ordering in the 2nd direction.
Ordering in the 3rd direction.
Creating an inverse topology of the finite element mesh
There are from 2 to 72 connections in the inverse topology.
Set the node partitions by the dominating element partition.
There are from 11225 to 11341 nodes in the 4 partitions.
Succesfully made a partitioning with 56942 to 56942 elements.
Optimizing the partitioning at boundaries.
Ownership of 0 parents was changed at BCs
Creating a table showing all parenting partitions of nodes.
Nodes belong to 4 partitions in maximum
There are 9422 shared nodes which is 20.86 % of all nodes.
The initial owner was not any of the elements for 0 nodes
Checking for partitioning
Distribution of elements, nodes and shared nodes
     partition  elements   nodes      shared    
     1          56942      11286      2574      
     2          56942      11225      2529      
     3          56942      11341      2391      
     4          56942      11311      2278      
Maximum deviation in ownership 116
Average deviation in ownership 42.66
Checking for problematic sharings
Changed the ownership of 8 nodes
Partitions 2 and 3 in element 8443 (3 owners) oddly related 1 times
Partitions 4 and 1 in element 10755 (3 owners) oddly related 1 times
Partitions 3 and 2 in element 92964 (3 owners) oddly related 1 times
Partitions 4 and 1 in element 192196 (3 owners) oddly related 1 times
Changed the ownership of 4 nodes
Partitions 2 and 3 in element 8443 (3 owners) oddly related 1 times
Partitions 4 and 1 in element 10755 (3 owners) oddly related 1 times
Partitions 3 and 2 in element 92964 (3 owners) oddly related 1 times
Partitions 4 and 1 in element 192196 (3 owners) oddly related 1 times
Changed the ownership of 4 nodes
4 problematic sharings may still exist
A posteriori checking
Checking for partitioning
Distribution of elements, nodes and shared nodes
     partition  elements   nodes      shared    
     1          56942      11283      2577      
     2          56942      11226      2528      
     3          56942      11344      2388      
     4          56942      11310      2279      
The partitioning was optimized.

Elmergrid saving data:
----------------------
Saving mesh in parallel ElmerSolver format to directory /home/innoluce/Documents/schijf_MUMPStest/partitioning.4.
Nodes belong to 4 partitions in maximum
Saving mesh for 4 partitions
   part  elements   nodes      shared   bc elems orphan  
   1     56942      11283      2577     6516     8       
   2     56942      11226      2528     6395     10      
   3     56942      11344      2388     6292     136     
   4     56942      11310      2279     6130     147     
Writing of partitioned mesh finished

Thank you for using Elmergrid!
Send bug reports and feature wishes to peter.raback@csc.fi
ELMER SOLVER (v 6.2) STARTED AT: 2011/07/08 14:00:09
ELMER SOLVER (v 6.2) STARTED AT: 2011/07/08 14:00:09
ELMER SOLVER (v 6.2) STARTED AT: 2011/07/08 14:00:09
ELMER SOLVER (v 6.2) STARTED AT: 2011/07/08 14:00:09
ParCommInit:  Initialize #PEs:            4
MAIN: 
MAIN: ==========================================
MAIN:  E L M E R  S O L V E R  S T A R T I N G
MAIN:  Library version: 6.2 (Rev: 5267)
MAIN:  Running in parallel using 4 tasks.
MAIN:  HYPRE library linked in.
MAIN:  MUMPS library linked in.
MAIN: ==========================================
MAIN: 
MAIN: 
MAIN: -----------------------
MAIN: Reading Model ...
Loading user function library: [StressSolve]...[StressSolver_Init0]
Loading user function library: [ResultOutputSolve]...[ResultOutputSolver_Init0]
LoadMesh: Scaling coordinates: 1.000E-03 1.000E-03 1.000E-03
MAIN: Done
MAIN: -----------------------
Loading user function library: [StressSolve]...[StressSolver_Init]
Loading user function library: [StressSolve]...[StressSolver]
OptimizeBandwidth: ---------------------------------------------------------
OptimizeBandwidth: Computing matrix structure for: linear elasticity...done.
OptimizeBandwidth:  Half bandwidth without optimization:        13626
OptimizeBandwidth: 
OptimizeBandwidth: Bandwidth Optimization ...done.
OptimizeBandwidth:  Half bandwidth after optimization:          521
OptimizeBandwidth: ---------------------------------------------------------
Loading user function library: [ResultOutputSolve]...[ResultOutputSolver_Init]
Loading user function library: [ResultOutputSolve]...[ResultOutputSolver]
MAIN: 
MAIN: -------------------------------------
MAIN:  Steady state iteration:            1
MAIN: -------------------------------------
MAIN: 
StressSolve: 
StressSolve: 
StressSolve: -------------------------------------
StressSolve:  DISPLACEMENT SOLVER ITERATION           1
StressSolve: -------------------------------------
StressSolve: 
StressSolve: Starting assembly...
StressSolve: Assembly:
: ........Assembly done
DefUtils::DefaultDirichletBCs: Setting Dirichlet boundary conditions
DefUtils::DefaultDirichletBCs: Dirichlet boundary conditions set
StressSolve: Set boundaries done
WARNING:: ListFind: 
WARNING:: ListFind:  Requested property: [Linear System Convergence Tolerance], not found
WARNING:: ListFind: 
WARNING:: ListFind: 
WARNING:: ListFind:  Requested property: [Linear System Convergence Tolerance], not found
WARNING:: ListFind: 
WARNING:: ListFind: 
WARNING:: ListFind:  Requested property: [Linear System Convergence Tolerance], not found
WARNING:: ListFind: 
WARNING:: ListFind: 
WARNING:: ListFind:  Requested property: [Linear System Convergence Tolerance], not found
WARNING:: ListFind: 
Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
MPI_Allreduce(773): MPI_Allreduce(sbuf=0x7fff0d210e28, rbuf=0x7fff0d210e20, count=1, INVALID DATATYPE, op=0x1, comm=0x84000004) failed
MPI_Allreduce(637): Invalid MPI_Op
MPI_Allreduce(636): Invalid datatype
Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
MPI_Allreduce(773): MPI_Allreduce(sbuf=0x7fffe8d11238, rbuf=0x7fffe8d11230, count=1, INVALID DATATYPE, op=0x1, comm=0x84000002) failed
MPI_Allreduce(637): Invalid MPI_Op
MPI_Allreduce(636): Invalid datatype
Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
MPI_Allreduce(773): MPI_Allreduce(sbuf=0x7fff675cf428, rbuf=0x7fff675cf420, count=1, INVALID DATATYPE, op=0x1, comm=0x84000002) failed
MPI_Allreduce(637): Invalid MPI_Op
MPI_Allreduce(636): Invalid datatype
Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
MPI_Allreduce(773): MPI_Allreduce(sbuf=0x7fffa06000e8, rbuf=0x7fffa06000e0, count=1, INVALID DATATYPE, op=0x1, comm=0x84000002) failed
MPI_Allreduce(637): Invalid MPI_Op
MPI_Allreduce(636): Invalid datatype
EigenSolve: .
=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 256
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
Juha, could you please check if you can reproduce this on your system using our test case? see http://www.mediafire.com/file/pq24gwqcw ... est.tar.gz. If not, could you then report the version you've been using? I wouldn't be surprised if this "Invalid datatype" is caused by a version conflict or something.

Thanks in advance!

Diederik
msui
Posts: 9
Joined: 15 Feb 2011, 13:59

Re: mechanical analysis on large domains (>200k nodes)

Post by msui »

Hi Juha,

In the mean time we installed the CHOLMOD solver of SuiteSparse on Ubuntu 10.04 following your instructions, and we managed to get it compiled and running (don't ask how, I'm afraid we cannot reproduce the compilation ;-))/ With it we ran a successful eigenmode simulation on a model of ~ 20k nodes. Memory consumption and CPU time (~3 GB and ~16 min) are a bit disappointing compared to performing the same simulation with Umfpack (~1GB and ~1.5 min). Unfortunately the simulation using CHOLMOD on our large domain geometry (~ 300k nodes) did not finish. The process segfaulted after a few hours without any warning or feedback :( . The memory consumption was huge, and it took very long, so we didn't continue this effort.

Today however, we managed to setup the openMPI MUMPS solver in Elmer on a Ubuntu 11.04 system which produced for our 300k nodes domain a successful eigenmode simulation. This is in contrast to the openMPI version on Ubuntu 10.04 which aborted abruptly (see the July 5th message) and did not return results. Although eigenvalues are returned, we still run into the error message below

Code: Select all

StressSolve: Set boundaries done
EigenSolve: ................................
EigenSolve: EIGEN SYSTEM SOLUTION COMPLETE:
EigenSolve: 
EigenSolve:  The convergence criterion is   1.00000000000000002E-003
EigenSolve:   The number of converged Ritz values is           10
EigenSolve: 
EigenSolve: Computed Eigen Values:
EigenSolve: --------------------------------
EigenSolve:            1 (  7376266293.2724094     ,  0.0000000000000000     )
EigenSolve:            2 (  14454950628.288219     ,  0.0000000000000000     )
EigenSolve:            3 (  14583173275.291462     ,  0.0000000000000000     )
EigenSolve:            4 (  124316394240.53374     ,  0.0000000000000000     )
EigenSolve:            5 (  312123041250.91504     ,  0.0000000000000000     )
EigenSolve:            6 (  313044861267.48865     ,  0.0000000000000000     )
EigenSolve:            7 (  1279661684170.7180     ,  0.0000000000000000     )
EigenSolve:            8 (  1284478971543.0984     ,  0.0000000000000000     )
EigenSolve:            9 (  1458684570276.8904     ,  0.0000000000000000     )
EigenSolve:           10 (  1645397036445.7429     ,  0.0000000000000000     )
EigenSolve: --------------------------------
ComputeChange: SS (ITER=1) (NRM,RELC): (  0.0000000      0.0000000     ) :: linear elasticity
ElmerSolver: *** Elmer Solver: ALL DONE ***
*** The MPI_Comm_f2c() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[ubuntu:26471] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!
*** The MPI_Comm_f2c() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[ubuntu:26470] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!
Assuming that this error is quite harmless, we can conclude that we have found a suitable direct solver for our large domain eigenmode (and other mechanical analysis) simulations. The 300k node problem was solved within half an hour and using ~10GB of memory which our system still can handle :).

But is this error really harmless? Do you have any idea what causes this or how this can be resolved?

Thanks in advance and
regards (also of Diederik),

Matthijs
Post Reply