High order infinite loop during partition/ parallel run fails

Clearly defined bug reports and their fixes
Post Reply
drmike
Posts: 38
Joined: 17 Mar 2024, 18:52
Antispam: Yes

High order infinite loop during partition/ parallel run fails

Post by drmike »

I have a mesh which is hexahedral with 27 points per element. The object is a hollow cylinder with 2 units radius and 40 units tall. Attempting to make it a parallel using command ElmerGrid 2 2 <directory> -partition 1 1 16 caused a seg fault, and a little debugging found it is in egnative.c when attempting to look for sides.
Here is the section of code:

Code: Select all

	GetElementSide(elemind,side,normal,data,&sideind2[0],&sideelemtype2);
	printf("after getelements side = %d %d %d\n", side, sideelemtype2, sideelemtype);
	fflush(stdout);
	if(sideelemtype2 == 0 ) break;
	if(sideelemtype2 < 300 && sideelemtype > 300) break;	
	if(sideelemtype2 < 200 && sideelemtype > 200) break;		
	if(sideelemtype != sideelemtype2) continue;
		
	sidenodes = sideelemtype % 100;

	nohits = 0;
	for(j=0;j<sidenodes;j++) 
	  for(i=0;i<sidenodes;i++) 
	    if(sideind[i] == sideind2[j]) nohits++;
	printf("nohits= %d\n", nohits);
	fflush(stdout);
Here is the death output:

Code: Select all

after getelements side = 329341 409 409
nohits= 0
after getelements side = 329342 409 409
nohits= 0
after getelements side = 329343 409 409
nohits= 0
after getelements side = 329344 409 409
nohits= 0
after getelements side = 329345 409 409
nohits= 0
The mesh files are too large to post, so I've placed them here: https://www.eskimo.com/~eresrch/Elmer/

The work around is to use ElmerGrid 14 2 <.msh> -autoclean -partdual -metiskway 16 - this creates a partition, but running gives an error:

Code: Select all

$ mpirun -np 16 ElmerSolver picneg.sif 
ELMER SOLVER (v 9.0) STARTED AT: 2024/04/25 13:28:47
ELMER SOLVER (v 9.0) STARTED AT: 2024/04/25 13:28:47
ELMER SOLVER (v 9.0) STARTED AT: 2024/04/25 13:28:47
ELMER SOLVER (v 9.0) STARTED AT: 2024/04/25 13:28:47
.
.
.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f809eb552ed in ???
#1  0x7f809eb54503 in ???
#2  0x7f809e787f0f in ???
#3  0x7f809f2a6170 in permutenodenumbering
	at /home/drmike/Nuclear_reactions/Plasma_physics/Elmer/ElmerFEM/elmerfem/fem/src/MeshUtils.F90:2143
#4  0x7f809f2a6170 in __meshutils_MOD_elmerasciimesh
	at /home/drmike/Nuclear_reactions/Plasma_physics/Elmer/ElmerFEM/elmerfem/fem/src/MeshUtils.F90:1710
#5  0x7f809f2d8674 in __meshutils_MOD_loadmesh2
	at /home/drmike/Nuclear_reactions/Plasma_physics/Elmer/ElmerFEM/elmerfem/fem/src/MeshUtils.F90:2446
#6  0x7f809f071f3e in __modeldescription_MOD_loadmodel
	at /home/drmike/Nuclear_reactions/Plasma_physics/Elmer/ElmerFEM/elmerfem/fem/src/ModelDescription.F90:2876
#7  0x7f809f480e84 in elmersolver_
	at /home/drmike/Nuclear_reactions/Plasma_physics/Elmer/ElmerFEM/elmerfem/fem/src/ElmerSolver.F90:387
#8  0x55e1cbb4723e in solver
	at /home/drmike/Nuclear_reactions/Plasma_physics/Elmer/ElmerFEM/elmerfem/fem/src/Solver.F90:57
#9  0x55e1cbb46f8e in main
	at /home/drmike/Nuclear_reactions/Plasma_physics/Elmer/ElmerFEM/elmerfem/fem/src/Solver.F90:34
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
WARNING:: AddEquationBasics: > Timestepping method < defaulted to > Implicit Euler <
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node Relativity exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
There is something wrong with this mesh. ElmerGUI shows a lot of internal surfaces not connected to anything, and I'm supposing this is causing problems. Please let me know how I can help with debugging.
kevinarden
Posts: 2328
Joined: 25 Jan 2019, 01:28
Antispam: Yes

Re: High order infinite loop during partition/ parallel run fails

Post by kevinarden »

Have you considered p elements as an alternative to higher order node elements.

Appendix E,
https://www.nic.funet.fi/pub/sci/physic ... Manual.pdf
drmike
Posts: 38
Joined: 17 Mar 2024, 18:52
Antispam: Yes

Re: High order infinite loop during partition/ parallel run fails

Post by drmike »

The p-elements do not work with StatElecSolve.F90:

Code: Select all

    IF( Found ) THEN
      IF(str(1:2) == 'p:') CALL Fatal('StatElecSolver_init','No support for p-elements in solver!')
    END IF
I would have fun helping to make that work, but I doubt I'd be very efficient at it.
Post Reply