Parallel grid seg fault

Numerical methods and mathematical models of Elmer
drmike
Posts: 38
Joined: 17 Mar 2024, 18:52
Antispam: Yes

Parallel grid seg fault

Post by drmike »

I got a long cylinder to run with a single core, and now I'm trying to make it a parallel job. However, I get the following error

Code: Select all

$ ElmerGrid 2 2 deuteron_chamber -partition 1 1 16

Starting program Elmergrid, compiled on Mar 10 2024
Elmergrid reading in-line arguments
The mesh will be partitioned geometrically to 16 partitions.
Output will be saved to file deuteron_chamber.

Elmergrid loading data:
-----------------------
Loading mesh in ElmerSolver format from directory deuteron_chamber.
Loading header from mesh.header
Maximum elementtype index is: 827
Maximum number of nodes in element is: 27
Allocating for 6757236 knots and 820800 elements.
Loading 6757236 Elmer nodes from mesh.nodes
Loading 820800 bulk elements from mesh.elements
Loading 119880 boundary elements from mesh.boundary
Segmentation fault (core dumped)
The cylinder has radius 2 and length 40, so it is extreme, but the directory loads fine with ElmerGUI and ElmerSolver. Am I doing something wrong?
kevinarden
Posts: 2327
Joined: 25 Jan 2019, 01:28
Antispam: Yes

Re: Parallel grid seg fault

Post by kevinarden »

try
ElmerGrid 2 2 deuteron_chamber -partdual -metiskway 16
drmike
Posts: 38
Joined: 17 Mar 2024, 18:52
Antispam: Yes

Re: Parallel grid seg fault

Post by drmike »

Same error:

Code: Select all

$ ElmerGrid 2 2 deuteron_chamber -partdual -metiskway 16

Starting program Elmergrid, compiled on Mar 10 2024
Elmergrid reading in-line arguments
Using dual (elemental) graph in partitioning.
The mesh will be partitioned with Metis to 16 partitions.
Output will be saved to file deuteron_chamber.

Elmergrid loading data:
-----------------------
Loading mesh in ElmerSolver format from directory deuteron_chamber.
Loading header from mesh.header
Maximum elementtype index is: 827
Maximum number of nodes in element is: 27
Allocating for 6757236 knots and 820800 elements.
Loading 6757236 Elmer nodes from mesh.nodes
Loading 820800 bulk elements from mesh.elements
Loading 119880 boundary elements from mesh.boundary
Segmentation fault (core dumped)
I'm slowly digging into ElmerGrid to find the place it fails. So far, it seems like it should be OK - no indexing is out of bounds yet. I'm in FindParentSide subroutine, and digging through that with printf's. It's kind of fun, maybe I need a break from physics :-)
drmike
Posts: 38
Joined: 17 Mar 2024, 18:52
Antispam: Yes

Re: Parallel grid seg fault

Post by drmike »

Looks like an infinite loop, I assume a pointer is going out of bounds somewhere. I put in printfs in egnative.c:

Code: Select all

	GetElementSide(elemind,side,normal,data,&sideind2[0],&sideelemtype2);
	printf("after getelements side = %d %d %d\n", side, sideelemtype2, sideelemtype);
	fflush(stdout);
	if(sideelemtype2 == 0 ) break;
	if(sideelemtype2 < 300 && sideelemtype > 300) break;	
	if(sideelemtype2 < 200 && sideelemtype > 200) break;		
	if(sideelemtype != sideelemtype2) continue;
		
	sidenodes = sideelemtype % 100;

	nohits = 0;
	for(j=0;j<sidenodes;j++) 
	  for(i=0;i<sidenodes;i++) 
	    if(sideind[i] == sideind2[j]) nohits++;
	printf("nohits= %d\n", nohits);
	fflush(stdout);
and it dies when "side" gets really big:

Code: Select all

after getelements side = 329341 409 409
nohits= 0
after getelements side = 329342 409 409
nohits= 0
after getelements side = 329343 409 409
nohits= 0
after getelements side = 329344 409 409
nohits= 0
after getelements side = 329345 409 409
nohits= 0
In all the previous long loops, side was 101, so big, but not enormous. Hopefully this is one more clue. I'll see if I can dig for more clues.
drmike
Posts: 38
Joined: 17 Mar 2024, 18:52
Antispam: Yes

Re: Parallel grid seg fault

Post by drmike »

On the element which causes the infinite loop, there are 4 faces in the boundary file:

Code: Select all

From mesh.boundary file:

46851 34 14398 820794 409 168389 3583 10 753 189569 3983 770 189567 189570 

116219 4 820794 0 409 417324 3583 10 5842 462484 3983 5879 462482 462485 

116903 2 820794 0 409 10 753 463115 5842 770 465107 465109 5879 465110 

119831 136 820794 0 409 5863453 5961759 5961760 5961783 4816765 4675402 4804781 4816759 6757218 

From mesh.elements file:

820794 1 827 3583 168389 4910832 417324 10 753 463115 5842 189569 6757212 6757213 462484 3983 189567 6757218 462482 770 465107 465109 5879 189570 6757219 6757220 462485 6757216 465110 6757221 
For a hollow cylinder this makes no sense - the maximum number of faces on a boundary element should be 2. I'm not sure what the code is looking for, but the "floating face" is not finding anything and the program goes into the weeds. If there is anything more I can do to help fix this, please let me know.
raback
Site Admin
Posts: 4835
Joined: 22 Aug 2009, 11:57
Antispam: Yes
Location: Espoo, Finland
Contact:

Re: Parallel grid seg fault

Post by raback »

Hi,

Does it work for linear elements? It tries to identify the parent of the boundary element and obviously fails. The parent is found by matching indexes.

Which code has written the initial mesh.* files?

-Peter
drmike
Posts: 38
Joined: 17 Mar 2024, 18:52
Antispam: Yes

Re: Parallel grid seg fault

Post by drmike »

Yes, it does work for linear elements.

Code: Select all

Saving mesh for 16 partitions
   part  elements   nodes      shared   bc elems
   1     51300      55233      1083     9102    
   2     51300      54150      2166     7050    
   3     51300      54150      2166     7050    
   4     51300      54150      2166     7050    
   5     51300      54150      2166     7050    
   6     51300      54150      2166     7050    
   7     51300      54150      2166     7050    
   8     51300      54150      2166     7050    
   9     51300      54150      2166     7050    
   10    51300      54150      2166     7050    
   11    51300      54150      2166     7050    
   12    51300      54150      2166     7050    
   13    51300      54150      2166     7050    
   14    51300      54150      2166     7050    
   15    51300      54150      2166     7050    
   16    51300      55233      1083     9102    
----------------------------------------------------------------------------------------------
   ave   51300.0    54285.4    2030.6   7306.5   0.0     
Writing of partitioned mesh finished
Gmsh creates the .msh file and ElmerGrid 14 2 ... creates the mesh.* files
raback
Site Admin
Posts: 4835
Joined: 22 Aug 2009, 11:57
Antispam: Yes
Location: Espoo, Finland
Contact:

Re: Parallel grid seg fault

Post by raback »

Hi

And if you say:

ElmerGrid 14 2 gmsh.msh -autoclean -partdual -metiskway 16

-Peter
drmike
Posts: 38
Joined: 17 Mar 2024, 18:52
Antispam: Yes

Re: Parallel grid seg fault

Post by drmike »

That works! I will make note of this command and ignore the fun of debugging!
Thanks, I'm really happy that gets me back to physics.
raback
Site Admin
Posts: 4835
Joined: 22 Aug 2009, 11:57
Antispam: Yes
Location: Espoo, Finland
Contact:

Re: Parallel grid seg fault

Post by raback »

The 27 node quad is rather seldomly used that maybe there is some inconsistency in the treatment of the boundary indexes. Should not happen. You might upload the files & procedure as a bug report here in the forum / or github issue.

-Peter
Post Reply