elmersolver_mpi doesn't exit on completion

Numerical methods and mathematical models of Elmer
raback
Site Admin
Posts: 4871
Joined: 22 Aug 2009, 11:57
Antispam: Yes
Location: Espoo, Finland
Contact:

Re: elmersolver_mpi doesn't exit on completion

Post by raback »

Ok, it seems everybody finished but some partition was left hanging...
Romuald
Posts: 16
Joined: 28 Sep 2020, 17:13
Antispam: Yes

Re: elmersolver_mpi doesn't exit on completion

Post by Romuald »

exactly.
Romuald
Posts: 16
Joined: 28 Sep 2020, 17:13
Antispam: Yes

Re: elmersolver_mpi doesn't exit on completion

Post by Romuald »

To complement my previous report, here is what happened when I tried with 5 nodes this time.

During the run, the attached image DuringTheRun.png shows the memory and cpu usage. The second image AfterTheRun.png shows that in this case, 3 nodes did not terminate while we still got printed "Partn: The end" for all nodes.

The CPU usage for these late nodes is zero while the used memory significantly reduced. I hope it helps.

Romuald
Attachments
AfterTheRun.PNG
(65.78 KiB) Not downloaded yet
DuringThe Run.PNG
(64.25 KiB) Not downloaded yet
Romuald
Posts: 16
Joined: 28 Sep 2020, 17:13
Antispam: Yes

Re: elmersolver_mpi doesn't exit on completion

Post by Romuald »

I am still investigating the case.

Instead of killing the late node(s) via the task manager, I performed a CTRL-C from the console executing the batch. I got the following by mpiexec with 4 nodes:
and with 5 nodes:

Code: Select all

mpiexec aborting job...

job aborted:
[ranks] message

[0] job terminated by the user

[1-4] terminated

---- error analysis -----

[0] on DACMPCG158
ctrl-c was hit. job aborted by the user.
It seems that node 0 does not terminate systematically.
raback
Site Admin
Posts: 4871
Joined: 22 Aug 2009, 11:57
Antispam: Yes
Location: Espoo, Finland
Contact:

Re: elmersolver_mpi doesn't exit on completion

Post by raback »

Hi

I added FLUSH and STOP to the code. Should not matter but worth testing...

You can find fresh installers from here:
https://www.nic.funet.fi/pub/sci/physic ... n/windows/

-Peter
Romuald
Posts: 16
Joined: 28 Sep 2020, 17:13
Antispam: Yes

Re: elmersolver_mpi doesn't exit on completion

Post by Romuald »

Thanks.

I will tell you what I got Tomorrow then.

Romuald
Romuald
Posts: 16
Joined: 28 Sep 2020, 17:13
Antispam: Yes

Re: elmersolver_mpi doesn't exit on completion

Post by Romuald »

Hi

it seems that no new installers have been compiled since 07/10 3-4 pm.

I will check tomorrow.

Best,

Romuald
sslone
Posts: 5
Joined: 06 Feb 2020, 19:35
Antispam: Yes

Re: elmersolver_mpi doesn't exit on completion

Post by sslone »

I tested the updated version.

FLUSH and STOP did not change anything as likely expected. Jobs run and then stop as before. I had 9 subprocesses that hanged (was using 8 processors) rather than 8 closing and one staying.

-Scott
Romuald
Posts: 16
Joined: 28 Sep 2020, 17:13
Antispam: Yes

Re: elmersolver_mpi doesn't exit on completion

Post by Romuald »

For information, while my first tests were with MS-MPI 10.0, I also tried with older version 7.1 and 8.0. The problem remains.

That is very weird.

R
mark smith
Posts: 215
Joined: 26 Aug 2009, 18:20
Location: Peterborough, England

Re: elmersolver_mpi doesn't exit on completion

Post by mark smith »

Hi All
Since originally posting in 2016 I can confirm that I haven't found a solution other than to run the solver under Linux (Ubuntu) if a solution is found for windows I would still be interested.
Regards
Mark
Post Reply