elmersolver_mpi doesn't exit on completion
-
- Site Admin
- Posts: 4871
- Joined: 22 Aug 2009, 11:57
- Antispam: Yes
- Location: Espoo, Finland
- Contact:
Re: elmersolver_mpi doesn't exit on completion
Ok, it seems everybody finished but some partition was left hanging...
Re: elmersolver_mpi doesn't exit on completion
To complement my previous report, here is what happened when I tried with 5 nodes this time.
During the run, the attached image DuringTheRun.png shows the memory and cpu usage. The second image AfterTheRun.png shows that in this case, 3 nodes did not terminate while we still got printed "Partn: The end" for all nodes.
The CPU usage for these late nodes is zero while the used memory significantly reduced. I hope it helps.
Romuald
During the run, the attached image DuringTheRun.png shows the memory and cpu usage. The second image AfterTheRun.png shows that in this case, 3 nodes did not terminate while we still got printed "Partn: The end" for all nodes.
The CPU usage for these late nodes is zero while the used memory significantly reduced. I hope it helps.
Romuald
- Attachments
-
- AfterTheRun.PNG
- (65.78 KiB) Not downloaded yet
-
- DuringThe Run.PNG
- (64.25 KiB) Not downloaded yet
Re: elmersolver_mpi doesn't exit on completion
I am still investigating the case.
Instead of killing the late node(s) via the task manager, I performed a CTRL-C from the console executing the batch. I got the following by mpiexec with 4 nodes:
and with 5 nodes:
It seems that node 0 does not terminate systematically.
Instead of killing the late node(s) via the task manager, I performed a CTRL-C from the console executing the batch. I got the following by mpiexec with 4 nodes:
Code: Select all
Code: Select all
mpiexec aborting job...
job aborted:
[ranks] message
[0] job terminated by the user
[1-4] terminated
---- error analysis -----
[0] on DACMPCG158
ctrl-c was hit. job aborted by the user.
-
- Site Admin
- Posts: 4871
- Joined: 22 Aug 2009, 11:57
- Antispam: Yes
- Location: Espoo, Finland
- Contact:
Re: elmersolver_mpi doesn't exit on completion
Hi
I added FLUSH and STOP to the code. Should not matter but worth testing...
You can find fresh installers from here:
https://www.nic.funet.fi/pub/sci/physic ... n/windows/
-Peter
I added FLUSH and STOP to the code. Should not matter but worth testing...
You can find fresh installers from here:
https://www.nic.funet.fi/pub/sci/physic ... n/windows/
-Peter
Re: elmersolver_mpi doesn't exit on completion
Thanks.
I will tell you what I got Tomorrow then.
Romuald
I will tell you what I got Tomorrow then.
Romuald
Re: elmersolver_mpi doesn't exit on completion
Hi
it seems that no new installers have been compiled since 07/10 3-4 pm.
I will check tomorrow.
Best,
Romuald
it seems that no new installers have been compiled since 07/10 3-4 pm.
I will check tomorrow.
Best,
Romuald
Re: elmersolver_mpi doesn't exit on completion
I tested the updated version.
FLUSH and STOP did not change anything as likely expected. Jobs run and then stop as before. I had 9 subprocesses that hanged (was using 8 processors) rather than 8 closing and one staying.
-Scott
FLUSH and STOP did not change anything as likely expected. Jobs run and then stop as before. I had 9 subprocesses that hanged (was using 8 processors) rather than 8 closing and one staying.
-Scott
Re: elmersolver_mpi doesn't exit on completion
For information, while my first tests were with MS-MPI 10.0, I also tried with older version 7.1 and 8.0. The problem remains.
That is very weird.
R
That is very weird.
R
-
- Posts: 215
- Joined: 26 Aug 2009, 18:20
- Location: Peterborough, England
Re: elmersolver_mpi doesn't exit on completion
Hi All
Since originally posting in 2016 I can confirm that I haven't found a solution other than to run the solver under Linux (Ubuntu) if a solution is found for windows I would still be interested.
Regards
Mark
Since originally posting in 2016 I can confirm that I haven't found a solution other than to run the solver under Linux (Ubuntu) if a solution is found for windows I would still be interested.
Regards
Mark