Page 2 of 3

Re: elmersolver_mpi doesn't exit on completion

Posted: 06 Oct 2020, 18:51
by raback
Ok, it seems everybody finished but some partition was left hanging...

Re: elmersolver_mpi doesn't exit on completion

Posted: 07 Oct 2020, 09:58
by Romuald
exactly.

Re: elmersolver_mpi doesn't exit on completion

Posted: 07 Oct 2020, 10:10
by Romuald
To complement my previous report, here is what happened when I tried with 5 nodes this time.

During the run, the attached image DuringTheRun.png shows the memory and cpu usage. The second image AfterTheRun.png shows that in this case, 3 nodes did not terminate while we still got printed "Partn: The end" for all nodes.

The CPU usage for these late nodes is zero while the used memory significantly reduced. I hope it helps.

Romuald

Re: elmersolver_mpi doesn't exit on completion

Posted: 07 Oct 2020, 16:38
by Romuald
I am still investigating the case.

Instead of killing the late node(s) via the task manager, I performed a CTRL-C from the console executing the batch. I got the following by mpiexec with 4 nodes:
and with 5 nodes:

Code: Select all

mpiexec aborting job...

job aborted:
[ranks] message

[0] job terminated by the user

[1-4] terminated

---- error analysis -----

[0] on DACMPCG158
ctrl-c was hit. job aborted by the user.
It seems that node 0 does not terminate systematically.

Re: elmersolver_mpi doesn't exit on completion

Posted: 07 Oct 2020, 18:06
by raback
Hi

I added FLUSH and STOP to the code. Should not matter but worth testing...

You can find fresh installers from here:
https://www.nic.funet.fi/pub/sci/physic ... n/windows/

-Peter

Re: elmersolver_mpi doesn't exit on completion

Posted: 07 Oct 2020, 18:37
by Romuald
Thanks.

I will tell you what I got Tomorrow then.

Romuald

Re: elmersolver_mpi doesn't exit on completion

Posted: 08 Oct 2020, 18:04
by Romuald
Hi

it seems that no new installers have been compiled since 07/10 3-4 pm.

I will check tomorrow.

Best,

Romuald

Re: elmersolver_mpi doesn't exit on completion

Posted: 13 Oct 2020, 18:49
by sslone
I tested the updated version.

FLUSH and STOP did not change anything as likely expected. Jobs run and then stop as before. I had 9 subprocesses that hanged (was using 8 processors) rather than 8 closing and one staying.

-Scott

Re: elmersolver_mpi doesn't exit on completion

Posted: 23 Oct 2020, 20:15
by Romuald
For information, while my first tests were with MS-MPI 10.0, I also tried with older version 7.1 and 8.0. The problem remains.

That is very weird.

R

Re: elmersolver_mpi doesn't exit on completion

Posted: 24 Oct 2020, 16:40
by mark smith
Hi All
Since originally posting in 2016 I can confirm that I haven't found a solution other than to run the solver under Linux (Ubuntu) if a solution is found for windows I would still be interested.
Regards
Mark