This shows you the differences between two versions of the page.

Both sides previous revision Previous revision | |||

problems:vectstokes [2019/04/13 13:52] tzwinger [Setting up a simulation] |
problems:vectstokes [2019/04/13 13:53] (current) tzwinger [IncompressibleNSVec: vectorized incompressible Navier Stokes Solver] |
||
---|---|---|---|

Line 1: | Line 1: | ||

- | ====== IncompressibleNSVec: vectorized incompressible Navier Stokes Solver ====== | + | ====== Vectorized incompressible Navier Stokes Solver (IncompressibleNSVec) ====== |

Since March 2019, this solver is in the distribution of Elmer. It has more or less the functionality of the legacy Navier-Stokes solver, i.e., is able to use any linear solution procedure (including the library version of block-preconditioner). The difference and improvement in performance is given by the vectorized assembly routines utilizing a general way of SIMD enables bi-linear forms. Since this uses OpenMP SIMD instructions, it is essential to have OpenMP enabled in your compilation. | Since March 2019, this solver is in the distribution of Elmer. It has more or less the functionality of the legacy Navier-Stokes solver, i.e., is able to use any linear solution procedure (including the library version of block-preconditioner). The difference and improvement in performance is given by the vectorized assembly routines utilizing a general way of SIMD enables bi-linear forms. Since this uses OpenMP SIMD instructions, it is essential to have OpenMP enabled in your compilation. | ||

==== Performance improvements ==== | ==== Performance improvements ==== | ||

Testing on a Skylake high-end consumer PC with a quad-core CPU, the new solver showed a reduction of 2/3rd of the computing time for the 10km ISMIP-HOM C experiments run on a 30 x 30 x 15 (13500) node mesh partitioned into 4 partitions. As the same (direct) linear solver was used, this gain was solely achieved in the assembly part. The vectorization tremendously increases the memory throughput and hence the performance. | Testing on a Skylake high-end consumer PC with a quad-core CPU, the new solver showed a reduction of 2/3rd of the computing time for the 10km ISMIP-HOM C experiments run on a 30 x 30 x 15 (13500) node mesh partitioned into 4 partitions. As the same (direct) linear solver was used, this gain was solely achieved in the assembly part. The vectorization tremendously increases the memory throughput and hence the performance. |