Gromacs
From UF HPC Wiki
Contents |
General Information
For more information concerning the use of Gromacs, the developers have setup a Wiki page of their own: http://wiki.gromacs.org/index.php/Main_Page
Version 4.0.4
Intel 10 + MVAPICH 0.9.9
Build Date: 03/24/09
Compiler: Intel (10.1.015)
Libraries: Intel MKL (10.0.2.018)
MVAPICH (0.9.9)
Location: /apps/gromacs/4.0.4/intel10/mvapich/099
Build Notes
Notes:
- You must set your MPI environment via mpi-selector to mvapich_intel10-0.9.9.
- The executable was built using Intel 10 FFTs with the FFTW wrappers and MKL 10 LaPack and BLAS.
- The executable was tested successfully with the "gmxbench" test cases from the GROMACS web site.
- The full path to the executables is /apps/gromacs/4.0.4/intel10/mvapich/099
The following commands were run to configure and build the executable.
setenv CC mpicc
setenv CFLAGS "-axW"
setenv CPPFLAGS "-I/opt/intel/mkl/10.0.2.018/include/fftw"
setenv LDFLAGS "-L/opt/intel/mkl/10.0.2.018/lib/em64t"
setenv LDFLAGS "$LDFLAGS -lfftw3xf_intel -lmkl_lapack -lmkl_intel_lp64 -lmkl_sequential -lmkl_core"
setenv F77 mpif90
setenv FFLAGS "-axW"
setenv CXX mpiCC
./configure \
--prefix=/apps/gromacs/4.0.4/intel10/mvapich/099 \
--with-fft=fftw3 \
--enable-double \
--disable-software-sqrt \
--enable-prefetch-forces \
--enable-mpi \
--program-suffix=_mpi_d \
--with-external-lapack \
--with-external-blas
Sample Submission Script
This submission script was used for the "d.dppc" test case from gmxbench. Note that the "-shuffle" and "-sort" options to grompp are no longer necessary or supported.
#!/bin/csh -f
#PBS -N gromacs
#PBS -o gromacs.out
#PBS -e gromacs.err
#PBS -j oe
#PBS -m abe
#PBS -r n
#PBS -M taylor@hpc.ufl.edu
#PBS -q submit
#PBS -l nodes=2:ppn=4:infiniband:phase2
#PBS -l walltime=24:00:00
#
set MPI_THREADS = `cat $PBS_NODEFILE | wc -l`
set IbEnabled = `/usr/local/sbin/IbEnabled`
if ( $IbEnabled ) then
echo "Running on IB-enabled node set"
set MPIRUN = "mpiexec"
else
echo "MVAPICH is not supported over GigE"
exit
endif
cd $PBS_O_WORKDIR
set GROMPP = /apps/gromacs/4.0.4/intel10/mvapich/099/bin/grompp_mpi_d
echo $GROMPP -v
$GROMPP -v
set EXE = /apps/gromacs/4.0.4/intel10/mvapich/099/bin/mdrun_mpi_d
set T1 = `date +%s`
echo $MPIRUN $EXE
$MPIRUN $EXE -o md.out
set T2 = `date +%s`
set TT = `expr $T2 - $T1`
echo TT = $TT
echo "Total CPU time used" $TT
Test Runs for d.dppc
Intel 5430 @ 2.66 GHz (4 cores)
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 4 501 68.698 25.8 1.0
Comm. coord. 4 5001 40.265 15.1 0.6
Neighbor search 4 501 2358.611 886.7 35.2
Force 4 5001 3702.238 1391.9 55.2
Wait + Comm. F 4 5001 131.010 49.3 2.0
Write traj. 4 1 1.335 0.5 0.0
Update 4 5001 54.149 20.4 0.8
Constraints 4 5001 282.102 106.1 4.2
Comm. energies 4 5001 5.356 2.0 0.1
Rest 4 59.225 22.3 0.9
-----------------------------------------------------------------------
Total 4 6702.988 2520.0 100.0
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 630.000 630.000 100.0
10:30
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 184.804 6.495 1.372 17.497
Intel 5430 @ 2.66 GHz (8 cores)
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 8 501 119.920 45.1 1.7
Comm. coord. 8 5001 116.370 43.7 1.6
Neighbor search 8 501 2328.787 874.9 32.3
Force 8 5001 3774.434 1418.0 52.3
Wait + Comm. F 8 5001 288.151 108.3 4.0
Write traj. 8 1 1.371 0.5 0.0
Update 8 5001 84.702 31.8 1.2
Constraints 8 5001 413.247 155.3 5.7
Comm. energies 8 5001 12.239 4.6 0.2
Rest 8 79.618 29.9 1.1
-----------------------------------------------------------------------
Total 8 7218.837 2712.0 100.0
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 339.000 339.000 100.0
5:39
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 343.666 12.075 2.549 9.415
Intel 5462 @ 2.8 GHz (8 cores)
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 8 501 82.594 29.6 1.2
Comm. coord. 8 5001 73.809 26.5 1.1
Neighbor search 8 501 2308.059 827.3 33.9
Force 8 5001 3626.676 1299.9 53.3
Wait + Comm. F 8 5001 281.823 101.0 4.1
Write traj. 8 1 1.546 0.6 0.0
Update 8 5001 47.468 17.0 0.7
Constraints 8 5001 316.583 113.5 4.7
Comm. energies 8 5001 14.276 5.1 0.2
Rest 8 54.705 19.6 0.8
-----------------------------------------------------------------------
Total 8 6807.538 2440.0 100.0
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 305.000 305.000 100.0
5:05
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 381.925 13.420 2.833 8.471
Opteron 275 @ 2.2 GHz (4 cores)
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 4 501 150.203 68.2 1.7
Comm. coord. 4 5001 50.819 23.1 0.6
Neighbor search 4 501 2858.946 1298.6 31.5
Force 4 5001 4783.492 2172.8 52.7
Wait + Comm. F 4 5001 202.525 92.0 2.2
Write traj. 4 2 2.495 1.1 0.0
Update 4 5001 188.199 85.5 2.1
Constraints 4 5001 626.113 284.4 6.9
Comm. energies 4 5001 10.429 4.7 0.1
Rest 4 197.112 89.5 2.2
-----------------------------------------------------------------------
Total 4 9070.334 4120.0 100.0
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 1030.000 1030.000 100.0
17:10
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 113.087 3.974 0.839 28.605
Opteron 275 @ 2.2 GHz (8 cores)
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 8 501 152.261 69.2 1.6
Comm. coord. 8 5001 113.786 51.7 1.2
Neighbor search 8 501 2842.908 1292.4 30.6
Force 8 5001 4760.846 2164.3 51.2
Wait + Comm. F 8 5001 488.859 222.2 5.3
Write traj. 8 1 1.756 0.8 0.0
Update 8 5001 124.185 56.5 1.3
Constraints 8 5001 613.724 279.0 6.6
Comm. energies 8 5001 42.557 19.3 0.5
Rest 8 150.557 68.4 1.6
-----------------------------------------------------------------------
Total 8 9291.439 4224.0 100.0
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 528.000 528.000 100.0
8:48
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 220.501 7.749 1.637 14.664
Intel 10 + OpenMPI 1.2.7
Build Date: 03/23/09 Compiler: Intel 10 Libraries: Intel MKL Location: /apps/gromacs/4.0.4/intel10/ompi/127
Test Runs for d.dppc
Intel 5462 @ 2.80 GHz (4 cores)
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 4 501 71.797 25.7 1.1
Comm. coord. 4 5001 43.950 15.7 0.6
Neighbor search 4 501 2294.827 822.4 33.9
Force 4 5001 3711.944 1330.2 54.9
Wait + Comm. F 4 5001 172.184 61.7 2.5
Write traj. 4 1 1.208 0.4 0.0
Update 4 5001 57.161 20.5 0.8
Constraints 4 5001 333.691 119.6 4.9
Comm. energies 4 5001 14.659 5.3 0.2
Rest 4 62.876 22.5 0.9
-----------------------------------------------------------------------
Total 4 6764.296 2424.0 100.0
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 606.000 606.000 100.0
10:06
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 192.262 6.756 1.426 16.830
Intel 5462 @ 2.80 GHz (8 cores)
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 8 501 82.059 29.4 1.2
Comm. coord. 8 5001 77.741 27.8 1.1
Neighbor search 8 501 2298.189 823.0 33.7
Force 8 5001 3612.843 1293.7 53.0
Wait + Comm. F 8 5001 303.867 108.8 4.5
Write traj. 8 1 1.293 0.5 0.0
Update 8 5001 47.378 17.0 0.7
Constraints 8 5001 323.778 115.9 4.8
Comm. energies 8 5001 12.283 4.4 0.2
Rest 8 54.549 19.5 0.8
-----------------------------------------------------------------------
Total 8 6813.980 2440.0 100.0
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 305.000 305.000 100.0
5:05
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 381.794 13.416 2.833 8.471
Intel 5462 @ 2.80 GHz (16 cores)
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 16 501 150.212 54.0 2.1
Comm. coord. 16 5001 159.797 57.5 2.2
Neighbor search 16 501 2329.117 837.4 31.9
Force 16 5001 3772.718 1356.5 51.7
Wait + Comm. F 16 5001 307.170 110.4 4.2
Write traj. 16 1 1.563 0.6 0.0
Update 16 5001 57.076 20.5 0.8
Constraints 16 5001 424.124 152.5 5.8
Comm. energies 16 5001 35.244 12.7 0.5
Rest 16 60.957 21.9 0.8
-----------------------------------------------------------------------
Total 16 7297.976 2624.0 100.0
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 164.000 164.000 100.0
2:44
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 710.116 25.010 5.269 4.555
Intel 5462 @ 2.80 GHz (32 cores)
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 32 501 219.475 78.6 2.8
Comm. coord. 32 5001 426.671 152.9 5.4
Neighbor search 32 501 2255.270 808.1 28.4
Force 32 5001 3614.747 1295.3 45.5
Wait + Comm. F 32 5001 638.409 228.8 8.0
Write traj. 32 1 2.085 0.7 0.0
Update 32 5001 43.577 15.6 0.5
Constraints 32 5001 540.924 193.8 6.8
Comm. energies 32 5001 151.298 54.2 1.9
Rest 32 55.597 19.9 0.7
-----------------------------------------------------------------------
Total 32 7948.054 2848.0 100.0
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 89.000 89.000 100.0
1:29
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 1309.349 46.306 9.710 2.472
Intel 5462 @ 2.80 GHz (64 cores)
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 64 501 400.293 144.6 4.3
Comm. coord. 64 5001 727.051 262.6 7.9
Neighbor search 64 501 2333.636 842.9 25.3
Force 64 5001 3615.026 1305.7 39.2
Wait + Comm. F 64 5001 697.219 251.8 7.6
Write traj. 64 1 3.644 1.3 0.0
Update 64 5001 42.050 15.2 0.5
Constraints 64 5001 909.290 328.4 9.9
Comm. energies 64 5001 431.830 156.0 4.7
Rest 64 54.117 19.5 0.6
-----------------------------------------------------------------------
Total 64 9214.158 3328.0 100.0
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 52.000 52.000 100.0
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 2240.762 79.190 16.619 1.444
Intel 5462 @ 2.80 GHz (128 cores)
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 128 501 976.685 353.9 6.9
Comm. coord. 128 5001 1431.929 518.8 10.1
Neighbor search 128 501 2410.015 873.2 17.1
Force 128 5001 3543.238 1283.8 25.1
Wait + Comm. F 128 5001 1533.045 555.5 10.8
Write traj. 128 1 13.197 4.8 0.1
Update 128 5001 43.235 15.7 0.3
Constraints 128 5001 2146.484 777.7 15.2
Comm. energies 128 5001 1974.616 715.5 14.0
Rest 128 58.302 21.1 0.4
-----------------------------------------------------------------------
Total 128 14130.746 5120.0 100.0
-----------------------------------------------------------------------
NOTE: 14 % of the run time was spent communicating energies,
you might want to use the -nosum option of mdrun
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 40.000 40.000 100.0
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 2912.069 102.957 21.604 1.111
Serial (Intel 10)
Test Runs for d.dppc
Opteron 275 @ 2.20 GHz
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Neighbor search 1 501 5190.689 2359.5 40.9
Force 1 5001 6827.533 3103.5 53.8
Write traj. 1 7 3.103 1.4 0.0
Update 1 5001 122.871 55.9 1.0
Constraints 1 5001 407.663 185.3 3.2
Rest 1 128.625 58.5 1.0
-----------------------------------------------------------------------
Total 1 12680.484 5764.0 100.0
-----------------------------------------------------------------------
NODE (s) Real (s) (%)
Time: 5672.870 5764.000 98.4
1h34:32
(Mnbf/s) (MFlops) (ns/day) (hour/ns)
Performance: 20.532 723.250 0.152 157.548
Intel 5462 @ 2.66 GHz
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Neighbor search 1 501 3279.484 1174.3 42.0
Force 1 5001 3998.769 1431.9 51.2
Write traj. 1 4 1.634 0.6 0.0
Update 1 5001 120.140 43.0 1.5
Constraints 1 5001 303.224 108.6 3.9
Rest 1 99.592 35.7 1.3
-----------------------------------------------------------------------
Total 1 7802.845 2794.0 100.0
-----------------------------------------------------------------------
NODE (s) Real (s) (%)
Time: 2768.610 2794.000 99.1
46:08
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 42.069 1.482 0.312 76.890
Performance Summary
| Processor | Cores | MPI | Mnbf/s | GFlops | ns/day | hour/ns |
|---|---|---|---|---|---|---|
| 5462 | 1 | Serial | 42.069 | 1.482 | 0.312 | 76.890 |
| 5462 | 4 | ompi 1.2.7 | 192.262 | 6.756 | 1.426 | 16.830 |
| 5462 | 8 | ompi 1.2.7 | 381.794 | 13.416 | 2.833 | 8.471 |
| 5462 | 8 | mvapich 0.9.9 | 381.925 | 13.420 | 2.833 | 8.471 |
| 5462 | 16 | ompi 1.2.7 | 710.116 | 25.010 | 5.269 | 4.555 |
| 5462 | 32 | ompi 1.2.7 | 1309.349 | 46.306 | 9.710 | 2.472 |
| 5462 | 64 | ompi 1.2.7 | 2240.762 | 79.190 | 16.619 | 1.444 |
| 5462 | 128 | ompi 1.2.7 | 2912.069 | 102.957 | 21.604 | 1.111 |
