Gromacs

From UF HPC Wiki

Jump to: navigation, search

Contents

General Information

For more information concerning the use of Gromacs, the developers have setup a Wiki page of their own: http://wiki.gromacs.org/index.php/Main_Page


Version 4.0.4

Intel 10 + MVAPICH 0.9.9

Build Date: 03/24/09
Compiler: Intel (10.1.015)
Libraries: Intel MKL (10.0.2.018)
           MVAPICH (0.9.9)
Location: /apps/gromacs/4.0.4/intel10/mvapich/099

Build Notes

Notes:

  1. You must set your MPI environment via mpi-selector to mvapich_intel10-0.9.9.
  2. The executable was built using Intel 10 FFTs with the FFTW wrappers and MKL 10 LaPack and BLAS.
  3. The executable was tested successfully with the "gmxbench" test cases from the GROMACS web site.
  4. The full path to the executables is /apps/gromacs/4.0.4/intel10/mvapich/099

The following commands were run to configure and build the executable.

setenv  CC          mpicc
setenv  CFLAGS      "-axW"
setenv  CPPFLAGS    "-I/opt/intel/mkl/10.0.2.018/include/fftw"
setenv  LDFLAGS     "-L/opt/intel/mkl/10.0.2.018/lib/em64t"
setenv  LDFLAGS     "$LDFLAGS -lfftw3xf_intel -lmkl_lapack -lmkl_intel_lp64 -lmkl_sequential -lmkl_core"
setenv  F77         mpif90
setenv  FFLAGS      "-axW"
setenv  CXX         mpiCC

  ./configure  \
              --prefix=/apps/gromacs/4.0.4/intel10/mvapich/099 \
              --with-fft=fftw3 \
              --enable-double \
              --disable-software-sqrt \
              --enable-prefetch-forces  \
              --enable-mpi  \
              --program-suffix=_mpi_d \
              --with-external-lapack \
              --with-external-blas 

Sample Submission Script

This submission script was used for the "d.dppc" test case from gmxbench. Note that the "-shuffle" and "-sort" options to grompp are no longer necessary or supported.

#!/bin/csh -f
#PBS -N gromacs
#PBS -o gromacs.out
#PBS -e gromacs.err
#PBS -j oe
#PBS -m abe
#PBS -r n
#PBS -M taylor@hpc.ufl.edu
#PBS -q submit
#PBS -l nodes=2:ppn=4:infiniband:phase2
#PBS -l walltime=24:00:00
#
set MPI_THREADS = `cat $PBS_NODEFILE | wc -l`

set IbEnabled = `/usr/local/sbin/IbEnabled`
if ( $IbEnabled ) then
     echo "Running on IB-enabled node set"
     set MPIRUN = "mpiexec"
else
     echo "MVAPICH is not supported over GigE"
     exit
endif

cd $PBS_O_WORKDIR

set GROMPP = /apps/gromacs/4.0.4/intel10/mvapich/099/bin/grompp_mpi_d
echo $GROMPP -v 
$GROMPP -v 

set EXE = /apps/gromacs/4.0.4/intel10/mvapich/099/bin/mdrun_mpi_d
set T1 = `date +%s`
echo $MPIRUN $EXE 
$MPIRUN $EXE -o md.out
set T2 = `date +%s`

set TT = `expr $T2 - $T1`
echo TT = $TT
echo "Total CPU time used" $TT

Test Runs for d.dppc

Intel 5430 @ 2.66 GHz (4 cores)
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.         4        501       68.698       25.8     1.0
 Comm. coord.           4       5001       40.265       15.1     0.6
 Neighbor search        4        501     2358.611      886.7    35.2
 Force                  4       5001     3702.238     1391.9    55.2
 Wait + Comm. F         4       5001      131.010       49.3     2.0
 Write traj.            4          1        1.335        0.5     0.0
 Update                 4       5001       54.149       20.4     0.8
 Constraints            4       5001      282.102      106.1     4.2
 Comm. energies         4       5001        5.356        2.0     0.1
 Rest                   4                  59.225       22.3     0.9
-----------------------------------------------------------------------
 Total                  4                6702.988     2520.0   100.0
-----------------------------------------------------------------------

        Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:    630.000    630.000    100.0
                       10:30
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    184.804      6.495      1.372     17.497

Intel 5430 @ 2.66 GHz (8 cores)
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.         8        501      119.920       45.1     1.7
 Comm. coord.           8       5001      116.370       43.7     1.6
 Neighbor search        8        501     2328.787      874.9    32.3
 Force                  8       5001     3774.434     1418.0    52.3
 Wait + Comm. F         8       5001      288.151      108.3     4.0
 Write traj.            8          1        1.371        0.5     0.0
 Update                 8       5001       84.702       31.8     1.2
 Constraints            8       5001      413.247      155.3     5.7
 Comm. energies         8       5001       12.239        4.6     0.2
 Rest                   8                  79.618       29.9     1.1
-----------------------------------------------------------------------
 Total                  8                7218.837     2712.0   100.0
-----------------------------------------------------------------------

        Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:    339.000    339.000    100.0
                       5:39
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    343.666     12.075      2.549      9.415

Intel 5462 @ 2.8 GHz (8 cores)
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.         8        501       82.594       29.6     1.2
 Comm. coord.           8       5001       73.809       26.5     1.1
 Neighbor search        8        501     2308.059      827.3    33.9
 Force                  8       5001     3626.676     1299.9    53.3
 Wait + Comm. F         8       5001      281.823      101.0     4.1
 Write traj.            8          1        1.546        0.6     0.0
 Update                 8       5001       47.468       17.0     0.7
 Constraints            8       5001      316.583      113.5     4.7
 Comm. energies         8       5001       14.276        5.1     0.2
 Rest                   8                  54.705       19.6     0.8
-----------------------------------------------------------------------
 Total                  8                6807.538     2440.0   100.0
-----------------------------------------------------------------------

	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:    305.000    305.000    100.0
                       5:05
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    381.925     13.420      2.833      8.471
Opteron 275 @ 2.2 GHz (4 cores)
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.         4        501      150.203       68.2     1.7
 Comm. coord.           4       5001       50.819       23.1     0.6
 Neighbor search        4        501     2858.946     1298.6    31.5
 Force                  4       5001     4783.492     2172.8    52.7
 Wait + Comm. F         4       5001      202.525       92.0     2.2
 Write traj.            4          2        2.495        1.1     0.0
 Update                 4       5001      188.199       85.5     2.1
 Constraints            4       5001      626.113      284.4     6.9
 Comm. energies         4       5001       10.429        4.7     0.1
 Rest                   4                 197.112       89.5     2.2
-----------------------------------------------------------------------
 Total                  4                9070.334     4120.0   100.0
-----------------------------------------------------------------------

	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:   1030.000   1030.000    100.0
                       17:10
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    113.087      3.974      0.839     28.605
Opteron 275 @ 2.2 GHz (8 cores)
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.         8        501      152.261       69.2     1.6
 Comm. coord.           8       5001      113.786       51.7     1.2
 Neighbor search        8        501     2842.908     1292.4    30.6
 Force                  8       5001     4760.846     2164.3    51.2
 Wait + Comm. F         8       5001      488.859      222.2     5.3
 Write traj.            8          1        1.756        0.8     0.0
 Update                 8       5001      124.185       56.5     1.3
 Constraints            8       5001      613.724      279.0     6.6
 Comm. energies         8       5001       42.557       19.3     0.5
 Rest                   8                 150.557       68.4     1.6
-----------------------------------------------------------------------
 Total                  8                9291.439     4224.0   100.0
-----------------------------------------------------------------------

	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:    528.000    528.000    100.0
                       8:48
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    220.501      7.749      1.637     14.664

Intel 10 + OpenMPI 1.2.7

Build Date: 03/23/09
Compiler: Intel 10
Libraries: Intel MKL
Location: /apps/gromacs/4.0.4/intel10/ompi/127

Test Runs for d.dppc

Intel 5462 @ 2.80 GHz (4 cores)
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.         4        501       71.797       25.7     1.1
 Comm. coord.           4       5001       43.950       15.7     0.6
 Neighbor search        4        501     2294.827      822.4    33.9
 Force                  4       5001     3711.944     1330.2    54.9
 Wait + Comm. F         4       5001      172.184       61.7     2.5
 Write traj.            4          1        1.208        0.4     0.0
 Update                 4       5001       57.161       20.5     0.8
 Constraints            4       5001      333.691      119.6     4.9
 Comm. energies         4       5001       14.659        5.3     0.2
 Rest                   4                  62.876       22.5     0.9
-----------------------------------------------------------------------
 Total                  4                6764.296     2424.0   100.0
-----------------------------------------------------------------------

	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:    606.000    606.000    100.0
                       10:06
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    192.262      6.756      1.426     16.830
Intel 5462 @ 2.80 GHz (8 cores)
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.         8        501       82.059       29.4     1.2
 Comm. coord.           8       5001       77.741       27.8     1.1
 Neighbor search        8        501     2298.189      823.0    33.7
 Force                  8       5001     3612.843     1293.7    53.0
 Wait + Comm. F         8       5001      303.867      108.8     4.5
 Write traj.            8          1        1.293        0.5     0.0
 Update                 8       5001       47.378       17.0     0.7
 Constraints            8       5001      323.778      115.9     4.8
 Comm. energies         8       5001       12.283        4.4     0.2
 Rest                   8                  54.549       19.5     0.8
-----------------------------------------------------------------------
 Total                  8                6813.980     2440.0   100.0
-----------------------------------------------------------------------

	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:    305.000    305.000    100.0
                       5:05
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    381.794     13.416      2.833      8.471
Intel 5462 @ 2.80 GHz (16 cores)
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.        16        501      150.212       54.0     2.1
 Comm. coord.          16       5001      159.797       57.5     2.2
 Neighbor search       16        501     2329.117      837.4    31.9
 Force                 16       5001     3772.718     1356.5    51.7
 Wait + Comm. F        16       5001      307.170      110.4     4.2
 Write traj.           16          1        1.563        0.6     0.0
 Update                16       5001       57.076       20.5     0.8
 Constraints           16       5001      424.124      152.5     5.8
 Comm. energies        16       5001       35.244       12.7     0.5
 Rest                  16                  60.957       21.9     0.8
-----------------------------------------------------------------------
 Total                 16                7297.976     2624.0   100.0
-----------------------------------------------------------------------

	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:    164.000    164.000    100.0
                       2:44
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    710.116     25.010      5.269      4.555
Intel 5462 @ 2.80 GHz (32 cores)
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.        32        501      219.475       78.6     2.8
 Comm. coord.          32       5001      426.671      152.9     5.4
 Neighbor search       32        501     2255.270      808.1    28.4
 Force                 32       5001     3614.747     1295.3    45.5
 Wait + Comm. F        32       5001      638.409      228.8     8.0
 Write traj.           32          1        2.085        0.7     0.0
 Update                32       5001       43.577       15.6     0.5
 Constraints           32       5001      540.924      193.8     6.8
 Comm. energies        32       5001      151.298       54.2     1.9
 Rest                  32                  55.597       19.9     0.7
-----------------------------------------------------------------------
 Total                 32                7948.054     2848.0   100.0
-----------------------------------------------------------------------

	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:     89.000     89.000    100.0
                       1:29
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:   1309.349     46.306      9.710      2.472
Intel 5462 @ 2.80 GHz (64 cores)
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.        64        501      400.293      144.6     4.3
 Comm. coord.          64       5001      727.051      262.6     7.9
 Neighbor search       64        501     2333.636      842.9    25.3
 Force                 64       5001     3615.026     1305.7    39.2
 Wait + Comm. F        64       5001      697.219      251.8     7.6
 Write traj.           64          1        3.644        1.3     0.0
 Update                64       5001       42.050       15.2     0.5
 Constraints           64       5001      909.290      328.4     9.9
 Comm. energies        64       5001      431.830      156.0     4.7
 Rest                  64                  54.117       19.5     0.6
-----------------------------------------------------------------------
 Total                 64                9214.158     3328.0   100.0
-----------------------------------------------------------------------

	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:     52.000     52.000    100.0
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:   2240.762     79.190     16.619      1.444
Intel 5462 @ 2.80 GHz (128 cores)
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.       128        501      976.685      353.9     6.9
 Comm. coord.         128       5001     1431.929      518.8    10.1
 Neighbor search      128        501     2410.015      873.2    17.1
 Force                128       5001     3543.238     1283.8    25.1
 Wait + Comm. F       128       5001     1533.045      555.5    10.8
 Write traj.          128          1       13.197        4.8     0.1
 Update               128       5001       43.235       15.7     0.3
 Constraints          128       5001     2146.484      777.7    15.2
 Comm. energies       128       5001     1974.616      715.5    14.0
 Rest                 128                  58.302       21.1     0.4
-----------------------------------------------------------------------
 Total                128               14130.746     5120.0   100.0
-----------------------------------------------------------------------

NOTE: 14 % of the run time was spent communicating energies,
      you might want to use the -nosum option of mdrun


	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:     40.000     40.000    100.0
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:   2912.069    102.957     21.604      1.111

Serial (Intel 10)

Test Runs for d.dppc

Opteron 275 @ 2.20 GHz
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Neighbor search        1        501     5190.689     2359.5    40.9
 Force                  1       5001     6827.533     3103.5    53.8
 Write traj.            1          7        3.103        1.4     0.0
 Update                 1       5001      122.871       55.9     1.0
 Constraints            1       5001      407.663      185.3     3.2
 Rest                   1                 128.625       58.5     1.0
-----------------------------------------------------------------------
 Total                  1               12680.484     5764.0   100.0
-----------------------------------------------------------------------

               NODE (s)   Real (s)      (%)
       Time:   5672.870   5764.000     98.4
                       1h34:32
               (Mnbf/s)   (MFlops)   (ns/day)  (hour/ns)
Performance:     20.532    723.250      0.152    157.548
Intel 5462 @ 2.66 GHz
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Neighbor search        1        501     3279.484     1174.3    42.0
 Force                  1       5001     3998.769     1431.9    51.2
 Write traj.            1          4        1.634        0.6     0.0
 Update                 1       5001      120.140       43.0     1.5
 Constraints            1       5001      303.224      108.6     3.9
 Rest                   1                  99.592       35.7     1.3
-----------------------------------------------------------------------
 Total                  1                7802.845     2794.0   100.0
-----------------------------------------------------------------------

               NODE (s)   Real (s)      (%)
       Time:   2768.610   2794.000     99.1
                       46:08
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:     42.069      1.482      0.312     76.890

Performance Summary

Processor Cores MPI Mnbf/s GFlops ns/day hour/ns
5462 1 Serial 42.069 1.482 0.312 76.890
5462 4 ompi 1.2.7 192.262 6.756 1.426 16.830
5462 8 ompi 1.2.7 381.794 13.416 2.833 8.471
5462 8 mvapich 0.9.9 381.925 13.420 2.833 8.471
5462 16 ompi 1.2.7 710.116 25.010 5.269 4.555
5462 32 ompi 1.2.7 1309.349 46.306 9.710 2.472
5462 64 ompi 1.2.7 2240.762 79.190 16.619 1.444
5462 128 ompi 1.2.7 2912.069 102.957 21.604 1.111

Older Versions