Batch Script Explanation

From UF HPC Wiki

Jump to: navigation, search

The idea behind making a PBS job script is pretty straightforward - at the top of the job script, you put some PBS directives describing your job and resource request, and you follow that by putting the same commands you would use if you were running the job interactively.

Let's start with this. Log into submit. Create a file called matlab.job with the following contents in the same directory as your twoSat_main executable:

#!/bin/sh
#
#PBS -r n
#PBS -N matlab
#PBS -o matlab.out
#PBS -e matlab.err
#PBS -m abe
#PBS -m nosuchuser@ufl.edu
#PBS -l nodes=1:ppn=1
#PBS -l pmem=600mb
#PBS -l walltime=06:00:00
#

cd $PBS_O_WORKDIR
./twoSat_main

Going through the lines one-by-one, let's see what the above says:

  • Line 1: This is a Bourne shell script. If you don't know what the Bourne shell is, don't worry about it for now.
  • Line 2: just a comment
  • Line 3: this tells PBS that the job is not rerunnable. That is, if the job dies, don't automatically requeue it.
  • Line 4: the name of this job is "matlab". This is the name you will see associated with the job in 'qstat' output.
  • Line 5: the STDOUT of the job should go to a file 'matlab.out'.
  • Line 6: the STDERR of the job should go to a file 'matlab.err'.
  • Line 7: Send an email when/if the job aborts, begins, or ends.
  • Line 8: Send any emails to "nosuchuser@ufl.edu". In this case this is a false email address, but you would want to substitute it with your own email address so that you would receive the notification emails.
  • Line 9: this requests one node and one "processor per node" (ppn) to run your job.
  • Line 10: this requests 600MB of RAM "per task". We only have one task, so the value of this request is the total amount of memory we'll need to run the job.
  • Line 11: this requests 6 hours of walltime to run the job
  • Line 14: this says that the first thing we do when the job starts is to "change directory to the place where the job was submitted from".
  • Line 15: Runs your program.

Once you've created that file, you can type:

qsub matlab.job

That will submit the job to the batch system. It will report back to you a job ID which looks something like 4037225.torx.ufhpc.

You can monitor the status of your job by typing qstat <your_job_id>.

I'd recommend the following resources for additional information:

  • man pbs_resources
  • man qstat
  • man qsub