Job Submission Queues

From UF HPC Wiki

Jump to: navigation, search

Contents

The University of Florida High Performance Computing Center uses the Torque scheduler and Maui to handle job submissions to the cluster. This allows us to provide a flexible system of job handling, allowing for small jobs to be run in conjunction with large jobs on a fair basis.

The methods by which the UF HPC Center uses to determine how to schedule jobs is detailed in the UF HPC Center Policies webpage.

Job Types

In addition to the jobs submitted by local users from different departments across the University of Florida, the cluster is also used by some external groups who have supplied financing for the cluster:

Torque Submission Queue

Torque Documentation

Torque Resource Manager

Routing

How do I specify the queue in which my jobs will run?

The quick answer is, you don't. The default queue is called "submit", and this is a routing queue which the PBS system uses to look at your job and figure out into which execution queue it should actually go. At the time of this writing, your job's walltime request determines this.

Basically, you don't have to worry about the queue, as the system will figure it out for you. All you need to do is make a reasonable estimate for the walltime your job requires, and request it with the following directive in your job script:

#PBS -l walltime=

Test Queue

There is now a test queue that can be used in order to test a job on a much smaller set of nodes. It is not routed to by the "submit queue. You must submit to it directly.

The queue has the following resources/properties:

  • Two nodes with two processors and 2gb of RAM, as well as a node with 8 processors and 16GB of RAM
  • Infiniband connections for all nodes
  • Walltime limit of 30 minutes

You can use this queue by adding a PBS directive to your submission script:

#PBS -q testq

It should also be noted that the machines in this queue are also used for interactive usage of the cluster, so your test job may be slowed by other things occurring on the test nodes.

Checking job status

The qstat command displays job status information. To check the status of jobs in the default queue:

qstat [-u <username>]

Deleting a job

The qdel command is what you are looking for. Basic version is typically

qdel <job id>

If you have a large number of jobs that need to be killed, you can use the following command:

qdelmine

Memory Settings

The queuing system is now enforcing memory limits on job. If you do not specify a "pmem" limit for your job you will receive the default pmem limit of 600mb. This corresponds to:

#PBS -l pmem=600mb

If your job uses more memory per thread than this and you do not explicitly ask for more, your job will be killed my the maui scheduler. If your job uses less than this and you do not specify, you will simply be telling maui that your job needs more memory than it actually uses which will make that memory unavailable to other jobs. It is best if your pmem setting accurately reflects your jobs actual memory requirements.

Hardware Awareness

Be aware of the hardware that you are submitting jobs to. What is meant by this is that you should know the capabilities of the hardware that you are submitting jobs to. For instance, if you have a job that requires 60gb of memory in order to run, and you want to run it on 64 processors, knowing that we have nodes with four processors and 8gb of RAM is a good thing, because that means that you could possibly run the job on 16 of those machines instead of 64 individual nodes.

In order to find the amount of memory a job is taking, you can use qstat on the command line to find the memory usage.

The memory stats reported by qstat is for the whole job. So it is not the same as pmem - pmem is memory per task.

To get a good pmem number, you'll want to divide the memory numbers reported by qstat by the number of processors your job is using.

Walltime Settings

If you are requesting an arbitrarily long "walltime" for your jobs when they really only need three or four days, you should be aware that you are penalizing yourself. There are several different walltime and cpu limits on the various queues which we continue to adjust. However, if you ask for 2000 hours of walltime when your job really only needs 72 or 96 hours, your job will go into the "long" queue where it will wait for jobs that really do need very long wall times. So, if your job really only needs 72 hours and your walltime requests reflects that, you will go into one of the other queues and your job will run much sooner (perhaps right away).

The point is simply that the more accurately you characterize your job for PBS, the sooner your job may run and PBS will be able to do a better job of scheduling all the jobs.

Also remember that if you think you have hit a bug, Bugzilla is the best place to report it. We certainly try not to overlook emails but if we can't get to something right away, they can fall off our radar and then you might not get the help you need in a timely fashion.

Also, take a look at the Walltime Experiment for some hard data on the use of walltime in a job script and why it is good to get a reasonable walltime setting in your job scripts.

Memory Settings

When deciding on the amount of memory to use for a job in your job submission script, please try to be as accurate as possible. You want to leave yourself some leeway in how much memory you request for overhead, in case the process needs more than you estimate, but you also do not want to use more than you really need. The reason for this is that if you request significantly more than you actually use, you may prevent other jobs from running on the same node.

For instance, if you had a job that only used 400mb of memory in a single thread, it would be best to request only about 600mb of ram for that process. Something like the following PBS directive would be used, noting that the pmem directive is on a separate line:

#PBS -l nodes=1:ppn=1
#PBS -l pmem=600mb

If you were to ask for more, such as 1400mb or more, this would possibly prevent other jobs from running on the same node, thus making the cluster less efficient for everyone.

The queuing system is now enforcing memory limits on job. If you do not specify a pmem limit for your job you will receive the default pmem limit of 600mb. This corresponds to

#PBS -l pmem=600mb

If your job uses more memory per thread than this and you do not explicitly ask for more, your job will be killed my the maui scheduler. If your job uses less than this and you do not specify, you will simply be telling maui that your job needs more memory than it actually uses which will make that memory unavailable to other jobs. It is best if your pmem setting accurately reflects your jobs actual memory requirements.

Note that roughly half of the nodes in the cluster have 4GB RAM (~900mb/cpu) and the other half of the nodes of 8 GB RAM (~1800mb/cpu). So there are nodes on which you can use more than 900mb/cpu but we insist that you request so that maui will not oversubscribe the node.


CPU Settings

Be sure to set the number of CPUs that your job is going to use to the proper number, as this greatly helps in scheduling the job properly. If you specify a number of CPUs greater than you actually use, your job will take longer to actually start running, and also waste resources that could be used by other jobs. If you specify fewer CPUs than your job actually uses, the scheduler will place other jobs on the same CPUs that your jobs are running on, causing a slowdown of both your job and the other job, which also means a deficit of performance.

There is also the possibility in this case of having your job deleted until you fix the job script to reflect your job's actual usage.

PBS to Torque Conversion

A long, long time ago, the UF HPC Center used PBS Pro for its resource manager and queueing system. Now, it uses Torque. You may find that you need to convert a PBS Pro job script to one that can be used with Torque.

The PBS Pro language for making resource requests is a superset of the language used by Torque. Thus, in many cases, no changes need to be made to a PBS Pro job script in order to get it to submit/run successfully with Torque. In cases where changes are necessary, they are relatively minor. Often, these changes involve the "place" job placement request - this keyword exists in PBS Pro, but not in Torque. Other times, changes may be needed to account for evolution in the HPC Center scheduling policy. For instance, to submit the following PBS Pro script:

#PBS -q testq
#PBS -l nodes=1:ppn=1:mem=1500mb
#PBS -l place=scatter
#PBS -l walltime=0:05:00

to the Torque queue, the following changes would need to be made:

#PBS -l nodes=1:ppn=1:pmem=1500mb
#PBS -l walltime=0:05:00

The place option does not exist in the Torque resource manager, so you leave it out. In addition, you should use the "pmem" attribute to request "memory per task" instead of the total memory used by the job.

Submitting multiple jobs with the same executable

If you are in a situation where you need to submit many jobs, all of which are using the same executable with different command line variables, then you may want to take a look at some scripts that have been created for aiding in this.

Job Submission Scripts

Frequently Asked Questions

How do I submit a job to the Altix queue?

First of all, the Altix queue is only for those that are authorized to use the system. If your job is rejected with a message saying you are not authorized, this is why. Second, in order to actually submit a job to the Altix, you need to do one of the following:

  1. Add a flag to your qsub command such as this: "qsub -q altixq <jobscript>"
  2. Add a flag to your job script, like this: "#PBS -q altixq"

Note that your job must be submitted from submit, not from Altix.

How can I take an executable and run N copies of it on N nodes?

It sounds like you are asking about a feature known to some batch systems as "job arrays" or "array jobs". That is, you want to send N independent (or nearly so) jobs to the batch system with a single "qsub" command. Unfortunately, our batch system (Torque resource manager plus Maui scheduler) do not support "job arrays" yet. I can tell you that there is some active development in this area in Torque. At this time, it is planned to be unveiled in their 2.2 release, but there is no timeline for that.

So what do people do instead? Generally they write shell scripts which generate Torque job scripts and submit them. In your case, if the jobs are independent of each other, you might want to submit N jobs - each of which invokes your executable upon a single processor. Would that work for you? If not, or if you want help with that, please let us know. There are other ways to achieve this.

How do I perform a quick test to ensure a newly-compiled executable is working before potentially wasting lots of node time?

You can submit a short job - that is, you can specify a maximum walltime with your job if you just want to see if it runs for a little while. Alternatively, perhaps you have a set of inputs for your job that that will naturally run fairly quickly, and tell you if you are getting good answers. In addition, we now have the test queue which you can use as well. See above for details on that.

Why do we use an Intel compiler on AMD chips?

When the first half of the current incarnation of the UF HPC Center was purchased back in the fall of 2005, we purchased a license for the Pathscale compilers at the same time. For the x86_64 CPU architecture (Opterons), Pathscale was a good choice at the time for a highly optimizing compiler suite. We had access to licenses for the Intel compilers at this time, as well. That said, Pathscale was the HPC Center compiler of choice until the fall of 2006 or so. When it came time to renew the license, we elected not to do so.

Why? Well, there are a few reasons. First, there is money. The Pathscale and Intel compilers are not free, and the HPC Center operations budget is meager. Second, in the experience we gained with the compilers over the first year, we found no compelling reason to favor Pathscale over the Intel compilers. Third, we submitted a bug report to Pathscale during that first year - they never got back to us with a fix, or any substantial communication from them at all. So we dropped Pathscale.

The Intel compilers generate optimized code for the x64_64 arch just fine in our experience.

Of course, GCC is always there if you want to use it!

Personal tools