Job Submission Queues
From UF HPC Wiki
|
The University of Florida High Performance Computing Center uses the Torque scheduler and Maui to handle job submissions to the cluster. This allows us to provide a flexible system of job handling, allowing for small jobs to be run in conjunction with large jobs on a fair basis.
[edit] Torque Submission Queue
[edit] Torque Documentation
[edit] Routing
How do I specify my queue status?
The quick answer is, you don't. By default the queue is actually one called "submit", but this is actually a routing queue which the PBS system uses to look at your job and figure out where it should actually go. If your job is only requesting a single CPU, it will be routed to the serial queue automatically. If it were to ask for 16 processors, the queuing system would look at the time requirements you defined in the job and associate it with the short, medium, or long queue instead.
Basically, you don't have to worry about the queue, as the system will figure it out for you.
[edit] Test Queue
There is now a test queue that can be used in order to test a job on a much smaller set of nodes. This queue has the following attributes:
- Two nodes, each with four processors and 4gb of RAM
- Infiniband connection
- Walltime limit of 10 minutes
You can use this queue by adding a PBS directive to your submission script:
#PBS -q testq
It should also be noted that the machines in this queue are also used for interactive usage of the cluster, so your test job may be slowed by other things occurring on the test nodes.
[edit] Checking job status
The qstat command displays job status information. To check the status of jobs in the default queue:
qstat [-u <username>]
[edit] Deleting a job
The qdel command is what you are looking for. Basic version is typically
qdel <job id>
If you have a large number of jobs that need to be killed, you can use the following command:
qdelmine
[edit] Memory Settings
The queuing system is now enforcing memory limits on job. If you do not specify a "pmem" limit for your job you will receive the default pmem limit of 900mb. This corresponds to:
#PBS -l pmem=900mb
If your job uses more memory per thread than this and you do not explicitly ask for more, your job will be killed my the maui scheduler. If your job uses less than this and you do not specify, you will simply be telling maui that your job needs more memory than it actually uses which will make that memory unavailable to other jobs. It is best if your pmem setting accurately reflects your jobs actual memory requirements.
[edit] Hardware Awareness
Be aware of the hardware that you are submitting jobs to. What is meant by this is that you should know the capabilities of the hardware that you are submitting jobs to. For instance, if you have a job that requires 60gb of memory in order to run, and you want to run it on 64 processors, knowing that we have nodes with four processors and 8gb of RAM is a good thing, because that means that you could possibly run the job on 16 of those machines instead of 64 individual nodes.
In order to find the amount of memory a job is taking, you can use qstat on the command line to find the memory usage.
The memory stats reported by qstat is for the whole job. So it is not the same as pmem - pmem is memory per task.
To get a good pmem number, you'll want to divide the memory numbers reported by qstat by the number of processors your job is using.
[edit] Walltime Settings
If you are requesting an arbitrarily long "walltime" for your jobs when they really only need three or four days, you should be aware that you are penalizing yourself. There are several different walltime and cpu limits on the various queues which we continue to adjust. However, if you ask for 2000 hours of walltime when your job really only needs 72 or 96 hours, your job will go into the "long" queue where it will wait for jobs that really do need very long wall times. So, if your job really only needs 72 hours and your walltime requests reflects that, you will go into one of the other queues and your job will run much sooner (perhaps right away).
The point is simply that the more accurately you characterize your job for PBS, the sooner your job may run and PBS will be able to do a better job of scheduling all the jobs.
Also remember that if you think you have hit a bug, Bugzilla is the best place to report it. We certainly try not to overlook emails but if we can't get to something right away, they can fall off our radar and then you might not get the help you need in a timely fashion.
Also, take a look at the Walltime Experiment for some hard data on the use of walltime in a job script and why it is good to get a reasonable walltime setting in your job scripts.
[edit] Memory Settings
When deciding on the amount of memory to use for a job in your job submission script, please try to be as accurate as possible. You want to leave yourself some leeway in how much memory you request for overhead, in case the process needs more than you estimate, but you also do not want to use more than you really need. The reason for this is that if you request significantly more than you actually use, you may prevent other jobs from running on the same node.
For instance, if you had a job that only used 400mb of memory in a single thread, it would be best to request only about 600mb of ram for that process. Something like the following PBS directive would be used, noting that the pmem directive is on a separate line:
#PBS -l nodes=1:ppn=1 #PBS -l pmem=600mb
If you were to ask for more, such as 1400mb or more, this would possibly prevent other jobs from running on the same node, thus making the cluster less efficient for everyone.
The queuing system is now enforcing memory limits on job. If you do not specify a pmem limit for your job you will receive the default pmem limit of 900mb. This corresponds to
#PBS -l pmem=900mb
If your job uses more memory per thread than this and you do not explicitly ask for more, your job will be killed my the maui scheduler. If your job uses less than this and you do not specify, you will simply be telling maui that your job needs more memory than it actually uses which will make that memory unavailable to other jobs. It is best if your pmem setting accurately reflects your jobs actual memory requirements.
Note that roughly half of the nodes in the cluster have 4GB RAM (~900mb/cpu) and the other half of the nodes of 8 GB RAM (~1800mb/cpu). So there are nodes on which you can use more than 900mb/cpu but we insist that you request so that maui will not oversubscribe the node. Also note that right now, all infiniband nodes have only 4GB of RAM. This will change around the end of June when we connect some of the newer 8GB nodes via InfiniBand.
[edit] CPU Settings
Be sure to set the number of CPUs that your job is going to use to the proper number, as this greatly helps in scheduling the job properly. If you specify a number of CPUs greater than you actually use, your job will take longer to actually start running, and also waste resources that could be used by other jobs. If you specify fewer CPUs than your job actually uses, the scheduler will place other jobs on the same CPUs that your jobs are running on, causing a slowdown of both your job and the other job, which also means a deficit of performance.
There is also the possibility in this case of having your job deleted until you fix the job script to reflect your job's actual usage.
[edit] PBS to Torque Conversion
The changes necessary for submitting jobs to the torque queueing system from your PBS scripts are relatively minor. For instance, to submit the following PBS script:
#PBS -q testq #PBS -l nodes=1:ppn=1:mem=1500mb #PBS -l place=scatter #PBS -l walltime=0:05:00
to the Torque queue, the following changes would need to be made:
#PBS -l nodes=1:ppn=1 #PBS -l walltime=0:05:00
The place option does not exist in the Torque scheduler, so it will not be used.
[edit] Submitting multiple jobs with the same executable
If you are in a situation where you need to submit many jobs, all of which are using the same executable with different command line variables, then you may want to take a look at some scripts that have been created for aiding in this.
[edit] Frequently Asked Questions
How do I submit a job to the Altix queue?
First of all, the Altix queue is only for those that are authorized to use the system. If your job is rejected with a message saying you are not authorized, this is why. Second, in order to actually submit a job to the Altix, you need to do one of the following:
- Add a flag to your qsub command such as this: "qsub -q altixq <jobscript>"
- Add a flag to your job script, like this: "#PBS -q altixq"
Note that your job must be submitted from submit, not from Altix.
How can I take an executable and run N copies of it on N nodes?
It sounds like you are asking about a feature known to some batch systems as "job arrays" or "array jobs". That is, you want to send N independent (or nearly so) jobs to the batch system with a single "qsub" command. Unfortunately, our batch system (Torque resource manager plus Maui scheduler) do not support "job arrays" yet. I can tell you that there is some active development in this area in Torque. At this time, it is planned to be unveiled in their 2.2 release, but there is no timeline for that.
So what do people do instead? Generally they write shell scripts which generate Torque job scripts and submit them. In your case, if the jobs are independent of each other, you might want to submit N jobs - each of which invokes your executable upon a single processor. Would that work for you? If not, or if you want help with that, please let us know. There are other ways to achieve this.
How do I perform a quick test to ensure a newly-compiled executable is working before potentially wasting lots of node time?
You can submit a short job - that is, you can specify a maximum walltime with your job if you just want to see if it runs for a little while. Alternatively, perhaps you have a set of inputs for your job that that will naturally run fairly quickly, and tell you if you are getting good answers. In addition, we now have the test queue which you can use as well. See above for details on that.
Why do we use an Intel compiler on AMD chips?
When the first half of the current incarnation of the UF HPC Center was purchased back in the fall of 2005, we purchased a license for the Pathscale compilers at the same time. For the x86_64 CPU architecture (Opterons), Pathscale was a good choice at the time for a highly optimizing compiler suite. We had access to licenses for the Intel compilers at this time, as well. That said, Pathscale was the HPC Center compiler of choice until the fall of 2006 or so. When it came time to renew the license, we elected not to do so.
Why? Well, there are a few reasons. First, there is money. The Pathscale and Intel compilers are not free, and the HPC Center operations budget is meager. Second, in the experience we gained with the compilers over the first year, we found no compelling reason to favor Pathscale over the Intel compilers. Third, we submitted a bug report to Pathscale during that first year - they never got back to us with a fix, or any substantial communication from them at all. So we dropped Pathscale.
The Intel compilers generate optimized code for the x64_64 arch just fine in our experience.
Of course, GCC is always there if you want to use it!
