Frequently Asked Questions

From UF HPC Wiki

Jump to: navigation, search

Contents

Compilers

Why do we use an Intel compiler on AMD chips?

When the first half of the current incarnation of the UF HPC Center was purchased back in the fall of 2005, we purchased a license for the Pathscale compilers at the same time. For the x86_64 CPU architecture (Opterons), Pathscale was a good choice at the time for a highly optimizing compiler suite. We had access to licenses for the Intel compilers at this time, as well. That said, Pathscale was the HPC Center compiler of choice until the fall of 2006 or so. When it came time to renew the license, we elected not to do so.

Why? Well, there are a few reasons. First, there is money. The Pathscale and Intel compilers are not free, and the HPC Center operations budget is meager. Second, in the experience we gained with the compilers over the first year, we found no compelling reason to favor Pathscale over the Intel compilers. Third, we submitted a bug report to Pathscale during that first year - they never got back to us with a fix, or any substantial communication from them at all. So we dropped Pathscale.

The Intel compilers generate optimized code for the x64_64 arch just fine in our experience.

Of course, GCC is always there if you want to use it.

Scratch Space

Can I submit jobs from /scratch or do I have to submit them from home? I had always been submitting them from home.

Yes! This is not a problem at all and in reality you should be doing this, as the home area is hugely inefficient for this sort of work.

Can the program my scripts run be located on /scratch rather than home? I had the program in home, but if I can move it to scratch and submit from /scratch, that would be the simplest solution.

Sure thing. This will probably help in the long run as the scratch filesystem is a faster one that your home area.

Should I avoid jobs reading data in my home directory, too?

Yes. You should both read and write data for your programs from the scratch area as it is much more efficient.

If I copy the scripts to my /scratch directory, what modifications to my script will need to be made?

You will have to at least put in a CD command to change directory to the right place. Otherwise the commands may not work properly on standard input.

Account Functions

Password

How do I reset my password?

There are two ways in which you can reset your password. The first is from within your account while you are logged in, in which case you would use the passwd command like so:

[jka@submit ~]$ passwd
Changing password for user jka.
Enter login(LDAP) password: 
New UNIX password: 
Retype new UNIX password: 
New password: 
Re-enter new password: 
LDAP password information changed for jka
passwd: all authentication tokens updated successfully.

Yes, you have to put in your new password a total of four times. This is caused by a problem we are having with LDAP. Also note that it asks for your current password the first time through.

If you cannot remember what your password is, you can reset it through the web as well. Simply go to the Password Reset Page and authenticate via Gatorlink and you should have no problem with resetting your password.

If these two methods fail, you can still email support@hpc.ufl.edu and ask the administrators to reset your password for you.

Security

How secure is my account on the system? Can anyone access or see my directory?

It is only as secure as the linux/unix permissions you use to protect it. If you don't want other users or members of your group to see your data, you need to set your file permission accordingly.

Test Nodes

I see from your Wiki that there are machines that can be used to run short test scripts in order to reduce the chances of me hosing up anything important. How exactly is this done?

  1. To access the "test" nodes you first ssh to submit.hpc.ufl.edu (our primary login host). From there, you can ssh into any or all of them as you wish. Since the "test" nodes are on our private network, they are not accessible directly from outside hosts.
  2. On the test nodes you can do pretty much anything you wish as long as you are considerate of other users an don't monopolize the resources. These nodes are intended for you to be able to develop, test, debug scripts and programs as needed.
  3. You can do this interactively initially and when you are ready to submit through the queue, you can test your submission script on the test nodes as well since they are dedicated to the "testq" queue. In other words, get your code and submission scripts working properly using the "test" nodes and when you are sure everything works, you can submit your jobs, be they a few or many, with confidence that once they are scheduled they will run correctly.

Infiniband versus Ethernet

I am running parallel program on HPC and I used 4 processors. It works but I got a message:

--------------------------------------------------------------------------
[0,1,0]: OpenIB on host r6b-s35.ufhpc was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,1]: OpenIB on host r6b-s35.ufhpc was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,2]: OpenIB on host r6b-s35.ufhpc was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,3]: OpenIB on host r6b-s35.ufhpc was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
-------------------------------------------------------------------------- 

You are getting that message because your job was scheduled on ethernet-only nodes (r6). The default/preferred transport for OpenMPI is InfiniBand but it will also work w/ non-IB transports but you get the message below. You can avoid the message two ways.

  • Request IB Nodes:
#PBS -l nodes=n:ppn=1:infiniband
  • Add the following logic to your submission script:
if [ `/usr/local/sbin/IbEnabled` -gt 0 ] ; then
    echo "Running on IB-enabled node set"
    MPIRUN="mpirun --mca btl openib"
else
    echo "Running on GigE-enabled node set"
    MPIRUN="mpirun --mca btl ^udapl,openib --mca btl_tcp_if_include eth0"
fi
Personal tools