This document describes how to get started using your shiny new computing account. If you don't yet have an account please see this FAQ.
We try to use SSL/Encryption wherever possible. Usually this means you have a private key/certificate that you use to access our computing resources. A private key is just that, private. Don't share this with anyone, make it readable by anyone, send it over unencrypted email, post it to reddit, etc. We encourage our users to only install their private key on the machines they sit at, and trust. Don't use it at an internet cafe or a hacked machine.
For more information on how to keep an ssh key safe please see this help document.
The easiest way to get help is to Contact Us. We have a ticket tracker and that is the fastest way to get help. If it is an emergency you can call us. For less urgent/specific help you can search this wiki. All of our documentation is stored here.
We try to provide a useful research environment with the minimal limitations. However:
All of our computing resources use a Batch Queue. There are many benefits to using a batch queue on a compute cluster. We currently use Slurm for batch queue management. We no longer support Sun Grid Engine or Condor.
The general idea with a batch queue is that you don't have to babysit your jobs. You submit it, and it'll run until it dies, or there is a problem. You can configure it to notify you via email when that happens. This allows very efficient use of the cluster. You can still babysit/debug your jobs if you wish using an interactive session (ie qlogin).
Our main concern is that all jobs go through the batch queuing system. Do not bypass the batch queue. We don't lock anything down but that doesn't mean we can't or won't. If you need to retrieve files from a compute node feel free to ssh directly to it and get them, but don't impact other jobs that have gone through the queue.
Please read our Slurm page for more information about using the queue system.
All of our clusters have a local disk on the compute nodes. If your job is I/O intensive please don't hammer your /home
directory. Instead you can use the scratch space (either /scratch
or /state/partition1
).
Another thing to note is that you should make sure you have a unique directory for each job. If you happen to run multiple jobs at the same time you don't want to have them both using the same scratch space. We recommend something like: /scratch/username/jobid
In you submit scripts (or interactive sessions) please remember to clean up any temporary files from the compute node scratch space. If everyone fails at this there will be no more scratch space.
Feel free to compile code on the frontend. Don't worry about logging into a compute node interactively to compile. Make sure to do a module load before configuring and compiling any source for any needed libraries or compilers.
We support a bunch of different open source and commercial software products. We might already have what you need installed. You can use the module
command to load/unload/browse those packages. Here are a few common commands:
module avail
(show available software modules)module list
(show modules currently loaded)module load foo
(load the foo module)module unload foo
(unload the foo module)module purge
(unload all modules)
Check out the modules man page (run: man module
) for more info. If you want to compile and/or run a serial code module load gcc
is a good place to start. If you want to compile and/or run a parallel code module load gcc openmpi
is a good place to start.
For application specific module related commands please see our documentation
If you have any additional software packages that you feel would be of use to others on your cluster please let us know.