Note: this guide is a work in progress.
HPC1 is accessible using SSH in a terminal emulator.
An SSH key is required to log in. SSH keys are generated as a matched pair of a private key and a public key. Keep your private key safe and use a strong, memorable passphrase.
We support one key per user. If you need to access the cluster from multiple computers, such as a desktop and a laptop, copy your private key. Directions are on the ssh key page.
Note that if you forget your passphrase or lose your private key, we cannot reset it. You'll need to generate a new key pair, following the same directions as when you first created it.
Visit the ssh key page for much more information on generating and using an ssh key on a PC, Mac, or Linux computer.
You will need terminal emulator software to log into HPC1 and run jobs. The software you choose must be able to use ssh keys to connect. This information is typically available in the documentation for the software. Common software choices include:
Mobaxterm is a more complex and less-recommended terminal emulator for Windows.
Windows Subsystem for Linux can provide a Linux terminal within Windows 10. Once you have it installed, you can follow the Linux-based directions to generate a key pair and use ssh at the command-line to connect.
Once you have an SSH key and your account has been created, you can connect to HPC1. In most text-based terminal emulators (Linux and MacOS), this is how you will connect:
Filezilla is a multi-platform client commonly used to transfer data to and from the cluster.
WinSCP is Windows-only file transfer software.
rsync and scp are command-line tools to transfer data to and from the cluster.
These commands should be run on your computer, not on HPC1.
To transfer something to HPC1 from another computer:
scp -r local-directory firstname.lastname@example.org:~/destination/
To transfer something from HPC1 to another computer:
scp -r email@example.com:~/mydata local-directory
man scp and
man rsync for more information.
HPC1 has many software packages available for a wide range of needs. Most packages that are installed are available as environment modules using the
module avail command. Use
module load <module/version> to load a module, and
module unload <module/version> when done.
Generally, use as few modules as possible at a time–once you're done using a particular piece of software, unload the module before you load another one, to avoid incompatibilities.
Many of the most up-to-date Python-based software packages may be found under the
bio3 module. Load the module with
module load bio3 and run
conda list to see a complete and up-to-date list.
Many additional Python 2 packages may be found under the
bio module. Note that the
bio3 modules are mutually incompatible with one another, so do not load both at the same time.
Visit the Environments page for much more information on getting started with software and the
modules command on the cluster.
If you can't find a piece of software on the cluster, you can request an installation for cluster-wide use. Contact the helpdesk with the name of the cluster, your username, the name of the software, and a link to the software's website, documentation, or installation directions, if applicable.
Disk I/O (input/output) happens when reading to or from a file on the hard drive. Please avoid heavy I/O in your home directory, as this degrades file server performance for everyone. If you know that your software is I/O intensive, such as software that rapidly reads/writes to many files, performs many small reads/writes, and so on, you may want to copy your data out of your home directory and onto the compute node as a part of your batch job, or the network file system (NFS) can bottleneck, slowing down both your job and others, as well.
To prevent NFS bottlenecking, HPC1 supports the use of the
/scratch/ directory on the compute nodes when you have I/O-intensive code that needs temporary file space. Each compute node has its own scratch directory of about 1TB.
Please create a unique directory for each job when you use scratch space, such as
/scratch/your-username/job-id/, to avoid collisions with other users or yourself. For example, in your sbatch script, you can use
/scratch/$USER/$SLURM_JOBID/$SLURM_ARRAY_TASK_ID (for array jobs).
When your job is finished, copy any results/output that you wrote to your
/scratch subdirectory (if any) and remove ALL of your files from your
/scratch/ is a shared space and a limited resource. It is your responsibility to clean up your scratch space when your job is done or the space will fill up and be unusable by anyone.
/scratch/ directory is subject to frequent purges, so do not attempt to store anything there longer than it takes your job to run.
If you would like to purchase additional scratch space for yourself or your lab group, contact the helpdesk for more information.
The batch queue system allows for provisioning of resources (nodes, CPUS, memory, and time) on a shared system, and is a key feature of computing in a cluster environment. HPC1 uses the SLURM batch manager to handle jobs and passes them off to compute nodes for execution through the
Do not skip the batch queue by running your compute tasks directly on the head/login node. This degrades performance of the entire cluster for all users. Jobs found running outside of the queue will be terminated and your account may be suspended until you contact the helpdesk.
The batch queue is divided into job priority queues called partitions. Access to a particular partition is determined by your college, department, lab, or sponsor's contribution to the cluster by buying nodes. You will be informed what partitions you have access to when you receive your account creation email.
HPC1's primary partitions include:
Other partitions may exist on a per-user or lab/group basis.
Low priority - jobs in this queue may killed at any time when superceded by jobs in the medium or high partitions, with the possibility of being restarted at a later time when there are resources available again. The low queue is useful for soaking up unused cycles with short jobs, and is a particularly good fit for large array jobs with short run times. Low priority jobs can use more resources than your group paid for, if there are no other higher-priority jobs.
Medium priority - jobs in this queue may be temporarily suspended when superceded by jobs in the high partition, but will resume when higher priority job finishes. Medium jobs can also use more resources than your group paid for, if there are no higher-priority jobs. It is NOT recommended to run MPI jobs in medium.
High priority - jobs in this queue will kill/suspend lower priority jobs. Jobs in high will keep the allocated hardware until it's done (or there's a system or power failure.) Jobs in the high partition are limited to using the number of CPUs that your group contributed to the cluster. This partition is recommended for MPI jobs.
For more information about submitting your job to the batch queue with the sbatch and srun commands, visit our SLURM page.
For additional help with logging in, software or job-related problems, to request the installation of a software package for cluster-wide use, or other issues not listed here, contact the helpdesk to open a trouble ticket.
When contacting help for job-related issues, please ALWAYS include the complete prompt and command that you tried including the cluster, directory, username, command, arguments, job number, and any output/results that you received so that we can quickly begin troubleshooting your issue. For example:
user@cluster:~$ sbatch myjob.sh Submitted batch job 12345678
For software requests or other issues where a command prompt is not applicable, please include your cluster username and the name of the cluster in your message.