User Tools

Site Tools


support:systems:hpc1

Getting Started With HPC1

Note: this guide is a work in progress.

Logging In

HPC1 is accessible using SSH in a terminal emulator.

SSH Keys

An SSH key is required to log in. SSH keys are generated as a matched pair of a private key and a public key. Keep your private key safe and use a strong, memorable passphrase.

We support one key per user. If you need to access the cluster from multiple computers, such as a desktop and a laptop, copy your private key. Directions are on the ssh key page.

Note that if you forget your passphrase or lose your private key, we cannot reset it. You'll need to generate a new key pair, following the same directions as when you first created it.

Visit the ssh key page for much more information on generating and using an ssh key on a PC, Mac, or Linux computer.

Terminal Emulator Software

You will need terminal emulator software to log into HPC1 and run jobs. The software you choose must be able to use ssh keys to connect. This information is typically available in the documentation for the software. Common software choices include:

terminal.app or iTerm 2 are common MacOS terminal emulator options.

PuTTY is the most common free and open-source terminal emulator for Windows. How to configure PuTTY for your SSH keys to connect to a cluster.)

Mobaxterm is a more complex and less-recommended terminal emulator for Windows.

Windows Subsystem for Linux can provide a Linux terminal within Windows 10. Once you have it installed, you can follow the Linux-based directions to generate a key pair and use ssh at the command-line to connect.

Windows Terminal for Windows 10 can provide a terminal experience much like Linux or MacOS. Preview code may be available Microsoft's Github.

Connecting

Once you have an SSH key and your account has been created, you can connect to HPC1. In most text-based terminal emulators (Linux and MacOS), this is how you will connect:

ssh yourusername@hpc1.engr.ucdavis.edu

Transferring Your Data

File Transfer Software

Filezilla is a multi-platform client commonly used to transfer data to and from the cluster.

WinSCP is Windows-only file transfer software.

rsync and scp are command-line tools to transfer data to and from the cluster.

Example Transfer Commands

These commands should be run on your computer, not on HPC1.

To transfer something to HPC1 from another computer:

scp -r local-directory username@hpc1.engr.ucdavis.edu:~/destination/

To transfer something from HPC1 to another computer:

scp -r username@hpc1.engr.ucdavis.edu:~/mydata local-directory

See man scp and man rsync for more information.

Using Software and Modules

HPC1 has many software packages available for a wide range of needs. Most packages that are installed are available as environment modules using the module avail command. Use module load <module/version> to load a module, and module unload <module/version> when done.

Generally, use as few modules as possible at a time–once you're done using a particular piece of software, unload the module before you load another one, to avoid incompatibilities.

Many of the most up-to-date Python-based software packages may be found under the bio3 module. Load the module with module load bio3 and run conda list to see a complete and up-to-date list.

Many additional Python 2 packages may be found under the bio module. Note that the bio and bio3 modules are mutually incompatible with one another, so do not load both at the same time.

Visit the Environments page for much more information on getting started with software and the modules command on the cluster.

If you can't find a piece of software on the cluster, you can request an installation for cluster-wide use. Contact the helpdesk with the name of the cluster, your username, the name of the software, and a link to the software's website, documentation, or installation directions, if applicable.

The /scratch/ Directory and Disk I/O

Disk I/O (input/output) happens when reading to or from a file on the hard drive. Please avoid heavy I/O in your home directory, as this degrades file server performance for everyone. If you know that your software is I/O intensive, such as software that rapidly reads/writes to many files, performs many small reads/writes, and so on, you may want to copy your data out of your home directory and onto the compute node as a part of your batch job, or the network file system (NFS) can bottleneck, slowing down both your job and others, as well.

To prevent NFS bottlenecking, HPC1 supports the use of the /scratch/ directory on the compute nodes when you have I/O-intensive code that needs temporary file space. Each compute node has its own scratch directory of about 1TB.

Please create a unique directory for each job when you use scratch space, such as /scratch/your-username/job-id/, to avoid collisions with other users or yourself. For example, in your sbatch script, you can use /scratch/$USER/$SLURM_JOBID or /scratch/$USER/$SLURM_JOBID/$SLURM_ARRAY_TASK_ID (for array jobs).

When your job is finished, copy any results/output that you wrote to your /scratch subdirectory (if any) and remove ALL of your files from your /scratch location.

Note that /scratch/ is a shared space and a limited resource. It is your responsibility to clean up your scratch space when your job is done or the space will fill up and be unusable by anyone.

The /scratch/ directory is subject to frequent purges, so do not attempt to store anything there longer than it takes your job to run.

If you would like to purchase additional scratch space for yourself or your lab group, contact the helpdesk for more information.

Using the Batch Queue

The batch queue system allows for provisioning of resources (nodes, CPUS, memory, and time) on a shared system, and is a key feature of computing in a cluster environment. HPC1 uses the SLURM batch manager to handle jobs and passes them off to compute nodes for execution through the sbatch and srun commands.

Do not skip the batch queue by running your compute tasks directly on the head/login node. This degrades performance of the entire cluster for all users. Jobs found running outside of the queue will be terminated and your account may be suspended until you contact the helpdesk.

Batch Partitions

The batch queue is divided into job priority queues called partitions. Access to a particular partition is determined by your college, department, lab, or sponsor's contribution to the cluster by buying nodes. You will be informed what partitions you have access to when you receive your account creation email.

HPC1's primary partitions include:

  • low, med, high - CPU compute nodes
  • gpu - GPU nodes

Other partitions may exist on a per-user or lab/group basis.

Choosing a Partition:

Low priority - jobs in this queue may killed at any time when superceded by jobs in the medium or high partitions, with the possibility of being restarted at a later time when there are resources available again. The low queue is useful for soaking up unused cycles with short jobs, and is a particularly good fit for large array jobs with short run times. Low priority jobs can use more resources than your group paid for, if there are no other higher-priority jobs.

Medium priority - jobs in this queue may be temporarily suspended when superceded by jobs in the high partition, but will resume when higher priority job finishes. Medium jobs can also use more resources than your group paid for, if there are no higher-priority jobs. It is NOT recommended to run MPI jobs in medium.

High priority - jobs in this queue will kill/suspend lower priority jobs. Jobs in high will keep the allocated hardware until it's done (or there's a system or power failure.) Jobs in the high partition are limited to using the number of CPUs that your group contributed to the cluster. This partition is recommended for MPI jobs.

For more information about submitting your job to the batch queue with the sbatch and srun commands, visit our SLURM page.

Additional Help

For additional help with logging in, software or job-related problems, to request the installation of a software package for cluster-wide use, or other issues not listed here, contact the helpdesk to open a trouble ticket.

When contacting help for job-related issues, please ALWAYS include the complete prompt and command that you tried including the cluster, directory, username, command, arguments, job number, and any output/results that you received so that we can quickly begin troubleshooting your issue. For example:

user@cluster:~$ sbatch myjob.sh
   Submitted batch job 12345678

For software requests or other issues where a command prompt is not applicable, please include your cluster username and the name of the cluster in your message.

support/systems/hpc1.txt · Last modified: 2019/10/10 16:54 by tdthatch