User Tools

Site Tools


farm_guide

Getting Started With Farm

Note: this guide is a work in progress.

Logging In

Farm is accessible using SSH in a terminal emulator.

SSH Keys

An SSH key is required to log in. SSH keys are generated as a matched pair of a private key and a public key. Keep your private key safe and use a strong, memorable passphrase.

We support one key per user. If you need to access the cluster from multiple computers, such as a desktop and a laptop, copy your private key. Directions are on the ssh key page.

Note that if you forget your passphrase or lose your private key, we cannot reset it. You'll need to generate a new key pair, following the same directions as when you first created it.

Visit the ssh key page for much more information on generating and using an ssh key on a PC, Mac, or Linux computer.

Terminal Emulator Software

You will need terminal emulator software to log into Farm and run jobs. The software you choose must be able to use ssh keys to connect. This information is typically available in the documentation for the software. Common software choices include:

terminal.app or iTerm 2 are common MacOS terminal emulator options.

Mobaxterm is an all-in-one terminal emulator for Windows that gives a very Linux-like terminal environment along with the ability to edit files in a local editor and have changes automatically uploaded back to Farm.

PuTTY is the most common free and open-source terminal emulator for Windows. (How to configure PuTTY for your SSH keys to connect to a cluster.)

Windows Subsystem for Linux can provide a Linux terminal within Windows 10. Once you have it installed, you can follow the Linux-based directions to generate a key pair and use ssh at the command-line to connect.

Windows Terminal for Windows 10 can provide a terminal experience much like Linux or MacOS. Preview code may be available Microsoft's Github.

Connecting

Once you have an SSH key and your account has been created, you can connect to Farm. In most text-based terminal emulators (Linux and MacOS), this is how you will connect:

ssh yourusername@farm.cse.ucdavis.edu

Transferring Your Data

Farm has a dedicated node for file transfers. When transferring large amounts of data to or from the cluster, you can specify port 2022 with your transfer software to connect to the transfer node. For all other work and to submit jobs to the cluster, connect to the login node on port 22.

The Farm transfer node is being phased out. You can transfer through the login node on port 22 for now.

Farm uses SSH key-pairs ONLY so you need to point any local scp/sftp clients at that the same private part that you use to SSH to Farm.

File Transfer Software

Filezilla is a multi-platform client commonly used to transfer data to and from the cluster.

Cyberduck is another popular file transfer client for Mac or Windows computers that has the ability to edit files in a local editor and have changes automatically uploaded back to Farm.

WinSCP is Windows-only file transfer software.

Globus is another common solution, especially for larger transfers.

rsync and scp are command-line tools to transfer data to and from the cluster.

Example Transfer Commands

These commands should be run on your computer, not on Farm.

To transfer something to Farm from your local computer:

scp -r local-directory username@farm.cse.ucdavis.edu:~/destination/

Note: outbound scp initiated from Farm is disabled. Please initiate an inbound scp using the above method.

To transfer something from Farm to your local computer:

scp -r username@farm.cse.ucdavis.edu:~/farmdata local-directory

To use rsync to transfer a file or directory from Farm to your local computer:

rsync -aP -e ssh username@farm.cse.ucdavis.edu:~/farmdata .

rsync has the advantage that if the connection is interrupted for any reason you can just up-arrow and run the exact same command again and it will resume where it stopped.

See man scp and man rsync for more information.

Using Software and Modules

Farm has many software packages available for a wide range of needs. Most packages that are installed are available as environment modules using the module avail command. Use module load <module/version> to load a module, and module unload <module/version> when done.

Generally, use as few modules as possible at a time–once you're done using a particular piece of software, unload the module before you load another one, to avoid incompatibilities.

Many of the most up-to-date Python-based software packages may be found under the bio3 module. Load the module with module load bio3 and run conda list to see a complete and up-to-date list.

Many additional Python 2 packages may be found under the bio module. Note that the bio and bio3 modules are mutually incompatible with one another, so do not load both at the same time.

Visit the Environments page for much more information on getting started with software and the modules command on the cluster.

If you can't find a piece of software on the cluster, you can request an installation for cluster-wide use. Contact the helpdesk with the name of the cluster, your username, the name of the software, and a link to the software's website, documentation, or installation directions, if applicable.

The /scratch/ Directory and Disk I/O

Disk I/O (input/output) happens when reading to or from a file on the hard drive. Please avoid heavy I/O in your home directory, as this degrades file server performance for everyone. If you know that your software is I/O intensive, such as software that rapidly reads/writes to many files, performs many small reads/writes, and so on, you may want to copy your data out of your home directory and onto the compute node as a part of your batch job, or the network file system (NFS) can bottleneck, slowing down both your job and others, as well.

To prevent NFS bottlenecking, Farm supports the use of the /scratch/ directory on the compute nodes when you have I/O-intensive code that needs temporary file space. Each compute node has its own independent scratch directory of about 1TB.

Please create a unique directory for each job when you use scratch space, such as /scratch/your-username/job-id/, to avoid collisions with other users or yourself. For example, in your sbatch script, you can use /scratch/$USER/$SLURM_JOBID or /scratch/$USER/$SLURM_JOBID/$SLURM_ARRAY_TASK_ID (for array jobs).

When your job is finished, copy any results/output that you wrote to your /scratch subdirectory (if any) and remove ALL of your files from your /scratch location.

Note that /scratch/ is a shared space between everyone who runs jobs on a node, and is a limited resource. It is your responsibility to clean up your scratch space when your job is done or the space will fill up and be unusable by anyone.

/scratch/ is local to each node, and is not shared between nodes and the login node so you will need to perform setup and cleanup tasks at the start and end of every job run. If you do not cleanup at the end of every run you will leave remnants behind that will eventually fill the shared space.

The /scratch/ directory is subject to frequent purges, so do not attempt to store anything there longer than it takes your job to run.

If you would like to purchase additional scratch space for yourself or your lab group, contact the helpdesk for more information.

Using the Batch Queue

Job scheduling with SLURM is a key feature of computing on the cluster.

A job in the context of the cluster is a running piece of software performing some kind of function, such as computation, analysis, simulation, analysis, modeling, comparing, sorting, and other research-related tasks.

The job scheduler or batch queue system allows for the fair provisioning of limited resources (nodes, CPUS, memory, and time) on a shared system.

Farm uses the SLURM job scheduler to manage user jobs, passing the work to the compute nodes for execution, primarily through the use of sbatch and srun commands. Jobs are placed in a queue and executed according to a priority system.

Do not skip the batch queue by running your compute tasks directly on the head/login node.

Running jobs on the login node degrades performance for all users and can damage the cluster. Jobs found running outside of the job queue will be terminated and your account may be temporarily suspended until you contact the helpdesk, so that the admins can work with you to help you run your job most effectively without damaging the cluster.

Batch Partitions

The batch queue is divided into job priority queues called partitions. Access to a particular partition is determined by your college, department, lab, or sponsor's contribution to the cluster by buying nodes. You will be informed what partitions you have access to when you receive your account creation email.

Farm's primary partitions include:

  • low, med, high - Farm II CPU compute nodes
  • low2, med2, high2 - Farm III CPU compute nodes
  • bigmeml, bigmemm, bigmemh - Farm II high-memory compute nodes
  • bml, bmm, bmh - Farm III high-memory compute nodes

When purchasing a node, it will typically be added to the pool of nodes in the latest generation of Farm (Farm III, as of 2019) unless special arrangements are made.

Choosing a Partition:

Low priority - jobs in this queue may killed at any time when superceded by jobs in the medium or high partitions, with the possibility of being restarted at a later time when there are resources available again. The low queue is useful for soaking up unused cycles with short jobs, and is a particularly good fit for large array jobs with short run times. Low priority jobs can use more resources than your group paid for, if there are no other higher-priority jobs.

Medium priority - jobs in this queue may be temporarily suspended when superceded by jobs in the high partition, but will resume when higher priority job finishes. Medium jobs can also use more resources than your group paid for, if there are no higher-priority jobs. It is NOT recommended to run MPI jobs in medium.

High priority - jobs in this queue will kill/suspend lower priority jobs. Jobs in high will keep the allocated hardware until it's done (or there's a system or power failure.) Jobs in the high partition are limited to using the number of CPUs that your group contributed to the cluster. This partition is recommended for MPI jobs.

For more information about submitting your job to the batch queue with the sbatch and srun commands, visit our SLURM page.

Additional Help

For additional help with logging in, software or job-related problems, to request the installation of a software package for cluster-wide use, or other issues not listed here, contact the helpdesk to open a trouble ticket.

When contacting help for job-related issues, please ALWAYS include the complete prompt and command that you tried including the cluster, directory, username, command, arguments, job number, and any output/results that you received so that we can quickly begin troubleshooting your issue. For example:

user@cluster:~$ sbatch myjob.sh
   Submitted batch job 12345678

For software requests or other issues where a command prompt is not applicable, please include your cluster username and the name of the cluster in your message.

farm_guide.txt · Last modified: 2021/05/07 09:43 by omen