Farm is a research and teaching cluster for the College of Agricultural and Environmental Sciences. This page documents the hardware, software, and policies surrounding this resource. The announcement archives are available online.
Announcement notifications are sent to a internally maintained mailing list. If you are a user of this cluster you will be added to the mailing lists automatically.
All researchers in CA&ES are entitled to free access to the original 8 nodes with 24 CPUs each and 64GB ram in the low, med, and high partitions.
Any new nodes purchased will be in the “Farm III” pool separate from existing Farm partitions. Currently (July 2021) the “Farm III” pool has 38 “parallel” nodes and 18 “bigmem” nodes.
Costs to add to Farm III:
Everyone gets free access to a common storage pool for 1TB per user. If you need more please email farm-hpc@ucdavis.edu. Unless special arrangements are made there are no backups. Please plan accordingly.
Researchers with bigger storage needs can purchase storage at a rate of $1,000 per 10TB.
For access to the cluster, please fill out the Account Request Form. Choose “Farm” for the cluster, and if your PI already has access to Farm, select their name from the dropdown. Otherwise, select “Getchell” as your sponsor and notify Adam Getchell. Please review the Getting Started section.
Purchases can be split among different groups, but have to accumulate until a full node is purchased. Use of resources can be partitioned according to financial contribution.
Ganglia is available at http://stats.cse.ucdavis.edu/ganglia/?c=Agri&m=load_one&r=hour&s=descending&hc=4&mc=2
Farm III cluster runs Ubuntu 18.04. Farm III uses the slurm batch queue manager. System configuration and management is via cobbler and puppet.
Requests for any centrally installed software should go to farm-hpc@ucdavis.edu. Any software that is available in CentOS/Ubuntu is also available for installation or already installed on this cluster. In many cases we compile and install our own software packages. These custom packages include compilers, mpi layers, open source packages, commercial packages, HDF, NetCDF, WRF, and others. We use Environment Modules to manage the environment. A quick intro:
module avail
module load <directory/application>
Documentation on some of the custom installed software is at HPC Software Documentation. An (outdated) list is at Custom Software. Best to use the “module avail” command for the current list of installed software.
Interconnect
Large Memory Nodes on Farm (bigmem partition)
Parallel Nodes on Farm
Farm II Interactive head node
File Servers on Farm II
Total usable (not including filesytem and RAID overhead) disk space around 2.3PB
Infiniband interconnect currently on Farm II (32Gbps) (high, low, and medium queue)
Low priority means that you might be killed at any time. Great for soaking up unused cycles with short jobs; a particularly good fit for large array jobs with short run times.
Medium priority means you might be suspended, but will resume when a high priority job finishes. *NOT* recommended for MPI jobs. Up to 100% of idle resources can be used.
High priority - your job will kill/suspend lower priority jobs. High priority means your jobs will keep the allocated hardware until it's done or there's a system or power failure. Limited to the number of CPUs your group contributed. Recommended for MPI jobs.
Farm II's bigmem partitions are named bml, bmm, and bmh.
You will be notified when your hardware has been installed and your account has been updated. Rather than giving you unlimited access to the specific hardware purchased, the account update will give you high-priority access to a “fair share” of resources equivalent to the purchase. For example, if you have purchased one compute node, you will always have hi-priority access to one compute node. There is no need to worry about the details of which node you are using or which machine is storing your results. Those details are handled by the system administrators and managed directly by slurm.
Slurm is configured you get 100% of the resources you paid for within 1 minute in the high partition. Access to unused resources is available though the medium and low partitions. Users get a “fair share” of free resources. So if you contribute twice as much as another user you get twice as large a share of any free resources.