User Tools

Site Tools


support:systems:farm

FARM Documentation

Farm is a research and teaching cluster for the College of Agricultural and Environmental Sciences. This page documents the hardware, software, and policies surrounding this resource. The announcement archives are available online.

Announcements

Announcement notifications are sent to a internally maintained mailing list. If you are a user of this cluster you will be added to the mailing lists automatically.

Access Policy

All researchers in CA&ES are entitled to free access to the original 8 nodes with 24 CPUs each and 64GB ram in the low, med, and high partitions.

Any new nodes purchased will be in the “Farm III” pool separate from existing farm partitions. Currently (June 2019) the “Farm III” pool has 24 “parallel” nodes and 13 “bigmem” nodes.

Costs to add to farm III:

  • $1,000 for 10TB of storage, this does NOT include backups
  • $4,000 per partial GPU node (1 of 8 Titan RTX 24GB, 3 of 24 Xeon Cores, 96GB of 768GB RAM)
  • $8,800 for a “parallel” node with 256GB ram and 64 CPUs (in the low2, med2, and high2 partitions)
  • $15,000 for a “bigmem” node with 1TB of ram and 96 CPUs (in the bmh, bmm, and bml partitions)

Everyone gets free access to a common storage pool for 1TB per user. If you need more please email help@cse.ucdavis.edu. Unless special arrangements are made there are no backups. Please plan accordingly.

Researchers with bigger storage needs can purchase storage at a rate of $1,000 per 10TB.

For access to the cluster, please fill out the Account Request Form. Choose “Farm” for the cluster, and if your PI already has access to Farm, select their name from the dropdown. Otherwise, select “Getchell” as your sponsor and notify Adam Getchell. Please review the Getting Started section.

Purchases can be split among different groups, but have to accumulate until a full node is purchased. Use of resources can be partitioned according to financial contribution.

Monitoring

Operating System

Farm II cluster runs Ubuntu 18.04. Farm II uses the slurm batch queue manager. System configuration and management is via cobbler and puppet.

Software

Requests for any centrally installed software should go to help@cse.ucdavis.edu. Any software that is available in CentOS/Ubuntu is also available for installation or already installed on this cluster. In many cases we compile and install our own software packages. These custom packages include compilers, mpi layers, open source packages, commercial packages, HDF, NetCDF, WRF, and others. We use Environment Modules to manage the environment. A quick intro:

  • To get a list of available applications and libraries - module avail
  • To setup your command line or script based environment - module load <directory/application>

Documentation on some of the custom installed software is at HPC Software Documentation. An (outdated) list is at Custom Software. Best to use the “module avail” command for the current list of installed software.

Hardware

Interconnect

  • Farm III: 2 x 36 port 100Gbps Infiniband switches
  • Farm II: Three 36 port QDR (40Gbps) Infiniband switches
  • Farm I: Collection of 1Gbitx48 switches

Large Memory Nodes on Farm (bigmem partition)

  • Farm III - 13 1TB Bigmem nodes with 96 CPUs
  • Farm II - 9 512GB Bigmem nodes with 64 CPUs and 1 1024GB node with 96 CPUs.

Parallel Nodes on Farm

  • Farm III -
    • 24 256GB Parallel nodes with 64 CPUs and 256GB ram.
  • Farm II -
    • 95 nodes with 32 CPUs and 64GB ram

Farm II Interactive head node

  • Agri, 12 cores/24 threads total, Intel Xeon E5-2620, 64 GB RAM (Samsung M393B1K70DH0-CK0 8x8GiB), 1x1TB Seagate ST1000NM0011 drive

File Servers on Farm II

  • NAS-8-0, 9x11TB = 100TB Usable
  • NAS-8-1, 3x 7TB = 21TB
  • NAS-8-2, 3x11T and 2*17B = 67TB
  • NAS-8-3, 4x11TB = 44TB
  • NAS-9-0, 6×8.2TB = 50TB
  • NAS-9-1, 6×8.2TB = 50TB
  • NAS-9-2, collection = 30TB
  • NAS-10-1, 22T * 9 = 200TB
  • NAS-10-3, collection = 50TB
  • NAS-10-4, collection = 131TB
  • NAS-11-0, 9x28TB = 252TB
  • NAS-11-1, 9x22TB = 200TB
  • NAS-11-2, 9x22TB = 200TB
  • NAS-12-1, 9x28TB = 252TB
  • NAS-12-2, 4*87TB = 350TB
  • NAS-12-3, 4*87TB = 350TB

Total usable (not including filesytem and RAID overhead) disk space around 2.3PB

Infiniband interconnect currently on Farm II (32Gbps) (high, low, and medium queue)

  • 2 48-port HP 1810-48G switches
  • 1 KVM console
  • 2 APC racks
  • 4 managed PDUs

Batch Partitions

Low priority means that you might be killed at any time. Great for soaking up unused cycles with short jobs; a particularly good fit for large array jobs with short run times.

Medium priority means you might be suspended, but will resume when a high priority job finishes. *NOT* recommended for MPI jobs. Up to 100% of idle resources can be used.

High priority - your job will kill/suspend lower priority jobs. High priority means your jobs will keep the allocated hardware until it's done or there's a system or power failure. Limited to the number of CPUs your group contributed. Recommended for MPI jobs.

  • low- Parallel, Infiniband nodes at low priority
  • med- Parallel, Infiniband nodes at medium priority
  • high- Parallel, Infiniband nodes at high priority
  • bigmeml- Large memory nodes at low priority
  • bigmemm- Large memory nodes at medium priority
  • bigmemh- Large memory nodes at high priority

Farm II's bigmem partitions are named bml, bmm, and bmh.

Additional information for prospective investors

You will be notified when your hardware has been installed and your account has been updated. Rather than giving you unlimited access to the specific hardware purchased, the account update will give you high-priority access to a “fair share” of resources equivalent to the purchase. For example, if you have purchased one compute node, you will always have hi-priority access to one compute node. There is no need to worry about the details of which node you are using or which machine is storing your results. Those details are handled by the system administrators and managed directly by slurm.

Slurm is configured you get 100% of the resources you paid for within 1 minute in the high partition. Access to unused resources is available though the medium and low partitions. Users get a “fair share” of free resources. So if you contribute twice as much as another user you get twice as large a share of any free resources.

support/systems/farm.txt · Last modified: 2021/02/26 09:57 by omen