User Tools

Site Tools


support:systems:gauss

Gauss Documentation

This page documents the hardware, software, and policies surrounding this resource.

Operating System

This cluster currently runs Ubuntu version 14.04 for the x86-64 with custom additions by CSE. The major customizations include (but are not limited to) ganglia, OpenMPI, cobbler, slurm, and puppet.

Software

Requests for any centrally installed software should go to help@cse.ucdavis.edu. Any software that is available in Ubuntu is also available for installation or already installed on this cluster. We have set up a page that has details on the current software configuration. We use Environment Modules to manage the environment. A quick intro:

  • To get a list of available applications and libraries - module avail
  • To setup your command line or script based environment - module load <directory/application>

Documentation on some of the custom installed software is at HPC Software Documentation.

Monitoring

Ganglia is available but an SSH tunnel is required to view. For example:

ssh -L 50087:gauss.cse.ucdavis.edu:80 terri@gauss.cse.ucdavis.edu

The URL to get to the ganglia page would then be http://localhost:50087/ganglia

In a Linux environment you might create an alias in your .bashrc file:

alias gaussganglia='ssh -L 50087:gauss.cse.ucdavis.edu:80 terri@gauss.cse.ucdavis.edu'

The command gaussganglia would then create the tunnel.

Hardware

The hardware for Gauss is made up of the following:

Head Node

  • 2 AMD Opteron Interlagos 6272 processor, 16 cores per processor, 2.1GHz, 16MB L3 Cache
  • Approximately 11TB of usable disk (8x2TB Western Digital RE4 WD2003FYYS 2TB)
  • 64GB DDR3 RAM
  • 10 G uplink (Intel Corporation 82599EB 10-Gigabit SFI/SFP+)

Compute Nodes

  • 12 compute nodes
  • 2 AMD Opteron 6272 Processors per node, 16 cores per processor, 2.1GHz, 16MB L3 Cache
  • 64GB DDR3 RAM per node
  • 1 1TB drive per node: Seagate Constellation ES ST1000NM0011 1TB 7200
  • IPMI 2.0 for on/off/console.

The rest

  • 1 48-port HP ProCurve 2910AL 10G switch
  • 1 KVM console
  • 1 APC rack
  • 1 managed PDU

Submitting Jobs on Gauss

Standard R jobs can be submitted on Gauss by creating scripts of the form sample.sh:

#!/bin/sh

#SBATCH --job-name=test

module load R

srun R --vanilla < script.R

Then just run as:

jim@gauss:~$ sbatch ./sample.sh
Submitted batch job 31231

You can check the status of the job using:

$ cat slurm-31231.out
# any output to STDOUT would be in this file

Similarly, suppose your username is 'jim' then typing:

jim@gauss:~$ squeue -u jim

would show the status of your jobs.

Array Jobs on Gauss

Gauss is currently set up to accept array jobs. In short, an array job allows users to:

  • To submit multiple jobs from the same script
  • Each job can receive customized arguments

For example, suppose Jim wants to test his new statistical methodology, so he decides to conduct a simulation study. He decides he would like to simulate 100 datasets, and fit each simulated dataset with his new method. He will then combine the results to study the properties of his procedure. If he does this in a single R job he would do:

for (i in 1:100){
   # generate dataset i...
   # analyze dataset i...
   # store results for dataset i...
}

Suppose each dataset takes 1 hour for simulation and analysis, the full simulation study would take 100 hours. Using an array job on Gauss, Jim can potentially save a lot of time with minimal changes to his code. First, he will need a sbatch script, lets suppose this is saved as jimsjob.sh:

#!/bin/bash -l

###############################################################################
##
## NOTES:
##
## Submit as:
## 
##    sbatch ./jimsjob.sh
## 
## (1) When specifying --array as a range it must start from a positive
##     integer e.g.,
##       sbatch --array=0-9 
##     is not allowed.
##
## (2) Negative numbers are not allowed in --range
##     e.g.,
##      sbatch --array=-5,-4,-3,-2,-1,0,1,2,3,4,5
##     is NOT allowed.
##
## (3) Zero can be included if specified separately.
##    e.g., 
##       sbatch --array=0,1-9
##     is allowed.
##
## (4) Ranges can be combined with specified job numbers.
##    e.g., 
##       sbatch --array=0,1-4,6-10,50-100
##     is allowed.
##
###############################################################################

# Load R module:
module load R/2.15.1

# Name of the job - you'll probably want to customize this.
#SBATCH --job-name=jimsjob

# Tell Gauss how much memory per CPU your job will use:
#SBATCH --mem-per-cpu=1000

# Array job specifications:
#SBATCH --array=1-100

# Email notifications (optional), type=BEGIN, END, FAIL, ALL
# Uncomment, if desired:
# #SBATCH --mail-type=ALL
# #SBATCH --mail-user=jimsemail@ucdavis.edu

# Standard out and Standard Error output files with the job number in the name.
#SBATCH -o conv_%j.out
#SBATCH -e conv_%j.err

# Execute each of the jobs with a different index (the R script will then process
# this to do something different for each index):
srun R --vanilla --no-save --args ${SLURM_ARRAY_TASK_ID} < /pathtojimsrscript/jimsrscript.R 

The main elements of the script are just setting up Gauss, and can be customized in many different ways. Please see the full SLURM documentation for more details.

  • It is vitally important that users request sufficient resources for their array jobs
  • All Gauss users must use the cluster responsibly and fairly, be mindful of this when submitting large jobs

What the jimsjob.sh script will do when submitted, as per the last line of the script, is submit 100 R jobs each receiving a different command-line argument i.e., it is equivalent to running:

R -- vanilla --no-save --args 1 < /pathtojimsrscript/jimsrscript.R 
R -- vanilla --no-save --args 2 < /pathtojimsrscript/jimsrscript.R 
R -- vanilla --no-save --args 3 < /pathtojimsrscript/jimsrscript.R 
...
R -- vanilla --no-save --args 100 < /pathtojimsrscript/jimsrscript.R 

If Jim's R script doesn't use the command line argument, then it will just do the same thing 100 times (not useful). However, if Jim modifies his R script to be of the form:

#============================== Setup for running on Gauss... ==============================#

args <- commandArgs(TRUE)

# This will print the command line argument e.g., a number from 1 to 100
cat("Command-line arguments:\n")
print(args)

####
# sim_start ==> Lowest dataset number to be analyzed by this particular batch job
###

###################
sim_start <- 0
###################

if (length(args)==0){
  sim_num <- sim_start + 1
  set.seed(121231)
  sinkit <- FALSE
} else {
  sim_num <- sim_start + as.numeric(args[1])
  set.seed(762*sim_num + 121231)
}

i <- sim_num
sinkfile <- paste("output_progress_",i,".txt",sep="")

cat(paste("\nAnalyzing dataset number ",i,"...\n\n",sep=""))
    
#============================== Run the simulation study ==============================#

gauss <- TRUE

if (gauss){
  setwd("~/my_favourite_output_directory/")
}

if (sinkit){
  cat(paste("Sinking output to: ",sinkfile,"\n",sep=""))
  sink(sinkfile)
}  

# Load dataset i...
load(paste("Data_",i,".RData",sep=""))

# Now, run the inner-part of Jim's old for-loop...

   # generate dataset i...
   # analyze dataset i...
   # store results for dataset i...

# Save dataset i...
save.image(paste("Dataset_",i,"_Results.RData",sep=""))

This will allow each R job to analyze a different dataset, the first job will analyze dataset 1, the second job will analyze dataset 2 and so forth. Assuming sufficient resources are available on Gauss, jobs can run concurrently and the full simulation study could potentially be completed in 1 hour, rather than 100 hours. Some key things to note about the R wrapper:

  • Each job number must use a different random seed, otherwise all 100 simulated datasets will be identical!
  • The set of 100 jobs will produce 100 .RData files, these must then be post-processed by another R script.

There are many ways to customize array jobs, Jim's example being just one of them.

So, to run the array job Jim would run:

jim@gauss:~$ sbatch ./jimsjob.sh

To check the status of the jobs he would do:

jim@gauss:~$ squeue

To delete all of his jobs he would do:

jim@gauss:~$ scancel jim

where 'jim' is his Gauss username. Again, please consult the full SLURM documentation for more details.

Introduction to Gauss Slides

Slides from the “Introduction to Gauss” session hosted by Paul Baines on November 20th 2012 are available in pdf here intro_to_gauss_slides.pdf or PowerPoint here intro_to_gauss_slides.pptx.

Note that the code for the “Bob” examples are essentially identical to the code given in the explanations above, so you should use those as your starting point. Happy coding!

MDCS on Gauss - Setup

Create MDCS data location

nehad@gauss:~$ mkdir -p ~/MdcsDataLocation/gauss/R2012a

Launch MATLAB

nehad@gauss:~$ module load matlab/7.14
nehad@gauss:~$ matlab

Import cluster settings

  • Select “Import Cluster Profile” from the “Parallel” menu
  • Navigate to /share/apps/anismail/gauss.settings

Validate cluster profile

  • Select “Manage Cluster Profiles” from the “Parallel” menu
  • Hightlight the imported profile
  • Click “Set as Default”
  • Click “Validate”

Set your email address

>> ClusterInfo.setEmailAddress('nehad@ucdavis.edu')

MDCS on Gauss - Usage

Assume the following test M file (~/test.m)

% sleeps for 2 seconds 16 times
function test
tic,parfor x=1:16,pause(2),end,toc

Launch MATLAB

nehad@gauss:~$ module load matlab/7.14
nehad@gauss:~$ matlab

Interactive Demo (16 workers)

Before pool

>> test
Elapsed time is 32.138099 seconds.

After pool

>> matlabpool(16)
>> test
Elapsed time is 2.064622 seconds.
>> matlabpool close
Sending a stop signal to all the labs ... stopped.

Non-Interactive Demo (16 workers - requires 17)

>> batch(@test,0,'matlabpool',16,'CaptureDiary',true)

I can now close MATLAB, disconnect from Gauss, and wait for an email from the scheduler.

Once my job has completed, I can reconnect to Gauss and launch MATLAB to get my output.

>> myCluster = parcluster('gauss');
>> job45 = myCluster.findJob('ID',45)
>> diary(job45)
Elapsed time is 2.458482 seconds.

MDCS Licenses

There are 64 MDCS worker licenses available. Your job will not run until the requisiste number of workers are available. Non-interactive jobs require n+1 workers.

You can check usage by running “lmat” and looking at the last field

nehad@gauss:~$ lmat
  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)  LICENSES
 120309     debug    Job46    nehad  R       0:03      1           c0-13    mdcs*1

In the above case only 1 MDCS worker is in use, so 63 are available

anismail@gauss:~$ lmat
  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)  LICENSES
 120310     debug    Job47    nehad  R       0:09      2      c0-[18-19]   mdcs*64

In this case all 64 MDCS workers are in use, so my job will be queued until enough workers are available.

support/systems/gauss.txt · Last modified: 2019/03/08 10:28 by tdthatch