This page documents the hardware, software, and policies surrounding this resource.
This cluster currently runs Ubuntu version 14.04 for the x86-64 with custom additions by CSE. The major customizations include (but are not limited to) ganglia, OpenMPI, cobbler, slurm, and puppet.
Requests for any centrally installed software should go to help@cse.ucdavis.edu. Any software that is available in Ubuntu is also available for installation or already installed on this cluster. We have set up a page that has details on the current software configuration. We use Environment Modules to manage the environment. A quick intro:
module avail
module load <directory/application>
Documentation on some of the custom installed software is at HPC Software Documentation.
Ganglia is available but an SSH tunnel is required to view. For example:
ssh -L 50087:gauss.cse.ucdavis.edu:80 terri@gauss.cse.ucdavis.edu
The URL to get to the ganglia page would then be http://localhost:50087/ganglia
In a Linux environment you might create an alias in your .bashrc file:
alias gaussganglia='ssh -L 50087:gauss.cse.ucdavis.edu:80 terri@gauss.cse.ucdavis.edu'
The command gaussganglia
would then create the tunnel.
The hardware for Gauss is made up of the following:
Head Node
Compute Nodes
The rest
Standard R jobs can be submitted on Gauss by creating scripts of the form sample.sh:
#!/bin/sh #SBATCH --job-name=test module load R srun R --vanilla < script.R
Then just run as:
jim@gauss:~$ sbatch ./sample.sh Submitted batch job 31231
You can check the status of the job using:
$ cat slurm-31231.out # any output to STDOUT would be in this file
Similarly, suppose your username is 'jim' then typing:
jim@gauss:~$ squeue -u jim
would show the status of your jobs.
Gauss is currently set up to accept array jobs. In short, an array job allows users to:
For example, suppose Jim wants to test his new statistical methodology, so he decides to conduct a simulation study. He decides he would like to simulate 100 datasets, and fit each simulated dataset with his new method. He will then combine the results to study the properties of his procedure. If he does this in a single R job he would do:
for (i in 1:100){ # generate dataset i... # analyze dataset i... # store results for dataset i... }
Suppose each dataset takes 1 hour for simulation and analysis, the full simulation study would take 100 hours. Using an array job on Gauss, Jim can potentially save a lot of time with minimal changes to his code. First, he will need a sbatch script, lets suppose this is saved as jimsjob.sh:
#!/bin/bash -l ############################################################################### ## ## NOTES: ## ## Submit as: ## ## sbatch ./jimsjob.sh ## ## (1) When specifying --array as a range it must start from a positive ## integer e.g., ## sbatch --array=0-9 ## is not allowed. ## ## (2) Negative numbers are not allowed in --range ## e.g., ## sbatch --array=-5,-4,-3,-2,-1,0,1,2,3,4,5 ## is NOT allowed. ## ## (3) Zero can be included if specified separately. ## e.g., ## sbatch --array=0,1-9 ## is allowed. ## ## (4) Ranges can be combined with specified job numbers. ## e.g., ## sbatch --array=0,1-4,6-10,50-100 ## is allowed. ## ############################################################################### # Load R module: module load R/2.15.1 # Name of the job - you'll probably want to customize this. #SBATCH --job-name=jimsjob # Tell Gauss how much memory per CPU your job will use: #SBATCH --mem-per-cpu=1000 # Array job specifications: #SBATCH --array=1-100 # Email notifications (optional), type=BEGIN, END, FAIL, ALL # Uncomment, if desired: # #SBATCH --mail-type=ALL # #SBATCH --mail-user=jimsemail@ucdavis.edu # Standard out and Standard Error output files with the job number in the name. #SBATCH -o conv_%j.out #SBATCH -e conv_%j.err # Execute each of the jobs with a different index (the R script will then process # this to do something different for each index): srun R --vanilla --no-save --args ${SLURM_ARRAY_TASK_ID} < /pathtojimsrscript/jimsrscript.R
The main elements of the script are just setting up Gauss, and can be customized in many different ways. Please see the full SLURM documentation for more details.
What the jimsjob.sh script will do when submitted, as per the last line of the script, is submit 100 R jobs each receiving a different command-line argument i.e., it is equivalent to running:
R -- vanilla --no-save --args 1 < /pathtojimsrscript/jimsrscript.R R -- vanilla --no-save --args 2 < /pathtojimsrscript/jimsrscript.R R -- vanilla --no-save --args 3 < /pathtojimsrscript/jimsrscript.R ... R -- vanilla --no-save --args 100 < /pathtojimsrscript/jimsrscript.R
If Jim's R script doesn't use the command line argument, then it will just do the same thing 100 times (not useful). However, if Jim modifies his R script to be of the form:
#============================== Setup for running on Gauss... ==============================# args <- commandArgs(TRUE) # This will print the command line argument e.g., a number from 1 to 100 cat("Command-line arguments:\n") print(args) #### # sim_start ==> Lowest dataset number to be analyzed by this particular batch job ### ################### sim_start <- 0 ################### if (length(args)==0){ sim_num <- sim_start + 1 set.seed(121231) sinkit <- FALSE } else { sim_num <- sim_start + as.numeric(args[1]) set.seed(762*sim_num + 121231) } i <- sim_num sinkfile <- paste("output_progress_",i,".txt",sep="") cat(paste("\nAnalyzing dataset number ",i,"...\n\n",sep="")) #============================== Run the simulation study ==============================# gauss <- TRUE if (gauss){ setwd("~/my_favourite_output_directory/") } if (sinkit){ cat(paste("Sinking output to: ",sinkfile,"\n",sep="")) sink(sinkfile) } # Load dataset i... load(paste("Data_",i,".RData",sep="")) # Now, run the inner-part of Jim's old for-loop... # generate dataset i... # analyze dataset i... # store results for dataset i... # Save dataset i... save.image(paste("Dataset_",i,"_Results.RData",sep=""))
This will allow each R job to analyze a different dataset, the first job will analyze dataset 1, the second job will analyze dataset 2 and so forth. Assuming sufficient resources are available on Gauss, jobs can run concurrently and the full simulation study could potentially be completed in 1 hour, rather than 100 hours. Some key things to note about the R wrapper:
There are many ways to customize array jobs, Jim's example being just one of them.
So, to run the array job Jim would run:
jim@gauss:~$ sbatch ./jimsjob.sh
To check the status of the jobs he would do:
jim@gauss:~$ squeue
To delete all of his jobs he would do:
jim@gauss:~$ scancel jim
where 'jim' is his Gauss username. Again, please consult the full SLURM documentation for more details.
Slides from the “Introduction to Gauss” session hosted by Paul Baines on November 20th 2012 are available in pdf here intro_to_gauss_slides.pdf or PowerPoint here intro_to_gauss_slides.pptx.
Note that the code for the “Bob” examples are essentially identical to the code given in the explanations above, so you should use those as your starting point. Happy coding!
Create MDCS data location
nehad@gauss:~$ mkdir -p ~/MdcsDataLocation/gauss/R2012a
Launch MATLAB
nehad@gauss:~$ module load matlab/7.14 nehad@gauss:~$ matlab
Import cluster settings
Validate cluster profile
Set your email address
>> ClusterInfo.setEmailAddress('nehad@ucdavis.edu')
Assume the following test M file (~/test.m)
% sleeps for 2 seconds 16 times function test tic,parfor x=1:16,pause(2),end,toc
Launch MATLAB
nehad@gauss:~$ module load matlab/7.14 nehad@gauss:~$ matlab
Interactive Demo (16 workers)
Before pool
>> test Elapsed time is 32.138099 seconds.
After pool
>> matlabpool(16) >> test Elapsed time is 2.064622 seconds. >> matlabpool close Sending a stop signal to all the labs ... stopped.
Non-Interactive Demo (16 workers - requires 17)
>> batch(@test,0,'matlabpool',16,'CaptureDiary',true)
I can now close MATLAB, disconnect from Gauss, and wait for an email from the scheduler.
Once my job has completed, I can reconnect to Gauss and launch MATLAB to get my output.
>> myCluster = parcluster('gauss'); >> job45 = myCluster.findJob('ID',45) >> diary(job45) Elapsed time is 2.458482 seconds.
MDCS Licenses
There are 64 MDCS worker licenses available. Your job will not run until the requisiste number of workers are available. Non-interactive jobs require n+1 workers.
You can check usage by running “lmat” and looking at the last field
nehad@gauss:~$ lmat JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) LICENSES 120309 debug Job46 nehad R 0:03 1 c0-13 mdcs*1
In the above case only 1 MDCS worker is in use, so 63 are available
anismail@gauss:~$ lmat JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) LICENSES 120310 debug Job47 nehad R 0:09 2 c0-[18-19] mdcs*64
In this case all 64 MDCS workers are in use, so my job will be queued until enough workers are available.