User Tools

Site Tools


support:hpc:software:slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
support:hpc:software:slurm [2018/04/12 17:42]
bill [Interactive Sessions]
support:hpc:software:slurm [2021/06/21 15:52]
omen [Example Script:]
Line 1: Line 1:
-====== SLURM: A Highly Scalable Resource Manage ======+====== SLURM: A Highly Scalable Resource Manager ======
 [[https://computing.llnl.gov/linux/slurm/|SLURM]] is an open-source resource manager (batch queue) designed for Linux clusters of all sizes.  [[https://computing.llnl.gov/linux/slurm/|SLURM]] is an open-source resource manager (batch queue) designed for Linux clusters of all sizes. 
  
Line 21: Line 21:
 # NOTE the -l flag! # NOTE the -l flag!
  
-# If you need any help, please email help@cse.ucdavis.edu+# If you need any help, please email farm-hpc@ucdavis.edu
  
 # Name of the job - You'll probably want to customize this. # Name of the job - You'll probably want to customize this.
-#SBATCH -J bench+#SBATCH --job-name=benchmark-test 
 + 
 +# Use the med2 partition (or which ever you have access to) 
 +# Run this to see what partitions you have access to: 
 +# sacctmgr -s list user $USER format=partition 
 +#SBATCH --partition=med2
  
 # Standard out and Standard Error output files with the job number in the name. # Standard out and Standard Error output files with the job number in the name.
-#SBATCH -bench-%j.output +#SBATCH --output=bench-%j.output 
-#SBATCH -bench-%j.output+#SBATCH --error=bench-%j.output
  
-no -n here, the user is expected to provide that on the command line.+Request 4 CPUs and 8 GB of RAM from 1 node: 
 +#SBATCH --nodes=1 
 +#SBATCH --mem=8G 
 +#SBATCH --ntasks=1 
 +#SBATCH --cpus-per-task=4 
  
 # The useful part of your job goes below # The useful part of your job goes below
Line 39: Line 48:
 export OMP_NUM_THREADS=$SLURM_NTASKS export OMP_NUM_THREADS=$SLURM_NTASKS
 module load benchmarks module load benchmarks
-stream+ 
 +# The main job executable to run: note the use of srun before it 
 +srun stream
 </code> </code>
  
Line 97: Line 108:
 The newest version of slurm supports array jobs.  For example: The newest version of slurm supports array jobs.  For example:
 <code> <code>
-$ cat test.sh+$ cat test-array.sh
 #!/bin/bash #!/bin/bash
 hostname hostname
Line 105: Line 116:
 <code> <code>
 # Submit a job array with index values between 0 and 10,000 on all free CPUs: # Submit a job array with index values between 0 and 10,000 on all free CPUs:
-$ sbatch --array=0-1000 MyScript.sh+$ sbatch --array=0-10000 --partition=low test-array.sh
 </code> </code>
  
Line 175: Line 186:
 | -t | time limit for job, <minutes>, or <hours>:<minutes> are commonly used| | -t | time limit for job, <minutes>, or <hours>:<minutes> are commonly used|
 | -v -vv -vvv| Increasing levels of verbosity| | -v -vv -vvv| Increasing levels of verbosity|
-| -x node-name | Don't run job on node-name (and please report any problematic nodes to help@cse.ucdavis.edu) |+| -x node-name | Don't run job on node-name (and please report any problematic nodes to farm-hpc@ucdavis.edu) |
  
 ====== Interactive Sessions ====== ====== Interactive Sessions ======
Line 181: Line 192:
 (takes 30 seconds or so) (takes 30 seconds or so)
  
-<code>$ srun -partition-name ---pty bash -il </code>+<code>$ srun --partition=partition-name --time=1:00:00 --unbuffered --pty /bin/bash -il </code>
  
 +When the time limit expires you will be forcibly logged out and anything left running will be killed.
 ======  Monitoring Jobs: ====== ======  Monitoring Jobs: ======
  
Line 216: Line 228:
  
 [[http://slurm.schedmd.com/rosetta.pdf|SLURM Rosetta]] [[http://slurm.schedmd.com/rosetta.pdf|SLURM Rosetta]]
- 
 ===== Cancelling ===== ===== Cancelling =====
  
Line 225: Line 236:
 </code> </code>
 If you forget the JOBID it will cancel all your jobs. If you forget the JOBID it will cancel all your jobs.
 +
 +
 +=====  Advanced (Optional) Squeue Usage =====
 +The squeue command has some additional command flags that can be passed to better monitor your jobs, if necessary.
 +
 +This section involves some Linux shell knowledge and an understanding of environment variables. If you are unsure, you can skip this section, or ask an administrator for help.
 +
 +The default output fields of squeue are defined in the slurm module, but these can be overridden with the 
 +''--format'' flag. The current Farm configuration is:
 +An example of the standard output of ''squeue -u <username>'':
 +<code>
 +
 +JOBID PARTITION     NAME     USER  ST        TIME  NODES CPU MIN_ME NODELIST(REASON)
 +12345       med    myjob  username  R  1-22:20:42      1 22  24000M c10-67
 +</code>
 +These fields are defined by default using the following format codes:
 +<code>
 +%.14i %.9P %.8j %.8u %.2t %.11M %.6D %3C %6m %R
 +</code>
 +A full explanation of what formatting codes may be used can be found in ''man squeue'' under the ''-o <output_format> --format=<output-format>'' section.
 +
 +To see the time and date that your jobs are scheduled to end, and how much time is remaining:
 +<code>
 +squeue --format="%.14i %9P %15j %.8u %.2t %.20e %.12L" -u <username>
 +</code>
 +Sample output:
 +<code>
 +JOBID PARTITION NAME     USER     ST  END_TIME             TIME_LEFT
 +1234  med       myjob    username  R  2019-06-10T01:12:28  5-21:50:53
 +</code>
 +
 +For convenience, you can add an alias to your ~/.bash_aliases file with this command and it will be available next time you log in. Here's an example of a helpful alias:
 +<code>
 +alias jobtimes="squeue --format=\"%.14i %9P %15j %.8u %.2t %.20e %.12L\" -u"
 +</code>
 +Next time you log in, the command "jobtimes <yourusername>" will be available and will display the information as above.
 +
 +See the squeue man page for other fields that squeue can output.
 +
 +The default squeue formatting is stored in the environment variable ''$SQUEUE_FORMAT'', which can be altered using the same flags as the --format option on the command line. PLEASE be cautious when altering environment variables. Use ''module show slurm'' to see the default setting for ''$SQUEUE_FORMAT''.
 +
  
  
 ====== SLURM Partitions ====== ====== SLURM Partitions ======
  
-Generally, there are three SLURM partitions (aka queues) on a cluster.+Generally, there are three SLURM partitions (aka queues) on a cluster. These partitions divide up pools of nodes based on job priority needs.
  
-|low| Low priority means that you might be killed at any time. Great for soaking up unused cycles with short jobs; a particularly good fit for large array jobs when individual jobs have short run times|+|low| Low priority means that you might be killed at any time. Great for soaking up unused cycles with short jobs; a particularly good fit for large array jobs when individual jobs have short run times.|
 |med|Medium priority means you might be suspended, but will resume when a high priority job finishes.  *NOT* recommended for MPI jobs.  Up to 100% of idle resources can be used.| |med|Medium priority means you might be suspended, but will resume when a high priority job finishes.  *NOT* recommended for MPI jobs.  Up to 100% of idle resources can be used.|
 |hi|Your job will kill/suspend lower priority jobs.  High priority means your jobs will keep the allocated hardware until it's done or there's a system or power failure.  Limited to the number of CPUs your group contributed.  Recommended for MPI jobs.| |hi|Your job will kill/suspend lower priority jobs.  High priority means your jobs will keep the allocated hardware until it's done or there's a system or power failure.  Limited to the number of CPUs your group contributed.  Recommended for MPI jobs.|
-|bigmem| Large memory nodes, jobs will keep the allocated hardware until it's done or there's a system or power failure| 
-|serial| Older serial nodes, jobs will keep the allocated hardware until it's done or there's a system or power failure| 
  
 +There are other types of partitions that may exist, as well.
 +
 +|bigmem, bm| Large memory nodes. Jobs will keep the allocated hardware until it's done or there's a system or power failure. (bigmems/bms may be further divided into l/m/h partitions, following the same priority rules as low/med/high in the table above.)|
 +|gpu      | GPU nodes, will keep the allocated hardware until it's done or there's a system or power failure.|
 +|serial   | Older serial nodes, jobs will keep the allocated hardware until it's done or there's a system or power failure.|
 +
 +Nodes can be in more than one partition, and partitions with similar names generally have identical or near-identical hardware: low/med/high are typically one set of hardware, low2/med2/high2 are another, and so on.
  
 +There may be other partitions based on the hardware available on a particular cluster; not all users have access to all partitions. Consult with your account creation email, your PI, or the helpdesk if you are unsure what partitions you have access to or to use.
 ======  SBATCH job with parallel programs running: ====== ======  SBATCH job with parallel programs running: ======
  
support/hpc/software/slurm.txt · Last modified: 2021/06/21 15:53 by omen