This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
getting_started_guide [2016/02/16 10:57] Editor |
getting_started_guide [2016/04/20 14:31] Editor |
||
---|---|---|---|
Line 26: | Line 26: | ||
**Submitting a Job** | **Submitting a Job** | ||
- | The cluster uses [[http:// | + | The cluster uses [[http:// |
+ | is an open-source cluster resource management and job scheduling system that strives to be simple, scalable, portable, fault-tolerant, | ||
+ | |||
+ | As a cluster resource manager, Slurm provides three key functions. First,it allocates exclusive and/or non-exclusive access to resources(compute nodes) to users for some duration of time so they can perform | ||
+ | work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work. | ||
+ | Key commands to view the status of the cluster are | ||
**sinfo** reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options | **sinfo** reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options | ||
Line 36: | Line 41: | ||
**srun** is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, | **srun** is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, | ||
+ | |||
+ | **sbatch** is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks. | ||
**scancel** is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step. | **scancel** is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step. | ||
+ | |||
+ | More information can be obtained from [[https:// | ||
+ | |||
+ |