User Tools

Site Tools


getting_started_guide

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
getting_started_guide [2016/02/16 10:46]
Editor edit
getting_started_guide [2016/04/20 14:31]
Editor
Line 1: Line 1:
-Getting Started+**Getting Started**
  
-This section show how to login to the the system and submit a basic job on the cluster+This section shows how to login to the the system and submit a basic job on the cluster. If you do no have an account already, please apply for one by following the link [[applying_for_an_account|]]
  
-Logging In+**Logging In**
  
-To connect to the cluster, ssh to ranger.zamren.zm using the username and password you registered and account application. Once you login, you will be asked to reset the password.+To connect to the cluster, ssh to ranger.zamren.zm using the username and password you registered during account application. Once you login, you will be asked to reset the password.
  
  
-Environmental Variables+**Environmental Variables**
 To see the variables in your environment execute the command: env To see the variables in your environment execute the command: env
  
-Modules+**Modules**
  
 The Environment Modules package provides for the dynamic modification of a user’s environment via modulefiles. To see available modules type the command The Environment Modules package provides for the dynamic modification of a user’s environment via modulefiles. To see available modules type the command
Line 23: Line 23:
 FFTW/2.1.5      gcc/4.4.7       gotoblas2/    gsl/1.9         mpich/3.2       openblas/0.2.15 FFTW/2.1.5      gcc/4.4.7       gotoblas2/    gsl/1.9         mpich/3.2       openblas/0.2.15
  
 +
 +**Submitting a Job**
 +
 +The cluster uses [[http://www.schedmd.com|SLURM]] for scheduling and resource management.Slurm
 +is an open-source cluster resource management and job scheduling system that strives to be simple, scalable, portable, fault-tolerant, and interconnect agnostic. Slurm currently has been tested only under Linux.
 +
 +As a cluster resource manager, Slurm provides three key functions. First,it allocates exclusive and/or non-exclusive access to resources(compute nodes) to users for some duration of time so they can perform
 +work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work.
 +Key commands to view the status of the cluster are
 +
 +**sinfo** reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options
 +
 +
 +**squeue** reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.
 +
 +
 +**srun** is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.
 +
 +
 +**sbatch** is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.
 +
 +**scancel** is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
 +
 +More information can be obtained from [[https://computing.llnl.gov/linux/slurm/quickstart.html|Slurm Quick User Guide]]
  
  
getting_started_guide.txt · Last modified: 2017/10/19 10:53 (external edit)