User Tools

Site Tools


getting_started_guide

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
getting_started_guide [2016/02/16 10:45]
Editor
getting_started_guide [2016/04/20 14:31]
Editor
Line 1: Line 1:
-Getting Started+**Getting Started**
  
-This section show how to login to the the system and submit a basic job on the cluster+This section shows how to login to the the system and submit a basic job on the cluster. If you do no have an account already, please apply for one by following the link [[applying_for_an_account|]]
  
-Logging In+**Logging In**
  
-To connect to the cluster, ssh to ranger.zamren.zm and log in using the username and password you registered and account application. Once you login, you will be asked to reset the password.+To connect to the cluster, ssh to ranger.zamren.zm using the username and password you registered during account application. Once you login, you will be asked to reset the password.
  
  
-Environmental Variables+**Environmental Variables**
 To see the variables in your environment execute the command: env To see the variables in your environment execute the command: env
  
-Modules+**Modules**
  
 The Environment Modules package provides for the dynamic modification of a user’s environment via modulefiles. To see available modules type the command The Environment Modules package provides for the dynamic modification of a user’s environment via modulefiles. To see available modules type the command
Line 23: Line 23:
 FFTW/2.1.5      gcc/4.4.7       gotoblas2/    gsl/1.9         mpich/3.2       openblas/0.2.15 FFTW/2.1.5      gcc/4.4.7       gotoblas2/    gsl/1.9         mpich/3.2       openblas/0.2.15
  
 +
 +**Submitting a Job**
 +
 +The cluster uses [[http://www.schedmd.com|SLURM]] for scheduling and resource management.Slurm
 +is an open-source cluster resource management and job scheduling system that strives to be simple, scalable, portable, fault-tolerant, and interconnect agnostic. Slurm currently has been tested only under Linux.
 +
 +As a cluster resource manager, Slurm provides three key functions. First,it allocates exclusive and/or non-exclusive access to resources(compute nodes) to users for some duration of time so they can perform
 +work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work.
 +Key commands to view the status of the cluster are
 +
 +**sinfo** reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options
 +
 +
 +**squeue** reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.
 +
 +
 +**srun** is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.
 +
 +
 +**sbatch** is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.
 +
 +**scancel** is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
 +
 +More information can be obtained from [[https://computing.llnl.gov/linux/slurm/quickstart.html|Slurm Quick User Guide]]
  
  
getting_started_guide.txt · Last modified: 2017/10/19 10:53 (external edit)