User Tools

Site Tools


getting_started_guide

Getting Started Guide

This section shows how to login to the the system and submit a basic job on the cluster. If you do no have an account already, please apply for one by following the link applying_for_an_account

Logging In

To connect to the cluster, ssh to ranger.zamren.zm using the username and password you registered during account application. Once you login, you will be asked to reset the password.

Environmental Variables To see the variables in your environment execute the command: env

Modules

The Environment Modules package provides for the dynamic modification of a user’s environment via modulefiles. To see available modules type the command

module avail

—————-/path/toModules——————————————————————————-

cmake/3.5.0 FFTW/3.3.4 gmp/4.3.2 gromacs/5.1.0 mpich/3.1 mvapich/2.1 openmpi/1.10.1 FFTW/2.1.5 gcc/4.4.7 gotoblas2/2 gsl/1.9 mpich/3.2 openblas/0.2.15

Submitting Jobs using TORQUE

TORQUE is an open source batch queuing system that is very similar to PBS. Most PBS commands will work without any change. TORQUE is maintained by Adaptive Computing.

In order to use the HPC compute nodes, you must first log into the login nodes, and submit a PBS job. The qsub command is used to submit a job to the PBS queue and to request additional resources. The qstat command is used to check on the status of a job already in the PBS queue. To simplify submitting a job, you can create a PBS script and use the qsub and qstat commands to interact with the PBS queue.

Creating a PBS Script

To set the parameters for your job, you can create a control file that contains the commands to be executed. Typically, this is in the form of a PBS script. This script is then submitted to PBS using the qsub command.

Here is a sample PBS file, named myjobs.pbs, followed by an explanation of each line of the file.

 #!/bin/bash
 #PBS -l nodes=1:ppn=2 
 #PBS -l walltime=00:00:59 
 cd /home/rcf-proj3/pv/test/ 
 source /usr/usc/sas/setup.sh 
 sas my.sas 

The first line in the file identifies which shell will be used for the job. In this example, bash is used. The second line specifies the number of nodes and processors desired for this job. In this example, one node with two processors is being requested. The third line in the PBS file states how much wall-clock time is being requested. In this example 59 seconds of wall time have been requested. The fourth line tells the HPC cluster to access the directory where the data is located for this job. In this example, the cluster is instructed to change the directory to the /home/rcf-proj3/pv/test/ directory. The fifth line tells the cluster which program you would like to use to analyze your data. In this example, the cluster sources the environment for SAS. The sixth line tells the cluster to run the program. In this example, it runs SAS, specifying my.sas as the argument in the current directory, /home/rcf-proj3/pv/test, as defined in the previous line. To submit your job without requesting additional resources, issue the command qsub myjob.pbs

If you have the myjob.pbs set up as explained in the example above and you want to override the default options in the myjob.pbs file, then you can use the -l parameter on the qsub command line to override the option specified in the file.

Below are some examples of these overrides.

Requesting Additional Wall Time

If you need to request more or less wall time after you have already created your PBS script, you can do this by using the qsub command.

In the example script above, we have requested 59 seconds of wall time. If you realize later that your job actually requires five minutes to complete, the command

qsub -l walltime=0:05:00 myjob.pbs will ask PBS for a limit of five minutes of wall time. If your job does not finish within the specified time, it will be terminated.

Requesting Nodes and Processors

You may also alter the number of nodes and processors requested for a job by using the qsub command. In the example script, we have requested one node with two processors, or one dual-processor node.

If you later decide that you need four HPC nodes for your job but you are going to use only one of the dual-processors on each node, then use the following command: qsub -l walltime=0:05:00,nodes=4 myjob.pbs

If you want to use both processors on each HPC node, you should use the following command: qsub -l walltime=0:05:00,nodes=4:ppn=2 myjob.pbs

Requesting a Specific Network

To run your job on the infiniband network, add the IB feature to your PBS script.

#PBS -l nodes=1:ppn=2:IB

MPI jobs using OpenMPI 1.6.4 or later can run on the Infiniband network.

NOTE: Only one network should be specified for each job. If no network is specified. the job will be scheduled to run on whichever network is available.

Checking Job Status

To check on the status of your job, you will use the qstat command. The command qstat –u [your username] will show you the current status of all your submitted jobs.

More information can be obtained from the Torque website

getting_started_guide.txt · Last modified: 2017/10/19 10:53 (external edit)