User Tools

Site Tools


getting_started_guide

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
getting_started_guide [2016/02/16 11:03]
Editor
getting_started_guide [2017/10/19 10:53] (current)
Line 1: Line 1:
-**Getting Started**+=====Getting Started Guide=====
  
 This section shows how to login to the the system and submit a basic job on the cluster. If you do no have an account already, please apply for one by following the link [[applying_for_an_account|]] This section shows how to login to the the system and submit a basic job on the cluster. If you do no have an account already, please apply for one by following the link [[applying_for_an_account|]]
Line 24: Line 24:
  
  
-**Submitting a Job**+**Submitting Jobs using TORQUE**
  
-The cluster uses [[http://www.schedmd.com|slurm]] for scheduling and resource managementKey commands to view the status of the cluster are+[[http://en.wikipedia.org/wiki/TORQUE|TORQUE]] is an open source batch queuing system that is very similar to [[http://en.wikipedia.org/wiki/Portable_Batch_System|PBS]]. Most PBS commands will work without any change. TORQUE is maintained by [[http://www.adaptivecomputing.com/products/open-source/torque/|Adaptive Computing]].
  
-**sinfo** reports the state of partitions and nodes managed by SLURMIt has wide variety of filtering, sorting, and formatting options+In order to use the HPC compute nodes, you must first log into the login nodes, and submit a PBS jobThe qsub command is used to submit job to the PBS queue and to request additional resources. The qstat command is used to check on the status of a job already in the PBS queue. To simplify submitting a jobyou can create a PBS script and use the qsub and qstat commands to interact with the PBS queue.
  
 +**Creating a PBS Script**
  
-**squeue** reports the state of jobs or job steps. It has wide variety of filtering, sorting, and formatting optionsBy defaultit reports the running jobs in priority order and then the pending jobs in priority order.+To set the parameters for your job, you can create control file that contains the commands to be executedTypicallythis is in the form of a PBS script. This script is then submitted to PBS using the qsub command.
  
 +Here is a sample PBS file, named myjobs.pbs, followed by an explanation of each line of the file.
  
-**srun** is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, includingminimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.+<code> 
 + #!/bin/bash 
 + #PBS -l nodes=1:ppn=2  
 + #PBS -l walltime=00:00:59  
 + cd /home/rcf-proj3/pv/test/  
 + source /usr/usc/sas/setup.sh  
 + sas my.sas  
 +</code>
  
 +The first line in the file identifies which shell will be used for the job.  In this example, bash is used.
 +The second line specifies the number of nodes and processors desired for this job. In this example, one node with two processors is being requested.
 +The third line in the PBS file states how much wall-clock time is being requested. In this example 59 seconds of wall time have been requested.
 +The fourth line tells the HPC cluster to access the directory where the data is located for this job. In this example, the cluster is instructed to change the directory to the /home/rcf-proj3/pv/test/ directory.
 +The fifth line tells the cluster which program you would like to use to analyze your data. In this example, the cluster sources the environment for SAS.
 +The sixth line tells the cluster to run the program. In this example, it runs SAS, specifying my.sas as the argument in the current directory, /home/rcf-proj3/pv/test, as defined in the previous line.
 +To submit your job without requesting additional resources, issue the command
 +**qsub myjob.pbs**
  
-**sbatch** is used to submit a job script for later executionThe script will typically contain one or more srun commands to launch parallel tasks.+If you have the myjob.pbs set up as explained in the example above and you want to override the default options in the myjob.pbs file, then you can use the -l parameter on the qsub command line to override the option specified in the file. 
 + 
 +Below are some examples of these overrides. 
 + 
 +**Requesting Additional Wall Time** 
 + 
 +If you need to request more or less wall time after you have already created your PBS script, you can do this by using the qsub command. 
 + 
 +In the example script above, we have requested 59 seconds of wall time. If you realize later that your job actually requires five minutes to complete, the command 
 + 
 +**qsub -l walltime=0:05:00 myjob.pbs** 
 +will ask PBS for limit of five minutes of wall time. If your job does not finish within the specified time, it will be terminated. 
 + 
 +**Requesting Nodes and Processors** 
 + 
 +You may also alter the number of nodes and processors requested for a job by using the qsub commandIn the example script, we have requested one node with two processors, or one dual-processor node. 
 + 
 +If you later decide that you need four HPC nodes for your job but you are going to use only one of the dual-processors on each node, then use the following command: 
 +**qsub -l walltime=0:05:00,nodes=4 myjob.pbs** 
 + 
 +If you want to use both processors on each HPC node, you should use the following command: 
 +**qsub -l walltime=0:05:00,nodes=4:ppn=2 myjob.pbs** 
 + 
 +**Requesting a Specific Network** 
 + 
 +To run your job on the infiniband network, add the IB feature to your PBS script. 
 + 
 +**#PBS -l nodes=1:ppn=2:IB** 
 + 
 +MPI jobs using OpenMPI 1.6.4 or later can run on the Infiniband network. 
 + 
 +NOTE: Only one network should be specified for each job. If no network is specified. the job will be scheduled to run on whichever network is available. 
 + 
 +**Checking Job Status** 
 + 
 +To check on the status of your job, you will use the qstat command. The command 
 +**qstat –u [your username]** 
 +will show you the current status of all your submitted jobs. 
 + 
 +More information can be obtained from the [[http://docs.adaptivecomputing.com/torque/4-2-10/help.htm|Torque website]]
  
-**scancel** is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step. 
  
-More information can be obtained from [[https://computing.llnl.gov/linux/slurm/quickstart.html|Slurm Quick User Guide]] 
getting_started_guide.1455620583.txt.gz · Last modified: 2016/02/16 11:03 by Editor