Passing command line argument to Rscript to run job arrays

If you want to add command line argument to an R script, use this line at the beginning of the script:

args <- commandArgs(trailingOnly = TRUE)

This will return a string vector with each argument as an element.

For example, I have the following R script called \verb|script-with-args.R|.

args <- commandArgs(trailingOnly = TRUE)
ind <- as.integer(args)
ind[1] + ind[2]

On the command line, I can now type

> Rscript add-two-nums.R 1 2
[1] 3

When do you want to add arguments to an R script from the command line? For example, when the R script is a calculation routine and you want to run it with different parameters. This is particularly useful when you run the R script on a cluster with job arrays. In this case the parameter you pass in will be the indices of the job array elements. Example: I have a script called \verb|myscript.R|.

args <- commandArgs(trailingOnly = TRUE)
parList <- readRDS('parlist.RDS')
ind <- as.integer(args)
output <- doStuff(parList[[ind]])
saveRDS(output, paste0('output_', ind, '.RDS'))

This script reads a list of parameter settings called \verb|parList|, then takes the command line argument index, then pass the correct element of \verb|parList| to the function \verb|doStuff()|.

Now we can write a shell script \verb|job.sh| with PBS to run multiple copies of \verb|myscript.R| in parallel on a cluster. If \verb|parList| has 10 elements then the shell script may look as follows (I’m only showing the relevant parts):

### Set a job array with 10 sub-jobs indexed from 1 to 10
#PBS -J 1-10

### Run R with the job index
module load R
Rscript myscript.R ${PBS_ARRAY_INDEX}

Observe that we’ve used the environment variable \verb|PBS_ARRAY_INDEX| as command line input to the R script.

We can now run

qsub job.sh

to submit all 10 jobs as once. Before knowing this, I had to write 10 different copies of \verb|myscript.R| and 10 different copies of \verb|job.sh|, then submit 10 different jobs. This is obviously better.

—-

P.S. to see the status of individual jobs in a job array, use

qstat -t