parallel computing

If you want to add command line argument to an R script, use this line at the beginning of the script:

args <- commandArgs(trailingOnly = TRUE)

This will return a string vector with each argument as an element.

For example, I have the following R script called $\verb|script-with-args.R|$ .

args <- commandArgs(trailingOnly = TRUE)
ind <- as.integer(args)
ind[1] + ind[2]

On the command line, I can now type

> Rscript add-two-nums.R 1 2
[1] 3

When do you want to add arguments to an R script from the command line? For example, when the R script is a calculation routine and you want to run it with different parameters. This is particularly useful when you run the R script on a cluster with job arrays. In this case the parameter you pass in will be the indices of the job array elements. Example: I have a script called $\verb|myscript.R|$ .

args <- commandArgs(trailingOnly = TRUE)
parList <- readRDS('parlist.RDS')
ind <- as.integer(args)
output <- doStuff(parList[[ind]])
saveRDS(output, paste0('output_', ind, '.RDS'))

### Set a job array with 10 sub-jobs indexed from 1 to 10
#PBS -J 1-10

### Run R with the job index
module load R
Rscript myscript.R ${PBS_ARRAY_INDEX}

Observe that we’ve used the environment variable $\verb|PBS_ARRAY_INDEX|$ as command line input to the R script.

We can now run

qsub job.sh

to submit all 10 jobs as once. Before knowing this, I had to write 10 different copies of $\verb|myscript.R|$ and 10 different copies of $\verb|job.sh|$ , then submit 10 different jobs. This is obviously better.

—-

P.S. to see the status of individual jobs in a job array, use

qstat -t

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

A water guy's blog

Journey of a PhD student

Passing command line argument to Rscript to run job arrays