If you want to add command line argument to an R script, use this line at the beginning of the script:
args <- commandArgs(trailingOnly = TRUE)
This will return a string vector with each argument as an element.
For example, I have the following R script called .
args <- commandArgs(trailingOnly = TRUE) ind <- as.integer(args) ind[1] + ind[2]
On the command line, I can now type
> Rscript add-two-nums.R 1 2 [1] 3
When do you want to add arguments to an R script from the command line? For example, when the R script is a calculation routine and you want to run it with different parameters. This is particularly useful when you run the R script on a cluster with job arrays. In this case the parameter you pass in will be the indices of the job array elements. Example: I have a script called .
args <- commandArgs(trailingOnly = TRUE) parList <- readRDS('parlist.RDS') ind <- as.integer(args) output <- doStuff(parList[[ind]]) saveRDS(output, paste0('output_', ind, '.RDS'))
This script reads a list of parameter settings called , then takes the command line argument index, then pass the correct element of to the function .
Now we can write a shell script with PBS to run multiple copies of in parallel on a cluster. If has 10 elements then the shell script may look as follows (I’m only showing the relevant parts):
### Set a job array with 10 sub-jobs indexed from 1 to 10 #PBS -J 1-10 ### Run R with the job index module load R Rscript myscript.R ${PBS_ARRAY_INDEX}
Observe that we’ve used the environment variable as command line input to the R script.
We can now run
qsub job.sh
to submit all 10 jobs as once. Before knowing this, I had to write 10 different copies of and 10 different copies of , then submit 10 different jobs. This is obviously better.
—-
P.S. to see the status of individual jobs in a job array, use
qstat -t