Library caret is a wonderful R package for tuning a variety of machine learning classification and regression algorithms. But it can take a long time to run, since model tuning usually involves running multiple bootstrapped replicates for each point in your tuning grid.
If you have a multi-core desktop machine, you can speed up your calls to the caret function train by using explicit parallelism.
There were just a couple hitches to get it flying on my 64bit quad core Optiplex 960 running linux kernel 2.6.28-15 x86_64, and R version 2.9.2 (2009-08-24). I present these hitches here in hopes of saving you time.
First, use apt-get to install some necessary dependencies:
sudo apt-get install lam4-dev lam-runtime libopenmpi1 openmpi-common
Then sudo into R, and use the install.packages() function to install snow and Rmpi. (Do not install Rmpi using apt or synaptic. In general, it is always a better idea to get R packages directly from CRAN using the built-in function.)
> install.packages("Rmpi")
> install.packages("snow")
Anyway, after you have all of the above install, just follow the framework shown in the manual for train:
mpiClacs <- function(X, FUN, ...) {
theDots <- list(...)
parLapply(theDots$cl, X, FUN)
}
cl <- makeCluster(4, "MPI") ##### I am using 4 b/c I have a quad core processor
## This is how we inform "train" that we will have multiple processors available
mpiControl <- trainControl(workers = 4,
number = 25,
computeFunction = mpiClacs,
computeArgs = list(cl = cl))
set.seed(1)
tune <- train(method="rf", x = t(exprs(es)), y = es$dx, ntree = 10000,
tuneGrid=data.frame(.mtry=c(3:7)*200),
trControl = mpiControl)
stopCluster(cl)
Hope this helps!
Monday, October 5, 2009
Thursday, October 1, 2009
fork processes in R using package multicore
I just found this wicked cool way to fork processes (for multi-core CPUs) in R.... Works just like you'd expect!
library(multicore)
job1 <- mcparallel(bigMachingLeaningFunction1())
job2 <- mcparallel(bigMachingLeaningFunction2())
job3 <- mcparallel(bigMachingLeaningFunction3())
job4 <- mcparallel(bigMachingLeaningFunction4())
###### time goes by .....
results1 <- collect(job1)
results2 <- collect(job2)
results3 <- collect(job3)
results4 <- collect(job4)
Note to emacs ESS users -- just be sure all your libraries are already loaded, and set silent=TRUE in the mcparallel call.... that will keep your ESS buffer from going read-only.
library(multicore)
job1 <- mcparallel(bigMachingLeaningFunction1())
job2 <- mcparallel(bigMachingLeaningFunction2())
job3 <- mcparallel(bigMachingLeaningFunction3())
job4 <- mcparallel(bigMachingLeaningFunction4())
###### time goes by .....
results1 <- collect(job1)
results2 <- collect(job2)
results3 <- collect(job3)
results4 <- collect(job4)
Note to emacs ESS users -- just be sure all your libraries are already loaded, and set silent=TRUE in the mcparallel call.... that will keep your ESS buffer from going read-only.
Subscribe to:
Posts (Atom)