Library caret is a wonderful R package for tuning a variety of machine learning classification and regression algorithms. But it can take a long time to run, since model tuning usually involves running multiple bootstrapped replicates for each point in your tuning grid.
If you have a multi-core desktop machine, you can speed up your calls to the caret function train by using explicit parallelism.
There were just a couple hitches to get it flying on my 64bit quad core Optiplex 960 running linux kernel 2.6.28-15 x86_64, and R version 2.9.2 (2009-08-24). I present these hitches here in hopes of saving you time.
First, use apt-get to install some necessary dependencies:
sudo apt-get install lam4-dev lam-runtime libopenmpi1 openmpi-common
Then sudo into R, and use the install.packages() function to install snow and Rmpi. (Do not install Rmpi using apt or synaptic. In general, it is always a better idea to get R packages directly from CRAN using the built-in function.)
> install.packages("Rmpi")
> install.packages("snow")
Anyway, after you have all of the above install, just follow the framework shown in the manual for train:
mpiClacs <- function(X, FUN, ...) {
theDots <- list(...)
parLapply(theDots$cl, X, FUN)
}
cl <- makeCluster(4, "MPI") ##### I am using 4 b/c I have a quad core processor
## This is how we inform "train" that we will have multiple processors available
mpiControl <- trainControl(workers = 4,
number = 25,
computeFunction = mpiClacs,
computeArgs = list(cl = cl))
set.seed(1)
tune <- train(method="rf", x = t(exprs(es)), y = es$dx, ntree = 10000,
tuneGrid=data.frame(.mtry=c(3:7)*200),
trControl = mpiControl)
stopCluster(cl)
Hope this helps!
No comments:
Post a Comment