Thursday, March 19, 2009

Pulling data from GEO into R Bioconductor expression sets (eSet)

I just discovered how easy Bioconductor makes it to import data from the Gene Expression Omnibus (GEO).

First, be sure you have the GEOquery package from Bioconductor. You will probably need to install the curl4 devel library package using apt-get or whatever you use to do such things. On my Ubuntu 8.10 distro, the requisite package is called libcurl4-gnutls-dev. After you have this, you should be able to install GEOquery like any other Bioconductor package, ie:

source("http://bioconductor.org/biocLite.R")
biocLite("GEOquery")

Then, pulling a GEO series (GSE) into an eSet object is as simple as:

gseObj <- getGEO("GSE10667")
eSet <- gseObj[[1]]

Note the [[1]] -- necessary because the GSE comes in as a list of eSets, apparently.

See the GEOquery vignette for more details, and for conversions from GDS format.

No comments:

Post a Comment