Monday, March 19, 2012

Of bash and streams and pipes - One Tee To Rule Them All

There is a great answer on stack overflow showing how to feed one output stream into multiple processes.  It is a little bit of a hack of the excellent tool tee, which I've been using for some time.  Normally, tee take stout and writes it to a file, while passing it along to stout.  This solution provided on s.o. shows how to hack this with the bash trick >(process)  -- so as far as tee knows, it is writing to 2 or more files, but the "files" are actually background processes.

This syntax is bash-dependent, so won't work in sh, and this will likely come up if you're using system() in, say,  perl.  There is another excellent answer on stack overflow on how to deal with this.

Side note:  Using the <(command) trick, my post from March 16 can now be written:

paste <(grep -v "#" hsa.gff | cut -f 1,4,5,9) <(grep -v "#" hsa.gff | cut -f 6,7) | sed s/^/chr/ >mirbase18.bed

 

Friday, March 16, 2012

mirbase gff annotations in bed format

The mirbase.org folks in Manchester are doing a great job, I think, but they only release their annotations in gff format.  I think the following will convert that into an acceptable bed-format file:
## get mirbase18 annotation and convert to a bed like format
wget ftp://mirbase.org/pub/mirbase/CURRENT/genomes/hsa.gff
grep -v "#" hsa.gff | cut -f 1,4,5,9 >mirbase18.temp1
grep -v "#" hsa.gff | cut -f 6,7 >mirbase18.temp2
paste mirbase18.temp1 mirbase18.temp2 | sed s/^/chr/ >mirbase18.bed

Extra credit: Is there a clever shell trick to piping two separate stdins into paste, thus avoiding the temp files?

Thursday, March 8, 2012

2012 NAR Database Summary (Category List)

Bioinformatic databases grow like kudzu, so I'll no doubt be referring to this directory in the coming year.

http://www.oxfordjournals.org/nar/database/c/

 

How to get the entire recursive directory structure of an FTP site using wget

How can I use wget to get a recursive directory listing of an entire FTP site?

yes, wget --no-remove-listing ftp://myftpserver/ftpdirectory/ will get and retain a directory listing for a single directory (which was discussed here)

but, wget -r --no-remove-listing ftp://myftpserver/ftpdirectory/ will recurse and download an entire site

so, if you want to get the recursive directory structure, but not download the entire site, try wget -r --no-remove-listing --spider ftp://myftpserver/ftpdirectory/

 

Thursday, February 16, 2012

rRNA genes in the human genome

Larry Moran has an excellent discussion of the organization of rRNA genes in the human genome.  It's an old post, but a good one.

If you're involved in RNA-Seq, you should read it.  Dr. Moran's discussion makes it clear why it is necessary to map reads to a synthetic rRNA-ome before doing full-genome alignment -- if you want to accurately count, or filter out, rRNA reads (which can predominate in RNA-seq datasets).

 

 

Thursday, February 2, 2012

get rid of dashboard in mac os X

Dashboard seems like a good idea, but I can't stand how it always seems to take 5 seconds to "wake up" all those widgets whenever I open it up. So, as a result, I never use it.  It just idles there in the background taking up a sliver of CPU cycles. Oh and occasionally I accidentally key into it. So, I am glad I finally did this:

defaults write com.apple.dashboard mcx-disabled -boolean YES

killall Dock


Tuesday, December 20, 2011

rRNA genes in human genome

GRCh37/hg19

http://sandwalk.blogspot.com/2008/01/human-ribosomal-rna-genes.html