Wednesday, May 1, 2013

Indispensable bookmarks for fixing computing annoyances

This list will no doubt grow...

Understanding GNU Emacs and Tabs
Executive summary: put this in your .emacs.el file:

(setq-default major-mode 'text-mode)
(define-key text-mode-map (kbd "TAB") 'self-insert-command); 


ggplot2: Changing the Default Order of Legend Labels and Stacking of Data
Executive summary:

ggplot(diamonds, aes(clarity, fill=cut, order= -as.numeric(cut))) 

actually, this is better, available in recent versions of ggplot2:
guides(color = guide_legend(reverse = TRUE))



Tuesday, April 9, 2013

Getting yum to work with RHEL / RHN behind a proxy


/etc/sysconfig/rhn/up2date

enableProxy=0
httpProxy=proxyserver.company.com:8080


~/.bash_profile

export http_proxy=http://proxyserver.company.com:8080
export HTTP_PROXY=http://proxyserver.company.com:8080


The thing is, there is no universal agreement among linux command line programs, some seem to use HTTP_PROXY and some use http_proxy.  So, yes, export http_proxy both as lower and UPPER case!

And complicating the picture is that RHN doesn't use either of these -- it pulls the settings out of /etc/sysconfig/rhn/up2date 


Wednesday, March 6, 2013

command line reverse complement sequence in linux shell

Sometimes it seems like BSD / GNU / Linux folks thought of everything.

echo AATGATACGGCGACCACCGG | rev | tr ATGC TACG

I'm a bowtie2 partisan too

I mean, I totally agree with the conclusions, but, um, is there a conflict of interest here?  Just sayin.



"... Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete."

Citation: Lindner R, Friedel CC (2012) A Comprehensive Evaluation of Alignment Algorithms in the Context of RNA-Seq. PLoS ONE 7(12): e52403. doi:10.1371/journal.pone.0052403
Editor: Steven L. Salzberg, Johns Hopkins University, United States of America
Received: August 30, 2012; Accepted: November 16, 2012; Published: December 26, 2012


Wednesday, February 27, 2013

Setting up JBrowse 1.8.1 on a CentOS 5.8

Some important details to successfully set up JBrowse on a Centos 5.8 machine.

Install and compile samtools from source.  I built it with fPIC flags like this:

make CXXFLAGS=-fPIC CFLAGS=-fPIC CPPFLAGS=-fPIC

Be sure that the file libbam.a that you just created with the fPIC flag is (also) present in the lib directory.  I had to rename the old one and copy the new one there.


# ls -al /usr/local/samtools/lib*
-rw-rw-r-- 1 odsolbe odsolbe 309986 Feb 27 09:51 /usr/local/samtools/libbam.a

/usr/local/samtools/lib:
total 1316
drwxr-xr-x 2 odsolbe odsolbe    4096 Feb 27 09:54 .
drwxr-xr-x 7 odsolbe odsolbe    4096 Feb 27 09:51 ..
-rw-rw-r-- 1 odsolbe odsolbe  309986 Feb 27 09:54 libbam.a
-rw-r--r-- 1 odsolbe odsolbe 1000346 Oct 24  2011 libbam.a_orig


Set two env variables: SAMTOOLS and PARL5LIB (see here)

export PERL5LIB=/path/to/...../jbrowse/extlib/lib/perl5 export SAMTOOLS=/usr/local/samtools

This is what worked for me.  Your milage may vary. 

Friday, February 1, 2013

What your father never told you about tophat

Following up on yesterday's posting about the "right" way to run cufflinks, I am here recording my experiences running tophat with and without the --no-novel-juncs setting.

Same dataset as described in the previous posting.  Still using the Illumina iGenomes Ensembl annotation files. The command I used was like this (note: the transcriptome index was pre-built during a previous run.)

tophat [--no-novel-juncs] -G genes.gtf --transcriptome-index=transcriptome -p 8 -o tophat_out file_1.fq.gz file_2.fq.gz

Mean run time for the three replicates (which have very nearly the same number of reads, about 19 million each):
Default (with novel junction finding): 160 minutes
With --no-novel-juncs option:  150 minutes

No big time savings from turning off novel junction finding.  No difference in the mean correlation of genes.fpkm_tracking from the resultant cufflinks run.

Also, I tried --no-novel-juncs with 8, 16, and 24 processors, on a server with 32 cores (not hyperthreaded).

8 procs = 150 minutes
16 procs = 134 minutes
24 procs = 121 minutes

Tophat basically parallelizes the bowtie2 alignment stage, but there is a lot of time spent reading and writing files, which is single-threaded.

I have, in the past, had problems with tophat dying when running with too many processors (24, for instance) but the failure is not predictable:


[2013-02-05 08:08:09] Mapping right_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/4)
[2013-02-05 08:08:34] Mapping right_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/4)
[2013-02-05 08:09:01] Mapping right_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)
[2013-02-05 08:09:25] Joining segment hits
[2013-02-05 08:11:22] Reporting output tracks
        [FAILED]
Error running /usr/local/tophat-2.0.7.Linux_x86_64/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir Sample_04_L005/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p24 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations /data/genomes/iGenomes/Ensembl/transcriptome.gff --gtf-juncs Sample_04_L005/tmp/transcriptome.juncs --no-closure-search --no-coverage-search --no-microexon-search --sam-header Sample_04_L005/tmp/genome_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/usr/local/samtools/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /data/genomes/iGenomes/Ensembl/genome.fa Sample_04_L005/junctions.bed Sample_04_L005/insertions.bed Sample_04_L005/deletions.bed Sample_04_L005/fusions.out Sample_04_L005/tmp/accepted_hits Sample_04_L005/tmp/left_kept_reads.m2g.bam,Sample_04_L005/tmp/left_kept_reads.m2g_um.mapped.bam,Sample_04_L005/tmp/left_kept_reads.m2g_um.candidates Sample_04_L005/tmp/left_kept_reads.bam Sample_04_L005/tmp/right_kept_reads.m2g.bam,Sample_04_L005/tmp/right_kept_reads.m2g_um.mapped.bam,Sample_04_L005/tmp/right_kept_reads.m2g_um.candidates Sample_04_L005/tmp/right_kept_reads.bam
Error: bam_merge failed to open BAM file Sample_04_L005/tmp/right_kept_reads.m2g_um.candidates15.bam