BioCozy: February 2013

Some important details to successfully set up JBrowse on a Centos 5.8 machine.

Install and compile samtools from source. I built it with fPIC flags like this:

make CXXFLAGS=-fPIC CFLAGS=-fPIC CPPFLAGS=-fPIC

Be sure that the file libbam.a that you just created with the fPIC flag is (also) present in the lib directory. I had to rename the old one and copy the new one there.

# ls -al /usr/local/samtools/lib*
-rw-rw-r-- 1 odsolbe odsolbe 309986 Feb 27 09:51 /usr/local/samtools/libbam.a

/usr/local/samtools/lib:
total 1316
drwxr-xr-x 2 odsolbe odsolbe 4096 Feb 27 09:54 .
drwxr-xr-x 7 odsolbe odsolbe 4096 Feb 27 09:51 ..
-rw-rw-r-- 1 odsolbe odsolbe 309986 Feb 27 09:54 libbam.a
-rw-r--r-- 1 odsolbe odsolbe 1000346 Oct 24 2011 libbam.a_orig

Set two env variables: SAMTOOLS and PARL5LIB (see here)

export PERL5LIB=/path/to/...../jbrowse/extlib/lib/perl5 export SAMTOOLS=/usr/local/samtools

This is what worked for me. Your milage may vary.

Following up on yesterday's posting about the "right" way to run cufflinks, I am here recording my experiences running tophat with and without the --no-novel-juncs setting.

Same dataset as described in the previous posting. Still using the Illumina iGenomes Ensembl annotation files. The command I used was like this (note: the transcriptome index was pre-built during a previous run.)

tophat [--no-novel-juncs] -G genes.gtf --transcriptome-index=transcriptome -p 8 -o tophat_out file_1.fq.gz file_2.fq.gz

Mean run time for the three replicates (which have very nearly the same number of reads, about 19 million each):
Default (with novel junction finding): 160 minutes
With --no-novel-juncs option: 150 minutes

No big time savings from turning off novel junction finding. No difference in the mean correlation of genes.fpkm_tracking from the resultant cufflinks run.

Also, I tried --no-novel-juncs with 8, 16, and 24 processors, on a server with 32 cores (not hyperthreaded).

8 procs = 150 minutes
16 procs = 134 minutes
24 procs = 121 minutes

Tophat basically parallelizes the bowtie2 alignment stage, but there is a lot of time spent reading and writing files, which is single-threaded.

I have, in the past, had problems with tophat dying when running with too many processors (24, for instance) but the failure is not predictable:

[2013-02-05 08:08:09] Mapping right_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/4)
[2013-02-05 08:08:34] Mapping right_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/4)
[2013-02-05 08:09:01] Mapping right_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)
[2013-02-05 08:09:25] Joining segment hits
[2013-02-05 08:11:22] Reporting output tracks
[FAILED]
Error running /usr/local/tophat-2.0.7.Linux_x86_64/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir Sample_04_L005/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p24 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations /data/genomes/iGenomes/Ensembl/transcriptome.gff --gtf-juncs Sample_04_L005/tmp/transcriptome.juncs --no-closure-search --no-coverage-search --no-microexon-search --sam-header Sample_04_L005/tmp/genome_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/usr/local/samtools/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /data/genomes/iGenomes/Ensembl/genome.fa Sample_04_L005/junctions.bed Sample_04_L005/insertions.bed Sample_04_L005/deletions.bed Sample_04_L005/fusions.out Sample_04_L005/tmp/accepted_hits Sample_04_L005/tmp/left_kept_reads.m2g.bam,Sample_04_L005/tmp/left_kept_reads.m2g_um.mapped.bam,Sample_04_L005/tmp/left_kept_reads.m2g_um.candidates Sample_04_L005/tmp/left_kept_reads.bam Sample_04_L005/tmp/right_kept_reads.m2g.bam,Sample_04_L005/tmp/right_kept_reads.m2g_um.mapped.bam,Sample_04_L005/tmp/right_kept_reads.m2g_um.candidates Sample_04_L005/tmp/right_kept_reads.bam
Error: bam_merge failed to open BAM file Sample_04_L005/tmp/right_kept_reads.m2g_um.candidates15.bam

BioCozy

Wednesday, February 27, 2013

Setting up JBrowse 1.8.1 on a CentOS 5.8

Friday, February 1, 2013

What your father never told you about tophat