Category: bioinformatics

JBrowse genome browser

JBrowse is a genome browser written in Javascript by GMOD . It’s ultra-light and fast and a lot easier to install than the UCSC genome browser.
However, the documentation is still lacking and it is tricky to configure tracks.

Here are some notes that may help others configuring the browser, assuming you were able to get it running and added some data.

As JBrowse gets updated, the topics here that are bugs will probably be ironed out. I wrote these notes using JBrowse-1.11.6 on May 12, 2015.

1. Collapsing features in a track so they all appear in the same line:

  1. After generating your track, edit the file data/trackList.json, changing “FeatureTrack” by “JBrowse/View/Track/CanvasFeatures”. Only CanvasFeatures allows you to collapse tracks.
  2. Add “displayMode” : “collapsed”
    Your track should look like this:

    {
       "key" : "left ventricle H3K27ac",
       "type" : "JBrowse/View/Track/CanvasFeatures",
       "trackType" : null,
       "compress" : 0,
       "urlTemplate" : "tracks/lvH3K27ac/{refseq}/trackData.json",
       "style" : {
          "color" : "blue",
          "className" : "feature",
          "strandArrow" : false
       },
       "storeClass" : "JBrowse/Store/SeqFeature/NCList",
       "label" : "lvH3K27ac",
       "displayMode" : "collapsed"
    },
    

2. Putting your track under a category:

While you can edit data/trackList.json and add the category key, this is useful when creating multiple tracks:
bin/flatfile-to-json.pl --tracklabel my_track --key "My track" --bed myfile.bed --config '{"category":"My category"}'

Notice that you must use single quotes, then double quotes in the JSON array.
You can use any valid JSON code in –config.

3. Hiding arrow heads:
JBrowser expects JSON in the configuration. JSON boolean values must not be quoted.
Only this will work in a style to hide arrows:
"strandArrow" : false

4. True/false and numbers must not be in quotes:

Only this will work in a style:
"strandArrow" : false

"connectorThickness" : 2

5. Loading GFF files:

This error:

Unsuccessful stat on filename containing newline at /mnt/home/varwww/JBrowse-1.11.6/bin/../src/perl5/JsonFileStorage.pm line 64.

Is caused by using space as separator in a GFF file. The separator must be TAB!

Advertisements

Alternative to Perl fork()

My goal was to split an input list in N and run N processes in parallel. When all N processes end, I want to concatenate input and finish the program.
Easier than using Perl’s fork(), it is to use a Makefile:

Files a and b are used by make to know that the targets completed. You should use a random file name to avoid collision with other programs.

Save this file as myMakefile or any name you want:

# the main target all will wait for execution of target x
all: x

# target x will wait for execution of targets clean, a and b. x only finishes when files a and b are created
x: clean a b
        cat /tmp/output_a /tmp/output_b > /tmp/output

clean:
        rm -f a b

a:
        sleep 2; echo "a"; touch /tmp/output_a; touch a; 

b:
        sleep 6; echo "b"; touch /tmp/output_b; touch b;

The name of the targets (a and b in the example) has to match the name of filenames that you generate in each target when your programs finishes (in the example, “sleep/echo/touch /tmp”).

This is how you run your makefile, 2 jobs at a time:

make -f myMakefile -j 2

If you copy/paste the example above, you will need to make sure the spaces in front of the commands are 1 single tab. Make is very picky about this.

Maq short read aligner errors

Error: maq: match.cc:516: int ma_match(int, char**): Assertion `fp_bfq_l’ failed
Solution: Wrong file, gave a .fq instead of .bfq as input. Give maq an .bfq and error will be gone.

Error: [ma_longread2read] encoding reads… maq: read.cc:106: match_info_t* ma_longread2read(const longreads_t*): Assertion `matches’ failed.
Solution: Too many reads, gave 17M reads. Split file in 3M reads chunks.

Installing a minimal UCSC genome browser mirror in Ubuntu 16.04 64 bits

updated Jun 06, 2017

For many years, installing a local mirror required it to be done manually. In the last few years, the UCSC genome browser created a simpler, automated way to install:

https://genome.ucsc.edu/goldenpath/help/gbic.html

I haven’t tried their automated installation, so I don’t know how well it works, but you should give it a try before going through my tutorial.

In case you’re still interested in installing the browser manually, the following has worked for me since Ubuntu 8.04 / ~2009.

Notes

  • These instructions should work for Ubuntu server 8.04, 9.04, 10.04, 10.10, 12.04, 14.04 and 16.04 Desktop and server 64 bits. It will work on 32 bit systems, but make sure you compile your binaries, because the ones provided by UCSC are 64 bits.
  • See the Troubleshooting section for issues with MySQL socket and other errors.

Goal and desired features

  1. install a minimal UCSC genome browser for a specific genome
  2. browser should not be at the root of the web dir (I wanted something like: http://www.example.edu/genomebrowser)
  3. restrict access to the browser with .htaccess
  4. load custom tracks to be displayed permanently

Assuming:

  1. MySQL is up and running in the default port
  2. MySQL datadir is at /home/mysql (just replace this for the default /var/lib/mysql if necessary)
  3. Apache’s DocumentRoot is /var/www

Genome browser requirements

  1. there is a /gbdb directory in /
  2. a directory with html files, ex. /var/www/genomebrowser
  3. a directory with cgi-bin files, ex. /var/www/genomebrowser/cgi-bin
  4. a directory for trash that is in the same directory as cgi-bin (ex. /var/www/genomebrowser/trash)
  5. a configuration file at /var/www/genomebrowser/cgi-bin/hg.conf owned by www-data
  6. XBitHack on in /etc/apache2/httpd.conf
  7. Options +Includes in /etc/apache2/httpd.conf for the directory where html files are located
  8. system has libssl.so.6 and libcrypto.so.6

Setting up a dedicated MySQL user

Enter the commands below in your mysql> shell, editting HOSTNAME and PASSWORD.

create user 'hguser'@'HOSTNAME' identified by 'PASSWORD';
flush privileges;

Test if you can connect to the MySQL database using the user above. You may need to provide the fully qualified hostname, depending on how your hostname is defined. I had to comment bind_address = 127.0.01 at /etc/mysql/my.cnf.

Installation: Part 1. Browser engine

Provide required libraries (in 12.04 the path is slightly different, see below):

apt-get install libssl0.9.8
ln -s /usr/lib/libssl.so.0.9.8 /usr/lib/libssl.so.6
ln -s /usr/lib/libcrypto.so.0.9.8 /usr/lib/libcrypto.so.6

Ubuntu server 12.04:

ln -s /usr/lib/x86_64-linux-gnu/libssl.so.0.9.8 /usr/lib/libssl.so.6
ln -s /usr/lib/x86_64-linux-gnu/libcrypto.so.0.9.8 /usr/lib/libcrypto.so.6

Create a base dir for the genome browser and download html files

mkdir /var/www/genomebrowser
rsync -avzP rsync://hgdownload.cse.ucsc.edu/htdocs/ /var/www/genomebrowser/

Create a customized directory for cgi-bin (not the root cgi-bin) and download cgi-bin files:

mkdir -p /var/www/genomebrowser/cgi-bin
rsync -avzP rsync://hgdownload.cse.ucsc.edu/cgi-bin/ /var/www/genomebrowser/cgi-bin/
chown -R www-data.www-data cgi-bin

Add this to /etc/apache2/httpd.conf:

XBitHack on
 <Directory /var/www/genomebrowser>
   AllowOverride AuthConfig
   Options +Includes
 </Directory>

 # the ScriptAlias directive is crucial
 ScriptAlias /genomebrowser/cgi-bin /var/www/genomebrowser/cgi-bin
 <Directory "/var/www/genomebrowser/cgi-bin">
   AllowOverride None
   Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
   Order allow,deny
   Allow from all
   AddHandler cgi-script cgi pl
 </Directory>

Enable XbitHack

ln -s /etc/apache2/mods-available/include.load /etc/apache2/mods-enabled/

Restart Apache

/etc/init.d/apache2 restart

Create file /var/www/genomebrowser/cgi-bin/hg.conf

# Configuration file for the UCSC Human Genome server
#
# the format is in the form of name/value pairs, written as 'name=value'
#
# note that there is no space between the name and its value. Also, no blank lines should be in this file.
#
#--------------------------------------------------------------#
#
# db.host is the name of the MySQL host to connect to
db.host=YOURHOST
#
# db.user is the username used when connecting to the host
db.user=hguser
#
# this is the password to use with the above hostname
db.password=PASSWORD
#
db.trackDb=trackDb

# central.host is the name of the host of the central MySQL
# database where stuff common to all versions of the genome
# and the user database is stored.
central.db=hgcentral
central.host=YOURHOST
central.user=hguser
central.password=PASSWORD
central.domain=

backupcentral.db=hgcentral
backupcentral.host=YOURHOST
backupcentral.user=hguser
backupcentral.password=PASSWORD
backupcentral.domain=

Give ownership to www-data:

sudo chown www-data /var/www/genomebrowser/cgi-bin/hg.conf

Create other directories required by the browser:

rm /var/www/genomebrowser/trash
mkdir /var/www/genomebrowser/trash
chown www-data.www-data /var/www/genomebrowser/trash

The following links are necessary because the browser assumes it is installed in the root dir of the server. Alternatively, you can setup these links at /etc/apache2/httpd.conf

ln -s /var/www/genomebrowser/images /var/www/images
ln -s /var/www/genomebrowser /var/www/html

Providing javascript files required by the binaries. DO NOT copy js and style to /usr/local/apache/htdocs. The browser will display tracks out of alignment!

mkdir -p /usr/local/apache/htdocs/
ln -s /var/www/genomebrowser/js/ /usr/local/apache/htdocs/js
ln -s /var/www/genomebrowser/style/ /usr/local/apache/htdocs/style

Test the installation by pointing your browser to http://localhost/genomebrowser

Setup crontab to clean trash

Create a script at /etc/cron.daily (no . or _ allowed in the file name) with the following contents:

#!/bin/bash

find /var/www/genomebrowser/trash/ \! \( -regex "/var/www/genomebrowser/trash/ct/.*" \
      -or -regex "/var/www/genomebrowser/trash/hgSs/.*" \) -type f -amin +5040 -exec rm -f {} \;
find /var/www/genomebrowser/trash/    \( -regex "/var/www/genomebrowser/trash/ct/.*" \
      -or -regex "/var/www/genomebrowser/trash/hgSs/.*" \) -type f -amin +10080 -exec rm -f {} \;

You can change the clean up schedule by changing +5040 and +10080 to the number of minutes that you want.

Installation: Part 2. MySQL tables required for functionality

For a minimal installation, the genome browser requires the following databases:

  • hgcentral
  • hgFixed

Download hgcentral to your MySQL directory

wget http://hgdownload.cse.ucsc.edu/admin/hgcentral.sql
mysql -youraccountoptions -e "create database hgcentral"
mysql -youraccountoptions hgcentral < hgcentral.sql
mysql -youraccountoptions -e "grant all privileges on hgcentral.* to 'hguser'@'HOSTNAME'"

Create a dummy hgFixed

mysql -youraccountoptions -e "create database hgFixed"
mysql -youraccountoptions -e "grant select on hgFixed.* to 'hguser'@'HOSTNAME'"

Your UCSC Genome Browser should be working (no data to display, though)

Installation: Part 3. Adding one genome

Shut down your MySQL database

kill -15 `ps aux | grep mysqld | grep 3306 | awk '{print $2}'`

Let’s use mm9 as an example

mkdir /home/mysql/mm9

The database of interest requires the following tables, at least:

  • chromInfo
  • cytoBandIdeo <- not required, but shows the chromosome at the top
  • extFile <- not strictly required for minimal functionality, but necessary for zooming in
  • grp
  • hgFindSpec
  • trackDb

Download these databases

rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/chromInfo.MYD /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/chromInfo.MYI /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/chromInfo.frm /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/cytoBandIdeo.MYD /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/cytoBandIdeo.MYI /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/cytoBandIdeo.frm /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/grp.MYD /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/grp.MYI /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/grp.frm /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/hgFindSpec.MYD /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/hgFindSpec.MYI /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/hgFindSpec.frm /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/trackDb.MYD /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/trackDb.MYI /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/trackDb.frm /home/mysql/mm9

Setup permissions

chown -R mysql.mysql /home/mysql/mm9

Restart your MySQL db

/usr/sbin/mysqld --basedir=/usr --datadir=/home/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid \
    --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock &

Grant privileges on mm9

mysql -youraccountoptions -e "grant all privileges on mm9.* to 'hguser'@'HOSTNAME'"

Download gbdb data

mkdir -p /home/genomebrowser/gbdb/mm9
rsync -avzP --delete --max-delete=20 rsync://hgdownload.cse.ucsc.edu/gbdb/mm9/ /home/genomebrowser/gbdb/mm9/

Provide directory gbdb to the browser

ln -s /home/genomebrowser/gbdb /gbdb

With this, your browser should be able to display mm9 data. The first time I access the browser, I add ?db=mm9 after hgGateway (http://localhost/cgi-bin/hgGateway?db=mm9) otherwise, the browser tries to show data for hg19. After the first time, the browser somehow learned that the only database available is mm9.

Adding UCSC tracks (MySQL tables)

Select the data you want from ftp://hgdownload.cse.ucsc.edu/mysql/mm9/ and download to /home/mysql/mm9.

For example, I wanted my browser to display:

  • RefSeq
  • UCSC genes
  • SNPs and repeats
  • Conservation

I downloaded these MySQL tables.

Additional functionality

Some tables can be left out but you will probably want them to obtain extra functionality. For example, when you click on a gene or feature in the genome viewer, you expect to be able to retrieve more info.

For this, you need to install the proteome database. After shutting down your MySQL server:

mkdir /home/mysql/proteins070202

That’s it, a dummy proteins070202 does the trick.

Displaying info about RefSeq genes

rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/description.frm /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/description.MYD /home/mysql/mm9
rsync -avzP  rsync://hgdownload.cse.ucsc.edu/mysql/mm9/description.MYI /home/mysql/mm9

Displaying info about UCSC Genes

Get the UniProt database

wget -nc -c ftp://hgdownload.cse.ucsc.edu/mysql/uniProt/*

Get the proteome database

wget -nc -c ftp://hgdownload.cse.ucsc.edu/mysql/proteome/*.*

Summary of my setup

MySQL databases:

  • hgcentral (380 kb)
  • hgFixed (empty database)
  • mm9 (selected tables: 5.8 G)
  • proteins070202 (empty database)
  • proteome (full database: 8.9 G)
  • uniProt (full database: 8 G)

Other files:

  • gbdb (62 G)

Adding custom tables

How the UCSC Genome Browser displays tracks in the browser:

1. the table mm9.grp describes groups of tracks (Mapping and Sequencing Tracks, Genes and … tracks)

If you want a new group instead of adding your custom track to a pre-existing one, you must create a new group in that table:

mysql -youraccountinfo -e "insert into grp set name='$name', label='$label', priority=$priority;"

2. the table mm9.tracksDb describes how tracks are displayed in their groups

The new table must have an entry in tracksDb describing the grp they belong to, color, name, etc.

3. use the hgLoadBed utility to upload .bed files. Notice that this script reads MySQL configuration data from ~/.hg.conf

/var/www/genomebrowser/cgi-bin/loader/hgLoadBed mm9 mynewtable myfile.bed

You can write some Perl scripts to automate the process or use UCSC’s utilities.

Configuring local BLAT

1. login to your hgcentral database and update the BLAT server to be your localhost (or another computer):

update blatServers set host='myserver.university.edu', port=8083 where db='mm9';

2. build BLAT

You will need to download the “kent source” from GIT and compile BLAT in your machine.

a. setup $MACHTYPE by running “uname -a” to find your architeture. Mine was x86_64

MACHTYPE=x86_64

b. create a directory ~/bin (the README says your binaries will be in ~/bin/$MACHTYPE, but in my case, they were under ~/bin)

c. go to your kent/src directory

  • go to lib and run make (should create jkweb.a). In my system, I had to run make as root and strangely, jkweb.a was created at /jkweb.a. So I had to move it from / to lib
  • go to jkOwnLib and run make (should create lib/jkOwnLib.a)
  • run make in gfServer, gfClient, blat and faToNib
  • binaries should appear under ~/bin

d. copy or move all binaries created blat, faToNib, gfClient and gfServer to /usr/local/bin

e. start gfServer:

  • go to /gbdb/mm9 (to start a blat server for mm9). This is important. Don’t give your full mm9.2bit path. If you start the server with /gbdb/mm9/mm9.2bit, for example, blat will search for /gbdb/mm9/gbdb/mm9/mm9.2bit
  • start gfServer by doing:
gfServer start myserver.university.edu 8083 -stepSize=5 -log=/tmp/gfserver.log mm9.2bit

This is what you should see:

$ gfServer start myserver.university.edu 8083 -stepSize=5 -log=/tmp/Log.log mm9.2bit
starting untranslated server...
Counting tiles in mm9.2bit
[several minutes and a good deal of RAM later]
Done adding
Server ready for queries!

You will need >2GB of RAM to start the server. Every time you run a BLAT search on your browser, ~2GB will be used for a few seconds in your BLAT server.

The neat thing is that you can run gfServer in any computer and simply tell the MySQL table blatServers where to find the BLAT server.

You can now run BLAT from your browser. You can check the status of the BLAT server by doing:

$ gfServer status myserver.university.edu 8083

Troubleshooting

Can’t connect to local MySQL server through socket ‘/var/lib/mysql/mysql.sock’ (13)

Strangely, I didn’t have this problem with one 10.04 server, but I had with another.

  • create a symbolic link from your actual socket to /var/lib: ln -s /var/run/mysqld/mysqld.sock /var/lib/mysql/mysql.sock
  • set permissions to the socked file: chmod 666 /var/lib/mysql/mysql.sock
  • set permissions to the socked directory: chmod 755 /var/lib/mysql/
  • test your connection from the shell:  mysql [your credentials] –socket=/var/lib/mysql/mysql.sock, you should get access to your MySQL db
  • I did not need to enter the socket path in /etc/apparmor.d/usr.sbin.mysqld, but if you still have access problems, you may try this
  • I did not modify /etc/mysql/debian.cnf
  • You do not need to restart your MySQL

(8)Exec format error: exec of ‘/var/www/genomebrowser/cgi-bin/hgGateway’ failed

  • Check that you have hg.conf under cgi-bin (with hgGateway)
  • Check that hg.conf is owned by www-data
  • manually run hgGateway. If if fails with “-bash: ./hgGateway: cannot execute binary file”, chances are you have a 32 bit system and you are trying to run a 64 bit binary. Install Ubuntu 64 bits or compile the browser from source. To check your system, use arch or uname -a, it should be x86_64

Can’t find hg19, but you want to display another organism

  • in the URL of your mirror give the code name of organism name that you found after db= as argument. For mouse: hgGateway?db=mm9, for Drosophila: hgGateway?org=dm3, etc

Browser loads, but nothing is displayed

  • make sure you installed the chromosome and ideogram tables
  • make sure there is a trash directory at the same level of the cgi-bin directory, and trash is www-data writable

Tracks appear scrambled

  1. One solution that worked for me was symlinking instead of copying directories js and style: ln -s /var/www/genomebrowser/js/ /usr/local/apache/htdocs/js and ln -s /var/www/genomebrowser/style/ /usr/local/apache/htdocs/style
  2. In another server, this solution failed. Creating a link in the DocumentRoot to genomebrowser/js and genomebrowser/style solved the problem.

Error in TCP non-blocking connect() 111 – Connection refused

If you get this error when trying to do a BLAT search on your local mirror, it means you didn’t start the BLAT server (gfServer start …).

See also

http://genome.ucsc.edu/admin/mirror.html

http://genomewiki.ucsc.edu/index.php/Minimal_Browser_Installation

http://bradbot.genomecenter.ucdavis.edu/wiki/index.php/Genome_Browser

Problems installing package affy in R (Ubuntu Gutsy)?

To solve this problem, first, I had to upgrade to R 2.8.

Then, affy wouldn’t compile. The problem was simple, it needed affyio.

affyio, however, wouldn’t compile either. It needed zlib.h, which can be found in several packages, but I installed zlib1g-dev and then affyio compiled without problems, and then affy compiled too.

Boxplot in R statistical package

Problem

Plotting a boxplot for many groups in desired order

Solution

1. Create a file (example.txt) in the format:

group     value
a           1
a           2
b           5
c           10
a           2
b           6
c           11

2. Start R and do:

file=read.table(“example.txt”, header=T);

mat=data.frame(file)

boxplot(mat$value~mat$group, as.data.frame(mat))

Look for additional parameters for boxplot in R documentation.

This works, but you may want to order your boxes in a different way. If you look online, many people will say that simply doing

group <- factor(group, levels=c(“c”, “a”, “b”))

will work.

It didn’t work for me.

What works is:

mat$group <- factor(as.character(mat$group), levels=c(“c”, “a”, “b”))

boxplot(mat$val~mat$group, as.data.frame(mat))