Thursday, 27 September 2007

Bio::Graphics and rails

As a follow up to my post on Bio::Graphics, I tried integrating this library in a rails application. After all, you'd get your data either from a file (like GFF) or a database. And let me tell you: it took me just 30 minutes or so to get a proof-of-concept running. This included installing rails itself, creating the rails app, creating the database, loading dummy data, and doing the coding itself. That 30 minutes was interrupted for a couple of hours, because I needed some advice from Kouhei Sutou, the author of rcairo, on how to write PNG images in memory instead of to a file.

So how do you do it? The proof-of-concept little database I created contained 3 tables:
  • chromosomes (columns: id, name, length)
  • tracks (columns: id, name, glyph, colour)
  • features (columns: id, chromosome_id, track_id, name, location, url)
Create some features for a couple of different tracks for a particular chromosome.

In views/chromosomes/show.rhtml, add the following line:

<%= @chromosome.to_png %>


My models/chromosome.rb looks like this:

require 'stringio'
require 'base64'
require_gem 'bio-graphics'

class Chromosome < through =""> :features

def to_png(width = 800, start = 1, stop = self.length)
return %{}
end

def draw(width, start, stop)
panel = Bio::Graphics::Panel.new(self.length, width, false, start, stop)
track_container = Hash.new
self.tracks.each do |track|
if ! track_container.has_key?(track.name)
track_container[track.name] = panel.add_track(track.name, track.colour.split(',').collect{|i| i.to_i}, track.glyph)
end
end

self.features.each do |feature|
track_container[feature.track.name].add_feature(feature.name, feature.location)
end

output = StringIO.new
panel.draw(output)
return output.string
end
end


UPDATE: Apparently, Blogger does not allow me to paste the correct code above. In the to_png method, replace the following ascii codes:

  • %7B with {

  • %28 with (

  • %29 with )

  • %7D with }



And that's it. I leave the integration of my ensembl-api, bio-graphics and rails as an exercise for the reader. We could make a ruby version of the Ensembl browser... and then: world domination. Mwahaha.

Thursday, 20 September 2007

Graphics, genomics and ruby


Having known and used the Generic Genome Browser (aka gbrowse, see here) for years now, it occured to me a while ago that it should be o so simple to create the same functionality with a much easier setup if we could use ruby instead of perl.

Gbrowse depends on bioperl's Bio::Graphics module. Although gbrowse has been instrumental for many people's research, it does take a bit of work to get it installed. Apart from bioperl, it depends on Apache for showing the results in a browser. Compare that to any Rails application, where you basically just need ruby and a "gem install rails". I've created rails applications in the past that contain exactly the kind of data that would typically be visualized by something like gbrowse. Takes no time at all to set up and you can even get away by virtually writing no code. And no Apache to be installed, or configuration files that you can't access because you're not root.

Such a rails application makes it possible to browse, edit and delete the data. The problem comes with the visualization bit. There's no bioruby graphics library (yet?) that automatically parses features on a reference and creates a nice picture of where your genes are on that chromosome. Of course, the genes should be clickable so you can link through to NCBI or Ensembl.

I've spend some time in the last year creating such a Bio::Graphics thing for ruby. I wanted it to behave the same as the one from bioperl: there one panel that has one or more tracks, and each track has features on it. Even though it was quite easy to create a proof-of-concept library, the most difficult part was actually finding the right backend.

What should I use to create the pictures themselves? As I'd worked with SVG before, that seemed the right way to go. Downloaded a library from http://raa.ruby-lang.org/project/ruby-svg/ and got a prototype running quite easily. Problem: I needed an SVG viewer or firefox to actually view the picture, and zooming in/out screwed up all text. So after weeks of digging around, I've found rcairo, a ruby-binding to Cairo. Migrating to this backend was easy peasy and the pictures look really nice (see at the top). Unfortunately, it's impossible to create clickable glyphs using Cairo itself, but that can be easily worked around by creating a html file with the map. That's exactly what gbrowse does as well, isn't it?

The picture at the top has been created using the following simple script:

g = BioExt::Graphics::Panel.new(800, 1200, true, 1, 610)

track1 = g.add_track('generic')
track2 = g.add_track('directed',[0,1,0],'directed_generic')
track3 = g.add_track('triangle',[0.5, 0.5, 0.5],'triangle')
track4 = g.add_track('spliced',[1,0,0],'spliced')
track5 = g.add_track('directed_spliced',[1,0,1],'directed_spliced')

track1.add_feature('bla1','250..375', 'http://www.newsforge.com')
track1.add_feature('bla2','54..124', 'http://www.thearkdb.org')
track1.add_feature('bla3','100..449', 'http://www.google.com')

track2.add_feature('bla4','50..60', 'http://www.google.com')
track2.add_feature('bla5','complement(80..120)', 'http://www.sourceforge.net')

track3.add_feature('piep','56')
track3.add_feature('bla','103', 'http://digg.com')

track4.add_feature('gene1','join(34..52,109..183)','http://news.bbc.co.uk')
track4.add_feature('gene2','complement(join(170..231,264..299,350..360,409..445))')
track4.add_feature('gene3','join(134..152,209..283)')

track5.add_feature('gene1','join(34..52,109..183)', 'http://www.vrtnieuws.net')
track5.add_feature('gene2','complement(join(170..231,264..299,350..360,409..445))','http://www.roslin.ac.uk')
track5.add_feature('gene3','join(134..152,209..283)')

g.draw('my_panel.png')



What happens here?
Line 1: Create a new panel for a sequence of 800 bp, with the picture being 1200 points wide. Make all glyphs clickable if a URL is defined (the true), and zoom into the region from 1 to 610 bp.
Lines 3-6: Create different tracks, each with a name, a colour (in RGB at the moment) and a type.
Lines 8-24: Add features to those tracks, each with a name, a locus and an optional URL to link out to external websites. Notice how it handles spliced features and features on the reverse strand?
Line 26: Create the PNG (and in this case: also HTML) file.

Here's a nicer way to produce the same type of output:

#Initialize graphic for a nucleotide sequence of 600 bp
my_panel = BioExt::Graphics::Panel.new(1000, 1200, false, 1, 600)

#Create and configure tracks
track_SNP = my_panel.add_track('SNP')
track_gene = my_panel.add_track('gene')
track_transcript = my_panel.add_track('transcript')

track_SNP.feature_colour = [1,0,0]
track_SNP.feature_glyph = 'triangle'
track_gene.feature_glyph = 'directed_spliced'
track_transcript.feature_glyph = 'spliced'
track_transcript.feature_colour = [0,0.5,0]

# Add data to tracks
DATA.each do |line|
line.chomp!
ref, type, name, location, link = line.split(/\s+/)
if link == ''
link = nil
end
if type == 'SNP'
track_SNP.add_feature(name, location, link)
elsif type == 'gene'
track_gene.add_feature(name, location, link)
elsif type == 'transcript'
track_transcript.add_feature(name, location, link)
end
end

# And draw
my_panel.draw('my_panel.png')

__END__
chr1 gene CYP2D6 complement(80..120)
chr1 gene ALDH 100..449
chr1 SNP rs1234 107
chr1 gene bla complement(400..430)
chr1 SNP rs9876 44
chr1 gene some_gene complement(join(170..231,264..299,350..360,409..445))
chr1 transcript transcript1 join(250..300,390..425)
chr1 transcript transcript2 253..330
chr1 transcript transcript3 266..344
chr1 transcript transcript4 complement(join(410..430,239..286,129..151))


If someone would actually be interested in getting the library behind this, just let me know. It should be really easy to incorporate this in a rails application where the data are actually stored in a database.

I wonder what if any role _why's Shoes thing would/could play...

UPDATE: This library has now been improved a bit and is hosted on rubyforge. You can find a tutorial and the whole API documentation at http://bio-graphics.rubyforge.org. You can find instructions on how to install and use it over there.

UPDATE TWO: Forget the previous update. I have moved the bio-graphics code to github. See http://github.com/jandot/bio-graphics. That should make it much easier to fork the code and get more input from other developers.

Tuesday, 4 September 2007

ActiveRecord - all vs all relationships

Modeling genetics or genomics data presents its own challenges. One of the issues is that the actual definition of things change over time. A database system can only be based on the scientific knowledge at the time of conception. The prime example of course is the definition of a gene over the years. Before 1997, it was believed that the vast majority of these encoded proteins. As a result, 'genes' tables in databases typically had columns to store information on the start and stop codon. However, it became clear that many genes actually do not encode proteins, forcing the remodeling of biological databases. But that's not the topic of this post.

What is the topic here, is how relationships can be stored in a database. Suppose I want to store mapping data: markers mapped to linkage groups, clones mapped to physical maps, ... Markers are stored in a markers table, clones are stored in a clones table, linkage groups in a linkage_groups table; you get the point.

The database that I'm working with at the moment (and only have read-access to), stores the mappings in a mappings table which includes the following columns:
  • map_type
  • map_name
  • mapped_object_type
  • mapped_object_name
So records could look like:
 map_type       map_id  map_name      mapped_object_type  mapped_object_id  mapped_object_name
--------------+-------+-------------+-------------------+-----------------+------------------
chromosome 1 chromosome_1 marker 1 marker_A
chromosome 1 chromosome_1 marker 2 marker_B
physical_map 2 ctg1 clone 1 clone_A
physical_map 3 ctg2 clone 2 clone_B


To make things worse, markers can also be mapped to clones. This means that any clone can act as a marker, but also as a map at the same time.
 map_type       map_id  map_name      mapped_object_type  mapped_object_id  mapped_object_name
--------------+-------+-------------+-------------------+-----------------+------------------
clone 1 clone_A marker 1 marker_A


How can I model this in ActiveRecord? There's the concept of polymorphisms in ruby, which could solve this relationship nightmare if there would be only one thing in the mappings table that's polymorphic. But as it happens, there's two... Evan Weaver wrote this rails plugin has_many_polymorphs, which should do the trick (see here for a tutorial and background if it's unclear what I'm talking about). Unfortunately, as it is focussed on rails and not on ActiveRecord in general, it doesn't handle namespaces.

So here's what I've come up with:

module MyNameSpace
class Mapping < ActiveRecord::Base
# Relationships to feature-like things
belongs_to :marker, :foreign_key => 'mapped_object_id', :conditions => ["mapped_object_type = 'marker'"]
belongs_to :clone, :foreign_key => 'mapped_object_id', :conditions => ["mapped_object_type = 'clone'"]

# Relationships to map-like things
belongs_to :chromosome, :foreign_key => 'map_id', :conditions => ["map_type = 'chromosome'"]
belongs_to :physical_map, :foreign_key => 'map_id', :conditions => ["map_type = 'physical_map'"]
belongs_to :clone, :foreign_key => 'map_id', :conditions => ["map_type = 'clone'"]
end

class Marker < ActiveRecord::Base
has_many :mappings_as_feature, :class_name => 'Mapping', :foreign_key => 'mapped_object_id', :conditions => "mapped_object_type = 'marker'"
has_many :chromosomes, :through => :mappings_as_feature
has_many :clones, :through => :mappings_as_feature
end

class Chromosome < ActiveRecord::Base
has_many :mappings_as_map, :class_name => 'Mapping', :foreign_key => 'map_id', :conditions => "map_type = 'chromosome'"
has_many :markers, :through => :mappings_as_map
end

class PhysicalMap < ActiveRecord::Base
has_many :mappings_as_map, :class_name => 'Mapping', :foreign_key => 'map_id', :conditions => "map_type = 'physical_map'"
has_many :clones, :through => :mappings_as_map
end

class Clone < ActiveRecord::Base
# Relationships where the clone is the feature
has_many :mappings_as_feature, :class_name => 'Mapping', :foreign_key => 'mapped_object_id', :conditions => "mapped_object_type = 'clone'"
has_many :physical_maps, :through => :mappings_as_feature

# Relationships where the clone is the map
has_many :mappings_as_map, :class_name => 'Mapping', :foreign_key => 'map_id', :conditions => "map_type = 'clone'"
has_many :markers, :through => :mappings_as_map
end
end


The key here is to make the distinguish between mappings_as_feature and mappings_as_map. A marker object can only have mappings where it acts as a feature, while a clone can both have mappings where it acts as a feature and where it acts as a map.

Using this code, it's now possible to do:

clone = Clone.find_by_name('clone_A')
puts clone.mappings_as_map.to_yaml
puts clone.mappings_as_feature.to_yaml
puts clone.markers.to_yaml
puts clone.physical_maps.to_yaml


Voila (until further notice...).

UPDATE: Pratik blogged about has_many_polymorphs and lists the generated associations here.