Module: Ensembl
- Defined in:
- lib/bio-ensembl.rb,
lib/bio-ensembl/core/slice.rb,
lib/bio-ensembl/core/project.rb,
lib/bio-ensembl/core/transform.rb,
lib/bio-ensembl/core/collection.rb,
lib/bio-ensembl/core/transcript.rb,
lib/bio-ensembl/core/activerecord.rb,
lib/bio-ensembl/variation/activerecord.rb,
lib/bio-ensembl/variation/variation_feature.rb,
lib/bio-ensembl/variation/variation_feature62.rb,
lib/bio-ensembl/db_connection.rb
Overview
What is it?
The Ensembl module provides an API to the Ensembl databases stored at ensembldb.ensembl.org. This is the same information that is available from www.ensembl.org.
The Ensembl::Core module mainly covers sequences and annotations. The Ensembl::Variation module covers variations (e.g. SNPs). The Ensembl::Compara module covers comparative mappings between species.
ActiveRecord
The Ensembl API provides a ruby interface to the Ensembl mysql databases at ensembldb.ensembl.org. Most of the API is based on ActiveRecord to get data from that database. In general, each table is described by a class with the same name: the coord_system table is covered by the CoordSystem class, the seq_region table is covered by the SeqRegion class, etc. As a result, accessors are available for all columns in each table. For example, the seq_region table has the following columns: seq_region_id, name, coord_system_id and length. Through ActiveRecord, these column names become available as attributes of SeqRegion objects:
puts my_seq_region.seq_region_id
puts my_seq_region.name
puts my_seq_region.coord_system_id
puts my_seq_region.length.to_s
ActiveRecord makes it easy to extract data from those tables using the collection of #find methods. There are three types of #find methods (e.g. for the CoordSystem class):
-
find based on primary key in table:
my_coord_system = CoordSystem.find(5)
-
find_by_sql:
my_coord_system = CoordSystem.find_by_sql('SELECT * FROM coord_system WHERE name = 'chromosome'")
-
find_by_<insert_your_column_name_here>
my_coord_system1 = CoordSystem.find_by_name('chromosome')
my_coord_system2 = CoordSystem.find_by_rank(3)
To find out which find_by_<column> methods are available, you can list the column names using the column_names class methods:
puts Ensembl::Core::CoordSystem.column_names.join("\t")
For more information on the find methods, see ar.rubyonrails.org/classes/ActiveRecord/Base.html#M000344
The relationships between different tables are accessible through the classes as well. For example, to loop over all seq_regions belonging to a coord_system (a coord_system “has many” seq_regions):
chr_coord_system = CoordSystem.find_by_name('chromosome')
chr_coord_system.seq_regions.each do |seq_region|
puts seq_region.name
end
Of course, you can go the other way as well (a seq_region “belongs to” a coord_system):
chr4 = SeqRegion.find_by_name('4')
puts chr4.coord_system.name #--> 'chromosome'
To find out what relationships exist for a given class, you can use the #reflect_on_all_associations class methods:
puts SeqRegion.reflect_on_all_associations(:has_many).collect{|a| a.name.to_s}.join("\n")
puts SeqRegion.reflect_on_all_associations(:has_one).collect{|a| a.name.to_s}.join("\n")
puts SeqRegion.reflect_on_all_associations(:belongs_to).collect{|a| a.name.to_s}.join("\n")
Defined Under Namespace
Modules: Core, DBRegistry, Variation Classes: DummyDBConnection, Session