Class: Ensembl::Core::Transcript
- Inherits:
-
DBConnection
- Object
- ActiveRecord::Base
- DBRegistry::Base
- DBConnection
- Ensembl::Core::Transcript
- Includes:
- Sliceable
- Defined in:
- lib/bio-ensembl/core/transcript.rb
Overview
The Transcript class provides an interface to the transcript table. This table contains mappings of transcripts for a Gene to a SeqRegion.
This class uses ActiveRecord to access data in the Ensembl database. See the general documentation of the Ensembl module for more information on what this means and what methods are available.
This class includes the mixin Sliceable, which means that it is mapped to a SeqRegion object and a Slice can be created for objects of this class. See Sliceable and Slice for more information.
Class Method Summary collapse
-
.find_all_by_stable_id(stable_id) ⇒ Object
The Transcript#find_all_by_stable_id class method returns an array of transcripts with the given stable_id.
-
.find_by_stable_id(stable_id) ⇒ Object
The Transcript#find_by_stable_id class method fetches a Transcript object based on its stable ID (i.e. the “ENST” accession number).
Instance Method Summary collapse
-
#cdna2genomic(pos) ⇒ Integer
The Transcript#cdna2genomic method converts cDNA coordinates to genomic coordinates for this transcript.
-
#cds2genomic(pos) ⇒ Integer
The Transcript#cds2genomic method converts CDS coordinates to genomic coordinates for this transcript.
-
#cds_seq ⇒ Object
The Transcript#cds_seq method returns the coding sequence of the transcript, i.e.
-
#coding_region_cdna_end ⇒ Object
The Transcript#coding_region_cdna_end returns the stop position of the CDS in cDNA coordinates.
-
#coding_region_cdna_start ⇒ Object
The Transcript#coding_region_cdna_start returns the start position of the CDS in cDNA coordinates.
-
#coding_region_genomic_end ⇒ Object
The Transcript#coding_region_genomic_end returns the stop position of the CDS in genomic coordinates.
-
#coding_region_genomic_start ⇒ Object
The Transcript#coding_region_genomic_start returns the start position of the CDS in genomic coordinates.
-
#display_label ⇒ Object
(also: #display_name, #label, #name)
The Transcript#display_label method returns the default name of the transcript.
-
#exon_for_cdna_position(pos) ⇒ Object
The Transcript#exon_for_position identifies the exon that covers a given position of the cDNA.
-
#exon_for_genomic_position(pos) ⇒ Object
The Transcript#exon_for_position identifies the exon that covers a given genomic position.
-
#five_prime_utr_seq ⇒ Object
The Transcript#five_prime_utr_seq method returns the sequence of the 5’UTR of the transcript.
-
#genomic2cdna(pos) ⇒ Integer
The Transcript#genomic2cdna method converts genomic coordinates to cDNA coordinates for this transcript.
-
#genomic2cds(pos) ⇒ Integer
The Transcript#genomic2cds method converts genomic coordinates to CDS coordinates for this transcript.
-
#genomic2pep(pos) ⇒ Integer
The Transcript#genomic2pep method converts genomic coordinates to peptide coordinates for this transcript.
-
#introns ⇒ Array<Intron>
The Transcript#introns methods returns the introns for this transcript.
-
#pep2genomic(pos) ⇒ Integer
The Transcript#pep2genomic method converts peptide coordinates to genomic coordinates for this transcript.
-
#protein_seq ⇒ Object
The Transcript#protein_seq method returns the sequence of the protein of the transcript.
-
#seq ⇒ Object
The Transcript#seq method returns the full sequence of all concatenated exons.
-
#stable_id ⇒ String
The Transcript#stable_id method returns the stable ID of the transcript.
-
#three_prime_utr_seq ⇒ Object
The Transcript#three_prime_utr_seq method returns the sequence of the 3’UTR of the transcript.
Methods included from Sliceable
#length, #project, #slice, #start, #stop, #strand, #transform
Methods inherited from DBConnection
connect, ensemblgenomes_connect
Methods inherited from DBRegistry::Base
generic_connect, get_info, get_name_from_db
Class Method Details
.find_all_by_stable_id(stable_id) ⇒ Object
The Transcript#find_all_by_stable_id class method returns an array of transcripts with the given stable_id. If none were found, an empty array is returned.
142 143 144 145 146 147 148 149 150 |
# File 'lib/bio-ensembl/core/transcript.rb', line 142 def self.find_all_by_stable_id(stable_id) answer = Array.new transcript_stable_id_objects = Ensembl::Core::TranscriptStableId.find_all_by_stable_id(stable_id) transcript_stable_id_objects.each do |transcript_stable_id_object| answer.push(Ensembl::Core::Transcript.find(transcript_stable_id_object.transcript_id)) end return answer end |
.find_by_stable_id(stable_id) ⇒ Object
The Transcript#find_by_stable_id class method fetches a Transcript object based on its stable ID (i.e. the “ENST” accession number). If the name is not found, it returns nil.
154 155 156 157 158 159 160 161 |
# File 'lib/bio-ensembl/core/transcript.rb', line 154 def self.find_by_stable_id(stable_id) all = self.find_all_by_stable_id(stable_id) if all.length == 0 return nil else return all[0] end end |
Instance Method Details
#cdna2genomic(pos) ⇒ Integer
The Transcript#cdna2genomic method converts cDNA coordinates to genomic coordinates for this transcript.
318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 |
# File 'lib/bio-ensembl/core/transcript.rb', line 318 def cdna2genomic(pos) #FIXME: Still have to check for when pos is outside of scope of cDNA. # Identify the exon we're looking at. exon_with_target = self.exon_for_cdna_position(pos) accumulated_position = 0 ex = self.exons.sort_by {|e| e.seq_region_start} ex.reverse! if self.strand == -1 ex.each do |exon| if exon == exon_with_target length_to_be_taken_from_exon = pos - (accumulated_position + 1) if self.strand == -1 return exon.seq_region_end - length_to_be_taken_from_exon else return exon.seq_region_start + length_to_be_taken_from_exon end else accumulated_position += exon.length end end end |
#cds2genomic(pos) ⇒ Integer
The Transcript#cds2genomic method converts CDS coordinates to genomic coordinates for this transcript.
345 346 347 |
# File 'lib/bio-ensembl/core/transcript.rb', line 345 def cds2genomic(pos) return self.cdna2genomic(pos + self.coding_region_cdna_start) end |
#cds_seq ⇒ Object
The Transcript#cds_seq method returns the coding sequence of the transcript, i.e. the concatenated sequence of all exons minus the UTRs.
189 190 191 192 193 |
# File 'lib/bio-ensembl/core/transcript.rb', line 189 def cds_seq cds_length = self.coding_region_cdna_end - self.coding_region_cdna_start + 1 return self.seq[(self.coding_region_cdna_start - 1), cds_length] end |
#coding_region_cdna_end ⇒ Object
The Transcript#coding_region_cdna_end returns the stop position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_end, the CDS start position is always at the border of the 3’UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ”right” of the CDS stop position.
270 271 272 273 274 275 276 277 278 279 280 281 |
# File 'lib/bio-ensembl/core/transcript.rb', line 270 def coding_region_cdna_end answer = 0 self.exons.each do |exon| if exon == self.translation.end_exon answer += self.translation.seq_end return answer else answer += exon.length end end end |
#coding_region_cdna_start ⇒ Object
The Transcript#coding_region_cdna_start returns the start position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_start, the CDS start position is always at the border of the 5’UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ”right” of the CDS stop position.
250 251 252 253 254 255 256 257 258 259 260 261 262 |
# File 'lib/bio-ensembl/core/transcript.rb', line 250 def coding_region_cdna_start answer = 0 self.exons.each do |exon| if exon == self.translation.start_exon answer += self.translation.seq_start return answer else answer += exon.length end end end |
#coding_region_genomic_end ⇒ Object
The Transcript#coding_region_genomic_end returns the stop position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_end, the CDS stop position is always ”right” of the start position. So for transcripts on the reverse strand, the CDS stop position is at the border of the 5’UTR instead of the 3’UTR.
235 236 237 238 239 240 241 242 |
# File 'lib/bio-ensembl/core/transcript.rb', line 235 def coding_region_genomic_end strand = self.translation.start_exon.seq_region_strand if strand == 1 return self.translation.end_exon.seq_region_start + ( self.translation.seq_end - 1 ) else return self.translation.start_exon.seq_region_end - ( self.translation.seq_start - 1 ) end end |
#coding_region_genomic_start ⇒ Object
The Transcript#coding_region_genomic_start returns the start position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_start, the CDS start position is always ”left” of the end position. So for transcripts on the reverse strand, the CDS start position is at the border of the 3’UTR instead of the 5’UTR.
220 221 222 223 224 225 226 227 |
# File 'lib/bio-ensembl/core/transcript.rb', line 220 def coding_region_genomic_start strand = self.translation.start_exon.seq_region_strand if strand == 1 return self.translation.start_exon.seq_region_start + ( self.translation.seq_start - 1 ) else return self.translation.end_exon.seq_region_end - ( self.translation.seq_end - 1 ) end end |
#display_label ⇒ Object Also known as: display_name, label, name
The Transcript#display_label method returns the default name of the transcript.
132 133 134 |
# File 'lib/bio-ensembl/core/transcript.rb', line 132 def display_label return Xref.find(self.display_xref_id).display_label end |
#exon_for_cdna_position(pos) ⇒ Object
The Transcript#exon_for_position identifies the exon that covers a given position of the cDNA.
300 301 302 303 304 305 306 307 308 309 310 311 |
# File 'lib/bio-ensembl/core/transcript.rb', line 300 def exon_for_cdna_position(pos) # FIXME: Still have to check for when pos is outside of scope of cDNA. accumulated_exon_length = 0 self.exons.each do |exon| accumulated_exon_length += exon.length if accumulated_exon_length > pos return exon end end raise RuntimeError, "Position outside of cDNA scope" end |
#exon_for_genomic_position(pos) ⇒ Object
The Transcript#exon_for_position identifies the exon that covers a given genomic position. Returns the exon object, or nil if in intron.
286 287 288 289 290 291 292 293 294 295 296 |
# File 'lib/bio-ensembl/core/transcript.rb', line 286 def exon_for_genomic_position(pos) if pos < self.seq_region_start or pos > self.seq_region_end raise RuntimeError, "Position has to be within transcript" end self.exons.each do |exon| if exon.start <= pos and exon.stop >= pos return exon end end return nil end |
#five_prime_utr_seq ⇒ Object
The Transcript#five_prime_utr_seq method returns the sequence of the 5’UTR of the transcript.
197 198 199 |
# File 'lib/bio-ensembl/core/transcript.rb', line 197 def five_prime_utr_seq return self.seq[0, self.coding_region_cdna_start - 1] end |
#genomic2cdna(pos) ⇒ Integer
The Transcript#genomic2cdna method converts genomic coordinates to cDNA coordinates for this transcript.
363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 |
# File 'lib/bio-ensembl/core/transcript.rb', line 363 def genomic2cdna(pos) #FIXME: Still have to check for when pos is outside of scope of cDNA. # Identify the exon we're looking at. exon_with_target = self.exon_for_genomic_position(pos) accumulated_position = 0 ex = self.exons.sort_by {|e| e.seq_region_start} ex.reverse! if self.strand == -1 ex.each do |exon| if exon.stable_id == exon_with_target.stable_id if self.strand == 1 accumulated_position += ( pos - exon.start) +1 else accumulated_position += ( exon.stop - pos ) +1 end return accumulated_position else accumulated_position += exon.length end end return RuntimeError, "Position outside of cDNA scope" end |
#genomic2cds(pos) ⇒ Integer
The Transcript#genomic2cds method converts genomic coordinates to CDS coordinates for this transcript.
391 392 393 |
# File 'lib/bio-ensembl/core/transcript.rb', line 391 def genomic2cds(pos) return self.genomic2cdna(pos) - self.coding_region_cdna_start end |
#genomic2pep(pos) ⇒ Integer
The Transcript#genomic2pep method converts genomic coordinates to peptide coordinates for this transcript.
Arguments:
- pos
-
position on the chromosome (required)
- Returns
403 404 405 |
# File 'lib/bio-ensembl/core/transcript.rb', line 403 def genomic2pep(pos) raise NotImplementedError end |
#introns ⇒ Array<Intron>
The Transcript#introns methods returns the introns for this transcript
111 112 113 114 115 116 117 118 119 120 121 122 |
# File 'lib/bio-ensembl/core/transcript.rb', line 111 def introns if @introns.nil? @introns = Array.new if self.exons.length > 1 self.exons.each_with_index do |exon, index| next if index == 0 @introns.push(Intron.new(self.exons[index - 1], exon)) end end end return @introns end |
#pep2genomic(pos) ⇒ Integer
The Transcript#pep2genomic method converts peptide coordinates to genomic coordinates for this transcript.
354 355 356 |
# File 'lib/bio-ensembl/core/transcript.rb', line 354 def pep2genomic(pos) raise NotImplementedError end |
#protein_seq ⇒ Object
The Transcript#protein_seq method returns the sequence of the protein of the transcript.
209 210 211 |
# File 'lib/bio-ensembl/core/transcript.rb', line 209 def protein_seq return Bio::Sequence::NA.new(self.cds_seq).translate.seq end |
#seq ⇒ Object
The Transcript#seq method returns the full sequence of all concatenated exons.
177 178 179 180 181 182 183 184 185 |
# File 'lib/bio-ensembl/core/transcript.rb', line 177 def seq if @seq.nil? @seq = '' self.exons.each do |exon| @seq += exon.seq end end return @seq end |
#stable_id ⇒ String
The Transcript#stable_id method returns the stable ID of the transcript.
127 128 129 |
# File 'lib/bio-ensembl/core/transcript.rb', line 127 def stable_id return self.transcript_stable_id.stable_id end |
#three_prime_utr_seq ⇒ Object
The Transcript#three_prime_utr_seq method returns the sequence of the 3’UTR of the transcript.
203 204 205 |
# File 'lib/bio-ensembl/core/transcript.rb', line 203 def three_prime_utr_seq return self.seq[self.coding_region_cdna_end..-1] end |