Class: Ensembl::Core::Transcript

Inherits:

DBConnection

Object
ActiveRecord::Base
DBRegistry::Base
DBConnection
Ensembl::Core::Transcript

show all

Includes:: Sliceable

Defined in:: lib/bio-ensembl/core/transcript.rb

Overview

The Transcript class provides an interface to the transcript table. This table contains mappings of transcripts for a Gene to a SeqRegion.

This class uses ActiveRecord to access data in the Ensembl database. See the general documentation of the Ensembl module for more information on what this means and what methods are available.

This class includes the mixin Sliceable, which means that it is mapped to a SeqRegion object and a Slice can be created for objects of this class. See Sliceable and Slice for more information.

Examples:

#TODO

Class Method Summary collapse

.find_all_by_stable_id(stable_id) ⇒ Object

The Transcript#find_all_by_stable_id class method returns an array of transcripts with the given stable_id.
.find_by_stable_id(stable_id) ⇒ Object

The Transcript#find_by_stable_id class method fetches a Transcript object based on its stable ID (i.e. the “ENST” accession number).

Instance Method Summary collapse

#cdna2genomic(pos) ⇒ Integer

The Transcript#cdna2genomic method converts cDNA coordinates to genomic coordinates for this transcript.
#cds2genomic(pos) ⇒ Integer

The Transcript#cds2genomic method converts CDS coordinates to genomic coordinates for this transcript.
#cds_seq ⇒ Object

The Transcript#cds_seq method returns the coding sequence of the transcript, i.e.
#coding_region_cdna_end ⇒ Object

The Transcript#coding_region_cdna_end returns the stop position of the CDS in cDNA coordinates.
#coding_region_cdna_start ⇒ Object

The Transcript#coding_region_cdna_start returns the start position of the CDS in cDNA coordinates.
#coding_region_genomic_end ⇒ Object

The Transcript#coding_region_genomic_end returns the stop position of the CDS in genomic coordinates.
#coding_region_genomic_start ⇒ Object

The Transcript#coding_region_genomic_start returns the start position of the CDS in genomic coordinates.
#display_label ⇒ Object (also: #display_name, #label, #name)

The Transcript#display_label method returns the default name of the transcript.
#exon_for_cdna_position(pos) ⇒ Object

The Transcript#exon_for_position identifies the exon that covers a given position of the cDNA.
#exon_for_genomic_position(pos) ⇒ Object

The Transcript#exon_for_position identifies the exon that covers a given genomic position.
#five_prime_utr_seq ⇒ Object

The Transcript#five_prime_utr_seq method returns the sequence of the 5’UTR of the transcript.
#genomic2cdna(pos) ⇒ Integer

The Transcript#genomic2cdna method converts genomic coordinates to cDNA coordinates for this transcript.
#genomic2cds(pos) ⇒ Integer

The Transcript#genomic2cds method converts genomic coordinates to CDS coordinates for this transcript.
#genomic2pep(pos) ⇒ Integer

The Transcript#genomic2pep method converts genomic coordinates to peptide coordinates for this transcript.
#introns ⇒ Array<Intron>

The Transcript#introns methods returns the introns for this transcript.
#pep2genomic(pos) ⇒ Integer

The Transcript#pep2genomic method converts peptide coordinates to genomic coordinates for this transcript.
#protein_seq ⇒ Object

The Transcript#protein_seq method returns the sequence of the protein of the transcript.
#seq ⇒ Object

The Transcript#seq method returns the full sequence of all concatenated exons.
#stable_id ⇒ String

The Transcript#stable_id method returns the stable ID of the transcript.
#three_prime_utr_seq ⇒ Object

The Transcript#three_prime_utr_seq method returns the sequence of the 3’UTR of the transcript.

Class Method Details

.find_all_by_stable_id(stable_id) ⇒ `Object`

The Transcript#find_all_by_stable_id class method returns an array of transcripts with the given stable_id. If none were found, an empty array is returned.

# File 'lib/bio-ensembl/core/transcript.rb', line 142

def self.find_all_by_stable_id(stable_id)
  answer = Array.new
  transcript_stable_id_objects = Ensembl::Core::TranscriptStableId.find_all_by_stable_id(stable_id)
  transcript_stable_id_objects.each do |transcript_stable_id_object|
    answer.push(Ensembl::Core::Transcript.find(transcript_stable_id_object.transcript_id))
  end

  return answer
end

.find_by_stable_id(stable_id) ⇒ `Object`

The Transcript#find_by_stable_id class method fetches a Transcript object based on its stable ID (i.e. the “ENST” accession number). If the name is not found, it returns nil.

# File 'lib/bio-ensembl/core/transcript.rb', line 154

def self.find_by_stable_id(stable_id)
  all = self.find_all_by_stable_id(stable_id)
  if all.length == 0
    return nil
  else
    return all[0]
  end
end

Instance Method Details

#cdna2genomic(pos) ⇒ `Integer`

The Transcript#cdna2genomic method converts cDNA coordinates to genomic coordinates for this transcript.

Parameters:

pos (Integer) —

Position on the cDNA

Returns:

(Integer) —

Position on the genomic DNA

# File 'lib/bio-ensembl/core/transcript.rb', line 318

def cdna2genomic(pos)
  #FIXME: Still have to check for when pos is outside of scope of cDNA.
  # Identify the exon we're looking at.
  exon_with_target = self.exon_for_cdna_position(pos)
  
  accumulated_position = 0
  ex = self.exons.sort_by {|e| e.seq_region_start}
  ex.reverse! if self.strand == -1
  ex.each do |exon|  
    if exon == exon_with_target
      length_to_be_taken_from_exon = pos - (accumulated_position + 1)
      if self.strand == -1
        return exon.seq_region_end - length_to_be_taken_from_exon
      else
        return exon.seq_region_start + length_to_be_taken_from_exon
      end
    else
      accumulated_position += exon.length 
    end
  end
end

#cds2genomic(pos) ⇒ `Integer`

The Transcript#cds2genomic method converts CDS coordinates to genomic coordinates for this transcript.

Parameters:

pos (Integer) —

Position on the CDS

Returns:

(Integer) —

Position on the genomic DNA



345
346
347

# File 'lib/bio-ensembl/core/transcript.rb', line 345

def cds2genomic(pos)
  return self.cdna2genomic(pos + self.coding_region_cdna_start)
end

#cds_seq ⇒ `Object`

The Transcript#cds_seq method returns the coding sequence of the transcript, i.e. the concatenated sequence of all exons minus the UTRs.

# File 'lib/bio-ensembl/core/transcript.rb', line 189

def cds_seq
  cds_length = self.coding_region_cdna_end - self.coding_region_cdna_start + 1
  
  return self.seq[(self.coding_region_cdna_start - 1), cds_length]
end

#coding_region_cdna_end ⇒ `Object`

The Transcript#coding_region_cdna_end returns the stop position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_end, the CDS start position is always at the border of the 3’UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ”right” of the CDS stop position.

# File 'lib/bio-ensembl/core/transcript.rb', line 270

def coding_region_cdna_end
  answer = 0
  
  self.exons.each do |exon|
    if exon == self.translation.end_exon
      answer += self.translation.seq_end
      return answer
    else
      answer += exon.length
    end
  end
end

#coding_region_cdna_start ⇒ `Object`

The Transcript#coding_region_cdna_start returns the start position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_start, the CDS start position is always at the border of the 5’UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ”right” of the CDS stop position.

# File 'lib/bio-ensembl/core/transcript.rb', line 250

def coding_region_cdna_start
  answer = 0
  
  self.exons.each do |exon|
    if exon == self.translation.start_exon
      answer += self.translation.seq_start
      return answer
    else
      answer += exon.length
    end
  end
  
end

#coding_region_genomic_end ⇒ `Object`

The Transcript#coding_region_genomic_end returns the stop position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_end, the CDS stop position is always ”right” of the start position. So for transcripts on the reverse strand, the CDS stop position is at the border of the 5’UTR instead of the 3’UTR.

# File 'lib/bio-ensembl/core/transcript.rb', line 235

def coding_region_genomic_end
  strand = self.translation.start_exon.seq_region_strand
  if strand == 1
    return self.translation.end_exon.seq_region_start + ( self.translation.seq_end - 1 )
  else
    return self.translation.start_exon.seq_region_end - ( self.translation.seq_start - 1 )
  end
end

#coding_region_genomic_start ⇒ `Object`

The Transcript#coding_region_genomic_start returns the start position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_start, the CDS start position is always ”left” of the end position. So for transcripts on the reverse strand, the CDS start position is at the border of the 3’UTR instead of the 5’UTR.

# File 'lib/bio-ensembl/core/transcript.rb', line 220

def coding_region_genomic_start
  strand = self.translation.start_exon.seq_region_strand
  if strand == 1
    return self.translation.start_exon.seq_region_start + ( self.translation.seq_start - 1 )
  else
    return self.translation.end_exon.seq_region_end - ( self.translation.seq_end - 1 )
  end
end

#display_label ⇒ `Object` Also known as: display_name, label, name

The Transcript#display_label method returns the default name of the transcript.



132
133
134

# File 'lib/bio-ensembl/core/transcript.rb', line 132

def display_label
  return Xref.find(self.display_xref_id).display_label
end

#exon_for_cdna_position(pos) ⇒ `Object`

The Transcript#exon_for_position identifies the exon that covers a given position of the cDNA.

Raises:

(RuntimeError)

# File 'lib/bio-ensembl/core/transcript.rb', line 300

def exon_for_cdna_position(pos)
  # FIXME: Still have to check for when pos is outside of scope of cDNA.
  accumulated_exon_length = 0
  
  self.exons.each do |exon|
    accumulated_exon_length += exon.length
    if accumulated_exon_length > pos
      return exon
    end
  end
  raise RuntimeError, "Position outside of cDNA scope"
end

#exon_for_genomic_position(pos) ⇒ `Object`

The Transcript#exon_for_position identifies the exon that covers a given genomic position. Returns the exon object, or nil if in intron.

# File 'lib/bio-ensembl/core/transcript.rb', line 286

def exon_for_genomic_position(pos)
  if pos < self.seq_region_start or pos > self.seq_region_end
    raise RuntimeError, "Position has to be within transcript"
  end
  self.exons.each do |exon|
    if exon.start <= pos and exon.stop >= pos
      return exon
    end
  end
  return nil
end

#five_prime_utr_seq ⇒ `Object`

The Transcript#five_prime_utr_seq method returns the sequence of the 5’UTR of the transcript.



197
198
199

# File 'lib/bio-ensembl/core/transcript.rb', line 197

def five_prime_utr_seq
  return self.seq[0, self.coding_region_cdna_start - 1]
end

#genomic2cdna(pos) ⇒ `Integer`

The Transcript#genomic2cdna method converts genomic coordinates to cDNA coordinates for this transcript.

Parameters:

pos (Integer) —

Position on the genomic DNA

Returns:

(Integer) —

Position on the cDNA

# File 'lib/bio-ensembl/core/transcript.rb', line 363

def genomic2cdna(pos)
  #FIXME: Still have to check for when pos is outside of scope of cDNA.
  # Identify the exon we're looking at.
  exon_with_target = self.exon_for_genomic_position(pos)
  
  accumulated_position = 0
  ex = self.exons.sort_by {|e| e.seq_region_start}
  ex.reverse! if self.strand == -1
  ex.each do |exon|
    if exon.stable_id == exon_with_target.stable_id
      if self.strand == 1
        accumulated_position += ( pos - exon.start) +1
      else
        accumulated_position += ( exon.stop - pos ) +1
      end  
      return accumulated_position
    else
        accumulated_position += exon.length 
    end
  end
  return RuntimeError, "Position outside of cDNA scope"
end

#genomic2cds(pos) ⇒ `Integer`

The Transcript#genomic2cds method converts genomic coordinates to CDS coordinates for this transcript.

Parameters:

pos (Integer) —

Position on the genomic DNA

Returns:

(Integer) —

Position on the CDS



391
392
393

# File 'lib/bio-ensembl/core/transcript.rb', line 391

def genomic2cds(pos)
  return self.genomic2cdna(pos) - self.coding_region_cdna_start
end

#genomic2pep(pos) ⇒ `Integer`

The Transcript#genomic2pep method converts genomic coordinates to peptide coordinates for this transcript.

Arguments:

pos

position on the chromosome (required)

Returns

Parameters:

pos (Integer) —

Base position on the genomic DNA

Returns:

(Integer) —

Aminoacid position in the protein

Raises:

(NotImplementedError)



403
404
405

# File 'lib/bio-ensembl/core/transcript.rb', line 403

def genomic2pep(pos)
  raise NotImplementedError
end

#introns ⇒ `Array<Intron>`

The Transcript#introns methods returns the introns for this transcript

Returns:

(Array<Intron>) —

Sorted array of Intron objects

# File 'lib/bio-ensembl/core/transcript.rb', line 111

def introns
  if @introns.nil?
    @introns = Array.new
    if self.exons.length > 1
      self.exons.each_with_index do |exon, index|
        next if index == 0
        @introns.push(Intron.new(self.exons[index - 1], exon))
      end
    end
  end
  return @introns
end

#pep2genomic(pos) ⇒ `Integer`

The Transcript#pep2genomic method converts peptide coordinates to genomic coordinates for this transcript.

Parameters:

pos (Integer) —

Aminoacid position on the protein

Returns:

(Integer) —

Position on the genomic DNA

Raises:

(NotImplementedError)



354
355
356

# File 'lib/bio-ensembl/core/transcript.rb', line 354

def pep2genomic(pos)
  raise NotImplementedError
end

#protein_seq ⇒ `Object`

The Transcript#protein_seq method returns the sequence of the protein of the transcript.



209
210
211

# File 'lib/bio-ensembl/core/transcript.rb', line 209

def protein_seq
  return Bio::Sequence::NA.new(self.cds_seq).translate.seq
end

#seq ⇒ `Object`

The Transcript#seq method returns the full sequence of all concatenated exons.

# File 'lib/bio-ensembl/core/transcript.rb', line 177

def seq
  if @seq.nil?
    @seq = ''
    self.exons.each do |exon|
      @seq += exon.seq
    end
  end
  return @seq
end

#stable_id ⇒ `String`

The Transcript#stable_id method returns the stable ID of the transcript.

Returns:

(String) —

Ensembl stable ID of the transcript.



127
128
129

# File 'lib/bio-ensembl/core/transcript.rb', line 127

def stable_id
  return self.transcript_stable_id.stable_id
end

#three_prime_utr_seq ⇒ `Object`

The Transcript#three_prime_utr_seq method returns the sequence of the 3’UTR of the transcript.



203
204
205

# File 'lib/bio-ensembl/core/transcript.rb', line 203

def three_prime_utr_seq
  return self.seq[self.coding_region_cdna_end..-1]
end

Class: Ensembl::Core::Transcript

Overview

Examples:

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Sliceable

Methods inherited from DBConnection

Methods inherited from DBRegistry::Base

Class Method Details

.find_all_by_stable_id(stable_id) ⇒ Object

.find_by_stable_id(stable_id) ⇒ Object

Instance Method Details

#cdna2genomic(pos) ⇒ Integer

#cds2genomic(pos) ⇒ Integer

#cds_seq ⇒ Object

#coding_region_cdna_end ⇒ Object

#coding_region_cdna_start ⇒ Object

#coding_region_genomic_end ⇒ Object

#coding_region_genomic_start ⇒ Object

#display_label ⇒ Object Also known as: display_name, label, name

#exon_for_cdna_position(pos) ⇒ Object

#exon_for_genomic_position(pos) ⇒ Object

#five_prime_utr_seq ⇒ Object

#genomic2cdna(pos) ⇒ Integer

#genomic2cds(pos) ⇒ Integer

#genomic2pep(pos) ⇒ Integer

#introns ⇒ Array<Intron>

#pep2genomic(pos) ⇒ Integer

#protein_seq ⇒ Object

#seq ⇒ Object

#stable_id ⇒ String

#three_prime_utr_seq ⇒ Object

.find_all_by_stable_id(stable_id) ⇒ `Object`

.find_by_stable_id(stable_id) ⇒ `Object`

#cdna2genomic(pos) ⇒ `Integer`

#cds2genomic(pos) ⇒ `Integer`

#cds_seq ⇒ `Object`

#coding_region_cdna_end ⇒ `Object`

#coding_region_cdna_start ⇒ `Object`

#coding_region_genomic_end ⇒ `Object`

#coding_region_genomic_start ⇒ `Object`

#display_label ⇒ `Object` Also known as: display_name, label, name

#exon_for_cdna_position(pos) ⇒ `Object`

#exon_for_genomic_position(pos) ⇒ `Object`

#five_prime_utr_seq ⇒ `Object`

#genomic2cdna(pos) ⇒ `Integer`

#genomic2cds(pos) ⇒ `Integer`

#genomic2pep(pos) ⇒ `Integer`

#introns ⇒ `Array<Intron>`

#pep2genomic(pos) ⇒ `Integer`

#protein_seq ⇒ `Object`

#seq ⇒ `Object`

#stable_id ⇒ `String`

#three_prime_utr_seq ⇒ `Object`