Class: Ensembl::Core::Transcript

Inherits:
DBConnection show all
Includes:
Sliceable
Defined in:
lib/bio-ensembl/core/transcript.rb

Overview

The Transcript class provides an interface to the transcript table. This table contains mappings of transcripts for a Gene to a SeqRegion.

This class uses ActiveRecord to access data in the Ensembl database. See the general documentation of the Ensembl module for more information on what this means and what methods are available.

This class includes the mixin Sliceable, which means that it is mapped to a SeqRegion object and a Slice can be created for objects of this class. See Sliceable and Slice for more information.

Examples:

#TODO

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Sliceable

#length, #project, #slice, #start, #stop, #strand, #transform

Methods inherited from DBConnection

connect, ensemblgenomes_connect

Methods inherited from DBRegistry::Base

generic_connect, get_info, get_name_from_db

Class Method Details

.find_all_by_stable_id(stable_id) ⇒ Object

The Transcript#find_all_by_stable_id class method returns an array of transcripts with the given stable_id. If none were found, an empty array is returned.



142
143
144
145
146
147
148
149
150
# File 'lib/bio-ensembl/core/transcript.rb', line 142

def self.find_all_by_stable_id(stable_id)
	answer = Array.new
  transcript_stable_id_objects = Ensembl::Core::TranscriptStableId.find_all_by_stable_id(stable_id)
  transcript_stable_id_objects.each do |transcript_stable_id_object|
    answer.push(Ensembl::Core::Transcript.find(transcript_stable_id_object.transcript_id))
  end

	return answer
end

.find_by_stable_id(stable_id) ⇒ Object

The Transcript#find_by_stable_id class method fetches a Transcript object based on its stable ID (i.e. the “ENST” accession number). If the name is not found, it returns nil.



154
155
156
157
158
159
160
161
# File 'lib/bio-ensembl/core/transcript.rb', line 154

def self.find_by_stable_id(stable_id)
  all = self.find_all_by_stable_id(stable_id)
  if all.length == 0
    return nil
  else
    return all[0]
  end
end

Instance Method Details

#cdna2genomic(pos) ⇒ Integer

The Transcript#cdna2genomic method converts cDNA coordinates to genomic coordinates for this transcript.

Parameters:

  • pos (Integer)

    Position on the cDNA

Returns:

  • (Integer)

    Position on the genomic DNA



318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
# File 'lib/bio-ensembl/core/transcript.rb', line 318

def cdna2genomic(pos)
  #FIXME: Still have to check for when pos is outside of scope of cDNA.
  # Identify the exon we're looking at.
  exon_with_target = self.exon_for_cdna_position(pos)
  
  accumulated_position = 0
  ex = self.exons.sort_by {|e| e.seq_region_start}
  ex.reverse! if self.strand == -1
  ex.each do |exon|  
    if exon == exon_with_target
      length_to_be_taken_from_exon = pos - (accumulated_position + 1)
      if self.strand == -1
        return exon.seq_region_end - length_to_be_taken_from_exon
      else
        return exon.seq_region_start + length_to_be_taken_from_exon
      end
    else
      accumulated_position += exon.length 
    end
  end
end

#cds2genomic(pos) ⇒ Integer

The Transcript#cds2genomic method converts CDS coordinates to genomic coordinates for this transcript.

Parameters:

  • pos (Integer)

    Position on the CDS

Returns:

  • (Integer)

    Position on the genomic DNA



345
346
347
# File 'lib/bio-ensembl/core/transcript.rb', line 345

def cds2genomic(pos)
  return self.cdna2genomic(pos + self.coding_region_cdna_start)
end

#cds_seqObject

The Transcript#cds_seq method returns the coding sequence of the transcript, i.e. the concatenated sequence of all exons minus the UTRs.



189
190
191
192
193
# File 'lib/bio-ensembl/core/transcript.rb', line 189

def cds_seq
  cds_length = self.coding_region_cdna_end - self.coding_region_cdna_start + 1
  
  return self.seq[(self.coding_region_cdna_start - 1), cds_length]
end

#coding_region_cdna_endObject

The Transcript#coding_region_cdna_end returns the stop position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_end, the CDS start position is always at the border of the 3’UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ”right” of the CDS stop position.



270
271
272
273
274
275
276
277
278
279
280
281
# File 'lib/bio-ensembl/core/transcript.rb', line 270

def coding_region_cdna_end
  answer = 0
  
  self.exons.each do |exon|
    if exon == self.translation.end_exon
      answer += self.translation.seq_end
      return answer
    else
      answer += exon.length
    end
  end
end

#coding_region_cdna_startObject

The Transcript#coding_region_cdna_start returns the start position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_start, the CDS start position is always at the border of the 5’UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ”right” of the CDS stop position.



250
251
252
253
254
255
256
257
258
259
260
261
262
# File 'lib/bio-ensembl/core/transcript.rb', line 250

def coding_region_cdna_start
  answer = 0
  
  self.exons.each do |exon|
    if exon == self.translation.start_exon
      answer += self.translation.seq_start
      return answer
    else
      answer += exon.length
    end
  end
  
end

#coding_region_genomic_endObject

The Transcript#coding_region_genomic_end returns the stop position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_end, the CDS stop position is always ”right” of the start position. So for transcripts on the reverse strand, the CDS stop position is at the border of the 5’UTR instead of the 3’UTR.



235
236
237
238
239
240
241
242
# File 'lib/bio-ensembl/core/transcript.rb', line 235

def coding_region_genomic_end
  strand = self.translation.start_exon.seq_region_strand
  if strand == 1
    return self.translation.end_exon.seq_region_start + ( self.translation.seq_end - 1 )
  else
    return self.translation.start_exon.seq_region_end - ( self.translation.seq_start - 1 )
  end
end

#coding_region_genomic_startObject

The Transcript#coding_region_genomic_start returns the start position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_start, the CDS start position is always ”left” of the end position. So for transcripts on the reverse strand, the CDS start position is at the border of the 3’UTR instead of the 5’UTR.



220
221
222
223
224
225
226
227
# File 'lib/bio-ensembl/core/transcript.rb', line 220

def coding_region_genomic_start
  strand = self.translation.start_exon.seq_region_strand
  if strand == 1
    return self.translation.start_exon.seq_region_start + ( self.translation.seq_start - 1 )
  else
    return self.translation.end_exon.seq_region_end - ( self.translation.seq_end - 1 )
  end
end

#display_labelObject Also known as: display_name, label, name

The Transcript#display_label method returns the default name of the transcript.



132
133
134
# File 'lib/bio-ensembl/core/transcript.rb', line 132

def display_label
  return Xref.find(self.display_xref_id).display_label
end

#exon_for_cdna_position(pos) ⇒ Object

The Transcript#exon_for_position identifies the exon that covers a given position of the cDNA.

Raises:

  • (RuntimeError)


300
301
302
303
304
305
306
307
308
309
310
311
# File 'lib/bio-ensembl/core/transcript.rb', line 300

def exon_for_cdna_position(pos)
  # FIXME: Still have to check for when pos is outside of scope of cDNA.
  accumulated_exon_length = 0
  
  self.exons.each do |exon|
    accumulated_exon_length += exon.length
    if accumulated_exon_length > pos
      return exon
    end
  end
  raise RuntimeError, "Position outside of cDNA scope"
end

#exon_for_genomic_position(pos) ⇒ Object

The Transcript#exon_for_position identifies the exon that covers a given genomic position. Returns the exon object, or nil if in intron.



286
287
288
289
290
291
292
293
294
295
296
# File 'lib/bio-ensembl/core/transcript.rb', line 286

def exon_for_genomic_position(pos)
  if pos < self.seq_region_start or pos > self.seq_region_end
    raise RuntimeError, "Position has to be within transcript"
  end
  self.exons.each do |exon|
    if exon.start <= pos and exon.stop >= pos
      return exon
    end
  end
  return nil
end

#five_prime_utr_seqObject

The Transcript#five_prime_utr_seq method returns the sequence of the 5’UTR of the transcript.



197
198
199
# File 'lib/bio-ensembl/core/transcript.rb', line 197

def five_prime_utr_seq
  return self.seq[0, self.coding_region_cdna_start - 1]
end

#genomic2cdna(pos) ⇒ Integer

The Transcript#genomic2cdna method converts genomic coordinates to cDNA coordinates for this transcript.

Parameters:

  • pos (Integer)

    Position on the genomic DNA

Returns:

  • (Integer)

    Position on the cDNA



363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
# File 'lib/bio-ensembl/core/transcript.rb', line 363

def genomic2cdna(pos)
  #FIXME: Still have to check for when pos is outside of scope of cDNA.
  # Identify the exon we're looking at.
  exon_with_target = self.exon_for_genomic_position(pos)
  
  accumulated_position = 0
  ex = self.exons.sort_by {|e| e.seq_region_start}
  ex.reverse! if self.strand == -1
  ex.each do |exon|
    if exon.stable_id == exon_with_target.stable_id
      if self.strand == 1
        accumulated_position += ( pos - exon.start) +1
      else
        accumulated_position += ( exon.stop - pos ) +1
      end  
      return accumulated_position
    else
        accumulated_position += exon.length 
    end
  end
  return RuntimeError, "Position outside of cDNA scope"
end

#genomic2cds(pos) ⇒ Integer

The Transcript#genomic2cds method converts genomic coordinates to CDS coordinates for this transcript.

Parameters:

  • pos (Integer)

    Position on the genomic DNA

Returns:

  • (Integer)

    Position on the CDS



391
392
393
# File 'lib/bio-ensembl/core/transcript.rb', line 391

def genomic2cds(pos)
  return self.genomic2cdna(pos) - self.coding_region_cdna_start
end

#genomic2pep(pos) ⇒ Integer

The Transcript#genomic2pep method converts genomic coordinates to peptide coordinates for this transcript.

Arguments:

  • pos

    position on the chromosome (required)

Returns

Parameters:

  • pos (Integer)

    Base position on the genomic DNA

Returns:

  • (Integer)

    Aminoacid position in the protein

Raises:

  • (NotImplementedError)


403
404
405
# File 'lib/bio-ensembl/core/transcript.rb', line 403

def genomic2pep(pos)
  raise NotImplementedError
end

#intronsArray<Intron>

The Transcript#introns methods returns the introns for this transcript

Returns:

  • (Array<Intron>)

    Sorted array of Intron objects



111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/bio-ensembl/core/transcript.rb', line 111

def introns
  if @introns.nil?
    @introns = Array.new
    if self.exons.length > 1
      self.exons.each_with_index do |exon, index|
        next if index == 0
        @introns.push(Intron.new(self.exons[index - 1], exon))
      end
    end
  end
  return @introns
end

#pep2genomic(pos) ⇒ Integer

The Transcript#pep2genomic method converts peptide coordinates to genomic coordinates for this transcript.

Parameters:

  • pos (Integer)

    Aminoacid position on the protein

Returns:

  • (Integer)

    Position on the genomic DNA

Raises:

  • (NotImplementedError)


354
355
356
# File 'lib/bio-ensembl/core/transcript.rb', line 354

def pep2genomic(pos)
  raise NotImplementedError
end

#protein_seqObject

The Transcript#protein_seq method returns the sequence of the protein of the transcript.



209
210
211
# File 'lib/bio-ensembl/core/transcript.rb', line 209

def protein_seq
  return Bio::Sequence::NA.new(self.cds_seq).translate.seq
end

#seqObject

The Transcript#seq method returns the full sequence of all concatenated exons.



177
178
179
180
181
182
183
184
185
# File 'lib/bio-ensembl/core/transcript.rb', line 177

def seq
  if @seq.nil?
    @seq = ''
    self.exons.each do |exon|
      @seq += exon.seq
    end
  end
  return @seq
end

#stable_idString

The Transcript#stable_id method returns the stable ID of the transcript.

Returns:

  • (String)

    Ensembl stable ID of the transcript.



127
128
129
# File 'lib/bio-ensembl/core/transcript.rb', line 127

def stable_id
	return self.transcript_stable_id.stable_id
end

#three_prime_utr_seqObject

The Transcript#three_prime_utr_seq method returns the sequence of the 3’UTR of the transcript.



203
204
205
# File 'lib/bio-ensembl/core/transcript.rb', line 203

def three_prime_utr_seq
  return self.seq[self.coding_region_cdna_end..-1]
end