Class: Bio::Sequence::NA

Inherits:

Object
String
Bio::Sequence::NA

Includes:: Common

Defined in:: lib/bio/sequence/na.rb,
lib/bio/sequence/compat.rb,
lib/bio/shell/plugin/midi.rb

Overview

– TODO

- add "Ohno" style
- add a accessor to drum pattern
- add a new feature to select music style (pop, trans, ryukyu, ...)
- what is the base?

Direct Known Subclasses

RestrictionEnzyme::SingleStrand

Defined Under Namespace

Classes: MidiTrack

Class Method Summary collapse

.randomize(*arg, &block) ⇒ Object

Generate a new random sequence with the given frequency of bases.

Instance Method Summary collapse

#at_content ⇒ Object

Calculate the ratio of AT / ATGC bases.
#at_skew ⇒ Object

Calculate the ratio of (A - T) / (A + T) bases.
#codon_usage ⇒ Object

Returns counts of each codon in the sequence in a hash.
#cut_with_enzyme(*args) ⇒ Object (also: #cut_with_enzymes)

Example:.
#dna ⇒ Object

Returns a new sequence object with any ‘u’ bases changed to ‘t’.
#dna! ⇒ Object

Changes any ‘u’ bases in the original sequence to ‘t’.
#forward_complement ⇒ Object

Returns a new complementary sequence object (without reversing).
#forward_complement! ⇒ Object

Converts the current sequence into its complement (without reversing).
#gc_content ⇒ Object

Calculate the ratio of GC / ATGC bases.
#gc_percent ⇒ Object

Calculate the ratio of GC / ATGC bases as a percentage rounded to the nearest whole number.
#gc_skew ⇒ Object

Calculate the ratio of (G - C) / (G + C) bases.
#illegal_bases ⇒ Object

Returns an alphabetically sorted array of any non-standard bases (other than ‘atgcu’).
#initialize(str) ⇒ NA constructor

Generate an nucleic acid sequence object from a string.
#molecular_weight ⇒ Object

Estimate molecular weight (using the values from BioPerl’s SeqStats.pm module).
#names ⇒ Object

Generate the list of the names of each nucleotide along with the sequence (full name).
#pikachu ⇒ Object

:nodoc:.
#reverse_complement ⇒ Object (also: #complement)

Returns a new sequence object with the reverse complement sequence to the original.
#reverse_complement! ⇒ Object (also: #complement!)

Converts the original sequence into its reverse complement.
#rna ⇒ Object

Returns a new sequence object with any ‘t’ bases changed to ‘u’.
#rna! ⇒ Object

Changes any ‘t’ bases in the original sequence to ‘u’.
#splicing(position) ⇒ Object

Alias of Bio::Sequence::Common splice method, documented there.
#to_midi(style = {}, drum = true) ⇒ Object

style: Hash of :tempo, :scale, :tones scale: C C# D D# E F F# G G# A A# B 0 1 2 3 4 5 6 7 8 9 10 11 tones: Hash of :prog, :base, :range – tone, vol? or len?, octaves drum: true (with rhythm part), false (without rhythm part).
#to_re ⇒ Object

Create a ruby regular expression instance (Regexp) .
#translate(frame = 1, table = 1, unknown = 'X') ⇒ Object

Translate into an amino acid sequence.

Methods included from Common

#+, #<<, #composition, #concat, #normalize!, #randomize, #seq, #splice, #subseq, #to_fasta, #to_s, #total, #window_search

Methods inherited from String

#fill, #fold, #skip, #step, #to_aaseq, #to_naseq

Constructor Details

#initialize(str) ⇒ `NA`

Generate an nucleic acid sequence object from a string.

s = Bio::Sequence::NA.new("aagcttggaccgttgaagt")

or maybe (if you have an nucleic acid sequence in a file)

s = Bio::Sequence:NA.new(File.open('dna.txt').read)

Nucleic Acid sequences are always all lowercase in bioruby

s = Bio::Sequence::NA.new("AAGcTtGG")
puts s                                  #=> "aagcttgg"

Whitespace is stripped from the sequence

seq = Bio::Sequence::NA.new("atg\nggg\ttt\r  gc")
puts s                                  #=> "atggggttgc"

Arguments:

(required) str: String

Returns: Bio::Sequence::NA object

# File 'lib/bio/sequence/na.rb', line 75

def initialize(str)
  super
  self.downcase!
  self.tr!(" \t\n\r",'')
end

Class Method Details

.randomize(*arg, &block) ⇒ `Object`

Generate a new random sequence with the given frequency of bases. The sequence length is determined by their cumulative sum. (See also Bio::Sequence::Common#randomize which creates a new randomized sequence object using the base composition of an existing sequence instance).

counts = {'a'=>1,'c'=>2,'g'=>3,'t'=>4}
puts Bio::Sequence::NA.randomize(counts)  #=> "ggcttgttac" (for example)

You may also feed the output of randomize into a block

actual_counts = {'a'=>0, 'c'=>0, 'g'=>0, 't'=>0}
Bio::Sequence::NA.randomize(counts) {|x| actual_counts[x] += 1}
actual_counts                     #=> {"a"=>1, "c"=>2, "g"=>3, "t"=>4}

Arguments:

(optional) hash: Hash object

Returns: Bio::Sequence::NA object



82
83
84

# File 'lib/bio/sequence/compat.rb', line 82

def self.randomize(*arg, &block)
  self.new('').randomize(*arg, &block)
end

Instance Method Details

#at_content ⇒ `Object`

Calculate the ratio of AT / ATGC bases. U is regarded as T.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.at_content                       #=> 4/9
puts s.at_content.to_f                  #=> 0.444444444444444

In older Ruby versions, Float is always returned.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.at_content                       #=> 0.444444444444444

Note that “u” is regarded as “t”. If there are no ATGC bases in the sequence, 0.0 is returned.

Returns: Rational or Float

# File 'lib/bio/sequence/na.rb', line 346

def at_content
  count = self.composition
  at = count['a'] + count['t'] + count['u']
  gc = count['g'] + count['c']
  total = at + gc
  return 0.0 if total == 0
  return at.quo(total)
end

#at_skew ⇒ `Object`

Calculate the ratio of (A - T) / (A + T) bases. U is regarded as T.

s = Bio::Sequence::NA.new('atgttgttgttc')
puts s.at_skew                          #=> (-3/4)
puts s.at_skew.to_f                     #=> -0.75

In older Ruby versions, Float is always returned.

s = Bio::Sequence::NA.new('atgttgttgttc')
puts s.at_skew                          #=> -0.75

Note that “u” is regarded as “t”. If there are no AT bases in the sequence, 0.0 is returned.

Returns: Rational or Float

# File 'lib/bio/sequence/na.rb', line 395

def at_skew
  count = self.composition
  a = count['a']
  t = count['t'] + count['u']
  at = a + t
  return 0.0 if at == 0
  return (a - t).quo(at)
end

#codon_usage ⇒ `Object`

Returns counts of each codon in the sequence in a hash.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.codon_usage                #=> {"gcg"=>1, "tga"=>1, "atg"=>1}

This method does not validate codons! Any three letter group is a ‘codon’. So,

s = Bio::Sequence::NA.new('atggNNtga')
puts s.codon_usage                #=> {"tga"=>1, "gnn"=>1, "atg"=>1}

seq = Bio::Sequence::NA.new('atgg--tga')
puts s.codon_usage                #=> {"tga"=>1, "g--"=>1, "atg"=>1}

Also, there is no option to work in any frame other than the first.

Returns: Hash object

# File 'lib/bio/sequence/na.rb', line 273

def codon_usage
  hash = Hash.new(0)
  self.window_search(3, 3) do |codon|
    hash[codon] += 1
  end
  return hash
end

#cut_with_enzyme(*args) ⇒ `Object` Also known as: cut_with_enzymes

Example:

seq = Bio::Sequence::NA.new('gaattc')
cuts = seq.cut_with_enzyme('EcoRI')

seq = Bio::Sequence::NA.new('gaattc')
cuts = seq.cut_with_enzyme('g^aattc')

See Bio::RestrictionEnzyme::Analysis.cut



530
531
532

# File 'lib/bio/sequence/na.rb', line 530

def cut_with_enzyme(*args)
  Bio::RestrictionEnzyme::Analysis.cut(self, *args)
end

#dna ⇒ `Object`

Returns a new sequence object with any ‘u’ bases changed to ‘t’. The original sequence is not modified.

s = Bio::Sequence::NA.new('augc')
puts s.dna                              #=> 'atgc'
puts s                                  #=> 'augc'

Returns: new Bio::Sequence::NA object



474
475
476

# File 'lib/bio/sequence/na.rb', line 474

def dna
  self.tr('u', 't')
end

#dna! ⇒ `Object`

Changes any ‘u’ bases in the original sequence to ‘t’. The original sequence is modified.

s = Bio::Sequence::NA.new('augc')
puts s.dna!                             #=> 'atgc'
puts s                                  #=> 'atgc'

Returns: current Bio::Sequence::NA object (modified)



486
487
488

# File 'lib/bio/sequence/na.rb', line 486

def dna!
  self.tr!('u', 't')
end

#forward_complement ⇒ `Object`

Returns a new complementary sequence object (without reversing). The original sequence object is not modified.

s = Bio::Sequence::NA.new('atgc')
puts s.forward_complement               #=> 'tacg'
puts s                                  #=> 'atgc'

Returns: new Bio::Sequence::NA object

# File 'lib/bio/sequence/na.rb', line 100

def forward_complement
  s = self.class.new(self)
  s.forward_complement!
  s
end

#forward_complement! ⇒ `Object`

Converts the current sequence into its complement (without reversing). The original sequence object is modified.

seq = Bio::Sequence::NA.new('atgc')
puts s.forward_complement!              #=> 'tacg'
puts s                                  #=> 'tacg'

Returns: current Bio::Sequence::NA object (modified)

# File 'lib/bio/sequence/na.rb', line 114

def forward_complement!
  if self.rna?
    self.tr!('augcrymkdhvbswn', 'uacgyrkmhdbvswn')
  else
    self.tr!('atgcrymkdhvbswn', 'tacgyrkmhdbvswn')
  end
  self
end

#gc_content ⇒ `Object`

Calculate the ratio of GC / ATGC bases. U is regarded as T.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_content                       #=> (5/9)
puts s.gc_content.to_f                  #=> 0.5555555555555556

In older Ruby versions, Float is always returned.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_content                       #=> 0.555555555555556

Note that “u” is regarded as “t”. If there are no ATGC bases in the sequence, 0.0 is returned.

Returns: Rational or Float

# File 'lib/bio/sequence/na.rb', line 321

def gc_content
  count = self.composition
  at = count['a'] + count['t'] + count['u']
  gc = count['g'] + count['c']
  total = at + gc
  return 0.0 if total == 0
  return gc.quo(total)
end

#gc_percent ⇒ `Object`

Calculate the ratio of GC / ATGC bases as a percentage rounded to the nearest whole number. U is regarded as T.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_percent                       #=> 55

Note that this method only returns an integer value. When more digits after decimal points are needed, use gc_content and sprintf like below:

s = Bio::Sequence::NA.new('atggcgtga')
puts sprintf("%3.2f", s.gc_content * 100)  #=> "55.56"

Returns: Fixnum

# File 'lib/bio/sequence/na.rb', line 296

def gc_percent
  count = self.composition
  at = count['a'] + count['t'] + count['u']
  gc = count['g'] + count['c']
  return 0 if at + gc == 0
  gc = 100 * gc / (at + gc)
  return gc
end

#gc_skew ⇒ `Object`

Calculate the ratio of (G - C) / (G + C) bases.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_skew                          #=> 3/5
puts s.gc_skew.to_f                     #=> 0.6

In older Ruby versions, Float is always returned.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_skew                          #=> 0.6

If there are no GC bases in the sequence, 0.0 is returned.

Returns: Rational or Float

# File 'lib/bio/sequence/na.rb', line 370

def gc_skew
  count = self.composition
  g = count['g']
  c = count['c']
  gc = g + c
  return 0.0 if gc == 0
  return (g - c).quo(gc)
end

#illegal_bases ⇒ `Object`

Returns an alphabetically sorted array of any non-standard bases (other than ‘atgcu’).

s = Bio::Sequence::NA.new('atgStgQccR')
puts s.illegal_bases                    #=> ["q", "r", "s"]

Returns: Array object



411
412
413

# File 'lib/bio/sequence/na.rb', line 411

def illegal_bases
  self.scan(/[^atgcu]/).sort.uniq
end

#molecular_weight ⇒ `Object`

Estimate molecular weight (using the values from BioPerl’s SeqStats.pm module).

s = Bio::Sequence::NA.new('atggcgtga')
puts s.molecular_weight                 #=> 2841.00708

RNA and DNA do not have the same molecular weights,

s = Bio::Sequence::NA.new('auggcguga')
puts s.molecular_weight                 #=> 2956.94708

Returns: Float object

# File 'lib/bio/sequence/na.rb', line 427

def molecular_weight
  if self.rna?
    Bio::NucleicAcid.weight(self, true)
  else
    Bio::NucleicAcid.weight(self)
  end
end

#names ⇒ `Object`

Generate the list of the names of each nucleotide along with the sequence (full name). Names used in bioruby are found in the Bio::AminoAcid::NAMES hash.

s = Bio::Sequence::NA.new('atg')
puts s.names                    #=> ["Adenine", "Thymine", "Guanine"]

Returns: Array object

# File 'lib/bio/sequence/na.rb', line 458

def names
  array = []
  self.each_byte do |x|
    array.push(Bio::NucleicAcid.names[x.chr.upcase])
  end
  return array
end

#pikachu ⇒ `Object`

:nodoc:



86
87
88

# File 'lib/bio/sequence/compat.rb', line 86

def pikachu #:nodoc:
  self.dna.tr("atgc", "pika") # joke, of course :-)
end

#reverse_complement ⇒ `Object` Also known as: complement

Returns a new sequence object with the reverse complement sequence to the original. The original sequence is not modified.

s = Bio::Sequence::NA.new('atgc')
puts s.reverse_complement               #=> 'gcat'
puts s                                  #=> 'atgc'

Returns: new Bio::Sequence::NA object

# File 'lib/bio/sequence/na.rb', line 131

def reverse_complement
  s = self.class.new(self)
  s.reverse_complement!
  s
end

#reverse_complement! ⇒ `Object` Also known as: complement!

Converts the original sequence into its reverse complement.

The original sequence is modified.

s = Bio::Sequence::NA.new('atgc')
puts s.reverse_complement               #=> 'gcat'
puts s                                  #=> 'gcat'

Returns: current Bio::Sequence::NA object (modified)

# File 'lib/bio/sequence/na.rb', line 145

def reverse_complement!
  self.reverse!
  self.forward_complement!
end

#rna ⇒ `Object`

Returns a new sequence object with any ‘t’ bases changed to ‘u’. The original sequence is not modified.

s = Bio::Sequence::NA.new('atgc')
puts s.dna                              #=> 'augc'  
puts s                                  #=> 'atgc'

Returns: new Bio::Sequence::NA object



498
499
500

# File 'lib/bio/sequence/na.rb', line 498

def rna
  self.tr('t', 'u')
end

#rna! ⇒ `Object`

Changes any ‘t’ bases in the original sequence to ‘u’. The original sequence is modified.

s = Bio::Sequence::NA.new('atgc')
puts s.dna!                             #=> 'augc'
puts s                                  #=> 'augc'

Returns: current Bio::Sequence::NA object (modified)



510
511
512

# File 'lib/bio/sequence/na.rb', line 510

def rna!
  self.tr!('t', 'u')
end

#splicing(position) ⇒ `Object`

Alias of Bio::Sequence::Common splice method, documented there.

# File 'lib/bio/sequence/na.rb', line 82

def splicing(position) #:nodoc:
  mRNA = super
  if mRNA.rna?
    mRNA.tr!('t', 'u')
  else
    mRNA.tr!('u', 't')
  end
  mRNA
end

#to_midi(style = {}, drum = true) ⇒ `Object`

style:

Hash of :tempo, :scale, :tones

scale:

C  C# D  D# E  F  F# G  G# A  A#  B
0  1  2  3  4  5  6  7  8  9  10  11

tones:

Hash of :prog, :base, :range -- tone, vol? or len?, octaves

drum:

true (with rhythm part), false (without rhythm part)

# File 'lib/bio/shell/plugin/midi.rb', line 351

def to_midi(style = {}, drum = true)
  default = MidiTrack::Styles["Ichinose"]
  if style.is_a?(String)
    style = MidiTrack::Styles[style] || default
  end
  tempo = style[:tempo] || default[:tempo]
  scale = style[:scale] || default[:scale]
  tones = style[:tones] || default[:tones]

  track = []

  tones.each_with_index do |tone, i|
    ch = i
    ch += 1 if i >= 9         # skip rythm track
    track.push MidiTrack.new(ch, tone[:prog], tone[:base], tone[:range], scale)
  end

  if drum
    rhythm = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
    track.push(MidiTrack.new(9, 0, 35, 2, rhythm))
  end

  cur = 0
  window_search(4) do |s|
    track[cur % track.length].push(s)
    cur += 1
  end

  track.each do |t|
    t.push_silent(12)
  end

  ans = track[0].header(track.length, tempo)
  track.each do |t|
    ans += t.encode
  end
  return ans
end

#to_re ⇒ `Object`

Create a ruby regular expression instance (Regexp)

s = Bio::Sequence::NA.new('atggcgtga')
puts s.to_re                            #=> /atggcgtga/

Returns: Regexp object

# File 'lib/bio/sequence/na.rb', line 442

def to_re
  if self.rna?
    Bio::NucleicAcid.to_re(self.dna, true)
  else
    Bio::NucleicAcid.to_re(self)
  end
end

#translate(frame = 1, table = 1, unknown = 'X') ⇒ `Object`

Translate into an amino acid sequence.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.translate                        #=> "MA*"

By default, translate starts in reading frame position 1, but you can start in either 2 or 3 as well,

puts s.translate(2)                     #=> "WR"
puts s.translate(3)                     #=> "GV"

You may also translate the reverse complement in one step by using frame values of -1, -2, and -3 (or 4, 5, and 6)

puts s.translate(-1)                    #=> "SRH"
puts s.translate(4)                     #=> "SRH"
puts s.reverse_complement.translate(1)  #=> "SRH"

The default codon table in the translate function is the Standard Eukaryotic codon table. The translate function takes either a number or a Bio::CodonTable object for its table argument. The available tables are (NCBI):

1. "Standard (Eukaryote)"
2. "Vertebrate Mitochondrial"
3. "Yeast Mitochondorial"
4. "Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma"
5. "Invertebrate Mitochondrial"
6. "Ciliate Macronuclear and Dasycladacean"
9. "Echinoderm Mitochondrial"
10. "Euplotid Nuclear"
11. "Bacteria"
12. "Alternative Yeast Nuclear"
13. "Ascidian Mitochondrial"
14. "Flatworm Mitochondrial"
15. "Blepharisma Macronuclear"
16. "Chlorophycean Mitochondrial"
21. "Trematode Mitochondrial"
22. "Scenedesmus obliquus mitochondrial"
23. "Thraustochytrium Mitochondrial"

If you are using anything other than the default table, you must specify frame in the translate method call,

puts s.translate                #=> "MA*"  (using defaults)
puts s.translate(1,1)           #=> "MA*"  (same as above, but explicit)
puts s.translate(1,2)           #=> "MAW"  (different codon table)

and using a Bio::CodonTable instance in the translate method call,

mt_table = Bio::CodonTable[2]
puts s.translate(1, mt_table)           #=> "MAW"

By default, any invalid or unknown codons (as could happen if the sequence contains ambiguities) will be represented by ‘X’ in the translated sequence. You may change this to any character of your choice.

s = Bio::Sequence::NA.new('atgcNNtga')
puts s.translate                        #=> "MX*"
puts s.translate(1,1,'9')               #=> "M9*"

The translate method considers gaps to be unknown characters and treats them as such (i.e. does not collapse sequences prior to translation), so

s = Bio::Sequence::NA.new('atgc--tga')
puts s.translate                        #=> "MX*"

Arguments:

(optional) frame: one of 1,2,3,4,5,6,-1,-2,-3 (default 1)
(optional) table: Fixnum in range 1,23 or Bio::CodonTable object (default 1)
(optional) unknown: Character (default ‘X’)

Returns: Bio::Sequence::AA object

# File 'lib/bio/sequence/na.rb', line 232

def translate(frame = 1, table = 1, unknown = 'X')
  if table.is_a?(Bio::CodonTable)
    ct = table
  else
    ct = Bio::CodonTable[table]
  end
  naseq = self.dna
  case frame
  when 1, 2, 3
    from = frame - 1
  when 4, 5, 6
    from = frame - 4
    naseq.complement!
  when -1, -2, -3
    from = -1 - frame
    naseq.complement!
  else
    from = 0
  end
  nalen = naseq.length - from
  nalen -= nalen % 3
  aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or unknown}
  return Bio::Sequence::AA.new(aaseq)
end

Class: Bio::Sequence::NA

Overview

Direct Known Subclasses

Defined Under Namespace

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Common

Methods inherited from String

Constructor Details

#initialize(str) ⇒ NA

Class Method Details

.randomize(*arg, &block) ⇒ Object

Instance Method Details

#at_content ⇒ Object

#at_skew ⇒ Object

#codon_usage ⇒ Object

#cut_with_enzyme(*args) ⇒ Object Also known as: cut_with_enzymes

#dna ⇒ Object

#dna! ⇒ Object

#forward_complement ⇒ Object

#forward_complement! ⇒ Object

#gc_content ⇒ Object

#gc_percent ⇒ Object

#gc_skew ⇒ Object

#illegal_bases ⇒ Object

#molecular_weight ⇒ Object

#names ⇒ Object

#pikachu ⇒ Object

#reverse_complement ⇒ Object Also known as: complement

#reverse_complement! ⇒ Object Also known as: complement!

#rna ⇒ Object

#rna! ⇒ Object

#splicing(position) ⇒ Object

#to_midi(style = {}, drum = true) ⇒ Object

#to_re ⇒ Object

#translate(frame = 1, table = 1, unknown = 'X') ⇒ Object

#initialize(str) ⇒ `NA`

.randomize(*arg, &block) ⇒ `Object`

#at_content ⇒ `Object`

#at_skew ⇒ `Object`

#codon_usage ⇒ `Object`

#cut_with_enzyme(*args) ⇒ `Object` Also known as: cut_with_enzymes

#dna ⇒ `Object`

#dna! ⇒ `Object`

#forward_complement ⇒ `Object`

#forward_complement! ⇒ `Object`

#gc_content ⇒ `Object`

#gc_percent ⇒ `Object`

#gc_skew ⇒ `Object`

#illegal_bases ⇒ `Object`

#molecular_weight ⇒ `Object`

#names ⇒ `Object`

#pikachu ⇒ `Object`

#reverse_complement ⇒ `Object` Also known as: complement

#reverse_complement! ⇒ `Object` Also known as: complement!

#rna ⇒ `Object`

#rna! ⇒ `Object`

#splicing(position) ⇒ `Object`

#to_midi(style = {}, drum = true) ⇒ `Object`

#to_re ⇒ `Object`

#translate(frame = 1, table = 1, unknown = 'X') ⇒ `Object`