Class: Bio::Sequence::NA
- Includes:
- Common
- Defined in:
- lib/bio/sequence/na.rb,
lib/bio/sequence/compat.rb,
lib/bio/shell/plugin/midi.rb
Overview
– TODO
- add "Ohno" style
- add a accessor to drum pattern
- add a new feature to select music style (pop, trans, ryukyu, ...)
- what is the base?
++
Direct Known Subclasses
Defined Under Namespace
Classes: MidiTrack
Class Method Summary collapse
-
.randomize(*arg, &block) ⇒ Object
Generate a new random sequence with the given frequency of bases.
Instance Method Summary collapse
-
#at_content ⇒ Object
Calculate the ratio of AT / ATGC bases.
-
#at_skew ⇒ Object
Calculate the ratio of (A - T) / (A + T) bases.
-
#codon_usage ⇒ Object
Returns counts of each codon in the sequence in a hash.
-
#cut_with_enzyme(*args) ⇒ Object
(also: #cut_with_enzymes)
Example:.
-
#dna ⇒ Object
Returns a new sequence object with any ‘u’ bases changed to ‘t’.
-
#dna! ⇒ Object
Changes any ‘u’ bases in the original sequence to ‘t’.
-
#forward_complement ⇒ Object
Returns a new complementary sequence object (without reversing).
-
#forward_complement! ⇒ Object
Converts the current sequence into its complement (without reversing).
-
#gc_content ⇒ Object
Calculate the ratio of GC / ATGC bases.
-
#gc_percent ⇒ Object
Calculate the ratio of GC / ATGC bases as a percentage rounded to the nearest whole number.
-
#gc_skew ⇒ Object
Calculate the ratio of (G - C) / (G + C) bases.
-
#illegal_bases ⇒ Object
Returns an alphabetically sorted array of any non-standard bases (other than ‘atgcu’).
-
#initialize(str) ⇒ NA
constructor
Generate an nucleic acid sequence object from a string.
-
#molecular_weight ⇒ Object
Estimate molecular weight (using the values from BioPerl’s SeqStats.pm module).
-
#names ⇒ Object
Generate the list of the names of each nucleotide along with the sequence (full name).
-
#pikachu ⇒ Object
:nodoc:.
-
#reverse_complement ⇒ Object
(also: #complement)
Returns a new sequence object with the reverse complement sequence to the original.
-
#reverse_complement! ⇒ Object
(also: #complement!)
Converts the original sequence into its reverse complement.
-
#rna ⇒ Object
Returns a new sequence object with any ‘t’ bases changed to ‘u’.
-
#rna! ⇒ Object
Changes any ‘t’ bases in the original sequence to ‘u’.
-
#splicing(position) ⇒ Object
Alias of Bio::Sequence::Common splice method, documented there.
-
#to_midi(style = {}, drum = true) ⇒ Object
style: Hash of :tempo, :scale, :tones scale: C C# D D# E F F# G G# A A# B 0 1 2 3 4 5 6 7 8 9 10 11 tones: Hash of :prog, :base, :range – tone, vol? or len?, octaves drum: true (with rhythm part), false (without rhythm part).
-
#to_re ⇒ Object
Create a ruby regular expression instance (Regexp) .
-
#translate(frame = 1, table = 1, unknown = 'X') ⇒ Object
Translate into an amino acid sequence.
Methods included from Common
#+, #<<, #composition, #concat, #normalize!, #randomize, #seq, #splice, #subseq, #to_fasta, #to_s, #total, #window_search
Methods inherited from String
#fill, #fold, #skip, #step, #to_aaseq, #to_naseq
Constructor Details
#initialize(str) ⇒ NA
Generate an nucleic acid sequence object from a string.
s = Bio::Sequence::NA.new("aagcttggaccgttgaagt")
or maybe (if you have an nucleic acid sequence in a file)
s = Bio::Sequence:NA.new(File.open('dna.txt').read)
Nucleic Acid sequences are always all lowercase in bioruby
s = Bio::Sequence::NA.new("AAGcTtGG")
puts s #=> "aagcttgg"
Whitespace is stripped from the sequence
seq = Bio::Sequence::NA.new("atg\nggg\ttt\r gc")
puts s #=> "atggggttgc"
Arguments:
-
(required) str: String
- Returns
-
Bio::Sequence::NA object
77 78 79 80 81 |
# File 'lib/bio/sequence/na.rb', line 77 def initialize(str) super self.downcase! self.tr!(" \t\n\r",'') end |
Class Method Details
.randomize(*arg, &block) ⇒ Object
Generate a new random sequence with the given frequency of bases. The sequence length is determined by their cumulative sum. (See also Bio::Sequence::Common#randomize which creates a new randomized sequence object using the base composition of an existing sequence instance).
counts = {'a'=>1,'c'=>2,'g'=>3,'t'=>4}
puts Bio::Sequence::NA.randomize(counts) #=> "ggcttgttac" (for example)
You may also feed the output of randomize into a block
actual_counts = {'a'=>0, 'c'=>0, 'g'=>0, 't'=>0}
Bio::Sequence::NA.randomize(counts) {|x| actual_counts[x] += 1}
actual_counts #=> {"a"=>1, "c"=>2, "g"=>3, "t"=>4}
Arguments:
-
(optional) hash: Hash object
- Returns
-
Bio::Sequence::NA object
87 88 89 |
# File 'lib/bio/sequence/compat.rb', line 87 def self.randomize(*arg, &block) self.new('').randomize(*arg, &block) end |
Instance Method Details
#at_content ⇒ Object
319 320 321 322 323 324 325 |
# File 'lib/bio/sequence/na.rb', line 319 def at_content count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] return 0.0 if at + gc == 0 return at.quo(at + gc) end |
#at_skew ⇒ Object
347 348 349 350 351 352 353 |
# File 'lib/bio/sequence/na.rb', line 347 def at_skew count = self.composition a = count['a'] t = count['t'] + count['u'] return 0.0 if a + t == 0 return (a - t).quo(a + t) end |
#codon_usage ⇒ Object
Returns counts of each codon in the sequence in a hash.
s = Bio::Sequence::NA.new('atggcgtga')
puts s.codon_usage #=> {"gcg"=>1, "tga"=>1, "atg"=>1}
This method does not validate codons! Any three letter group is a ‘codon’. So,
s = Bio::Sequence::NA.new('atggNNtga')
puts s.codon_usage #=> {"tga"=>1, "gnn"=>1, "atg"=>1}
seq = Bio::Sequence::NA.new('atgg--tga')
puts s.codon_usage #=> {"tga"=>1, "g--"=>1, "atg"=>1}
Also, there is no option to work in any frame other than the first.
- Returns
-
Hash object
275 276 277 278 279 280 281 |
# File 'lib/bio/sequence/na.rb', line 275 def codon_usage hash = Hash.new(0) self.window_search(3, 3) do |codon| hash[codon] += 1 end return hash end |
#cut_with_enzyme(*args) ⇒ Object Also known as: cut_with_enzymes
481 482 483 |
# File 'lib/bio/sequence/na.rb', line 481 def cut_with_enzyme(*args) Bio::RestrictionEnzyme::Analysis.cut(self, *args) end |
#dna! ⇒ Object
437 438 439 |
# File 'lib/bio/sequence/na.rb', line 437 def dna! self.tr!('u', 't') end |
#forward_complement ⇒ Object
102 103 104 105 106 |
# File 'lib/bio/sequence/na.rb', line 102 def forward_complement s = self.class.new(self) s.forward_complement! s end |
#forward_complement! ⇒ Object
116 117 118 119 120 121 122 123 |
# File 'lib/bio/sequence/na.rb', line 116 def forward_complement! if self.rna? self.tr!('augcrymkdhvbswn', 'uacgyrkmhdbvswn') else self.tr!('atgcrymkdhvbswn', 'tacgyrkmhdbvswn') end self end |
#gc_content ⇒ Object
305 306 307 308 309 310 311 |
# File 'lib/bio/sequence/na.rb', line 305 def gc_content count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] return 0.0 if at + gc == 0 return gc.quo(at + gc) end |
#gc_percent ⇒ Object
290 291 292 293 294 295 296 297 |
# File 'lib/bio/sequence/na.rb', line 290 def gc_percent count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] return 0 if at + gc == 0 gc = 100 * gc / (at + gc) return gc end |
#gc_skew ⇒ Object
333 334 335 336 337 338 339 |
# File 'lib/bio/sequence/na.rb', line 333 def gc_skew count = self.composition g = count['g'] c = count['c'] return 0.0 if g + c == 0 return (g - c).quo(g + c) end |
#illegal_bases ⇒ Object
362 363 364 |
# File 'lib/bio/sequence/na.rb', line 362 def illegal_bases self.scan(/[^atgcu]/).sort.uniq end |
#molecular_weight ⇒ Object
Estimate molecular weight (using the values from BioPerl’s SeqStats.pm module).
s = Bio::Sequence::NA.new('atggcgtga')
puts s.molecular_weight #=> 2841.00708
RNA and DNA do not have the same molecular weights,
s = Bio::Sequence::NA.new('auggcguga')
puts s.molecular_weight #=> 2956.94708
- Returns
-
Float object
378 379 380 381 382 383 384 |
# File 'lib/bio/sequence/na.rb', line 378 def molecular_weight if self.rna? Bio::NucleicAcid.weight(self, true) else Bio::NucleicAcid.weight(self) end end |
#names ⇒ Object
409 410 411 412 413 414 415 |
# File 'lib/bio/sequence/na.rb', line 409 def names array = [] self.each_byte do |x| array.push(Bio::NucleicAcid.names[x.chr.upcase]) end return array end |
#pikachu ⇒ Object
:nodoc:
91 92 93 |
# File 'lib/bio/sequence/compat.rb', line 91 def pikachu #:nodoc: self.dna.tr("atgc", "pika") # joke, of course :-) end |
#reverse_complement ⇒ Object Also known as: complement
133 134 135 136 137 |
# File 'lib/bio/sequence/na.rb', line 133 def reverse_complement s = self.class.new(self) s.reverse_complement! s end |
#reverse_complement! ⇒ Object Also known as: complement!
147 148 149 150 |
# File 'lib/bio/sequence/na.rb', line 147 def reverse_complement! self.reverse! self.forward_complement! end |
#rna! ⇒ Object
461 462 463 |
# File 'lib/bio/sequence/na.rb', line 461 def rna! self.tr!('t', 'u') end |
#splicing(position) ⇒ Object
Alias of Bio::Sequence::Common splice method, documented there.
84 85 86 87 88 89 90 91 92 |
# File 'lib/bio/sequence/na.rb', line 84 def splicing(position) #:nodoc: mRNA = super if mRNA.rna? mRNA.tr!('t', 'u') else mRNA.tr!('u', 't') end mRNA end |
#to_midi(style = {}, drum = true) ⇒ Object
style:
Hash of :tempo, :scale, :tones
scale:
C C# D D# E F F# G G# A A# B
0 1 2 3 4 5 6 7 8 9 10 11
tones:
Hash of :prog, :base, :range -- tone, vol? or len?, octaves
drum:
true (with rhythm part), false (without rhythm part)
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 |
# File 'lib/bio/shell/plugin/midi.rb', line 351 def to_midi(style = {}, drum = true) default = MidiTrack::Styles["Ichinose"] if style.is_a?(String) style = MidiTrack::Styles[style] || default end tempo = style[:tempo] || default[:tempo] scale = style[:scale] || default[:scale] tones = style[:tones] || default[:tones] track = [] tones.each_with_index do |tone, i| ch = i ch += 1 if i >= 9 # skip rythm track track.push MidiTrack.new(ch, tone[:prog], tone[:base], tone[:range], scale) end if drum rhythm = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] track.push(MidiTrack.new(9, 0, 35, 2, rhythm)) end cur = 0 window_search(4) do |s| track[cur % track.length].push(s) cur += 1 end track.each do |t| t.push_silent(12) end ans = track[0].header(track.length, tempo) track.each do |t| ans += t.encode end return ans end |
#to_re ⇒ Object
393 394 395 396 397 398 399 |
# File 'lib/bio/sequence/na.rb', line 393 def to_re if self.rna? Bio::NucleicAcid.to_re(self.dna, true) else Bio::NucleicAcid.to_re(self) end end |
#translate(frame = 1, table = 1, unknown = 'X') ⇒ Object
Translate into an amino acid sequence.
s = Bio::Sequence::NA.new('atggcgtga')
puts s.translate #=> "MA*"
By default, translate starts in reading frame position 1, but you can start in either 2 or 3 as well,
puts s.translate(2) #=> "WR"
puts s.translate(3) #=> "GV"
You may also translate the reverse complement in one step by using frame values of -1, -2, and -3 (or 4, 5, and 6)
puts s.translate(-1) #=> "SRH"
puts s.translate(4) #=> "SRH"
puts s.reverse_complement.translate(1) #=> "SRH"
The default codon table in the translate function is the Standard Eukaryotic codon table. The translate function takes either a number or a Bio::CodonTable object for its table argument. The available tables are (NCBI):
1. "Standard (Eukaryote)"
2. "Vertebrate Mitochondrial"
3. "Yeast Mitochondorial"
4. "Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma"
5. "Invertebrate Mitochondrial"
6. "Ciliate Macronuclear and Dasycladacean"
9. "Echinoderm Mitochondrial"
10. "Euplotid Nuclear"
11. "Bacteria"
12. "Alternative Yeast Nuclear"
13. "Ascidian Mitochondrial"
14. "Flatworm Mitochondrial"
15. "Blepharisma Macronuclear"
16. "Chlorophycean Mitochondrial"
21. "Trematode Mitochondrial"
22. "Scenedesmus obliquus mitochondrial"
23. "Thraustochytrium Mitochondrial"
If you are using anything other than the default table, you must specify frame in the translate method call,
puts s.translate #=> "MA*" (using defaults)
puts s.translate(1,1) #=> "MA*" (same as above, but explicit)
puts s.translate(1,2) #=> "MAW" (different codon table)
and using a Bio::CodonTable instance in the translate method call,
mt_table = Bio::CodonTable[2]
puts s.translate(1, mt_table) #=> "MAW"
By default, any invalid or unknown codons (as could happen if the sequence contains ambiguities) will be represented by ‘X’ in the translated sequence. You may change this to any character of your choice.
s = Bio::Sequence::NA.new('atgcNNtga')
puts s.translate #=> "MX*"
puts s.translate(1,1,'9') #=> "M9*"
The translate method considers gaps to be unknown characters and treats them as such (i.e. does not collapse sequences prior to translation), so
s = Bio::Sequence::NA.new('atgc--tga')
puts s.translate #=> "MX*"
Arguments:
-
(optional) frame: one of 1,2,3,4,5,6,-1,-2,-3 (default 1)
-
(optional) table: Fixnum in range 1,23 or Bio::CodonTable object (default 1)
-
(optional) unknown: Character (default ‘X’)
- Returns
-
Bio::Sequence::AA object
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 |
# File 'lib/bio/sequence/na.rb', line 234 def translate(frame = 1, table = 1, unknown = 'X') if table.is_a?(Bio::CodonTable) ct = table else ct = Bio::CodonTable[table] end naseq = self.dna case frame when 1, 2, 3 from = frame - 1 when 4, 5, 6 from = frame - 4 naseq.complement! when -1, -2, -3 from = -1 - frame naseq.complement! else from = 0 end nalen = naseq.length - from nalen -= nalen % 3 aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or unknown} return Bio::Sequence::AA.new(aaseq) end |