Class: Bio::Sequence::NA
- Includes:
- Common
- Defined in:
- lib/bio/sequence/na.rb,
lib/bio/sequence/compat.rb,
lib/bio/shell/plugin/midi.rb
Overview
– TODO
- add "Ohno" style
- add a accessor to drum pattern
- add a new feature to select music style (pop, trans, ryukyu, ...)
- what is the base?
++
Direct Known Subclasses
Defined Under Namespace
Classes: MidiTrack
Class Method Summary collapse
-
.randomize(*arg, &block) ⇒ Object
Generate a new random sequence with the given frequency of bases.
Instance Method Summary collapse
-
#at_content ⇒ Object
Calculate the ratio of AT / ATGC bases.
-
#at_skew ⇒ Object
Calculate the ratio of (A - T) / (A + T) bases.
-
#codon_usage ⇒ Object
Returns counts of each codon in the sequence in a hash.
-
#cut_with_enzyme(*args) ⇒ Object
(also: #cut_with_enzymes)
Example:.
-
#dna ⇒ Object
Returns a new sequence object with any ‘u’ bases changed to ‘t’.
-
#dna! ⇒ Object
Changes any ‘u’ bases in the original sequence to ‘t’.
-
#forward_complement ⇒ Object
Returns a new complementary sequence object (without reversing).
-
#forward_complement! ⇒ Object
Converts the current sequence into its complement (without reversing).
-
#gc_content ⇒ Object
Calculate the ratio of GC / ATGC bases.
-
#gc_percent ⇒ Object
Calculate the ratio of GC / ATGC bases as a percentage rounded to the nearest whole number.
-
#gc_skew ⇒ Object
Calculate the ratio of (G - C) / (G + C) bases.
-
#illegal_bases ⇒ Object
Returns an alphabetically sorted array of any non-standard bases (other than ‘atgcu’).
-
#initialize(str) ⇒ NA
constructor
Generate an nucleic acid sequence object from a string.
-
#molecular_weight ⇒ Object
Estimate molecular weight (using the values from BioPerl’s SeqStats.pm module).
-
#names ⇒ Object
Generate the list of the names of each nucleotide along with the sequence (full name).
-
#pikachu ⇒ Object
:nodoc:.
-
#reverse_complement ⇒ Object
(also: #complement)
Returns a new sequence object with the reverse complement sequence to the original.
-
#reverse_complement! ⇒ Object
(also: #complement!)
Converts the original sequence into its reverse complement.
-
#rna ⇒ Object
Returns a new sequence object with any ‘t’ bases changed to ‘u’.
-
#rna! ⇒ Object
Changes any ‘t’ bases in the original sequence to ‘u’.
-
#splicing(position) ⇒ Object
Alias of Bio::Sequence::Common splice method, documented there.
-
#to_midi(style = {}, drum = true) ⇒ Object
style: Hash of :tempo, :scale, :tones scale: C C# D D# E F F# G G# A A# B 0 1 2 3 4 5 6 7 8 9 10 11 tones: Hash of :prog, :base, :range – tone, vol? or len?, octaves drum: true (with rhythm part), false (without rhythm part).
-
#to_re ⇒ Object
Create a ruby regular expression instance (Regexp) .
-
#translate(frame = 1, table = 1, unknown = 'X') ⇒ Object
Translate into an amino acid sequence.
Methods included from Common
#+, #<<, #composition, #concat, #normalize!, #randomize, #seq, #splice, #subseq, #to_fasta, #to_s, #total, #window_search
Methods inherited from String
#fill, #fold, #skip, #step, #to_aaseq, #to_naseq
Constructor Details
#initialize(str) ⇒ NA
Generate an nucleic acid sequence object from a string.
s = Bio::Sequence::NA.new("aagcttggaccgttgaagt")
or maybe (if you have an nucleic acid sequence in a file)
s = Bio::Sequence:NA.new(File.open('dna.txt').read)
Nucleic Acid sequences are always all lowercase in bioruby
s = Bio::Sequence::NA.new("AAGcTtGG")
puts s #=> "aagcttgg"
Whitespace is stripped from the sequence
seq = Bio::Sequence::NA.new("atg\nggg\ttt\r gc")
puts s #=> "atggggttgc"
Arguments:
-
(required) str: String
- Returns
-
Bio::Sequence::NA object
75 76 77 78 79 |
# File 'lib/bio/sequence/na.rb', line 75 def initialize(str) super self.downcase! self.tr!(" \t\n\r",'') end |
Class Method Details
.randomize(*arg, &block) ⇒ Object
Generate a new random sequence with the given frequency of bases. The sequence length is determined by their cumulative sum. (See also Bio::Sequence::Common#randomize which creates a new randomized sequence object using the base composition of an existing sequence instance).
counts = {'a'=>1,'c'=>2,'g'=>3,'t'=>4}
puts Bio::Sequence::NA.randomize(counts) #=> "ggcttgttac" (for example)
You may also feed the output of randomize into a block
actual_counts = {'a'=>0, 'c'=>0, 'g'=>0, 't'=>0}
Bio::Sequence::NA.randomize(counts) {|x| actual_counts[x] += 1}
actual_counts #=> {"a"=>1, "c"=>2, "g"=>3, "t"=>4}
Arguments:
-
(optional) hash: Hash object
- Returns
-
Bio::Sequence::NA object
82 83 84 |
# File 'lib/bio/sequence/compat.rb', line 82 def self.randomize(*arg, &block) self.new('').randomize(*arg, &block) end |
Instance Method Details
#at_content ⇒ Object
317 318 319 320 321 322 323 |
# File 'lib/bio/sequence/na.rb', line 317 def at_content count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] return 0.0 if at + gc == 0 return at.quo(at + gc) end |
#at_skew ⇒ Object
345 346 347 348 349 350 351 |
# File 'lib/bio/sequence/na.rb', line 345 def at_skew count = self.composition a = count['a'] t = count['t'] + count['u'] return 0.0 if a + t == 0 return (a - t).quo(a + t) end |
#codon_usage ⇒ Object
Returns counts of each codon in the sequence in a hash.
s = Bio::Sequence::NA.new('atggcgtga')
puts s.codon_usage #=> {"gcg"=>1, "tga"=>1, "atg"=>1}
This method does not validate codons! Any three letter group is a ‘codon’. So,
s = Bio::Sequence::NA.new('atggNNtga')
puts s.codon_usage #=> {"tga"=>1, "gnn"=>1, "atg"=>1}
seq = Bio::Sequence::NA.new('atgg--tga')
puts s.codon_usage #=> {"tga"=>1, "g--"=>1, "atg"=>1}
Also, there is no option to work in any frame other than the first.
- Returns
-
Hash object
273 274 275 276 277 278 279 |
# File 'lib/bio/sequence/na.rb', line 273 def codon_usage hash = Hash.new(0) self.window_search(3, 3) do |codon| hash[codon] += 1 end return hash end |
#cut_with_enzyme(*args) ⇒ Object Also known as: cut_with_enzymes
479 480 481 |
# File 'lib/bio/sequence/na.rb', line 479 def cut_with_enzyme(*args) Bio::RestrictionEnzyme::Analysis.cut(self, *args) end |
#dna! ⇒ Object
435 436 437 |
# File 'lib/bio/sequence/na.rb', line 435 def dna! self.tr!('u', 't') end |
#forward_complement ⇒ Object
100 101 102 103 104 |
# File 'lib/bio/sequence/na.rb', line 100 def forward_complement s = self.class.new(self) s.forward_complement! s end |
#forward_complement! ⇒ Object
114 115 116 117 118 119 120 121 |
# File 'lib/bio/sequence/na.rb', line 114 def forward_complement! if self.rna? self.tr!('augcrymkdhvbswn', 'uacgyrkmhdbvswn') else self.tr!('atgcrymkdhvbswn', 'tacgyrkmhdbvswn') end self end |
#gc_content ⇒ Object
303 304 305 306 307 308 309 |
# File 'lib/bio/sequence/na.rb', line 303 def gc_content count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] return 0.0 if at + gc == 0 return gc.quo(at + gc) end |
#gc_percent ⇒ Object
288 289 290 291 292 293 294 295 |
# File 'lib/bio/sequence/na.rb', line 288 def gc_percent count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] return 0 if at + gc == 0 gc = 100 * gc / (at + gc) return gc end |
#gc_skew ⇒ Object
331 332 333 334 335 336 337 |
# File 'lib/bio/sequence/na.rb', line 331 def gc_skew count = self.composition g = count['g'] c = count['c'] return 0.0 if g + c == 0 return (g - c).quo(g + c) end |
#illegal_bases ⇒ Object
360 361 362 |
# File 'lib/bio/sequence/na.rb', line 360 def illegal_bases self.scan(/[^atgcu]/).sort.uniq end |
#molecular_weight ⇒ Object
Estimate molecular weight (using the values from BioPerl’s SeqStats.pm module).
s = Bio::Sequence::NA.new('atggcgtga')
puts s.molecular_weight #=> 2841.00708
RNA and DNA do not have the same molecular weights,
s = Bio::Sequence::NA.new('auggcguga')
puts s.molecular_weight #=> 2956.94708
- Returns
-
Float object
376 377 378 379 380 381 382 |
# File 'lib/bio/sequence/na.rb', line 376 def molecular_weight if self.rna? Bio::NucleicAcid.weight(self, true) else Bio::NucleicAcid.weight(self) end end |
#names ⇒ Object
407 408 409 410 411 412 413 |
# File 'lib/bio/sequence/na.rb', line 407 def names array = [] self.each_byte do |x| array.push(Bio::NucleicAcid.names[x.chr.upcase]) end return array end |
#pikachu ⇒ Object
:nodoc:
86 87 88 |
# File 'lib/bio/sequence/compat.rb', line 86 def pikachu #:nodoc: self.dna.tr("atgc", "pika") # joke, of course :-) end |
#reverse_complement ⇒ Object Also known as: complement
131 132 133 134 135 |
# File 'lib/bio/sequence/na.rb', line 131 def reverse_complement s = self.class.new(self) s.reverse_complement! s end |
#reverse_complement! ⇒ Object Also known as: complement!
145 146 147 148 |
# File 'lib/bio/sequence/na.rb', line 145 def reverse_complement! self.reverse! self.forward_complement! end |
#rna! ⇒ Object
459 460 461 |
# File 'lib/bio/sequence/na.rb', line 459 def rna! self.tr!('t', 'u') end |
#splicing(position) ⇒ Object
Alias of Bio::Sequence::Common splice method, documented there.
82 83 84 85 86 87 88 89 90 |
# File 'lib/bio/sequence/na.rb', line 82 def splicing(position) #:nodoc: mRNA = super if mRNA.rna? mRNA.tr!('t', 'u') else mRNA.tr!('u', 't') end mRNA end |
#to_midi(style = {}, drum = true) ⇒ Object
style:
Hash of :tempo, :scale, :tones
scale:
C C# D D# E F F# G G# A A# B
0 1 2 3 4 5 6 7 8 9 10 11
tones:
Hash of :prog, :base, :range -- tone, vol? or len?, octaves
drum:
true (with rhythm part), false (without rhythm part)
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 |
# File 'lib/bio/shell/plugin/midi.rb', line 351 def to_midi(style = {}, drum = true) default = MidiTrack::Styles["Ichinose"] if style.is_a?(String) style = MidiTrack::Styles[style] || default end tempo = style[:tempo] || default[:tempo] scale = style[:scale] || default[:scale] tones = style[:tones] || default[:tones] track = [] tones.each_with_index do |tone, i| ch = i ch += 1 if i >= 9 # skip rythm track track.push MidiTrack.new(ch, tone[:prog], tone[:base], tone[:range], scale) end if drum rhythm = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] track.push(MidiTrack.new(9, 0, 35, 2, rhythm)) end cur = 0 window_search(4) do |s| track[cur % track.length].push(s) cur += 1 end track.each do |t| t.push_silent(12) end ans = track[0].header(track.length, tempo) track.each do |t| ans += t.encode end return ans end |
#to_re ⇒ Object
391 392 393 394 395 396 397 |
# File 'lib/bio/sequence/na.rb', line 391 def to_re if self.rna? Bio::NucleicAcid.to_re(self.dna, true) else Bio::NucleicAcid.to_re(self) end end |
#translate(frame = 1, table = 1, unknown = 'X') ⇒ Object
Translate into an amino acid sequence.
s = Bio::Sequence::NA.new('atggcgtga')
puts s.translate #=> "MA*"
By default, translate starts in reading frame position 1, but you can start in either 2 or 3 as well,
puts s.translate(2) #=> "WR"
puts s.translate(3) #=> "GV"
You may also translate the reverse complement in one step by using frame values of -1, -2, and -3 (or 4, 5, and 6)
puts s.translate(-1) #=> "SRH"
puts s.translate(4) #=> "SRH"
puts s.reverse_complement.translate(1) #=> "SRH"
The default codon table in the translate function is the Standard Eukaryotic codon table. The translate function takes either a number or a Bio::CodonTable object for its table argument. The available tables are (NCBI):
1. "Standard (Eukaryote)"
2. "Vertebrate Mitochondrial"
3. "Yeast Mitochondorial"
4. "Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma"
5. "Invertebrate Mitochondrial"
6. "Ciliate Macronuclear and Dasycladacean"
9. "Echinoderm Mitochondrial"
10. "Euplotid Nuclear"
11. "Bacteria"
12. "Alternative Yeast Nuclear"
13. "Ascidian Mitochondrial"
14. "Flatworm Mitochondrial"
15. "Blepharisma Macronuclear"
16. "Chlorophycean Mitochondrial"
21. "Trematode Mitochondrial"
22. "Scenedesmus obliquus mitochondrial"
23. "Thraustochytrium Mitochondrial"
If you are using anything other than the default table, you must specify frame in the translate method call,
puts s.translate #=> "MA*" (using defaults)
puts s.translate(1,1) #=> "MA*" (same as above, but explicit)
puts s.translate(1,2) #=> "MAW" (different codon table)
and using a Bio::CodonTable instance in the translate method call,
mt_table = Bio::CodonTable[2]
puts s.translate(1, mt_table) #=> "MAW"
By default, any invalid or unknown codons (as could happen if the sequence contains ambiguities) will be represented by ‘X’ in the translated sequence. You may change this to any character of your choice.
s = Bio::Sequence::NA.new('atgcNNtga')
puts s.translate #=> "MX*"
puts s.translate(1,1,'9') #=> "M9*"
The translate method considers gaps to be unknown characters and treats them as such (i.e. does not collapse sequences prior to translation), so
s = Bio::Sequence::NA.new('atgc--tga')
puts s.translate #=> "MX*"
Arguments:
-
(optional) frame: one of 1,2,3,4,5,6,-1,-2,-3 (default 1)
-
(optional) table: Fixnum in range 1,23 or Bio::CodonTable object (default 1)
-
(optional) unknown: Character (default ‘X’)
- Returns
-
Bio::Sequence::AA object
232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 |
# File 'lib/bio/sequence/na.rb', line 232 def translate(frame = 1, table = 1, unknown = 'X') if table.is_a?(Bio::CodonTable) ct = table else ct = Bio::CodonTable[table] end naseq = self.dna case frame when 1, 2, 3 from = frame - 1 when 4, 5, 6 from = frame - 4 naseq.complement! when -1, -2, -3 from = -1 - frame naseq.complement! else from = 0 end nalen = naseq.length - from nalen -= nalen % 3 aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or unknown} return Bio::Sequence::AA.new(aaseq) end |