Class: Sequence
- Inherits:
-
String
- Object
- String
- Sequence
- Defined in:
- lib/parse_fasta/sequence.rb
Overview
Provide some methods for dealing with common tasks regarding nucleotide sequences.
Instance Method Summary collapse
-
#base_counts(count_ambiguous_bases = nil) ⇒ Hash
Returns a map of base counts.
-
#base_frequencies(count_ambiguous_bases = nil) ⇒ Hash
Returns a map of base frequencies.
-
#gc ⇒ 0, Float
Calculates GC content.
Instance Method Details
#base_counts(count_ambiguous_bases = nil) ⇒ Hash
Returns a map of base counts
This method will check if the sequence is DNA or RNA and return a count map appropriate for each. If a truthy argument is given, the count of ambiguous bases will be returned as well.
If a sequence has both T and U present, will warn the user and keep going. Will return a map with counts of both, however.
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
# File 'lib/parse_fasta/sequence.rb', line 76 def base_counts(count_ambiguous_bases=nil) s = self.downcase t = s.count('t') u = s.count('u') counts = { a: s.count('a'), c: s.count('c'), g: s.count('g') } if t > 0 && u == 0 counts[:t] = t elsif t == 0 && u > 0 counts[:u] = u elsif t > 0 && u > 0 warn('ERROR: A sequence contains both T and U') counts[:t], counts[:u] = t, u end counts[:n] = s.count('n') if count_ambiguous_bases counts end |
#base_frequencies(count_ambiguous_bases = nil) ⇒ Hash
Returns a map of base frequencies
Counts bases with the base_counts method, then divides each count by the total bases counted to give frequency for each base. If a truthy argument is given, ambiguous bases will be included in the total and their frequency reported. Can discern between DNA and RNA.
If default or falsy argument is given, ambiguous bases will not be counted in the total base count and their frequency will not be given.
116 117 118 119 120 121 122 |
# File 'lib/parse_fasta/sequence.rb', line 116 def base_frequencies(count_ambiguous_bases=nil) base_counts = self.base_counts(count_ambiguous_bases) total_bases = base_counts.values.reduce(:+).to_f base_freqs = base_counts.map { |base, count| [base, count/total_bases] }.flatten Hash[*base_freqs] end |
#gc ⇒ 0, Float
Calculates GC content
Calculates GC content by dividing count of G + C divided by count of G + C + T + A + U. If there are both T’s and U’s in the Sequence, things will get weird, but then again, that wouldn’t happen, now would it! Ambiguous bases are ignored similar to BioRuby.
41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/parse_fasta/sequence.rb', line 41 def gc s = self.downcase c = s.count('c') g = s.count('g') t = s.count('t') a = s.count('a') u = s.count('u') return 0 if c + g + t + a + u == 0 return (c + g) / (c + g + t + a + u).to_f end |