Class: Mspire::Digester
- Inherits:
-
Object
- Object
- Mspire::Digester
- Defined in:
- lib/mspire/digester.rb
Overview
A Digester splits a protein sequence into peptides at specified sites.
trypsin = Mspire::Digester[:trypsin]
trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG')
# => ['MIVIGR', 'SIVHPYITNEYEPFAAEK', 'QQILSIMAG']
With 1 missed cleavage:
trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => ['MIVIGR','MIVIGRSIVHPYITNEYEPFAAEK','SIVHPYITNEYEPFAAEK',
# 'SIVHPYITNEYEPFAAEKQQILSIMAG', 'QQILSIMAG']
Return the start and end sites of digestion:
trypsin.site_digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [[0,6],[0,24],[6,24],[6,33],[24,33]]
Constant Summary collapse
- MULTILINE_WHITESPACE =
/\s*/m
Instance Attribute Summary collapse
-
#cleave_str ⇒ Object
readonly
A string of residues at which cleavage occurs.
-
#cterm_cleavage ⇒ Object
readonly
True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.
-
#cterm_exception ⇒ Object
readonly
A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).
-
#name ⇒ Object
readonly
The name of the digester.
Class Method Summary collapse
-
.[](enzyme_name) ⇒ Object
takes the name of the enzyme in any case (symbol or string) and accesses the constant (returns nil if none found).
-
.mascot_parse(str) ⇒ Object
Utility method to parse a mascot enzyme configuration string (tab separated) into a Digester.
Instance Method Summary collapse
-
#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ Object
Returns digestion sites in sequence, as determined by the cleave_regexp boundaries.
-
#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object
Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites.
-
#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester
constructor
A new instance of Digester.
-
#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset, &block) ⇒ Object
Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages.
Constructor Details
#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester
Returns a new instance of Digester.
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
# File 'lib/mspire/digester.rb', line 41 def initialize(name, cleave_str, cterm_exception=nil, cterm_cleavage=true) regexp = [] 0.upto(cleave_str.length - 1) {|i| regexp << cleave_str[i, 1] } @name = name @cleave_str = cleave_str @cleave_regexp = Regexp.new(regexp.join('|')) @cterm_exception = case when cterm_exception == nil || cterm_exception.empty? then nil when cterm_exception.length == 1 then cterm_exception[0] else raise ArgumentError, "cterm exceptions must be a single residue: #{cterm_exception}" end @cterm_cleavage = cterm_cleavage @scanner = StringScanner.new('') end |
Instance Attribute Details
#cleave_str ⇒ Object (readonly)
A string of residues at which cleavage occurs
28 29 30 |
# File 'lib/mspire/digester.rb', line 28 def cleave_str @cleave_str end |
#cterm_cleavage ⇒ Object (readonly)
True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.
37 38 39 |
# File 'lib/mspire/digester.rb', line 37 def cterm_cleavage @cterm_cleavage end |
#cterm_exception ⇒ Object (readonly)
A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).
32 33 34 |
# File 'lib/mspire/digester.rb', line 32 def cterm_exception @cterm_exception end |
#name ⇒ Object (readonly)
The name of the digester
25 26 27 |
# File 'lib/mspire/digester.rb', line 25 def name @name end |
Class Method Details
.[](enzyme_name) ⇒ Object
takes the name of the enzyme in any case (symbol or string) and accesses the constant (returns nil if none found)
185 186 187 |
# File 'lib/mspire/digester.rb', line 185 def [](enzyme_name) ENZYMES[ enzyme_name.to_s.downcase.gsub(/\W+/,'_').to_sym ] end |
.mascot_parse(str) ⇒ Object
Utility method to parse a mascot enzyme configuration string (tab separated) into a Digester.
191 192 193 194 195 196 197 198 199 200 |
# File 'lib/mspire/digester.rb', line 191 def mascot_parse(str) # :nodoc: name, sense, cleave_str, cterm_exception, independent, semi_specific = str.split(/ *\t */) cterm_cleavage = case sense when 'C-Term' then true when 'N-Term' then false else raise ArgumentError, "unknown sense: #{sense}" end new(name, cleave_str, cterm_exception, cterm_cleavage) end |
Instance Method Details
#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ Object
Returns digestion sites in sequence, as determined by the cleave_regexp boundaries. The digestion sites correspond to the positions where a peptide begins and ends, such that [n, (n+1) - n] corresponds to the [index, length] for peptide n.
d = Digester.new('Trypsin', 'KR', 'P')
seq = "AARGGR"
sites = d.cleavage_sites(seq) # => [0, 3, 6]
seq[sites[0], sites[0+1] - sites[0]] # => "AAR"
seq[sites[1], sites[1+1] - sites[1]] # => "GGR"
Trailing whitespace is included in the fragment.
seq = "AAR \n GGR"
sites = d.cleavage_sites(seq) # => [0, 8, 11]
seq[sites[0], sites[0+1] - sites[0]] # => "AAR \n "
seq[sites[1], sites[1+1] - sites[1]] # => "GGR"
The digested section of sequence may be specified using offset and length.
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
# File 'lib/mspire/digester.rb', line 81 def cleavage_sites(seq, offset=0, length=seq.length-offset) return [0, 1] if seq.size == 1 # adding exceptions is lame--algorithm should just work adjustment = cterm_cleavage ? 0 : 1 limit = offset + length positions = [offset] pos = scan(seq, offset, limit) do |pos| positions << (pos - adjustment) end # add the final position if (pos < limit) || (positions.length == 1) positions << limit end # adding exceptions is lame.. this code probably needs to be # refactored (corrected). if !cterm_cleavage && pos == limit positions << limit end positions end |
#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object
Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.
126 127 128 129 130 |
# File 'lib/mspire/digester.rb', line 126 def digest(seq, max_misses=0, offset=0, length=seq.length-offset) site_digest(seq, max_misses, offset, length).map do |s, e| seq[s, e-s] end end |
#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset, &block) ⇒ Object
Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.
Each [start_index, end_index] pair is yielded to the block, if given, and the collected results are returned.
111 112 113 114 115 116 117 118 119 120 |
# File 'lib/mspire/digester.rb', line 111 def site_digest(seq, max_misses=0, offset=0, length=seq.length-offset, &block) # :yields: start_index, end_index frag_sites = cleavage_sites(seq, offset, length) (frag_sites.length, max_misses, 1) do |start_index, end_index| start_index = frag_sites[start_index] end_index = frag_sites[end_index] block ? block.call(start_index, end_index) : [start_index, end_index] end end |