Class: Mspire::Digester

Inherits:

Object

Object
Mspire::Digester

show all

Defined in:: lib/mspire/digester.rb

Overview

A Digester splits a protein sequence into peptides at specified sites.

trypsin = Mspire::Digester[:trypsin]

trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG')
# => ['MIVIGR', 'SIVHPYITNEYEPFAAEK', 'QQILSIMAG']

With 1 missed cleavage:

trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => ['MIVIGR','MIVIGRSIVHPYITNEYEPFAAEK','SIVHPYITNEYEPFAAEK', 
#     'SIVHPYITNEYEPFAAEKQQILSIMAG', 'QQILSIMAG']

Return the start and end sites of digestion:

trypsin.site_digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [[0,6],[0,24],[6,24],[6,33],[24,33]]

Constant Summary collapse

MULTILINE_WHITESPACE =

/\s*/m

Instance Attribute Summary collapse

#cleave_str ⇒ Object readonly

A string of residues at which cleavage occurs.
#cterm_cleavage ⇒ Object readonly

True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.
#cterm_exception ⇒ Object readonly

A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).
#name ⇒ Object readonly

The name of the digester.

Class Method Summary collapse

.[](enzyme_name) ⇒ Object

takes the name of the enzyme in any case (symbol or string) and accesses the constant (returns nil if none found).
.mascot_parse(str) ⇒ Object

Utility method to parse a mascot enzyme configuration string (tab separated) into a Digester.

Instance Method Summary collapse

#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ Object

Returns digestion sites in sequence, as determined by the cleave_regexp boundaries.
#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object

Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites.
#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester constructor

A new instance of Digester.
#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset, &block) ⇒ Object

Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages.

Constructor Details

#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ `Digester`

Returns a new instance of Digester.

# File 'lib/mspire/digester.rb', line 41

def initialize(name, cleave_str, cterm_exception=nil, cterm_cleavage=true)
  regexp = []
  0.upto(cleave_str.length - 1) {|i| regexp << cleave_str[i, 1] }

  @name = name
  @cleave_str = cleave_str
  @cleave_regexp = Regexp.new(regexp.join('|'))
  @cterm_exception = case 
                     when cterm_exception == nil || cterm_exception.empty? then nil
                     when cterm_exception.length == 1 then cterm_exception[0]
                     else
                       raise ArgumentError, "cterm exceptions must be a single residue: #{cterm_exception}"
                     end

  @cterm_cleavage = cterm_cleavage
  @scanner = StringScanner.new('')
end

Instance Attribute Details

#cleave_str ⇒ `Object` (readonly)

A string of residues at which cleavage occurs



28
29
30

# File 'lib/mspire/digester.rb', line 28

def cleave_str
  @cleave_str
end

#cterm_cleavage ⇒ `Object` (readonly)

True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.



37
38
39

# File 'lib/mspire/digester.rb', line 37

def cterm_cleavage
  @cterm_cleavage
end

#cterm_exception ⇒ `Object` (readonly)

A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).



32
33
34

# File 'lib/mspire/digester.rb', line 32

def cterm_exception
  @cterm_exception
end

#name ⇒ `Object` (readonly)

The name of the digester



25
26
27

# File 'lib/mspire/digester.rb', line 25

def name
  @name
end

Class Method Details

.[](enzyme_name) ⇒ `Object`

takes the name of the enzyme in any case (symbol or string) and accesses the constant (returns nil if none found)



185
186
187

# File 'lib/mspire/digester.rb', line 185

def [](enzyme_name)
  ENZYMES[ enzyme_name.to_s.downcase.gsub(/\W+/,'_').to_sym ]
end

.mascot_parse(str) ⇒ `Object`

Utility method to parse a mascot enzyme configuration string (tab separated) into a Digester.

# File 'lib/mspire/digester.rb', line 191

def mascot_parse(str) # :nodoc:
  name, sense, cleave_str, cterm_exception, independent, semi_specific = str.split(/ *\t */)
  cterm_cleavage = case sense
                   when 'C-Term' then true
                   when 'N-Term' then false
                   else raise ArgumentError, "unknown sense: #{sense}"
                   end

  new(name, cleave_str, cterm_exception, cterm_cleavage)
end

Instance Method Details

#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ `Object`

Returns digestion sites in sequence, as determined by the cleave_regexp boundaries. The digestion sites correspond to the positions where a peptide begins and ends, such that [n, (n+1) - n] corresponds to the [index, length] for peptide n.

d = Digester.new('Trypsin', 'KR', 'P')
seq = "AARGGR"
sites = d.cleavage_sites(seq)                 # => [0, 3, 6]

seq[sites[0], sites[0+1] - sites[0]]          # => "AAR"
seq[sites[1], sites[1+1] - sites[1]]          # => "GGR"

Trailing whitespace is included in the fragment.

seq = "AAR  \n  GGR"
sites = d.cleavage_sites(seq)                 # => [0, 8, 11]

seq[sites[0], sites[0+1] - sites[0]]          # => "AAR  \n  "
seq[sites[1], sites[1+1] - sites[1]]          # => "GGR"

The digested section of sequence may be specified using offset and length.

# File 'lib/mspire/digester.rb', line 81

def cleavage_sites(seq, offset=0, length=seq.length-offset)
  return [0, 1] if seq.size == 1  # adding exceptions is lame--algorithm should just work

  adjustment = cterm_cleavage ? 0 : 1
  limit = offset + length

  positions = [offset]
  pos = scan(seq, offset, limit) do |pos|
    positions << (pos - adjustment)
  end

  # add the final position
  if (pos < limit) || (positions.length == 1)
    positions << limit
  end
  # adding exceptions is lame.. this code probably needs to be
  # refactored (corrected).
  if !cterm_cleavage && pos == limit
    positions << limit
  end
  positions
end

#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ `Object`

Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.

# File 'lib/mspire/digester.rb', line 126

def digest(seq, max_misses=0, offset=0, length=seq.length-offset)
  site_digest(seq, max_misses, offset, length).map do |s, e|
    seq[s, e-s]
  end
end

#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset, &block) ⇒ `Object`

Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.

Each [start_index, end_index] pair is yielded to the block, if given, and the collected results are returned.

# File 'lib/mspire/digester.rb', line 111

def site_digest(seq, max_misses=0, offset=0, length=seq.length-offset, &block) # :yields: start_index, end_index
  frag_sites = cleavage_sites(seq, offset, length)

  overlay(frag_sites.length, max_misses, 1) do |start_index, end_index|
    start_index = frag_sites[start_index]
    end_index = frag_sites[end_index]

    block ? block.call(start_index, end_index) : [start_index, end_index]
  end  
end

Class: Mspire::Digester

Overview

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester

Instance Attribute Details

#cleave_str ⇒ Object (readonly)

#cterm_cleavage ⇒ Object (readonly)

#cterm_exception ⇒ Object (readonly)

#name ⇒ Object (readonly)