Class: Molecules::Libraries::Polypeptide

Inherits:
EmpiricalFormula show all
Defined in:
lib/molecules/libraries/polypeptide.rb

Overview

Represents a polypeptide as a sequence of residues. For convenience, polypeptides may contain whitespace in their sequences (thus allowing direct use with parsed FASTA formatted peptides sequences).

Currently polypeptide only handles sequences with common residues.

Defined Under Namespace

Classes: UnknownResidueError

Constant Summary collapse

SEQUENCE_TOKENS =

An array of tokens that may occur in a sequence, grouped as patterns (ie one token for all whitespace characters, and one token for each residue). Used to count the number of each type of residue in a sequence.

["\s\t\r\n"] + Residue.common.collect {|r| r.letter}

Constants inherited from EmpiricalFormula

EmpiricalFormula::ELEMENT_INDEX, EmpiricalFormula::ELEMENT_INDEX_ORDER

Instance Attribute Summary collapse

Attributes inherited from EmpiricalFormula

#formula

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from EmpiricalFormula

#*, #+, #-, #==, #each, mass, #mass, parse, parse_simple, #to_s

Methods included from Utils

add, count, multiply, round

Constructor Details

#initialize(sequence) ⇒ Polypeptide

Returns a new instance of Polypeptide.



36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# File 'lib/molecules/libraries/polypeptide.rb', line 36

def initialize(sequence)
  @sequence = sequence

  @length = 0
  @residue_composition = {}
  @formula = Array.new(5, 0)
  
  # count up the number of whitespaces and residues in self
  tokens = Utils.count(sequence, SEQUENCE_TOKENS)
  whitespace = tokens.shift

  if whitespace == sequence.length
    # as per the Base specification, factors
    # should have no trailing zeros
    @formula.clear
    return
  end

  # add the residue masses and factors
  Residue.common.each do |residue|
    # benchmarks indicated that counting for each residue
    # is quicker than trying anything like:
    #
    #   sequence.each_byte {|b| bytes[b] += 1}
    #
    # This is particularly an issue for long sequences.  The
    # count operation could be optimized for isobaric residues
    n = tokens.shift
    next if n == 0

    @length += n
    @residue_composition[residue] = n
    Utils.add(@formula, residue.formula, n)
  end

  if @length + whitespace != sequence.length
    # raise an error if there are unaccounted characters
    raise UnknownResidueError, "unknown characters in sequence: #{sequence}"
  end
end

Instance Attribute Details

#lengthObject (readonly)

The number of residues in self (may differ from sequence.length if sequence contains whitespace).



28
29
30
# File 'lib/molecules/libraries/polypeptide.rb', line 28

def length
  @length
end

#residue_compositionObject (readonly)

A hash of (Residue, Integer) pairs defining the number of a given residue in self.



24
25
26
# File 'lib/molecules/libraries/polypeptide.rb', line 24

def residue_composition
  @residue_composition
end

#sequenceObject (readonly)

The sequence of self (including whitespace)



21
22
23
# File 'lib/molecules/libraries/polypeptide.rb', line 21

def sequence
  @sequence
end

Class Method Details

.normalize(sequence) ⇒ Object

Normalizes the input sequence by removing whitespace and capitalizing.



15
16
17
# File 'lib/molecules/libraries/polypeptide.rb', line 15

def normalize(sequence)
  sequence.gsub(/\s/, "").upcase
end

Instance Method Details

#each_residueObject

Sequentially passes each residue in sequence to the block.



78
79
80
81
82
83
84
# File 'lib/molecules/libraries/polypeptide.rb', line 78

def each_residue
  residues = Residue.residue_index
  sequence.each_byte do |byte|
    residue = residues[byte]
    yield(residue) if residue
  end
end