Class: Mspire::Ident::ProteinGroup
- Defined in:
- lib/mspire/ident/protein_group.rb
Overview
represents a group of proteins, typically indistinguishable in the experiment.
Constant Summary collapse
- PRIORITIZE_PROTEINS =
lambda do |protein_group_and_peptide_hits| peptide_hits = protein_group_and_peptide_hits.last num_uniq_aaseqs = peptide_hits.map {|hit| hit.aaseq }.uniq.size num_uniq_aaseqs_at_z = peptide_hits.map {|hit| [hit.aaseq, hit.charge] }.uniq.size [num_uniq_aaseqs, num_uniq_aaseqs_at_z, peptide_hits.size] end
Instance Attribute Summary collapse
-
#peptide_hits ⇒ Object
Returns the value of attribute peptide_hits.
Class Method Summary collapse
-
.peptide_hits_to_protein_groups(peptide_hits, update_peptide_hits = false, &sort_by) ⇒ Object
greedy algorithm to map a set of peptide_hits to protein groups.
Methods inherited from Array
Instance Attribute Details
#peptide_hits ⇒ Object
Returns the value of attribute peptide_hits.
8 9 10 |
# File 'lib/mspire/ident/protein_group.rb', line 8 def peptide_hits @peptide_hits end |
Class Method Details
.peptide_hits_to_protein_groups(peptide_hits, update_peptide_hits = false, &sort_by) ⇒ Object
greedy algorithm to map a set of peptide_hits to protein groups. each peptide hit should respond to :aaseq, :charge, :proteins if a block is given, yields a single argument: a doublet of protein_group and peptide set. It expects a metric or array to sort by for creating greedy protein groups (the greediest proteins should sort to the back of the array). if no block is given, the groups are sorted by [# uniq aaseqs, # uniq aaseq+charge, # peptide_hits] (see PRIORITIZE_PROTEINS). Sets of peptide_hits and the objects returned by peptide_hit#proteins are used as hash keys. As long as each peptide hit has a unique signature (like an id) then any object will work. If they are Struct objects, you might consider redefining the #hash method to be object_id for performance and accuracy.
returns an array of ProteinGroup objects, each set with :peptide_hits
If update_peptide_hits is true, then each peptide_hit is linked to the array of protein_groups it is associated with using :protein_groups. A symbol can also be passed in, and that method will be called instead.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/mspire/ident/protein_group.rb', line 35 def self.peptide_hits_to_protein_groups(peptide_hits, update_peptide_hits=false, &sort_by) update_peptide_hits = 'protein_groups='.to_sym if (update_peptide_hits==true) sort_by ||= PRIORITIZE_PROTEINS # note to self: I wrote this in 2011, so I think I know what I'm doing now protein_to_peptides = Hash.new {|h,k| h[k] = Set.new } peptide_hits.each do |peptide_hit| peptide_hit.proteins.each do |protein| protein_to_peptides[protein] << peptide_hit end end peptides_to_protein_group = Hash.new {|h,k| h[k] = [] } protein_to_peptides.each do |protein, peptide_set| peptides_to_protein_group[peptide_set] << protein end peptides_to_protein_group.each do |pephits,ar_of_prots| pg = Mspire::Ident::ProteinGroup.new(ar_of_prots) pg.peptide_hits = pephits peptides_to_protein_group[pephits] = pg end protein_group_to_peptides = peptides_to_protein_group.invert greedy_first = protein_group_to_peptides.sort_by(&sort_by).reverse accounted_for = Set.new # we are discarding the subsumed sets, but we could get them with # partition greedy_first.select! do |group, peptide_set| has_an_unaccounted_peptide = false peptide_set.each do |peptide_hit| unless accounted_for.include?(peptide_hit) has_an_unaccounted_peptide = true accounted_for.add(peptide_hit) end end group.peptide_hits = peptide_set if has_an_unaccounted_peptide has_an_unaccounted_peptide end if update_peptide_hits greedy_first.each {|pg, pephits| pephits.each {|hit| hit.send(update_peptide_hits, pg) } } end greedy_first.map(&:first) end |