Class: HansenLab::Tasks::AlignMod

Inherits:
Tap::Tasks::TableTask
  • Object
show all
Defined in:
lib/hansen_lab/tasks/align_mod.rb

Overview

:startdoc::manifest align mascot peptide ids at a modification Aligns Mascot peptide identifications along a modification boundary.

For example:

[input.txt]
AAAQAAA 0.0004000.0 first
QBBBBBQ 0.4000004.0 second

Becomes:

[input.align]
|AAA|Q|AAA|first|
|.|Q|BBBBBQ|second (1)|
|QBBBBB|Q|.|second (2)|

Extra fields can be present in the input file, they will be carried forward to the result file. If a sequence has multiple modifications, each will be listed in the result, as above.

Note that while the results don’t seem terribly well aligned here, they can be turned into a table by using RedCloth (for example by posting the result as a message to Basecamp), or you can modify the output with the configs.

Instance Method Summary collapse

Instance Method Details

#format_row(data) ⇒ Object



37
38
39
# File 'lib/hansen_lab/tasks/align_mod.rb', line 37

def format_row(data)
  output_line_format % data.join(output_col_sep)
end

#process(target, source) ⇒ Object



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'lib/hansen_lab/tasks/align_mod.rb', line 41

def process(target, source)
    
  table = parse_table( File.read(source) )
  
  prepare(target) 
  File.open(target, "wb") do |file|
    
    # handle the header row.  Note that the headers need to be
    # moved around a little to conform to the output format.
    if header_row
      file.puts format_row(["", "", ""] + table.headers[2..-1])
    end
    
    sequence_locations = {}
    table.data.each do |line|
      seq, locator, *identifiers = line

      # checks
      unless locator =~ /^\d\.(\d+)\.\d$/ && seq.length == $1.length
        raise "could not split line correctly: #{line}"
      end
      
      locator = $1
      locations = sequence_locations[seq] ||= []
      split_locations = [] 
      0.upto(locator.length-1) do |index|
        if mod_numbers.include?(locator[index, 1].to_i)
          unless locations.include?(index)
            split_locations << index
            locations << index
          end
        end
      end
      
      split_locations.each_with_index do |location, index|
        data = [
          seq[0...location],
          seq[location, 1],
          seq[location+1..-1],
        ] 
        data += identifiers if index == 0
        
        # modify the lead identifier to note duplicates
        data[3] = "#{identifiers[0]} (#{index + 1})" if split_locations.length > 1
        data.collect! {|str| str.empty? ? output_empty_cell : str }
        
        file.puts format_row(data)
      end
      
    end
  end
  
  target
end