Class: HansenLab::Tasks::AlignMod
- Inherits:
-
Tap::Tasks::TableTask
- Object
- Tap::Tasks::TableTask
- HansenLab::Tasks::AlignMod
- Defined in:
- lib/hansen_lab/tasks/align_mod.rb
Overview
:startdoc::manifest align mascot peptide ids at a modification Aligns Mascot peptide identifications along a modification boundary.
For example:
[input.txt]
AAAQAAA 0.0004000.0 first
QBBBBBQ 0.4000004.0 second
Becomes:
[input.align]
|AAA|Q|AAA|first|
|.|Q|BBBBBQ|second (1)|
|QBBBBB|Q|.|second (2)|
Extra fields can be present in the input file, they will be carried forward to the result file. If a sequence has multiple modifications, each will be listed in the result, as above.
Note that while the results don’t seem terribly well aligned here, they can be turned into a table by using RedCloth (for example by posting the result as a message to Basecamp), or you can modify the output with the configs.
Instance Method Summary collapse
Instance Method Details
#format_row(data) ⇒ Object
37 38 39 |
# File 'lib/hansen_lab/tasks/align_mod.rb', line 37 def format_row(data) output_line_format % data.join(output_col_sep) end |
#process(target, source) ⇒ Object
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
# File 'lib/hansen_lab/tasks/align_mod.rb', line 41 def process(target, source) table = parse_table( File.read(source) ) prepare(target) File.open(target, "wb") do |file| # handle the header row. Note that the headers need to be # moved around a little to conform to the output format. if header_row file.puts format_row(["", "", ""] + table.headers[2..-1]) end sequence_locations = {} table.data.each do |line| seq, locator, *identifiers = line # checks unless locator =~ /^\d\.(\d+)\.\d$/ && seq.length == $1.length raise "could not split line correctly: #{line}" end locator = $1 locations = sequence_locations[seq] ||= [] split_locations = [] 0.upto(locator.length-1) do |index| if mod_numbers.include?(locator[index, 1].to_i) unless locations.include?(index) split_locations << index locations << index end end end split_locations.each_with_index do |location, index| data = [ seq[0...location], seq[location, 1], seq[location+1..-1], ] data += identifiers if index == 0 # modify the lead identifier to note duplicates data[3] = "#{identifiers[0]} (#{index + 1})" if split_locations.length > 1 data.collect! {|str| str.empty? ? output_empty_cell : str } file.puts format_row(data) end end end target end |