Class: Ms::Xcalibur::Convert::RawToDta

Inherits:
Tap::Tasks::FileTask
  • Object
show all
Defined in:
lib/ms/xcalibur/convert/raw_to_dta.rb

Overview

:startdoc::task convert RAW files to dta format

Converts a RAW file to dta files using extract_msn.exe. Returns an array of the output dta files. By default extracted files are put in a directory named after the RAW file, but an alternate extraction directory can be specified iwth the output-dir option.

RawToDta will skip extraction of an ‘lcq_dta.txt’ file and all the dta files listed therein exist in the output directory. This is good in most cases; if you want to force execution set force true for the run:

% tap run --force -- raw_to_dta ...

extract_msn

extract_msn.exe is an Xcalibur/BioWorks tool that extracts spectra from RAW files into dta (Sequest) format and must be installed for RawToDta to work. At present this means that RawToDta can only work on Windows.

RawToDta was developed against extract_msn version 4.0. You can check if extract_msn is installed at the default location, as well as determine the version of your executable using:

% tap run -- raw_to_dta  --extract_msn_help

Constant Summary collapse

CONFIG_MAP =
[
  [:first_scan, 'F'],
  [:last_scan, 'L'],
  [:lower_MW, 'B'],
  [:upper_MW, 'T'],
  [:precursor_mass_tol, 'M'],
  [:num_allowed_intermediate_scans_for_grouping, 'S'],
  [:charge_state, 'C'],
  [:num_required_group_scans, 'G'],
  [:num_ions_required, 'I'],
  [:output_path, 'D'],
  [:intensity_threshold, 'E'],
  [:use_unified_search_file, 'U'],
  [:subsequence, 'Y'],
  [:write_zta_files, 'Z'],
  [:perform_charge_calculations, 'K'],
  [:template_file, 'O'],
  [:options_string, 'A'],
  [:minimum_signal_to_noise, 'R'],
  [:minimum_number_of_peaks, 'r']
]

Instance Method Summary collapse

Instance Method Details

#cmd(input_file, output_dir = nil) ⇒ Object

Formats the extract_msn.exe command using the specified input_file, and the current configuration. A default output directory can be specified using output_dir; it will not override a configured output directory.

Note that output_dir should be an EXISTING filepath or relative filepath. execute_msn.exe will not generate .dta files if the

output_dir doesn’t exist.



126
127
128
129
130
131
132
133
# File 'lib/ms/xcalibur/convert/raw_to_dta.rb', line 126

def cmd(input_file, output_dir=nil)
  args = []
  args << "\"#{normalize extract_msn}\""
  args << cmd_options(output_dir)
  args << "\"#{normalize input_file}\""

  args.join(' ')
end

#cmd_options(output_dir = nil) ⇒ Object

Formats command options for extract_msn.exe using the current configuration. Configurations are mapped to their single-letter keys using CONFIG_MAP.

A default output_dir can be specified for when config is not specified.



94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# File 'lib/ms/xcalibur/convert/raw_to_dta.rb', line 94

def cmd_options(output_dir=nil)
  options = CONFIG_MAP.collect do |key, flag|
    value = (flag == "D" ? output_dir : config[key])
    next unless value

    # formatting consists of stringifying the value argument, or
    # in escaping the value if the arguement is a path
    formatted_value = case key
    when :use_unified_search_file, :perform_charge_calculations, :write_zta_files
      "" # no argument
    when :output_path, :template_file 
      # path argument, escape
      "\"#{normalize value}\""  
    else 
      # number or string, simply stringify
      value.to_s
    end

    "-#{flag}#{formatted_value}"
  end

  options.compact.join(" ")
end

#dta_files(output_dir) ⇒ Object

Returns an array of dta_files specified in the lcq_dta.txt file under output_dir. A simple glob is less preferable than reading the list of files from lcq_dta because there is no guarantee all the .dta file in the output directory should be used for a particular file.



170
171
172
173
174
175
176
177
178
179
# File 'lib/ms/xcalibur/convert/raw_to_dta.rb', line 170

def dta_files(output_dir)
  lcq_dta = File.join(output_dir, 'lcq_dta.txt')
  
  dta_files = []
  File.read(lcq_dta).scan(/Datafile:\s(.*?\.dta)\s/) do |dta_file|
    dta_files << File.join(output_dir, dta_file)
  end if File.exists?(lcq_dta)
  
  dta_files
end

#normalize(path) ⇒ Object

Expands the input path and converts all forward slashes (/) to backslashes () to make it into a Windows-style path.



85
86
87
# File 'lib/ms/xcalibur/convert/raw_to_dta.rb', line 85

def normalize(path)
  File.expand_path(path).gsub(/\//, "\\")
end

#process(input_file) ⇒ Object



135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# File 'lib/ms/xcalibur/convert/raw_to_dta.rb', line 135

def process(input_file)
  extname = File.extname(input_file)
  raise "Expected .RAW file: #{input_file}" unless  extname =~ /\.RAW$/i

  # Target the output to a directory with the same basename 
  # as the raw file, unless otherwise specified.
  output_dir = self.output_dir || input_file.chomp(File.extname(input_file))
  current_dta_files = dta_files(output_dir)
  if !current_dta_files.empty? && uptodate?(current_dta_files, input_file)
    log_basename :uptodate, input_file
    current_dta_files
  else
    unless File.exists?(extract_msn)
      raise "extract_msn does not exist at: #{extract_msn}"
    end
    
    mkdir(output_dir)
    command = cmd(input_file, output_dir)

    log :sh, command
    if app.quiet
      capture_sh(command, true)
    else
      sh(command)
      puts ""  # add extra line to make logging nice
    end
    
    dta_files(output_dir)
  end
end