Module: BioInterchange
- Defined in:
- lib/biointerchange/core.rb,
lib/biointerchange/so.rb,
lib/biointerchange/sio.rb,
lib/biointerchange/cdao.rb,
lib/biointerchange/gfvo.rb,
lib/biointerchange/sofa.rb,
lib/biointerchange/faldo.rb,
lib/biointerchange/model.rb,
lib/biointerchange/goxref.rb,
lib/biointerchange/reader.rb,
lib/biointerchange/writer.rb,
lib/biointerchange/registry.rb,
lib/biointerchange/life_science_registry.rb
Overview
BioInterchange converts non-RDF data formats into RDF.
Convert TSV, XML, GFF3, GVF and other files into RDF triples using BioInterchange’s command-line tool, its web-services or make use of it as a gem in your own Ruby implementation.
Defined Under Namespace
Modules: Exceptions, Genomics, Phylogenetics, TextMining Classes: CDAO, FALDO, GFVO, GOXRef, LifeScienceRegistry, Model, Reader, Registry, SIO, SO, SOFA, Writer
Constant Summary collapse
- @@evaluation =
If true, then RDF::Graph’s & co will be used. Should only be applied for performance comparisons between the “standard” Ruby gem implementation and BioInterchange’s optimized RDF handling.
false
- @@format =
BioInterchange can output RDF in two formats: Turtle (default) and N-Triples. The two corresponding Ruby constants for these two output formats are:
Turtle: `:turtle` N-Triples: `:ntriples`
:turtle
- @@default_uri_prefix =
Default URI prefix that is used when RDFizing data:
''
- @@default_batch_size =
If input/rdf options permit batching, but no batchsize has been provided by the user, then use this default batch size.
100
Class Method Summary collapse
- .cli ⇒ Object
- .evaluation ⇒ Object
- .format ⇒ Object
-
.get_parameters(map, parameters) ⇒ Object
Returns the values of several named parameters.
-
.make_safe_label(label) ⇒ Object
Returns a “safe” version of a label that can be used as a Ruby method name.
Class Method Details
.cli ⇒ Object
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 |
# File 'lib/biointerchange/core.rb', line 140 def self.cli begin opts = [ ["--help", "-h", Getopt::BOOLEAN], ["--debug", "-d", Getopt::BOOLEAN], # set debug mode => print stack traces ["--batchsize", "-b", Getopt::OPTIONAL], # batchsize for readers/writers that support +postpone?+ ["--ntriples", "-n", Getopt::BOOLEAN], # produce N-Triples instead of Turtle ["--evaluation", "-z", Getopt::BOOLEAN], # use RDF gem implementation for mem/speed comparison ["--uri", "-u", Getopt::OPTIONAL], # URI prefix to use when serializing RDF ["--input", "-i", Getopt::REQUIRED], # input file format ["--rdf", "-r", Getopt::REQUIRED], # output file format ["--file", "-f", Getopt::OPTIONAL], # file to read, will read from STDIN if not supplied ["--out", "-o", Getopt::OPTIONAL], # output file, will out to STDOUT if not supplied ["--version", "-v", Getopt::BOOLEAN] # output the version number of the gem and exit ] reader_writer_pairs = Registry.reader_writer_pairs reader_writer_pairs.each_index { |reader_writer_pair_index| reader_id, writer_id = reader_writer_pairs[reader_writer_pair_index] Registry.(reader_id).each { |option_description| option, description = option_description opts |= [ [ "--annotate_#{option.sub(/\s.*$/, '')}", Getopt::OPTIONAL ] ] } } opt = Getopt::Long.getopts(*opts) if opt['help'] or not (opt['input'] and opt['rdf'] or opt['version']) then puts "Usage: ruby #{$0} -i <format> -r <format> [options]" puts '' puts 'Supported input formats (--input <format>/-i <format>):' Registry.reader_descriptions.each_pair { |reader_id, description| puts " #{reader_id}#{' ' * (34 - reader_id.length)} : #{description}" } puts '' puts 'Supported output formats (--rdf <format>/-r <format>)' Registry.writer_descriptions.each_pair { |writer_id, description| puts " #{writer_id}#{' ' * (34 - writer_id.length)} : #{description}" } puts '' puts 'I/O options:' puts ' -f <file> / --file <file> : file to read; STDIN used if not supplied' puts ' -o <file> / --out <file> : output file; STDOUT used if not supplied' puts ' -n / --ntriples : output RDF N-Triples (instead of RDF Turtle)' puts ' -u <uri> / --uri <prefix> : URI prefix to use' puts " (default: #{@@default_uri_prefix})" puts ' -b <size>/--batchsize <size> : process input in batches of the given size' puts ' (if supported, see below for valid input/rdf pairs;' puts " if supported, but not set, default value is #{@@default_batch_size})" puts '' puts 'Other options:' puts ' -v / --version : print the Gem\'s version number and exit' puts ' -d / --debug : turn on debugging output (for stacktraces)' puts ' -z / --evaluation : use \'RDF\' gem implementation (slow & memory intensive,' puts ' only included for performance evaluation comparisons)' puts ' -h --help : this message' puts '' puts 'Input-/output-format specific options:' reader_writer_pairs.each_index { |reader_writer_pair_index| reader_id, writer_id = reader_writer_pairs[reader_writer_pair_index] puts " Input format : #{reader_id}" puts " Output format : #{writer_id}" Registry.(reader_id).each { |option_description| option, description = option_description puts " --annotate_#{option}#{' ' * (21 - option.length)} : #{description}" } puts '' if reader_writer_pair_index + 1 < reader_writer_pairs.length } exit 1 end # Print version number and exit: if opt['version'] then puts "BioInterchange #{Gem.loaded_specs["biointerchange"].version}" exit end # Turn off optimization, if requested. This will generate an RDF graph in memory and # at least double memory requirements and runtime. @@evaluation = true if opt['evaluation'] # Switch to N-Triples output: @@format = :ntriples if opt['ntriples'] # Check if the input/rdf options are supported: unsupported_combination unless Registry.is_supported?(opt['input'], opt['rdf']) # If a batchsize is given, then use it. Otherwise, check if the input/rdf combination # supports batching and set a default batch value. if opt['batchsize'] then batching_not_supported unless Registry.is_supporting_batch_processing?(opt['input'], opt['rdf']) wrong_type('batchsize', 'a positive integer') unless opt['batchsize'].match(/^[1-9][0-9]*$/) elsif Registry.is_supporting_batch_processing?(opt['input'], opt['rdf']) then opt['batchsize'] = @@default_batch_size end # Create a parameter map that can be passed along to Reader implementations: map = { 'input' => opt['input'], 'output' => opt['output'] } map['batch_size'] = opt['batchsize'].to_i if opt['batchsize'] opt.each_key { |key| map[key.sub(/^annotate_/, '')] = opt[key] if key.start_with?('annotate_') } # Generate model from file (deserialization). reader_class, *args = Registry.reader(opt['input']) reader = reader_class.new(*BioInterchange::get_parameters(map, args)) input_source = nil if opt['file'] then input_source = File.new(opt['file'], 'r') else input_source = STDIN end output_source = nil if opt['out'] then output_source = File.new(opt['out'], 'w') else output_source = STDOUT end # Generate rdf from model (serialization). writer = Registry.writer(opt['rdf']).new(output_source) begin model = reader.deserialize(input_source) writer.serialize(model, opt['uri']) end while reader.postponed? rescue Interrupt # The user hit Ctrl-C, which is okay and does not need error reporting. exit 0 rescue ArgumentError => e $stderr.puts e. $stderr.puts e.backtrace if opt['debug'] exit 1 rescue Getopt::Long::Error => e $stderr.puts e. #$stderr.puts e.backtrace if opt['debug'] exit 1 rescue BioInterchange::Exceptions::InputFormatError => e $stderr.puts e. $stderr.puts e.backtrace if opt['debug'] exit 2 end end |
.evaluation ⇒ Object
14 15 16 |
# File 'lib/biointerchange/core.rb', line 14 def self.evaluation @@evaluation end |
.format ⇒ Object
23 24 25 |
# File 'lib/biointerchange/core.rb', line 23 def self.format @@format end |
.get_parameters(map, parameters) ⇒ Object
Returns the values of several named parameters.
map
-
a map of named parameters and their values
parameters
-
the names of the parameter values we are interested in
297 298 299 300 301 302 303 304 305 |
# File 'lib/biointerchange/core.rb', line 297 def self.get_parameters(map, parameters) parameters.map { |parameter| if parameter.instance_of? Array then parameter[0].call(*BioInterchange::get_parameters(map, parameter[1..-1])) else map[parameter] end } end |
.make_safe_label(label) ⇒ Object
Returns a “safe” version of a label that can be used as a Ruby method name.
label
-
string that should be converted into a “safe” string that can be used as a Ruby method name
310 311 312 313 314 |
# File 'lib/biointerchange/core.rb', line 310 def self.make_safe_label(label) label.gsub(/[ '-.<>\/]/, '_').gsub(/\([^\)]*?\)/, '').sub(/^(\d+)/){ "a_#{$1}" }.gsub(/^_+|_+$/, '').gsub(/_+/, '_') # This additional call pulled together "whatever_ABC" to "whateverABC"; # not clear why: .gsub(/_([A-Z]+)/x){ "#{$1}" } end |