Module: Rsmart::ETL
- Defined in:
- lib/rsmart_toolbox/etl.rb
Overview
rSmart extract, transform and load methods.
Defined Under Namespace
Modules: GRM Classes: TextParseError
Class Method Summary collapse
-
.encode(str, opt = { encoding: "UTF-8" }) ⇒ String
Encodes the input String and replaces invalid or undefined characters.
-
.error(e) ⇒ Exception
Prepares an Exception for consistent error handling.
-
.escape_single_quotes(str) ⇒ String?
Prepares a String for a SQL statement where single quotes need to be escaped.
-
.mutate_sql_stmt!(insert_str, column_name, values_str, value) ⇒ void
Mutates two sides of a SQL insert statement: insert_str and values_str with column_name and value respectively.
-
.parse_boolean(str, opt = {}) ⇒ Boolean
Matches the input against a set of well known boolean patterns.
-
.parse_csv_command_line_options(executable, args, opt = { csv_options: { headers: :first_row, header_converters: :symbol, skip_blanks: true, col_sep: ",", quote_char: '"' } }) ⇒ Hash
Parse common command line options for CSV –> SQL transformations.
-
.parse_date(str, opt = { valid_values: /^$|(\d{4}\-\d{2}\-\d{2}){1}/ }) ⇒ String
Parse a SQL date from a String.
-
.parse_date!(row, insert_str, values_str, opt = {}) ⇒ void
Helper method for #parse_date which finds the value by column :name and mutates the SQL statement accordingly.
-
.parse_datetime(str, opt = { valid_values: /^$|(\d{4}\-\d{2}\-\d{2}){1}\s(\d{2}:\d{2}:\d{2})?/ }) ⇒ String
Parse a SQL datetime from a String.
-
.parse_datetime!(row, insert_str, values_str, opt = {}) ⇒ void
Helper method for #parse_datetime which finds the value by column :name and mutates the SQL statement accordingly.
-
.parse_flag(str, opt = { length: 1, upcase: true }) ⇒ String
Useful for parsing “flag” like values; i.e.
-
.parse_float(str, opt = {}) ⇒ Float?
Parse a Float from a String.
-
.parse_integer(str, opt = {}) ⇒ Integer?
Parse an Integer from a String.
-
.parse_integer!(row, insert_str, values_str, opt = {}) ⇒ void
Helper method which finds the value by column :name and mutates the SQL statement accordingly.
-
.parse_string(str, opt = { strict: true, required: false, escape_single_quotes: true }) ⇒ String
Parses a string using common parsing behavior with options.
-
.parse_string!(row, insert_str, values_str, opt = {}) ⇒ void
Helper method which finds the value by column :name and mutates the SQL statement accordingly.
-
.parse_timestamp(str, opt = { valid_values: /^$|(\d{4}\-\d{2}\-\d{2}){1}\s(\d{2}:\d{2}:\d{2})?/ }) ⇒ String
Parse a SQL timestamp from a String.
-
.parse_timestamp!(row, insert_str, values_str, opt = {}) ⇒ void
Helper method for #parse_timestamp which finds the value by column :name and mutates the SQL statement accordingly.
-
.to_symbol(str) ⇒ Symbol
Matches the MRI CSV specification: The header String is downcased, spaces are replaced with underscores, non-word characters are dropped, and finally to_sym() is called.
-
.valid_value(subject, valid_values, opt = {}) ⇒ Boolean
Tests whether the subject matches one of the valid values.
-
.warning(e) ⇒ Exception
Prepares an Exception for consistent warning handling.
Class Method Details
.encode(str, opt = { encoding: "UTF-8" }) ⇒ String
Encodes the input String and replaces invalid or undefined characters.
108 109 110 111 112 |
# File 'lib/rsmart_toolbox/etl.rb', line 108 def self.encode(str, opt={ encoding: "UTF-8" } ) opt[:encoding] = "UTF-8" if opt[:encoding].nil? str.encode( opt[:encoding], :invalid => :replace, :undef => :replace, :replace => "" ) end |
.error(e) ⇒ Exception
Prepares an Exception for consistent error handling.
30 31 32 33 34 35 36 37 38 39 |
# File 'lib/rsmart_toolbox/etl.rb', line 30 def self.error(e) if e.kind_of? String # default to TextParseError return TextParseError.new "ERROR: Line #{$INPUT_LINE_NUMBER}: #{e}" end if e.kind_of? Exception return e.exception "ERROR: Line #{$INPUT_LINE_NUMBER}: #{e}" end raise ArgumentError, "Unsupported error type: #{e.class}" end |
.escape_single_quotes(str) ⇒ String?
Prepares a String for a SQL statement where single quotes need to be escaped.
148 149 150 151 152 153 |
# File 'lib/rsmart_toolbox/etl.rb', line 148 def self.escape_single_quotes(str) if str.nil? return nil end return str.to_s.gsub("'", "\\\\'") end |
.mutate_sql_stmt!(insert_str, column_name, values_str, value) ⇒ void
This method returns an undefined value.
Mutates two sides of a SQL insert statement: insert_str and values_str with column_name and value respectively. Proper SQL value quoting will be performed based on object type.
133 134 135 136 137 138 139 140 141 142 |
# File 'lib/rsmart_toolbox/etl.rb', line 133 def self.mutate_sql_stmt!(insert_str, column_name, values_str, value) insert_str.concat "#{column_name.upcase}," # TODO what are all of the valid types that should not be quoted? if value.kind_of? Integer values_str.concat "#{value}," else values_str.concat "'#{value}'," end return nil end |
.parse_boolean(str, opt = {}) ⇒ Boolean
Matches the input against a set of well known boolean patterns.
88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
# File 'lib/rsmart_toolbox/etl.rb', line 88 def self.parse_boolean(str, opt={}) return true if str == true return false if str == false b = parse_string str, opt return true if b =~ /^(active|a|true|t|yes|y|1)$/i return false if b =~ /^(inactive|i|false|f|no|n|0)$/i if b.empty? && !opt[:default].nil? return opt[:default] end if b.empty? return nil end raise Rsmart::ETL::error TextParseError.new "invalid value for Boolean: '#{str}'" end |
.parse_csv_command_line_options(executable, args, opt = { csv_options: { headers: :first_row, header_converters: :symbol, skip_blanks: true, col_sep: ",", quote_char: '"' } }) ⇒ Hash
Parse common command line options for CSV –> SQL transformations.
367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 |
# File 'lib/rsmart_toolbox/etl.rb', line 367 def self.( executable, args, opt={ csv_options: { headers: :first_row, header_converters: :symbol, skip_blanks: true, col_sep: ",", quote_char: '"' } } ) optparse = OptionParser.new do |opts| opts. = "Usage: #{executable} [options] csv_file" opts.on( '-o' ,'--output SQL_FILE_OUTPUT', 'The file the SQL data will be writen to... (defaults to <csv_file>.sql)') do |f| opt[:sql_filename] = f end opts.on( '-s' ,'--separator SEPARATOR_CHARACTER', 'The character that separates each column of the CSV file.') do |s| opt[:csv_options][:col_sep] = s end opts.on( '-q' ,'--quote QUOTE_CHARACTER', 'The character used to quote fields.') do |q| opt[:csv_options][:quote_char] = q end opts.on( '-h', '--help', 'Display this screen' ) do puts opts exit 1 end opt[:csv_filename] = args[0] unless opt[:csv_filename] if opt[:csv_filename].nil? || opt[:csv_filename].empty? puts opts exit 1 end end optparse.parse! # construct a sensible default ouptput filename unless opt[:sql_filename] file_extension = File.extname opt[:csv_filename] dir_name = File.dirname opt[:csv_filename] base_name = File.basename opt[:csv_filename], file_extension opt[:sql_filename] = "#{dir_name}/#{base_name}.sql" end return opt end |
.parse_date(str, opt = { valid_values: /^$|(\d{4}\-\d{2}\-\d{2}){1}/ }) ⇒ String
Parse a SQL date from a String.
270 271 272 273 |
# File 'lib/rsmart_toolbox/etl.rb', line 270 def self.parse_date(str, opt={ valid_values: /^$|(\d{4}\-\d{2}\-\d{2}){1}/ }) opt[:valid_values] = /^$|(\d{4}\-\d{2}\-\d{2}){1}/ if opt[:valid_values].nil? return parse_string str, opt end |
.parse_date!(row, insert_str, values_str, opt = {}) ⇒ void
This method returns an undefined value.
Helper method for #parse_date which finds the value by column :name and mutates the SQL statement accordingly.
285 286 287 288 289 |
# File 'lib/rsmart_toolbox/etl.rb', line 285 def self.parse_date!(row, insert_str, values_str, opt={}) raise ArgumentError, "opt[:name] is required!" unless opt[:name] date = parse_date( row[ to_symbol( opt[:name] ) ], opt ) mutate_sql_stmt! insert_str, opt[:name], values_str, date end |
.parse_datetime(str, opt = { valid_values: /^$|(\d{4}\-\d{2}\-\d{2}){1}\s(\d{2}:\d{2}:\d{2})?/ }) ⇒ String
Parse a SQL datetime from a String.
296 297 298 299 |
# File 'lib/rsmart_toolbox/etl.rb', line 296 def self.parse_datetime(str, opt={ valid_values: /^$|(\d{4}\-\d{2}\-\d{2}){1}\s(\d{2}:\d{2}:\d{2})?/ }) opt[:valid_values] = /^$|(\d{4}\-\d{2}\-\d{2}){1}\s(\d{2}:\d{2}:\d{2})?/ if opt[:valid_values].nil? return parse_string str, opt end |
.parse_datetime!(row, insert_str, values_str, opt = {}) ⇒ void
This method returns an undefined value.
Helper method for #parse_datetime which finds the value by column :name and mutates the SQL statement accordingly.
311 312 313 314 315 |
# File 'lib/rsmart_toolbox/etl.rb', line 311 def self.parse_datetime!(row, insert_str, values_str, opt={}) raise ArgumentError, "opt[:name] is required!" unless opt[:name] datetime = parse_datetime( row[ to_symbol( opt[:name] ) ], opt ) mutate_sql_stmt! insert_str, opt[:name], values_str, datetime end |
.parse_flag(str, opt = { length: 1, upcase: true }) ⇒ String
Useful for parsing “flag” like values; i.e. usually single characters.
349 350 351 352 353 354 355 |
# File 'lib/rsmart_toolbox/etl.rb', line 349 def self.parse_flag(str, opt={ length: 1, upcase: true }) opt[:length] = 1 if opt[:length].nil? opt[:upcase] = true if opt[:upcase].nil? retval = parse_string str, opt retval = retval.upcase if opt[:upcase] == true return retval end |
.parse_float(str, opt = {}) ⇒ Float?
Note the behavioral difference versus #to_f.
Parse a Float from a String.
256 257 258 259 260 261 262 263 |
# File 'lib/rsmart_toolbox/etl.rb', line 256 def self.parse_float(str, opt={}) s = parse_string str, opt if s.empty? return nil; else return s.to_f end end |
.parse_integer(str, opt = {}) ⇒ Integer?
Note the behavioral difference versus #to_i.
Parse an Integer from a String.
223 224 225 226 227 228 229 230 |
# File 'lib/rsmart_toolbox/etl.rb', line 223 def self.parse_integer(str, opt={}) s = parse_string str, opt if s.empty? return nil; else return s.to_i end end |
.parse_integer!(row, insert_str, values_str, opt = {}) ⇒ void
This method returns an undefined value.
Helper method which finds the value by column :name and mutates the SQL statement accordingly.
242 243 244 245 246 |
# File 'lib/rsmart_toolbox/etl.rb', line 242 def self.parse_integer!(row, insert_str, values_str, opt={}) raise ArgumentError, "opt[:name] is required!" unless opt[:name] i = parse_integer( row[ to_symbol( opt[:name] ) ], opt ) mutate_sql_stmt! insert_str, opt[:name], values_str, i end |
.parse_string(str, opt = { strict: true, required: false, escape_single_quotes: true }) ⇒ String
Parses a string using common parsing behavior with options. This method forms the foundation of all the other parsing methods.
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
# File 'lib/rsmart_toolbox/etl.rb', line 173 def self.parse_string(str, opt={ strict: true, required: false, escape_single_quotes: true }) opt[:strict] = true if opt[:strict].nil? opt[:escape_single_quotes] = true if opt[:escape_single_quotes].nil? retval = encode str.to_s.strip if opt[:required] && retval.empty? raise Rsmart::ETL::error TextParseError.new "Required data element '#{opt[:name]}' not found: '#{str}'" end if opt[:default] && retval.empty? retval = opt[:default].to_s end if opt[:length] && retval.length > opt[:length].to_i detail = "#{opt[:name]}.length > #{opt[:length]}: '#{str}'-->'#{str[0..(opt[:length] - 1)]}'" if opt[:strict] raise Rsmart::ETL::error TextParseError.new "Data exceeds maximum field length: #{detail}" end Rsmart::ETL::warning "Data will be truncated: #{detail}" end if opt[:valid_values] && ! valid_value(retval, opt[:valid_values], opt) raise Rsmart::ETL::error TextParseError.new "Illegal #{opt[:name]}: value '#{str}' not found in: #{opt[:valid_values]}" end if opt[:escape_single_quotes] retval = escape_single_quotes retval end return retval end |
.parse_string!(row, insert_str, values_str, opt = {}) ⇒ void
This method returns an undefined value.
Helper method which finds the value by column :name and mutates the SQL statement accordingly.
209 210 211 212 213 |
# File 'lib/rsmart_toolbox/etl.rb', line 209 def self.parse_string!(row, insert_str, values_str, opt={}) raise ArgumentError, "opt[:name] is required!" unless opt[:name] str = parse_string( row[ to_symbol( opt[:name] ) ], opt ) mutate_sql_stmt! insert_str, opt[:name], values_str, str end |
.parse_timestamp(str, opt = { valid_values: /^$|(\d{4}\-\d{2}\-\d{2}){1}\s(\d{2}:\d{2}:\d{2})?/ }) ⇒ String
Parse a SQL timestamp from a String.
322 323 324 |
# File 'lib/rsmart_toolbox/etl.rb', line 322 def self.(str, opt={ valid_values: /^$|(\d{4}\-\d{2}\-\d{2}){1}\s(\d{2}:\d{2}:\d{2})?/ }) return parse_datetime str, opt end |
.parse_timestamp!(row, insert_str, values_str, opt = {}) ⇒ void
This method returns an undefined value.
Helper method for #parse_timestamp which finds the value by column :name and mutates the SQL statement accordingly.
336 337 338 339 340 |
# File 'lib/rsmart_toolbox/etl.rb', line 336 def self.(row, insert_str, values_str, opt={}) raise ArgumentError, "opt[:name] is required!" unless opt[:name] = parse_datetime( row[ to_symbol( opt[:name] ) ], opt ) mutate_sql_stmt! insert_str, opt[:name], values_str, end |
.to_symbol(str) ⇒ Symbol
Matches the MRI CSV specification: The header String is downcased, spaces are replaced with underscores, non-word characters are dropped, and finally to_sym() is called.
121 122 123 124 |
# File 'lib/rsmart_toolbox/etl.rb', line 121 def self.to_symbol(str) raise ArgumentError, "Illegal symbol name: '#{str}'" if str.nil? || str.empty? encode( str.downcase.gsub(/\s+/, "_").gsub(/\W+/, "") ).to_sym end |
.valid_value(subject, valid_values, opt = {}) ⇒ Boolean
Tests whether the subject matches one of the valid values.
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
# File 'lib/rsmart_toolbox/etl.rb', line 65 def self.valid_value(subject, valid_values, opt={}) raise ArgumentError, "valid_values must not be nil!" if valid_values.nil? if valid_values.kind_of? Regexp return true if subject =~ valid_values end if valid_values.kind_of? Array raise ArgumentError, "valid_values must have at least one element!" unless valid_values.length > 0 if opt[:case_sensitive] == false # case insensitive comparison requested raise ArgumentError, "Object must respond to #casecmp" unless subject.respond_to? 'casecmp' valid_values.each do |valid_value| return true if valid_value.casecmp(subject) == 0 end end return true if valid_values.include? subject # default to == equality end return false end |
.warning(e) ⇒ Exception
Prepares an Exception for consistent warning handling.
45 46 47 48 49 50 51 52 53 54 |
# File 'lib/rsmart_toolbox/etl.rb', line 45 def self.warning(e) if e.kind_of? String # default to TextParseError return TextParseError.new "WARN: Line #{$INPUT_LINE_NUMBER}: #{e}" end if e.kind_of? Exception return e.exception "WARN: Line #{$INPUT_LINE_NUMBER}: #{e}" end raise ArgumentError, "Unsupported error type: #{e.class}" end |