Class: Fech::Filing

Inherits:
Object
  • Object
show all
Defined in:
lib/fech/filing.rb

Overview

Fech::Filing downloads an Electronic Filing given its ID, and will search rows by row type. Using a child Translator object, the data in each row is automatically mapped at runtime into a labeled Hash. Additional Translations may be added to change the way that data is mapped and cleaned.

Direct Known Subclasses

SenateFiling

Constant Summary collapse

FIRST_V3_FILING =

first filing number using the version >=3.00 format note that there are plenty of <v3 filings after this, so readable? still needs to be checked

11850

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(filing_id, opts = {}) ⇒ Filing

Create a new Filing object, assign the download directory to system’s temp folder by default.

Parameters:

  • download_dir (String)

    override the directory where files should be downloaded.

  • translate (Symbol, Array)

    a list of built-in translation sets to use



23
24
25
26
27
28
29
30
31
32
# File 'lib/fech/filing.rb', line 23

def initialize(filing_id, opts={})
  @filing_id    = filing_id
  @download_dir = opts[:download_dir] || Dir.tmpdir
  @translator   = opts[:translate] ? Fech::Translator.new(:include => opts[:translate]) : nil
  @quote_char   = opts[:quote_char] || '"'
  @csv_parser   = opts[:csv_parser] || Fech::Csv
  @resaved      = false
  @customized   = false
  @encoding     = opts[:encoding] || 'iso-8859-1:utf-8'
end

Instance Attribute Details

#download_dirObject

Returns the value of attribute download_dir.



16
17
18
# File 'lib/fech/filing.rb', line 16

def download_dir
  @download_dir
end

#filing_idObject

Returns the value of attribute filing_id.



16
17
18
# File 'lib/fech/filing.rb', line 16

def filing_id
  @filing_id
end

Class Method Details

.map_for(row_type, opts = {}) ⇒ Object

Returns the column names for given row type and version in the order they appear in row data.

Parameters:

  • row_type (String, Regexp)

    representation of the row desired

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :version (String, Regexp)

    representation of the version desired



171
172
173
# File 'lib/fech/filing.rb', line 171

def self.map_for(row_type, opts={})
  Fech::Mappings.for_row(row_type, opts)
end

Instance Method Details

#amendment?Boolean

Whether this filing amends a previous filing or not.

Returns:

  • (Boolean)


195
196
197
# File 'lib/fech/filing.rb', line 195

def amendment?
  !amends.nil?
end

#amendsObject

Returns the filing ID of the past filing this one amends, nil if this is a first-draft filing. :report_id in the HDR line references the amended filing



202
203
204
# File 'lib/fech/filing.rb', line 202

def amends
  header[:report_id]
end

#custom_file_pathObject

The file path where custom versions of a filing are to be saved.



271
272
273
# File 'lib/fech/filing.rb', line 271

def custom_file_path
  File.join(download_dir, "fech_#{file_name}")
end

#delimiterString

Returns the delimiter used in the filing’s version.

Returns:

  • (String)

    the delimiter used in the filing’s version



346
347
348
# File 'lib/fech/filing.rb', line 346

def delimiter
  filing_version.to_f < 6 ? "," : "\034"
end

#downloadObject

Saves the filing data from the FEC website into the default download directory.



36
37
38
39
40
41
42
43
44
45
46
# File 'lib/fech/filing.rb', line 36

def download
  File.open(file_path, 'w') do |file|
    begin
      file << open(filing_url).read
    rescue
      file << open(filing_url).read.ensure_encoding('UTF-8', :external_encoding => Encoding::UTF_8,
    :invalid_characters => :drop)
    end
  end
  self
end

#each_row(opts = {}) {|Array| ... } ⇒ Object

Iterates over and yields the Filing’s lines

Parameters:

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :with_index (Boolean)

    yield both the item and its index

  • :row_type (Boolean)

    yield only rows that match this type

Yields:

  • (Array)

    a row of the filing, split by the delimiter from #delimiter



321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
# File 'lib/fech/filing.rb', line 321

def each_row(opts={}, &block)
  unless File.exists?(file_path)
    raise "File #{file_path} does not exist. Try invoking the .download method on this Filing object."
  end

  # If this is an F99, we need to parse it differently.
  resave_f99_contents if ['F99', '"F99"'].include? form_type

  c = 0
  @csv_parser.parse_row(@customized ? custom_file_path : file_path, opts.merge(:col_sep => delimiter, :quote_char => @quote_char, :skip_blanks => true, :encoding => @encoding)) do |row|
    if opts[:with_index]
      yield [row, c]
      c += 1
    else
      yield row
    end
  end
end

#each_row_with_index(&block) ⇒ Object

Wrapper around .each_row to include indexes



341
342
343
# File 'lib/fech/filing.rb', line 341

def each_row_with_index(&block)
  each_row(:with_index => true, &block)
end

#file_contentsObject

The raw contents of the Filing



248
249
250
# File 'lib/fech/filing.rb', line 248

def file_contents
  File.open(file_path, 'r')
end

#file_nameObject



309
310
311
# File 'lib/fech/filing.rb', line 309

def file_name
  "#{filing_id}.fec"
end

#file_pathObject

The location of the Filing on the file system



243
244
245
# File 'lib/fech/filing.rb', line 243

def file_path
  File.join(download_dir, file_name)
end

#filing_urlObject



313
314
315
# File 'lib/fech/filing.rb', line 313

def filing_url
  "http://docquery.fec.gov/dcdev/posted/#{filing_id}.fec"
end

#filing_versionObject

The version of the FEC software used to generate this Filing



216
217
218
# File 'lib/fech/filing.rb', line 216

def filing_version
  @filing_version ||= parse_filing_version
end

#fix_f99_contentsObject

Handle the contents of F99s by removing the

BEGINTEXT

and [ENDTEXT] delimiters and

putting the text content onto the same line as the summary.



279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
# File 'lib/fech/filing.rb', line 279

def fix_f99_contents
  @customized = true
  content = file_contents.read
  
  if RUBY_VERSION > "1.9.2"
    content.encode!('UTF-16', 'UTF-8', :invalid => :replace, :undef => :replace, :replace => '?')
    content.encode!('UTF-8', 'UTF-16')
  else
    require 'iconv'
    ic = Iconv.new('UTF-8//IGNORE', 'UTF-8') 
    content = ic.iconv(content + ' ')[0..-2] # add valid byte before converting, then remove it
  end
  
  regex = /\n\[BEGINTEXT\]\n(.*?)\[ENDTEXT\]\n/mi # some use eg [EndText]
  match = content.match(regex)
  if match
    repl = match[1].gsub(/"/, '""')
    content.gsub(regex, "#{delimiter}\"#{repl}\"")
  else
    content
  end
end

#form_typeObject

Determine the form type of the filing before it’s been parsed. This is needed for the F99 special case.



255
256
257
258
259
260
261
262
263
264
265
266
267
# File 'lib/fech/filing.rb', line 255

def form_type

  if RUBY_VERSION >= "2.0"
    lines = file_contents.each_line
  else
    lines = file_contents.lines
  end

  lines.each_with_index do |row, index|
    next if index == 0
    return row.split(delimiter).first
  end
end

#hash_zip(keys, values) ⇒ Fech::Mapped, Hash

Combines an array of keys and values into an Fech::Mapped object, a type of Hash.

Parameters:

  • keys (Array)

    the desired keys for the new hash

  • values (Array)

    the desired values for the new hash

Returns:



211
212
213
# File 'lib/fech/filing.rb', line 211

def hash_zip(keys, values)
  Fech::Mapped.new(self, values.first).merge(Hash[*keys.zip(values).flatten])
end

#header(opts = {}) ⇒ Hash

Access the header (first) line of the filing, containing information about the filing’s version and metadata about the software used to file it.

Returns:

  • (Hash)

    a hash that assigns labels to the values of the filing’s header row



51
52
53
54
55
# File 'lib/fech/filing.rb', line 51

def header(opts={})
  each_row do |row|
    return parse_row?(row)
  end
end

#map(row, opts = {}) ⇒ Object

Maps a raw row to a labeled hash following any rules given in the filing’s Translator based on its version and row type. Finds the correct map for a given row, performs any matching Translations on the individual values, and returns either the entire dataset, or just those fields requested.

Parameters:

  • row (String, Regexp)

    a partial or complete name of the type of row desired

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :include (Array)

    list of field names that should be included in the returned hash



119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# File 'lib/fech/filing.rb', line 119

def map(row, opts={})
  data = Fech::Mapped.new(self, row.first)
  full_row_map = map_for(row.first)
  
  # If specific fields were asked for, return only those
  if opts[:include]
    row_map = full_row_map.select { |k| opts[:include].include?(k) }
  else
    row_map = full_row_map
  end
  
  # Inserts the row into data, performing any specified preprocessing
  # on individual cells along the way
  row_map.each_with_index do |field, index|
    value = row[full_row_map.index(field)]
    if translator
      translator.get_translations(:row => row.first,
          :version => filing_version, :action => :convert,
          :field => field).each do |translation|
        # User's Procs should be given each field's value as context
        value = translation[:proc].call(value)
      end
    end
    data[field] = value
  end
  
  # Performs any specified group preprocessing / combinations
  if translator
    combinations = translator.get_translations(:row => row.first,
          :version => filing_version, :action => :combine)
    row_hash = hash_zip(row_map, row) if combinations
    combinations.each do |translation|
      # User's Procs should be given the entire row as context
      value = translation[:proc].call(row_hash)
      field = translation[:field].source.gsub(/[\^\$]*/, "").to_sym
      data[field] = value
    end
  end
  data
end

#map_for(row_type) ⇒ Object

Returns the column names for given row type and the filing’s version in the order they appear in row data.

Parameters:

  • row_type (String, Regexp)

    representation of the row desired



163
164
165
# File 'lib/fech/filing.rb', line 163

def map_for(row_type)
  mappings.for_row(row_type)
end

#mappingsObject

Gets or creats the Mappings instance for this filing_version



238
239
240
# File 'lib/fech/filing.rb', line 238

def mappings
  @mapping ||= Fech::Mappings.new(filing_version)
end

#parse_filing_versionObject

Pulls out the version number from the header line. Must parse this line manually, since we don’t know the version yet, and thus the delimiter type is still a mystery.



223
224
225
226
227
228
229
230
# File 'lib/fech/filing.rb', line 223

def parse_filing_version
  first = File.open(file_path).first
  if first.index("\034").nil?
    @csv_parser.parse(first).flatten[2]
  else
    @csv_parser.parse(first, :col_sep => "\034").flatten[2]
  end
end

#parse_row?(row, opts = {}) ⇒ Boolean

Decides what to do with a given row. If the row’s type matches the desired type, or if no type was specified, it will run the row through #map. If :raw was passed true, a flat, unmapped data array will be returned.

Parameters:

  • row (String, Regexp)

    a partial or complete name of the type of row desired

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :include (Array)

    list of field names that should be included in the returned hash

Returns:

  • (Boolean)


99
100
101
102
103
104
105
106
107
108
109
# File 'lib/fech/filing.rb', line 99

def parse_row?(row, opts={})
  return false if row.nil? || row.empty?

  # Always parse, unless :parse_if is given and does not match row
  if opts[:parse_if].nil? || \
      Fech.regexify(opts[:parse_if]).match(row.first.downcase)
    opts[:raw] ? row : map(row, opts)
  else
    false
  end
end

#readable?Boolean

Only FEC format 3.00 + is supported

Returns:

  • (Boolean)


233
234
235
# File 'lib/fech/filing.rb', line 233

def readable?
  filing_version.to_i >= 3
end

#resave_f99_contentsObject

Resave the “fixed” version of an F99



303
304
305
306
307
# File 'lib/fech/filing.rb', line 303

def resave_f99_contents
  return true if @resaved
  File.open(custom_file_path, 'w') { |f| f.write(fix_f99_contents) }
  @resaved = true
end

#rows_like(row_type, opts = {}) {|Hash| ... } ⇒ Array

Access all lines of the filing that match a given row type. Will return an Array of all available lines if called directly, or will yield the mapped rows one by one if a block is passed.

Parameters:

  • row_type (String, Regexp)

    a partial or complete name of the type of row desired

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :raw (Boolean)

    should the function return the data as an array that has not been mapped to column names

  • :include (Array)

    list of field names that should be included in the returned hash

Yields:

  • (Hash)

    each matched row’s data, as either a mapped hash or raw array

Returns:

  • (Array)

    the complete set of mapped hashes for matched lines



78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/fech/filing.rb', line 78

def rows_like(row_type, opts={}, &block)
  data = []
  each_row(:row_type => row_type) do |row|
    value = parse_row?(row, opts.merge(:parse_if => row_type))
    next if value == false
    if block_given?
      yield value
    else
      data << value if value
    end
  end
  block_given? ? nil : data
end

#summaryHash

Access the summary (second) line of the filing, containing aggregate and top-level information about the filing.

Returns:

  • (Hash)

    a hash that assigns labels to the values of the filing’s summary row



60
61
62
63
64
65
# File 'lib/fech/filing.rb', line 60

def summary
  each_row_with_index do |row, index|
    next if index == 0
    return parse_row?(row)
  end
end

#translate {|t| ... } ⇒ Object

Yields:

  • (t)

    returns a reference to the filing’s Translator

Yield Parameters:



186
187
188
189
190
191
192
# File 'lib/fech/filing.rb', line 186

def translate(&block)
  if block_given?
    yield translator
  else
    translator
  end
end

#translatorObject

Accessor for @translator. Will return the Translator initialized in Filing’s initializer if built-in translations were passed to Filing’s initializer (=> [:foo, :bar]). Otherwise, will create and memoize a new Translator without any default translations.



180
181
182
# File 'lib/fech/filing.rb', line 180

def translator
  @translator ||= Fech::Translator.new
end