Class: Fech::Filing
- Inherits:
-
Object
- Object
- Fech::Filing
- Defined in:
- lib/fech/filing.rb
Overview
Fech::Filing downloads an Electronic Filing given its ID, and will search rows by row type. Using a child Translator object, the data in each row is automatically mapped at runtime into a labeled Hash. Additional Translations may be added to change the way that data is mapped and cleaned.
Constant Summary collapse
- FIRST_V3_FILING =
first filing number using the version >=3.00 format note that there are plenty of <v3 filings after this, so readable? still needs to be checked
11850
Instance Attribute Summary collapse
-
#download_dir ⇒ Object
Returns the value of attribute download_dir.
-
#filing_id ⇒ Object
Returns the value of attribute filing_id.
-
#translator ⇒ Object
Returns the value of attribute translator.
Class Method Summary collapse
-
.download_all(download_dir) ⇒ Object
This downloads ALL the filings.
-
.for_all(options = {}) ⇒ Object
Runs the passed block on every downloaded .fec file.
-
.map_for(row_type, opts = {}) ⇒ Object
Returns the column names for given row type and version in the order they appear in row data.
Instance Method Summary collapse
-
#amendment? ⇒ Boolean
Whether this filing amends a previous filing or not.
-
#amends ⇒ Object
Returns the filing ID of the past filing this one amends, nil if this is a first-draft filing.
-
#custom_file_path ⇒ Object
The file path where custom versions of a filing are to be saved.
-
#delimiter ⇒ String
The delimiter used in the filing’s version.
-
#download ⇒ Object
Saves the filing data from the FEC website into the default download directory.
-
#each_row(opts = {}) {|Array| ... } ⇒ Object
Iterates over and yields the Filing’s lines.
-
#each_row_with_index(&block) ⇒ Object
Wrapper around .each_row to include indexes.
-
#file_contents ⇒ Object
The raw contents of the Filing.
- #file_name ⇒ Object
-
#file_path ⇒ Object
The location of the Filing on the file system.
- #filing_url ⇒ Object
-
#filing_version ⇒ Object
The version of the FEC software used to generate this Filing.
-
#fix_f99_contents ⇒ Object
Handle the contents of F99s by removing the [BEGINTEXT] and [ENDTEXT] delimiters and putting the text content onto the same line as the summary.
-
#form_type ⇒ Object
Determine the form type of the filing before it’s been parsed.
-
#hash_zip(keys, values) ⇒ Fech::Mapped, Hash
Combines an array of keys and values into an Fech::Mapped object, a type of Hash.
-
#header(opts = {}) ⇒ Hash
Access the header (first) line of the filing, containing information about the filing’s version and metadata about the software used to file it.
-
#initialize(filing_id, opts = {}) ⇒ Filing
constructor
Create a new Filing object, assign the download directory to system’s temp folder by default.
-
#map(row, opts = {}) ⇒ Object
Maps a raw row to a labeled hash following any rules given in the filing’s Translator based on its version and row type.
-
#map_for(row_type) ⇒ Object
Returns the column names for given row type and the filing’s version in the order they appear in row data.
-
#mappings ⇒ Object
Gets or creats the Mappings instance for this filing_version.
-
#parse_filing_version ⇒ Object
Pulls out the version number from the header line.
-
#parse_row?(row, opts = {}) ⇒ Boolean
Decides what to do with a given row.
-
#readable? ⇒ Boolean
Only FEC format 3.00 + is supported.
-
#resave_f99_contents ⇒ Object
Resave the “fixed” version of an F99.
-
#rows_like(row_type, opts = {}) {|Hash| ... } ⇒ Array
Access all lines of the filing that match a given row type.
-
#summary ⇒ Hash
Access the summary (second) line of the filing, containing aggregate and top-level information about the filing.
- #translate {|t| ... } ⇒ Object
Constructor Details
#initialize(filing_id, opts = {}) ⇒ Filing
Create a new Filing object, assign the download directory to system’s temp folder by default.
22 23 24 25 26 27 28 29 30 |
# File 'lib/fech/filing.rb', line 22 def initialize(filing_id, opts={}) @filing_id = filing_id @download_dir = opts[:download_dir] || Dir.tmpdir @translator = Fech::Translator.new(:include => opts[:translate]) @quote_char = opts[:quote_char] || '"' @csv_parser = opts[:csv_parser] || Fech::Csv @resaved = false @customized = false end |
Instance Attribute Details
#download_dir ⇒ Object
Returns the value of attribute download_dir.
15 16 17 |
# File 'lib/fech/filing.rb', line 15 def download_dir @download_dir end |
#filing_id ⇒ Object
Returns the value of attribute filing_id.
15 16 17 |
# File 'lib/fech/filing.rb', line 15 def filing_id @filing_id end |
#translator ⇒ Object
Returns the value of attribute translator.
15 16 17 |
# File 'lib/fech/filing.rb', line 15 def translator @translator end |
Class Method Details
.download_all(download_dir) ⇒ Object
This downloads ALL the filings.
Because this trashes the zip files after extraction (to save space), while it is safe to rerun, it has to do the whole thing over again. Update operations should just iterate single file downloads starting from the current+1th filing number.
This takes a very long time to run - on the order of an hour or two, depending on your bandwidth.
WARNING: As of July 9, 2012, this downloads 536964 files (25.8 GB), into one directory. This means that the download directory will break bash file globbing (so e.g. ls and rm *.fec will not work). If you want to get all of it, make sure to download only to a dedicated FEC filings directory.
51 52 53 54 55 |
# File 'lib/fech/filing.rb', line 51 def self.download_all download_dir `cd #{download_dir} && ftp -a ftp.fec.gov:/FEC/electronic/*.zip` `cd #{download_dir} && for z in *.zip; do unzip -o $z && rm $z; done` Dir[File.join(download_dir, '*.fec')].count end |
.for_all(options = {}) ⇒ Object
Runs the passed block on every downloaded .fec file. Pass the same options hash as you would to Fech::Filing.new. E.g. for_all(:download_dir => Rails.root.join(‘db’, ‘data’, ‘fec’, ‘filings’, :csv_parser => Fech::CsvDoctor, …) {|filing| … } filing.download is of course unnecessary.
note that if there are a lot of files (e.g. after download_all), just listing them to prepare for this will take several seconds
Special option: :from => integer or :from => range will only process filing #s starting from / within the argument
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/fech/filing.rb', line 64 def self.for_all = {} [:download_dir] ||= Dir.tmpdir from = .delete :from raise ArgumentError, ":from must be Integer or Range" if from and !(from.is_a?(Integer) or from.is_a?(Range)) # .sort{|x| x.scan/\d+/.to_i } # should be no need to spend time on sort, since the file system should already do that Dir[File.join([:download_dir], '*.fec')].each do |file| n = file.scan(/(\d+)\.fec/)[0][0].to_i if from.is_a? Integer next unless n >= from elsif from.is_a? Range next unless n.in? from end yield Fech::Filing.new(n, ) end end |
Instance Method Details
#amendment? ⇒ Boolean
Whether this filing amends a previous filing or not.
213 214 215 |
# File 'lib/fech/filing.rb', line 213 def amendment? !amends.nil? end |
#amends ⇒ Object
Returns the filing ID of the past filing this one amends, nil if this is a first-draft filing. :report_id in the HDR line references the amended filing
220 221 222 |
# File 'lib/fech/filing.rb', line 220 def amends header[:report_id] end |
#custom_file_path ⇒ Object
The file path where custom versions of a filing are to be saved.
282 283 284 |
# File 'lib/fech/filing.rb', line 282 def custom_file_path File.join(download_dir, "fech_#{file_name}") end |
#delimiter ⇒ String
Returns the delimiter used in the filing’s version.
346 347 348 |
# File 'lib/fech/filing.rb', line 346 def delimiter filing_version.to_f < 6 ? "," : "\034" end |
#download ⇒ Object
Saves the filing data from the FEC website into the default download directory.
34 35 36 37 38 39 |
# File 'lib/fech/filing.rb', line 34 def download File.open(file_path, 'w') do |file| file << open(filing_url).read end self end |
#each_row(opts = {}) {|Array| ... } ⇒ Object
Iterates over and yields the Filing’s lines
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 |
# File 'lib/fech/filing.rb', line 321 def each_row(opts={}, &block) unless File.exists?(file_path) raise "File #{file_path} does not exist. Try invoking the .download method on this Filing object." end # If this is an F99, we need to parse it differently. resave_f99_contents if ['F99', '"F99"'].include? form_type c = 0 @csv_parser.parse_row(@customized ? custom_file_path : file_path, :col_sep => delimiter, :quote_char => @quote_char, :skip_blanks => true) do |row| if opts[:with_index] yield [row, c] c += 1 else yield row end end end |
#each_row_with_index(&block) ⇒ Object
Wrapper around .each_row to include indexes
341 342 343 |
# File 'lib/fech/filing.rb', line 341 def each_row_with_index(&block) each_row(:with_index => true, &block) end |
#file_contents ⇒ Object
The raw contents of the Filing
266 267 268 |
# File 'lib/fech/filing.rb', line 266 def file_contents File.open(file_path, 'r') end |
#file_name ⇒ Object
310 311 312 |
# File 'lib/fech/filing.rb', line 310 def file_name "#{filing_id}.fec" end |
#file_path ⇒ Object
The location of the Filing on the file system
261 262 263 |
# File 'lib/fech/filing.rb', line 261 def file_path File.join(download_dir, file_name) end |
#filing_url ⇒ Object
314 315 316 |
# File 'lib/fech/filing.rb', line 314 def filing_url "http://query.nictusa.com/dcdev/posted/#{filing_id}.fec" end |
#filing_version ⇒ Object
The version of the FEC software used to generate this Filing
234 235 236 |
# File 'lib/fech/filing.rb', line 234 def filing_version @filing_version ||= parse_filing_version end |
#fix_f99_contents ⇒ Object
Handle the contents of F99s by removing the
- BEGINTEXT
-
and [ENDTEXT] delimiters and
putting the text content onto the same line as the summary.
290 291 292 293 294 295 296 297 298 299 300 301 |
# File 'lib/fech/filing.rb', line 290 def fix_f99_contents @customized = true content = file_contents.read regex = /\n\[BEGINTEXT\]\n(.*?)\[ENDTEXT\]\n/mi # some use eg [EndText] match = content.match(regex) if match repl = match[1].gsub(/"/, '""') content.gsub(regex, "#{delimiter}\"#{repl}\"") else content end end |
#form_type ⇒ Object
Determine the form type of the filing before it’s been parsed. This is needed for the F99 special case.
273 274 275 276 277 278 |
# File 'lib/fech/filing.rb', line 273 def form_type file_contents.lines.each_with_index do |row, index| next if index == 0 return row.split(delimiter).first end end |
#hash_zip(keys, values) ⇒ Fech::Mapped, Hash
Combines an array of keys and values into an Fech::Mapped object, a type of Hash.
229 230 231 |
# File 'lib/fech/filing.rb', line 229 def hash_zip(keys, values) Fech::Mapped.new(self, values.first).merge(Hash[*keys.zip(values).flatten]) end |
#header(opts = {}) ⇒ Hash
Access the header (first) line of the filing, containing information about the filing’s version and metadata about the software used to file it.
83 84 85 86 87 |
# File 'lib/fech/filing.rb', line 83 def header(opts={}) each_row do |row| return parse_row?(row) end end |
#map(row, opts = {}) ⇒ Object
Maps a raw row to a labeled hash following any rules given in the filing’s Translator based on its version and row type. Finds the correct map for a given row, performs any matching Translations on the individual values, and returns either the entire dataset, or just those fields requested.
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
# File 'lib/fech/filing.rb', line 149 def map(row, opts={}) data = Fech::Mapped.new(self, row.first) full_row_map = map_for(row.first) # If specific fields were asked for, return only those if opts[:include] row_map = full_row_map.select { |k| opts[:include].include?(k) } else row_map = full_row_map end # Inserts the row into data, performing any specified preprocessing # on individual cells along the way row_map.each_with_index do |field, index| value = row[full_row_map.index(field)] translator.get_translations(:row => row.first, :version => filing_version, :action => :convert, :field => field).each do |translation| # User's Procs should be given each field's value as context value = translation[:proc].call(value) end data[field] = value end # Performs any specified group preprocessing / combinations combinations = translator.get_translations(:row => row.first, :version => filing_version, :action => :combine) row_hash = hash_zip(row_map, row) if combinations combinations.each do |translation| # User's Procs should be given the entire row as context value = translation[:proc].call(row_hash) field = translation[:field].source.gsub(/[\^\$]*/, "").to_sym data[field] = value end data end |
#map_for(row_type) ⇒ Object
Returns the column names for given row type and the filing’s version in the order they appear in row data.
190 191 192 |
# File 'lib/fech/filing.rb', line 190 def map_for(row_type) mappings.for_row(row_type) end |
#mappings ⇒ Object
Gets or creats the Mappings instance for this filing_version
256 257 258 |
# File 'lib/fech/filing.rb', line 256 def mappings @mapping ||= Fech::Mappings.new(filing_version) end |
#parse_filing_version ⇒ Object
Pulls out the version number from the header line. Must parse this line manually, since we don’t know the version yet, and thus the delimiter type is still a mystery.
241 242 243 244 245 246 247 248 |
# File 'lib/fech/filing.rb', line 241 def parse_filing_version first = File.open(file_path).first if first.index("\034").nil? @csv_parser.parse(first).flatten[2] else @csv_parser.parse(first, :col_sep => "\034").flatten[2] end end |
#parse_row?(row, opts = {}) ⇒ Boolean
Decides what to do with a given row. If the row’s type matches the desired type, or if no type was specified, it will run the row through #map. If :raw was passed true, a flat, unmapped data array will be returned.
131 132 133 134 135 136 137 138 139 |
# File 'lib/fech/filing.rb', line 131 def parse_row?(row, opts={}) # Always parse, unless :parse_if is given and does not match row if opts[:parse_if].nil? || \ Fech.regexify(opts[:parse_if]).match(row.first.downcase) opts[:raw] ? row : map(row, opts) else false end end |
#readable? ⇒ Boolean
Only FEC format 3.00 + is supported
251 252 253 |
# File 'lib/fech/filing.rb', line 251 def readable? filing_version.to_i >= 3 end |
#resave_f99_contents ⇒ Object
Resave the “fixed” version of an F99
304 305 306 307 308 |
# File 'lib/fech/filing.rb', line 304 def resave_f99_contents return true if @resaved File.open(custom_file_path, 'w') { |f| f.write(fix_f99_contents) } @resaved = true end |
#rows_like(row_type, opts = {}) {|Hash| ... } ⇒ Array
Access all lines of the filing that match a given row type. Will return an Array of all available lines if called directly, or will yield the mapped rows one by one if a block is passed.
110 111 112 113 114 115 116 117 118 119 120 121 122 |
# File 'lib/fech/filing.rb', line 110 def rows_like(row_type, opts={}, &block) data = [] each_row do |row| value = parse_row?(row, opts.merge(:parse_if => row_type)) next if value == false if block_given? yield value else data << value if value end end block_given? ? nil : data end |
#summary ⇒ Hash
Access the summary (second) line of the filing, containing aggregate and top-level information about the filing.
92 93 94 95 96 97 |
# File 'lib/fech/filing.rb', line 92 def summary each_row_with_index do |row, index| next if index == 0 return parse_row?(row) end end |
#translate {|t| ... } ⇒ Object
204 205 206 207 208 209 210 |
# File 'lib/fech/filing.rb', line 204 def translate(&block) if block_given? yield translator else translator end end |