PodCSV another lazy solution for csv
podcsv.gem provides fast access to big CSV file such as several 10 thousand records. This introduces two classes: Parse-on-demand CSV and Array (PodCSV, PodArray) and is around 10 times faster than library default 'csv'. You can also randomly access to the elements (records).
This gem may be useful if you use only very small part (fg. 1 %) of records in big CSV.
1. Benchmark
$ bundle exec rake bm
# of records
csv: 40000
podcsv: 40000
Benchmark
read:
Rehearsal --------------------------------------------
csv 4.780000 0.090000 4.870000 ( 5.135495)
podcsv 0.270000 0.040000 0.310000 ( 0.319921)
----------------------------------- total: 5.180000sec
user system total real
csv 4.650000 0.070000 4.720000 ( 5.041622)
podcsv 0.240000 0.030000 0.270000 ( 0.272924)
access:
Rehearsal --------------------------------------------
csv 4.620000 0.060000 4.680000 ( 4.919373)
podcsv 0.400000 0.030000 0.430000 ( 0.460540)
----------------------------------- total: 5.110000sec
user system total real
csv 4.660000 0.100000 4.760000 ( 4.921194)
podcsv 0.360000 0.020000 0.380000 ( 0.399690)
1.1. Trick
This gem defines two classes: PodCSV and PodArray.
PodCSV does not parse any strings on reading CSV file and just returns an
array (PodArray).
When you access elements (records) of that array via [], each, etc.,
strings are parsed and changed into fields (PodArray cache mechanism).
2. Installation
Add this line to your application's Gemfile:
gem 'podcsv'
And then execute:
$ bundle
Or install it yourself as:
$ gem install podcsv
3. Usage
(1) Load CSV
Same as CSV.read.
ret = PodCSV.read( file [, opt_file] )
(2) Access records
Same as Array: [], each, first, last, etc.
ret = PodCSV.read( file [, opt_file] )
puts ret[-1]
(3) Custom Line Parser
ary = PodCSV.read( file, {},
lambda{|s| s.split(/"/) } )
4. Development
After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.
5. Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/mephistobooks/podcsv.
6. License
The gem is available as open source under the terms of the MIT License.