encoding: US-ASCII
csv.rb – CSV Reading and Writing
Created by James Edward Gray II on 2005-10-31.
Copyright 2005 James Edward Gray II. You can redistribute or modify this code
under the terms of Ruby's license.
See CSV for documentation.
Description
Welcome to the new and improved CSV.
This version of the CSV library began its life as FasterCSV. FasterCSV was intended as a replacement to Ruby’s then standard CSV library. It was designed to address concerns users of that library had and it had three primary goals:
-
Be significantly faster than CSV while remaining a pure Ruby library.
-
Use a smaller and easier to maintain code base. (FasterCSV eventually grew larger, was also but considerably richer in features. The parsing core remains quite small.)
-
Improve on the CSV interface.
Obviously, the last one is subjective. I did try to defer to the original interface whenever I didn’t have a compelling reason to change it though, so hopefully this won’t be too radically different.
We must have met our goals because FasterCSV was renamed to CSV and replaced the original library as of Ruby 1.9. If you are migrating code from 1.8 or earlier, you may have to change your code to comply with the new interface.
What’s Different From the Old CSV?
I’m sure I’ll miss something, but I’ll try to mention most of the major differences I am aware of, to help others quickly get up to speed:
CSV Parsing
-
This parser is m17n aware. See CSV for full details.
-
This library has a stricter parser and will throw MalformedCSVErrors on problematic data.
-
This library has a less liberal idea of a line ending than CSV. What you set as the
:row_sep
is law. It can auto-detect your line endings though. -
The old library returned empty lines as
[nil]
. This library calls them[]
. -
This library has a much faster parser.
Interface
-
CSV now uses Hash-style parameters to set options.
-
CSV no longer has generate_row() or parse_row().
-
The old CSV’s Reader and Writer classes have been dropped.
-
CSV::open() is now more like Ruby’s open().
-
CSV objects now support most standard IO methods.
-
CSV now has a new() method used to wrap objects like String and IO for reading and writing.
-
CSV::generate() is different from the old method.
-
CSV no longer supports partial reads. It works line-by-line.
-
CSV no longer allows the instance methods to override the separators for performance reasons. They must be set in the constructor.
If you use this library and find yourself missing any functionality I have trimmed, please let me know.
Documentation
See CSV for documentation.
What is CSV, really?
CSV maintains a pretty strict definition of CSV taken directly from the RFC. I relax the rules in only one place and that is to make using this library easier. CSV will parse all valid CSV.
What you don’t want to do is feed CSV invalid data. Because of the way the CSV format works, it’s common for a parser to need to read until the end of the file to be sure a field is invalid. This eats a lot of time and memory.
Luckily, when working with invalid CSV, Ruby’s built-in methods will almost always be superior in every way. For example, parsing non-quoted fields is as easy as:
data.split(",")
Questions and/or Comments
Feel free to email James Edward Gray II with any questions.