Fight CSV!
It's 2011, and parsing CSV with Ruby still sucks? Enter FightCSV! It will take the cumbersome out of your CSV parsing, while keeping the awesome! Want some taste of that juicy fresh? Check out this example:
Consider you have a csv file called log_entries.csv which looks like this:
Date,Person,Client/Project,Minutes,Tags,Billable
2011-08-15,John Doe,handsomelabs,60,blogpost,no
2011-08-15,Max Powers,beerbrewing,60,meeting,yes
2011-08-15,Tyler Durden,babysitting,180,"concepting, research",yes
2011-08-15,Hulk Hero,gardening,60,"meeting, research",no
2011-08-15,John Doe,handsomelabs,60,coding,yes
2011-08-08,John Doe,handsomelabs,60,"blabla, meeting",yes
Schema
Now you can define a class representing a row of the file. You only need
to include FightCSV::Record
.
class LogEntry
include FightCSV::Record
end
But of course you want the values from each row to behave like proper
Ruby objects. This can be easily achieved by defining a schema in the
LogEntry
class:
class LogEntry
include FightCSV::Record
schema do
field "Name"
field "Client/Project", {
identifier: :project
}
end
end
Now the LogEntry objects will have a name
method corresponding to
the column called "Name" and a project
method corresponding to the
column called "Client/Project".
But sometimes you don't only want to adjust the field names, but also the values. In this case FightCSV offers converters. The "Billable" column seems to represent boolean values, so let's tackle that:
class LogEntry
include FightCSV::Record
schema do
field "Name"
field "Client/Project", {
identifier: :project
}
field "Billable", {
converter: ->(string) { string == "yes" ? true : false }
}
end
end
Often when converting something, we assume that it has a certain format.
The "Date" column for example should always be of the format
/\d{2}\.\d{2}\.\d{4}/
. A validation can easily be added to a column
with FightCSV:
class LogEntry
include FightCSV::Record
schema do
field "Name"
field "Client/Project", {
identifier: :project
}
field "Billable", {
converter: ->(string) { string == "yes" ? true : false }
}
field "Date", {
validate: /\d{2}\.\d{2}\.\d{4}/,
converter: ->(string) { Date.parse(string) }
}
end
end
The complete schema:
class LogEntry
include FightCSV::Record
schema do
field "Name"
field "Client/Project", {
identifier: :project
}
field "Billable", {
converter: ->(string) { string == "yes" ? true : false }
}
field "Date", {
validate: /\d{2}\.\d{2}\.\d{4}/,
converter: ->(string) { Date.parse(string) }
}
field "Tags", {
converter: ->(string) { string.split(",") }
}
field "Minutes", {
validate: /\d+/,
converter: ->(string) { string.to_i }
}
end
end
Parsing CSV
With the schema definition you're finally able to parse some CSV. There are two possible ways of doing this:
LogEntry.records
will return an array with all rows mapped to instances ofLogEntry
.LogEntry.import
will return an enumerator which will pass the sameLogEntry
instance with the row changed for every iteration.LogEntry.import(csv).map(&:minutes).reduce(:+) #=> 780
Doing so you can avoid memory leaks on big csv documents.
Contributing to fight_csv
- Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet
- Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it
- Fork the project
- Commit and push until you are happy with your contribution
- Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
- Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
Copyright
Copyright (c) 2011 Manuel Korfmann. See LICENSE.txt for further details.