strong_csv
Type checker for a CSV file inspired by strong_json.
Motivation
Some applications have a feature to receive a CSV file uploaded by a user, and in general, it needs to validate each cell of the CSV file.
How should applications validate them? Of course, it depends, but there would be common validation logic for CSV files. For example, some columns may have to be integers because of database requirements. It would be cumbersome to write such validations always.
strong_csv helps you to mitigate such a drudgery by letting you declare desired types beforehand.
Installation
Add this line to your application's Gemfile:
gem "strong_csv"
And then execute:
bundle
Or install it yourself as:
gem install strong_csv
Usage
The most important APIs of strong_csv are StrongCSV.new
and StrongCSV#parse
.
StrongCSV.new
lets you declare types for each CSV column with Ruby's block syntax.
Inside the block, you will mainly use let
and declare types for a column.
After defining types, you can parse CSV content with StrongCSV#parse
.
StrongCSV#parse
won't raise errors as possible and just store error messages in its rows.
The reason why it won't raise errors is CSV content may contain invalid rows,
but sometimes, it makes sense to ignore them and process something for valid rows.
If you want to stop all processes with invalid rows,
check whether all rows are valid before proceeding with computation.
Here is an example usage of this gem:
require "strong_csv"
strong_csv = StrongCSV.new do
let :stock, integer
let :tax_rate, float
let :name, string(within: 1..255)
let :description, string?(within: 1..1000)
let :active, boolean
let :started_at, time?(format: "%Y-%m-%dT%H:%M:%S")
let :price, integer, error_message: "This should be Integer"
# Literal declaration
let :status, 0..6
let :priority, 10, 20, 30, 40, 50
let :size, "S", "M", "L" do |value|
case value
when "S"
1
when "M"
2
when "L"
3
end
end
# Regular expressions
let :url, %r{\Ahttps://}
# Custom validation
#
# This example sees the database to fetch exactly stored `User` IDs,
# and it checks the `:user_id` cell really exists in the `users` table.
# `pick` would be useful to avoid N+1 problems.
pick :user_id, as: :user_ids do |ids|
User.where(id: ids).ids
end
let :user_id, integer(constraint: ->(i) { user_ids.include?(i) })
end
data = <<~CSV
stock,tax_rate,name,active,status,priority,size,url,price
12,0.8,special item,True,4,20,M,https://example.com,PRICE
CSV
strong_csv.parse(data, field_size_limit: 2048) do |row|
if row.valid?
row[:tax_rate] # => 0.8
row[:active] # => true
# do something with row
else
row.errors # => {:price=>["This should be Integer"], :user_id=>["`nil` can't be casted to Integer"]}
# do something with row.errors
end
end
You can also define types without CSV headers by specifying column numbers.
[!NOTE] The numbers must start from
0
(zero-based index).
StrongCSV.new do
let 0, integer
let 1, string
let 2, 1..10
end
This declaration expects a CSV has the contents like this:
123,abc,3
830,mno,10
Available types
Type | Description |
---|---|
integer and integer? |
The value must be casted to Integer . |
float and float? |
The value must be casted to Float . |
boolean and boolean? |
The value must be casted to Boolean (true or false ). |
string and string? |
The value must be casted to String . |
time and time? |
The value must be casted to Time . |
optional |
The value can be nil . If the value exists, it must satisfy the given type constraint. |
23 (Integer literal) |
The value must be casted to the specific Integer literal. |
15.12 (Float literal) |
The value must be casted to the specific Float literal. |
1..10 (Range literal) |
The value must be casted to the beginning of Range and be covered with it. |
"abc" (String literal) |
The value must be casted to the specific String literal. |
%r{\Ahttps://} (Regexp literal) |
The value must be casted to a String that matches the specified Regexp. |
, (Union type) |
The value must satisfy one of the subtypes. |
integer
and integer?
The value must be casted to Integer. integer?
allows the value to be nil
, so you can declare optional integer type
for columns. It also lets you allow values that satisfy the specified limitation through :constraint
.
Example
strong_csv = StrongCSV.new do
let :stock, integer
let :state, integer?
let :user_id, integer(constraint: ->(v) { user_ids.include?(v)})
pick :user_id, as: :user_ids do |values|
User.where(id: values).ids
end
end
result = strong_csv.parse(<<~CSV)
stock,state,user_id
12,0,1
20,,2
non-integer,1,4
CSV
result.map(&:valid?) # => [true, true, false]
result[0].slice(:stock, :state, :user_id) # => {:stock=>12, :state=>0, :user_id=>1}
result[1].slice(:stock, :state, :user_id) # => {:stock=>20, :state=>nil, :user_id=>2}
result[2].slice(:stock, :state, :user_id) # => {:stock=>"non-integer", :state=>1, :user_id=>"4"}
result[2].errors.slice(:stock, :user_id) # => {:stock=>["`\"non-integer\"` can't be casted to Integer"], :user_id=>["`\"4\"` does not satisfy the specified constraint"]}
float
and float?
The value must be casted to Float. float?
allows the value to be nil
, so you can declare optional float type for
columns. It also lets you allow values that satisfy the specified limitation through :constraint
.
Example
strong_csv = StrongCSV.new do
let :tax_rate, float
let :fail_rate, float?
end
result = strong_csv.parse(<<~CSV)
tax_rate,fail_rate
0.02,0.1
0.05,
,0.8
CSV
result.map(&:valid?) # => [true, true, false]
result[0].slice(:tax_rate, :fail_rate) # => {:tax_rate=>0.02, :fail_rate=>0.1}
result[1].slice(:tax_rate, :fail_rate) # => {:tax_rate=>0.05, :fail_rate=>nil}
result[2].slice(:tax_rate, :fail_rate) # => {:tax_rate=>nil, :fail_rate=>0.8} (`nil` is not allowed for `tax_rate`)
boolean
and boolean?
The value must be casted to Boolean (true
of false
).
"true"
, "True"
, and "TRUE"
are casted to true
,
while "false"
, "False"
, and "FALSE"
are casted to false
.
boolean?
allows the value to be nil
as an optional boolean
value.
Example
strong_csv = StrongCSV.new do
let :enabled, boolean
let :active, boolean?
end
result = strong_csv.parse(<<~CSV)
enabled,active
True,True
False,
,
CSV
result.map(&:valid?) # => [true, true, false]
result[0].slice(:enabled, :active) # => {:enabled=>true, :active=>true}
result[1].slice(:enabled, :active) # => {:enabled=>false, :active=>nil}
result[2].slice(:enabled, :active) # => {:enabled=>nil, :active=>nil} (`nil` is not allowed for `enabled`)
string
and string?
The value must be casted to String. string?
allows the value to be nil
as an optional string value.
They also support :within
in its arguments, and it limits the length of the string value within the specified Range
.
Example
strong_csv = StrongCSV.new do
let :name, string(within: 1..4)
let :description, string?
end
result = strong_csv.parse(<<~CSV)
name,description
JB,Hello
yykamei,
,🤷
CSV
result.map(&:valid?) # => [true, false, false]
result[0].slice(:name, :description) # => {:name=>"JB", :description=>"Hello"}
result[1].slice(:name, :description) # => {:name=>"yykamei", :description=>nil} ("yykamei" exceeds the `Range` specified with `:within`)
result[2].slice(:name, :description) # => {:name=>nil, :description=>"🤷"} (`nil` is not allowed for `name`)
time
and time?
The value must be casted to Time. time?
allows the value to be nil
as an optional time value.
They have the :format
argument, which is used as the format
of Time.strptime
;
it means you can ensure the value must satisfy the time format. The default value of :format
is "%Y-%m-%d"
.
Example
strong_csv = StrongCSV.new do
let :start_on, time
let :updated_at, time?(format: "%FT%T")
end
result = strong_csv.parse(<<~CSV)
start_on,updated_at
2022-04-01,2022-04-30T15:30:59
2022-05-03
05-03,2021-09-03T09:48:23
CSV
result.map(&:valid?) # => [true, true, false]
result[0].slice(:start_on, :updated_at) # => {:start_on=>2022-04-01 00:00:00 +0900, :updated_at=>2022-04-30 15:30:59 +0900}
result[1].slice(:start_on, :updated_at) # => {:start_on=>2022-05-03 00:00:00 +0900, :updated_at=>nil}
result[2].slice(:start_on, :updated_at) # => {:start_on=>"05-03", :updated_at=>2021-09-03 09:48:23 +0900} ("05-03" does not satisfy the default format `"%Y-%m-%d"`)
optional
While each type above has its optional type with ?
, literals cannot be suffixed with ?
.
However, there would be a case to have an optional literal type.
In this case, optional
might be useful and lets you declare such types.
Example
strong_csv = StrongCSV.new do
let :foo, optional(123)
let :bar, optional("test")
end
result = strong_csv.parse(<<~CSV)
foo,bar
123,test
,
124
CSV
result.map(&:valid?) # => [true, true, false]
result[0].slice(:foo, :bar) # => {:foo=>123, :bar=>"test"}
result[1].slice(:foo, :bar) # => {:foo=>nil, :bar=>nil}
result[2].slice(:foo, :bar) # => {:foo=>"124", :bar=>nil} (124 is not equal to 123)
Literal
You can declare literal value as types. The supported literals are Integer
, Float
, String
, and Range
.
Example
strong_csv = StrongCSV.new do
let 0, 123
let 1, "test"
let 2, 2.5
let 3, 1..10
let 4, /[a-z]+/
end
result = strong_csv.parse(<<~CSV)
123,test,2.5,9,abc
123,test,2.5,0,xyz
123,Hey,2.5,10,!
CSV
result.map(&:valid?) # => [true, false, false]
result[0].slice(0, 1, 2, 3, 4) # => {0=>123, 1=>"test", 2=>2.5, 3=>9, 4=>"abc"}
result[1].slice(0, 1, 2, 3, 4) # => {0=>123, 1=>"test", 2=>2.5, 3=>"0", 4=>"xyz"} (0 is out of 1..10)
result[2].slice(0, 1, 2, 3, 4) # => {0=>123, 1=>"Hey", 2=>2.5, 3=>10, 4=>"!"} ("Hey" is not equal to "test", and "!" does not match /[a-z]+/)
Union
There would be a case that it's alright if a value satisfies one of the types. Union types are useful for such a case.
Example
strong_csv = StrongCSV.new do
let :priority, 10, 20, 30
let :size, "S", "M", "L"
end
result = strong_csv.parse(<<~CSV)
priority,size
10,M
30,A
11,S
CSV
result.map(&:valid?) # => [true, false, false]
result[0].slice(:priority, :size) # => {:priority=>10, :size=>"M"}
result[1].slice(:priority, :size) # => {:priority=>30, :size=>"A"} ("A" is not one of "S", "M", and "L")
result[2].slice(:priority, :size) # => {:priority=>"11", :size=>"S"} (11 is not one of 10, 20, and 30)
Contributing
Bug reports and pull requests are welcome on the GitHub repository. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.