csvutils - tools 'n' scripts for working with comma-separated values (csv) datafiles - the world's most popular tabular data interchange format in text
- home :: github.com/csvreader/csvutils
- bugs :: github.com/csvreader/csvutils/issues
- gem :: rubygems.org/gems/csvutils
- rdoc :: rubydoc.info/gems/csvutils
- forum :: wwwmake
Usage
Command Line Tools
csvhead
• csvheader
• csvstat
• csvsplit
• csvcut
Try the help option -h/--help
with the command line tools. Example:
$ csvhead -h # or
$ csvhead --help
resulting in:
Usage: csvhead [OPTS] datafile ...
-n, --num=NUM Number of rows
-h, --help Prints this help
Now try it with csvheader -h
, csvstat -h
, csvsplit -h
,
csvcut -h
and so on.
Working with Comma-Separated Values (CSV) Datafile Examples
Let's use a sample datafile e.g. ENG.csv
from the
football.csv project with
matches from the English Premier League. Try
$ csvhead ENG.csv
to pretty print (pp) the first four rows (use the -n/--num
option for more or less rows).
Resulting in:
== ENG.csv ==
<Date:11/08/17, Team1:Arsenal, Team2:Leicester, FT1:4, FT2:3>
<Date:12/08/17, Team1:Brighton, Team2:Man City, FT1:0, FT2:2>
<Date:12/08/17, Team1:Chelsea, Team2:Burnley, FT1:2, FT2:3>
<Date:12/08/17, Team1:Crystal Palace, Team2:Huddersfield, FT1:0, FT2:3>
4 rows
Next try
$ csvheader ENG.csv
to print all header columns (the first row). Resulting in:
== ENG.csv ==
5 columns:
1: Date
2: Team1
3: Team2
4: FT1
5: FT2
Next try:
$ csvstat -c Team1,Team2 ENG.csv
to show all unique values for the columns Team1
and Team2
.
Resulting in:
== ENG.csv ==
... 380 rows
5 columns:
1: Date
2: Team1
3: Team2
4: FT1
5: FT2
column "Team1" - 20 unique values:
19 x Arsenal
19 x Bournemouth
19 x Brighton
19 x Burnley
19 x Chelsea
19 x Crystal Palace
19 x Everton
19 x Huddersfield
19 x Leicester
19 x Liverpool
19 x Man City
19 x Man United
19 x Newcastle
19 x Southampton
19 x Stoke
19 x Swansea
19 x Tottenham
19 x Watford
19 x West Brom
19 x West Ham
column "Team2" - 20 unique values:
19 x Arsenal
19 x Bournemouth
19 x Brighton
19 x Burnley
19 x Chelsea
19 x Crystal Palace
19 x Everton
19 x Huddersfield
19 x Leicester
19 x Liverpool
19 x Man City
19 x Man United
19 x Newcastle
19 x Southampton
19 x Stoke
19 x Swansea
19 x Tottenham
19 x Watford
19 x West Brom
19 x West Ham
Split & Cut - Split One Datafile into Many or Cut / Reorder Columns
Let's use another sample datafile e.g. AUT.csv
that holds many seasons
from the Austrian (AUT) Bundesliga. First lets see how many seasons:
$ csvstat -c Season AUT.csv
Resulting in:
== AUT.csv ==
... 360 rows
6 columns:
1: Season
2: Date
3: Team1
4: Team2
5: FT1
6: FT2
column "Season" - 2 unique values:
180 x 2016/2017
180 x 2017/2018
Now let's split the AUT.csv
datafile by the Season
column
resulting in two new datafiles named AUT_2016-2017.csv
and ÀUT_2017-2018.csv
. Try:
$ csvsplit -c Season AUT.csv
Resulting in:
new chunk: ["2016/2017"] - saving "AUT_2016-2017.csv"...
new chunk: ["2017/2018"] - saving "AUT_2017-2018.csv"...
Let's cut out (remove) the Season
column from the new AUT_2016-2017.csv
datafile. Try:
$ csvcut -c Date,Team1,Team2,FT1,FT2 AUT_2016-2017.csv
Double check the overwritten in-place cleaned-up datafile:
$ csvhead AUT_2016-2017.csv
resulting in:
== AUT_2016-2017.csv ==
<Date:23/07/16, Team1:Rapid Vienna, Team2:Ried, FT1:5, FT2:0>
<Date:23/07/16, Team1:Altach, Team2:AC Wolfsberger, FT1:1, FT2:0>
<Date:23/07/16, Team1:Sturm Graz, Team2:Salzburg, FT1:3, FT2:1>
<Date:24/07/16, Team1:St. Pölten, Team2:Austria Vienna, FT1:1, FT2:2>
4 rows
And so on and so forth.
Code, Code, Code - Script Your Data Work Flow with Ruby
You can use all tools in your script using the CsvUtils
class methods:
Shell | Ruby |
---|---|
csvhead |
CsvUtils.head( path, n: 4 ) |
csvheader |
CsvUtils.header( path ) |
csvstat |
CsvUtils.stat( path, *columns ) |
csvsplit |
CsvUtils.split( path, *columns ) |
csvcut |
CsvUtils.cut( path, *columns, output: path) |
Let's retry the sample above in a script:
require 'csvutils'
CsvUtils.head( 'ENG.csv' )
# same as:
# $ csvhead ENG.csv
CsvUtils.header( 'ENG.csv' )
# same as:
# $ csvheader ENG.csv
CsvUtils.stat( 'ENG.csv', 'Team1', 'Team2' )
# same as:
# $ csvstat -c Team1,Team2 ENG.csv
CsvUtils.stat( 'AUT.csv', 'Season' )
# same as:
# $ csvstat -c Season AUT.csv
CsvUtils.split( 'AUT.csv', 'Season' )
# same as:
# $ csvsplit -c Season AUT.csv
CsvUtils.cut( 'AUT_2016-2017.csv', 'Date', 'Team1', 'Team2', 'FT1', 'FT2' )
# same as:
# $ csvcut -c Date,Team1,Team2,FT1,FT2 AUT_2016-2017.csv
That's it. See the /getting-started-samples
folder to
run the samples on your own computer.
Install
Just install the gem:
$ gem install csvutils
Alternatives
See the Libraries & Tools section in the Awesome CSV page.
License
The csvutils
scripts are dedicated to the public domain.
Use it as you please with no restrictions whatsoever.
Questions? Comments?
Send them along to the wwwmake forum. Thanks!