tabreader - read in tabular datafiles in text in the tabular (TAB) format
- home :: github.com/csv11/tabreader
- bugs :: github.com/csv11/tabreader/issues
- gem :: rubygems.org/gems/tabreader
- rdoc :: rubydoc.info/gems/tabreader
- forum :: wwwmake
Usage
Yes, it's as simple as:
line = "1\t2\t3"
values = line.split( "\t" )
pp values
# => ["1","2","3"]
or the "magic" packaged up in TabReader
:
line = "1\t2\t3"
values = Tab.parse_line( line ) ## or TAB.parse_line or TabReader.parse_line
pp values
# => ["1","2","3"]
or use the convenience helpers:
txt <<=TXT
1\t2\t3
4\t5\t6
TXT
records = Tab.parse( txt ) ## or TAB.parse or TabReader.parse
pp records
# => [["1","2","3"],
# ["4","5","6"]]
# -or-
records = Tab.read( "values.tab" ) ## or TAB.read or TabReader.read
pp records
# => [["1","2","3"],
# ["4","5","6"]]
# -or-
Tab.foreach( "values.tab" ) do |rec| ## or TAB.foreach or TabReader.foreach
pp rec
end
# => ["1","2","3"]
# => ["4","5","6"]
What about Enumerable?
Yes, every reader includes Enumerable
and runs on each
.
Use new
or open
without a block
to get the enumerator (iterator).
Example:
tab = Tab.new( "a\tb\tc" ) ## or TAB.new or TabReader.new
it = tab.to_enum
pp it.next
# => ["a","b","c"]
# -or-
tab = Tab.open( "values.tab" ) ## or TAB.open or TabReader.open
it = tab.to_enum
pp it.next
# => ["1","2","3"]
pp it.next
# => ["4","5","6"]
What about headers?
Use the TabHash
if the first line is a header (or if missing pass in the headers
as an array) and you want your records as hashes instead of arrays of values.
Example:
txt <<=TXT
A\tB\tC
1\t2\t3
4\t5\t6
TXT
records = TabHash.parse( txt ) ## or TabHashReader
pp records
# -or-
txt2 <<=TXT
1\t2\t3
4\t5\t6
TXT
records = TabHash.parse( txt2, headers: ["A","B","C"] )
pp records
# => [{"A": "1", "B": "2", "C": "3"},
# {"A": "4", "B": "5", "C": "6"}]
# -or-
records = TabHash.read( "hash.tab" )
pp records
# => [{"A": "1", "B": "2", "C": "3"},
# {"A": "4", "B": "5", "C": "6"}]
# -or-
TabHash.foreach( "hash.tab" ) do |rec|
pp rec
end
# => {"A": "1", "B": "2", "C": "3"}
# => {"A": "4", "B": "5", "C": "6"}
Frequently Asked Questions (FAQ) and Answers
Q: Why NOT use Csv.read( sep: "\t" )
?
TAB != CSV
The tabulator (TAB) format is an (even) simpler format than the comma-separated values (CSV) classic format. How?
The tab format has NO escape rules.
A double quote ("
) is a double quote ("
). Example:
"1"→"2"→"3"
4→5→6
vs
"1","2","3"
4,5,6
Turns into "1", "2", "3"
and 4, 5, 6
in tab and 1, 2, 3
and 4, 5, 6
in CSV.
Note: The surrounding double quotes get stripped in CSV.
You have to double up double quotes (e.g. ""
)
for adding "literal" double quotes in CSV:
"""1""","""2""","""3"""
4,5,"Hamlet says, ""Seems,"" madam! Nay it is; I know not ""seems."""
vs
"1"→"2"→"3"
4→5→Hamlet says, "Seems," madam! Nay it is; I know not "seems."
Thus, to avoid any surprises, do NOT use
Csv.read( sep: "\t" )
and friends for tab.
Note: Simpler also equals faster :-).
Q: What's the tabulator (TAB) format?
Let's reprint the (complete) tab spec(ification) right here (in an edited simpler version):
A tab file encodes a number of records that may contain multiple fields. Each record is represented as a single line. Each field value is represented as text. Fields in a record are separated from each other by a tab character.
Note that fields that contain tabs are not allowable in this encoding.
Here is a quick grammar in Backus-Naur Form (BNF):
field ::= [character]+ # multiple characters
record ::= field [TAB field]+ EOL # at least one field, or more
datafile ::= record+
Example:
Name→Age→Address
Paul→23→1115 W Franklin
Bessy the Cow→5→Big Farm Way
Zeke→45→W Main St
(Source: Tab Spec @ IANA Media Types)
Q: Why tab?
The tab format works great with
command line text wrangling / processing tools
because you can split lines / records on tab
(with no exceptions or extra escape rules).
Use classics such as cut
, paste
, sort
, uniq
, grep
, sed
, awk
and many more.
Q: Why NOT tab?
Let's quote from the Awesome CSV page:
In theory the tab (\t
) separator is perfect. Values never use tabs, don't they? So why hasn't the tab separator taken off?
In practice tab separators are invisible or look like spaces and often you cannot tell if a space is a tab or not.
Thus, tab works great only and only (like space) if your values do NOT use spaces and you treat a tab like a space.
(Source: Awesome CSV @ CSV v1.1)
Best of Both Worlds
Use the csv2tab
tool to convert comma-separated values (CSV) datafiles
to tab and use the tab2csv
tool to convert tab datafiles to
comma-separated values (CSV).
License
The tabreader
scripts are dedicated to the public domain.
Use it as you please with no restrictions whatsoever.
Questions? Comments?
Send them along to the wwwmake forum. Thanks!