SPECIAL NOTE
Gem name: fixed_width-multibyte (as opposed to fixed_width)
Forked from https://github.com/timonk/fixed_width to provide multibyte support. Uses ActiveSupport::Multibyte::Chars instead of String#unpack. Tested in Ruby 1.8.7 and 1.9.2.
Per https://github.com/timonk/fixed_width/pull/1, this fork will not be reintegrated into the fixed_width because it adds the ActiveSupport dependency.
DESCRIPTION:
A simple, clean DSL for describing, writing, and parsing fixed-width text files.
FEATURES:
- Easy DSL syntax
- Can parse and format fixed width files
- Templated sections for reuse
SYNOPSIS:
Creating a definition (Quick 'n Dirty)
Hopefully this will cover 90% of use cases.
# Create a FixedWidth::Defintion to describe a file format
FixedWidth.define :simple do |d|
# This is a template section that can be reused in other sections
d.template :boundary do |t|
t.column :record_type, 4
t.column :company_id, 12
end
# Create a section named :header
d.header(:align => :left) do |header|
# The trap tells FixedWidth which lines should fall into this section
header.trap { |line| line[0,4] == 'HEAD' }
# Use the boundary template for the columns
header.template :boundary
end
d.body do |body|
body.trap { |line| line[0,4] =~ /[^(HEAD|FOOT)]/ }
body.column :id, 10, :parser => :to_i
body.column :first, 10, :align => :left, :group => :name
body.column :last, 10, :align => :left, :group => :name
body.spacer 3
body.column :city, 20 , :group => :address
body.column :state, 2 , :group => :address
body.column :country, 3, :group => :address
end
d. do ||
.trap { |line| line[0,4] == 'FOOT' }
.template :boundary
.column :record_count, 10, :parser => :to_i
end
end
This definition would output a parsed file something like this:
{
:body => [
{ :id => 12,
:name => { :first => "Ryan", :last => "Wood" },
:address => { :city => "Foo", :state => 'SC', :country => "USA" }
},
{ :id => 13,
:name => { :first => "Jo", :last => "Schmo" },
:address => { :city => "Bar", :state => "CA", :country => "USA" }
}
],
:header => [{ :record_type => 'HEAD', :company_id => 'ABC' }],
:footer => [{ :record_type => 'FOOT', :company_id => 'ABC', :record_count => 2 }]
}
Sections
Declaring a section
Sections can have any name, however duplicates are not allowed. (A DuplicateSectionNameError
will be thrown.) We use the standard method_missing
trick. So if you see any unusual behavior, that's probably the first spot to look.
FixedWidth.define :simple do |d|
d.a_section_name do |s|
...
end
d.another_section_name do |s|
...
end
end
Section options:
:singular
(defaultfalse
) indicates that the section will only have a single record, and that it should not be returned nested in an array.:optional
(defaultfalse
) indicates that the section is optional. (An otherwise-specified section will raise aRequiredSectionNotFoundError
if the trap block doesn't match the row after the last one of the previous section.)
Columns
Declaring a column
Columns can have any name, except for :spacer
which is reserved. Also, duplicate column names within groupings are not allowed, and a column cannot share the same name as a group. (A DuplicateColumnNameError
will be thrown for a duplicate column name within a grouping. A DuplicateGroupNameError
will be thrown if you try to declare a column with the same name as an existing group or vice versa.) Again, basic method_missing
trickery here, so be warned. You can declare columns either with the method_missing
thing or by calling Section#column
.
FixedWidth.define :simple do |d|
d.a_section_name do |s|
s.a_column_name 12
s.column :another_column_name, 14
end
end
Column Options:
:align
can be set to:left
or:right
, to indicate which side the values should be/are justified to. By default, all columns are aligned:right
.:group
can be set to aSymbol
indicating the name of the nested hash which the value should be parsed to when reading/the name of the nested hash the value should be extracted from when writing.:parser
and:formatter
options are symbols (to be proc-ified) or procs. By default, parsing and formatting assume that we're expecting/writing right-aligned strings, padded with spaces.:nil_blank
set to true will cause whitespace-only fields to be parsed to nil, regardless of:parser
.:padding
can be set to a single character that will be used to pad formatted values, when writing fixed-width files.:truncate
can be set to true to truncate any value that exceeds thelength
property of a column. If unset or set tofalse
, aFixedWidth::FormattedStringExceedsLengthError
exception will be thrown.
Writing out fixed-width records
Then either feed it a nested struct with data values to create the file in the defined format:
test_data = {
:body => [
{ :id => 12,
:name => { :first => "Ryan", :last => "Wood" },
:address => { :city => "Foo", :state => 'SC', :country => "USA" }
},
{ :id => 13,
:name => { :first => "Jo", :last => "Schmo" },
:address => { :city => "Bar", :state => "CA", :country => "USA" }
}
],
:header => [{ :record_type => 'HEAD', :company_id => 'ABC' }],
:footer => [{ :record_type => 'FOOT', :company_id => 'ABC', :record_count => 2 }]
}
# Generates the file as a string
puts FixedWidth.generate(:simple, test_data)
# Writes the file
FixedWidth.write(file_instance, :simple, test_data)
Or parse files already in that format into a nested hash:
parsed_data = FixedWidth.parse(file_instance, :test).inspect
INSTALL:
sudo gem install fixed_width