16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
|
# File 'lib/flat_kit/command/stats.rb', line 16
def self.parser
::Optimist::Parser.new do
banner Sort.description.to_s
banner ""
banner <<~BANNER
Given an input file collect basic statistics.
The statistics can vary based upon the datatype of the field.
Numeric fields will report the basic count, min, max, mean, standard deviation and sum.
Non-numeric fields that are comparable, like dates, will report count, min and max.
Other non-numeric fields will only report the count.
Adding --cardinality will report the count, and frequency of distinct values in the result.
This will allow for reporting the median value.
The fields upon which stats are collected may be selected with the --fields parameter.
By default statistics are collected on all fields.
The flatfile type(s) will be automatically determined by the file name.
The output can be dumped as a CSV, JSON or a a formated ascii table.
BANNER
banner <<~USAGE
Usage:
fk stats --everything file.json
fk stats --select surname,given_name file.csv
fk stats --select surname,given_name --output-format json file.csv > stats.json
fk stats --select field1,field2 --output-format json input.csv
fk stats --select field1 file.json.gz -o stats.csv
gunzip -c file.json.gz | fk stats --input-format json --output-format text
USAGE
banner <<~OPTIONS
Options:
OPTIONS
opt :output, "Send the output to the given path instead of standard out.", default: "<stdout>"
opt :input_format, "Input format, csv or json", default: "auto", short: :none
opt :output_format, "Output format, csv or json", default: "auto", short: :none
opt :select, "The comma separted list of field(s) to report stats on", required: false, type: :string
opt :everything, "Show all statistics that are possible", default: false
opt :cardinality, "Show the cardinality of the fields, this requires additional memory", default: false
end
end
|