RedAmber
A simple dataframe library for Ruby (experimental)
Requirements
gem 'red-arrow', '>= 7.0.0'
gem 'red-parquet', '>= 7.0.0' # if you use IO from/to parquet
gem 'rover-df', '~> 0.3.0' # if you use IO from/to Rover::DataFrame
Installation
Add this line to your Gemfile:
gem 'red_amber'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install red_amber
RedAmber::DataFrame
Constructors and saving
[x]
new
from a columnar HashRedAmber::DataFrame.new(x: [1, 2, 3])
[x]
new
from a schema (by Hash) and rows (by Array)RedAmber::DataFrame.new({:x=>:uint8}, [[1], [2], [3]])
[x]
new
from an Arrow::TableRedAmber::DataFrame.new(Arrow::Table.new(x: [1, 2, 3]))
[x]
new
from a Rover::DataFrameRedAmber::DataFrame.new(Rover::DataFrame.new(x: [1, 2, 3]))
[ ]
load
(class method)- [x] from a [
.arrow
,.arrows
,.csv
,.csv.gz
,.tsv
] fileRedAmber::DataFrame.load("test/entity/with_header.csv")
- [x] from a string buffer
- [x] from a URI
RedAmber::DataFrame.load(URI("https://github.com/heronshoes/red_amber/blob/master/test/entity/with_header.csv"))
- [ ] from a parquet file
- [x] from a [
[ ]
save
(instance method)- [x] to a [
.arrow
,.arrows
,.csv
,.csv.gz
,.tsv
] file - [x] to a string buffer
- [x] to a URI
- [ ] to a parquet file
- [x] to a [
Properties
- [x]
table
Reader of Arrow::Table object inside.
- [x]
n_rows
,nrow
,size
,length
Returns num of rows (data size).
- [x]
n_columns
,ncol
,width
Returns num of columns (num of vectors).
- [x]
shape
Returns shape in an Array[n_rows, n_cols].
- [x]
column_names
,keys
Returns num of column names by an Array.
- [x]
types
Returns types of columns by an Array of Symbols.
- [x]
data_types
Returns types of columns by an Array of Arrow::DataType
.
- [x]
vectors
Returns an Array of Vectors.
- [x]
to_h
Returns column-oriented data in a Hash.
- [x]
to_a
,raw_records
Returns an array of row-oriented data without header. If you need a column-oriented full array, use .to_h.to_a
- [x]
schema
Returns column name and data type in a Hash.
[x]
==
[x]
empty?
Output
[x]
to_s
[ ] summary, describe
[x]
to_rover
Returns a Rover::DataFrame
.
- [x]
inspect(tally_level: 5, max_element: 5)
Shows some information about self.
hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
RedAmber::DataFrame.new(hash)
# =>
RedAmber::DataFrame : 3 observations(rows) of 3 variables(columns)
Variables : 2 numeric, 1 string
# key type level data_preview
1 :a uint8 3 [1, 2, 3]
2 :b string 3 [A, B, C]
3 :c double 3 [1.0, 2.0, 3.0]
- tally_level: max level to use tally mode
- max_element: max num of element to show values in each row
Selecting
[x] Select columns by
[]
as[key]
,[keys]
,[keys[index]]
- Key in a Symbol:
df[:symbol]
- Key in a String:
df["string"]
- Keys in an Array:
df[:symbol1
,"string"
,:symbol2
- Keys in indeces:
df[df.keys[0]
,df[df.keys[1,2]]
,df[df.keys[1..]]
- Keys in a Range:
A end-less Range can be used to represent keys.
ruby hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]} df = RedAmber::DataFrame.new(hash) df[:b..:c, "a"] # => RedAmber::DataFrame : 3 observations(rows) of 3 variables(columns) Variables : 2 numeric, 1 string # key type level data_preview 1 :b string 3 [A, B, C] 2 :c double 3 [1.0, 2.0, 3.0] 3 :a uint8 3 [1, 2, 3]
- Key in a Symbol:
[x] Select rows by
[]
as[index]
,[range]
,[array]
- Select a row by index:
df[0]
- Select rows by indeces in a Range:
df[1..2]
- Select rows by indeces in an Array:
df[1, 2]
- Mixed case:
df[2, 0..]
- Select a row by index:
[x] Select rows from top or bottom
head(n=5)
, tail(n=5)
, first(n=1)
, last(n=1)
- [ ] slice
Updating
[ ] Add a new column
[ ] Update a single element
[ ] Update multiple elements
[ ] Update all elements
[ ] Update elements matching a condition
[ ] Clamp
[ ] Delete columns
[ ] Rename a column
[ ] Sort rows
[ ] Clear data
Treat na data
[ ] Drop na (NaN, nil)
[ ] Replace na with value
[ ] Interpolate na with convolution array
Combining DataFrames
[ ] Add rows
[ ] Add columns
[ ] Inner join
[ ] Left join
Encoding
- [ ] One-hot encoding
Iteration (not impremented)
Filtering (not impremented)
RedAmber::Vector
Constructor
[x] Create from a column in a DataFrame
[x] New from an Array
Properties
[x]
to_s
[x]
values
,to_a
,entries
[x]
size
,length
,n_rows
,nrow
[x]
type
[x]
data_type
[ ]
each
[ ]
chunked?
[ ]
n_chunks
[ ]
each_chunk
[x]
tally
[ ]
n_nulls
Functions
Unary aggregations: vector.func => Scalar
Method | Boolean | Numeric | String | Remarks |
---|---|---|---|---|
[x] all |
[x] | |||
[x] any |
[x] | |||
[x] approximate_median |
[x] | |||
[x] count |
[x] | [x] | [x] | |
[x] count_distinct |
[x] | [x] | [x] | |
[x] count_uniq |
[x] | [x] | [x] | an alias of count_distinct |
[ ] index |
||||
[x] max |
[x] | [x] | [x] | |
[x] mean |
[x] | [x] | ||
[x] min |
[x] | [x] | [x] | |
[ ] min_max |
||||
[ ] mode |
||||
[x] product |
[x] | [x] | ||
[ ] quantile |
||||
[x] stddev |
[x] | |||
[x] sum |
[x] | [x] | ||
[ ] tdigest |
||||
[x] variance |
[x] |
Unary element-wise: vector.func => Vector
Method | Boolean | Numeric | String | Remarks |
---|---|---|---|---|
[x] [email protected] |
[x] | as -vector |
||
[x] negate |
[x] | [email protected] |
||
[x] abs |
[x] | |||
[ ] acos |
[ ] | |||
[ ] asin |
[ ] | |||
[x] atan |
[x] | |||
[ ] ceil |
[x] | |||
[x] cos |
[x] | |||
[ ] floor |
[x] | |||
[ ] ln |
[ ] | |||
[ ] log10 |
[ ] | |||
[ ] log1p |
[ ] | |||
[ ] log2 |
[ ] | |||
[x] sign |
[x] | |||
[x] sin |
[x] | |||
[x] tan |
[x] | |||
[ ] trunc |
[x] |
Binary element-wise: vector.func(vector) => Vector
Method | Boolean | Numeric | String | Remarks |
---|---|---|---|---|
[x] add |
[x] | + |
||
[x] atan2 |
[x] | |||
[x] and |
[x] | |||
[x] and_kleene |
[x] | |||
[x] and_not |
[x] | |||
[x] and_not_kleene |
[x] | |||
[x] bit_wise_and |
([x]) | & , integer only |
||
[ ] bit_wise_not |
([x]) | ! , integer only |
||
[x] bit_wise_or |
([x]) | ` | ||
[x] bit_wise_xor |
([x]) | ^ , integer only |
||
[x] divide |
[x] | / |
||
[x] equal |
[x] | [x] | [x] | == , alias eq |
[x] greater |
[x] | [x] | [x] | > , alias gt |
[x] greater_equal |
[x] | [x] | [x] | >= , alias ge |
[x] less |
[x] | [x] | [x] | < , alias lt |
[x] less_equal |
[x] | [x] | [x] | <= , alias le |
[ ] logb |
[ ] | |||
[ ] mod |
[ ] | |||
[x] multiply |
[x] | * |
||
[x] not_equal |
[x] | [x] | [x] | != , alias ne |
[x] or |
[x] | |||
[x] or_kleene |
[x] | |||
[x] power |
[x] | ** |
||
[x] subtract |
[x] | - |
||
[x] shift_left |
([x]) | << , integer only |
||
[x] shift_right |
([x]) | >> , integer only |
||
[x] xor |
[x] |
(Not impremented)
- [ ] invert, round, round_to_multiple
- [ ] sort, sort_index
- [ ] minmax, var, median, quantile
- [ ] argmin, argmax
Coerce (not impremented)
Updating (not impremented)
DSL in a block for faster calculation ?
Development
git clone https://github.com/heronshoes/red_amber.git
cd red_amber
bundle install
bundle exec rake test
License
The gem is available as open source under the terms of the MIT License.