Polars Ruby
:fire: Blazingly fast DataFrames for Ruby, powered by Polars
Installation
Add this line to your application’s Gemfile:
gem "polars-df"
Getting Started
This library follows the Polars Python API.
Polars.read_csv("iris.csv")
.lazy
.filter(Polars.col("sepal_length") > 5)
.groupby("species")
.agg(Polars.all.sum)
.collect
You can follow Polars tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
Reference
Examples
Creating DataFrames
From a CSV
Polars.read_csv("file.csv")
# or lazily with
Polars.scan_csv("file.csv")
From Parquet
Polars.read_parquet("file.parquet")
From Active Record
Polars.read_sql(User.all)
# or
Polars.read_sql("SELECT * FROM users")
From a hash
Polars::DataFrame.new({
a: [1, 2, 3],
b: ["one", "two", "three"]
})
From an array of series
Polars::DataFrame.new([
Polars::Series.new("a", [1, 2, 3]),
Polars::Series.new("b", ["one", "two", "three"])
])
Attributes
Get number of rows
df.height
Get column names
df.columns
Check if a column exists
df.include?(name)
Selecting Data
Select a column
df["a"]
Select multiple columns
df[["a", "b"]]
Select first rows
df.head
Select last rows
df.tail
Filtering
Filter on a condition
df[Polars.col("a") == 2]
df[Polars.col("a") != 2]
df[Polars.col("a") > 2]
df[Polars.col("a") >= 2]
df[Polars.col("a") < 2]
df[Polars.col("a") <= 2]
And, or, and exclusive or
df[(Polars.col("a") > 1) & (Polars.col("b") == "two")] # and
df[(Polars.col("a") > 1) | (Polars.col("b") == "two")] # or
df[(Polars.col("a") > 1) ^ (Polars.col("b") == "two")] # xor
Operations
Basic operations
df["a"] + 5
df["a"] - 5
df["a"] * 5
df["a"] / 5
df["a"] % 5
df["a"] ** 2
df["a"].sqrt
df["a"].abs
Rounding
df["a"].round(2)
df["a"].ceil
df["a"].floor
Logarithm
df["a"].log # natural log
df["a"].log(10)
Exponentiation
df["a"].exp
Trigonometric functions
df["a"].sin
df["a"].cos
df["a"].tan
df["a"].asin
df["a"].acos
df["a"].atan
Hyperbolic functions
df["a"].sinh
df["a"].cosh
df["a"].tanh
df["a"].asinh
df["a"].acosh
df["a"].atanh
Summary statistics
df["a"].sum
df["a"].mean
df["a"].median
df["a"].quantile(0.90)
df["a"].min
df["a"].max
df["a"].std
df["a"].var
Grouping
Group
df.groupby("a").count
Works with all summary statistics
df.groupby("a").max
Multiple groups
df.groupby(["a", "b"]).count
Combining Data Frames
Add rows
df.vstack(other_df)
Add columns
df.hstack(other_df)
Inner join
df.join(other_df, on: "a")
Left join
df.join(other_df, on: "a", how: "left")
Encoding
One-hot encoding
df.to_dummies
Conversion
Array of rows
df.rows
Hash of series
df.to_h
CSV
df.to_csv
# or
df.write_csv("file.csv")
Parquet
df.write_parquet("file.parquet")
Types
You can specify column types when creating a data frame
Polars::DataFrame.new(data, columns: {"a" => Polars::Int32, "b" => Polars::Float32})
Supported types are:
- boolean -
Boolean
- float -
Float64
,Float32
- integer -
Int64
,Int32
,Int16
,Int8
- unsigned integer -
UInt64
,UInt32
,UInt16
,UInt8
- string -
Utf8
,Categorical
- temporal -
Date
,Datetime
,Time
,Duration
Get column types
df.schema
For a specific column
df["a"].dtype
Cast a column
df["a"].cast(Polars::Int32)
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/polars-ruby.git
cd polars-ruby
bundle install
bundle exec rake compile
bundle exec rake test