Method: Polars::Expr#over

Defined in:
lib/polars/expr.rb

#over(partition_by = nil, *more_exprs, order_by: nil, descending: false, nulls_last: false, mapping_strategy: "group_to_rows") ⇒ Expr

Apply window function over a subgroup.

This is similar to a group by + aggregation + self join. Or similar to window functions in Postgres.

Examples:

df = Polars::DataFrame.new(
  {
    "groups" => ["g1", "g1", "g2"],
    "values" => [1, 2, 3]
  }
)
df.with_columns(
  Polars.col("values").max.over("groups").alias("max_by_group")
)
# =>
# shape: (3, 3)
# ┌────────┬────────┬──────────────┐
# │ groups ┆ values ┆ max_by_group │
# │ ---    ┆ ---    ┆ ---          │
# │ str    ┆ i64    ┆ i64          │
# ╞════════╪════════╪══════════════╡
# │ g1     ┆ 1      ┆ 2            │
# │ g1     ┆ 2      ┆ 2            │
# │ g2     ┆ 3      ┆ 3            │
# └────────┴────────┴──────────────┘
df = Polars::DataFrame.new(
  {
    "groups" => [1, 1, 2, 2, 1, 2, 3, 3, 1],
    "values" => [1, 2, 3, 4, 5, 6, 7, 8, 8]
  }
)
df.lazy
  .select([Polars.col("groups").sum.over("groups")])
  .collect
# =>
# shape: (9, 1)
# ┌────────┐
# │ groups │
# │ ---    │
# │ i64    │
# ╞════════╡
# │ 4      │
# │ 4      │
# │ 6      │
# │ 6      │
# │ 4      │
# │ 6      │
# │ 6      │
# │ 6      │
# │ 4      │
# └────────┘
df = Polars::DataFrame.new(
  {
    "store_id" => ["a", "a", "b", "b"],
    "date" => [Date.new(2024, 9, 18), Date.new(2024, 9, 17), Date.new(2024, 9, 18), Date.new(2024, 9, 16)],
    "sales" => [7, 9, 8, 10]
  }
)
df.with_columns(
  cumulative_sales: Polars.col("sales").cum_sum.over("store_id", order_by: "date")
)
# =>
# shape: (4, 4)
# ┌──────────┬────────────┬───────┬──────────────────┐
# │ store_id ┆ date       ┆ sales ┆ cumulative_sales │
# │ ---      ┆ ---        ┆ ---   ┆ ---              │
# │ str      ┆ date       ┆ i64   ┆ i64              │
# ╞══════════╪════════════╪═══════╪══════════════════╡
# │ a        ┆ 2024-09-18 ┆ 7     ┆ 16               │
# │ a        ┆ 2024-09-17 ┆ 9     ┆ 9                │
# │ b        ┆ 2024-09-18 ┆ 8     ┆ 18               │
# │ b        ┆ 2024-09-16 ┆ 10    ┆ 10               │
# └──────────┴────────────┴───────┴──────────────────┘

Parameters:

  • (defaults to: nil)

    Column(s) to group by. Accepts expression input. Strings are parsed as column names.

  • Additional columns to group by, specified as positional arguments.

  • (defaults to: nil)

    Order the window functions/aggregations with the partitioned groups by the result of the expression passed to order_by.

  • (defaults to: false)

    In case 'order_by' is given, indicate whether to order in ascending or descending order.

  • (defaults to: false)

    In case 'order_by' is given, indicate whether to order the nulls in last position.

  • (defaults to: "group_to_rows")
    • group_to_rows If the aggregation results in multiple values per group, map them back to their row position in the DataFrame. This can only be done if each group yields the same elements before aggregation as after. If the aggregation results in one scalar value per group, this value will be mapped to every row.
    • join If the aggregation may result in multiple values per group, join the values as 'List' to each row position. Warning: this can be memory intensive. If the aggregation always results in one scalar value per group, join this value as '' to each row position.
    • explode If the aggregation may result in multiple values per group, map each value to a new row, similar to the results of group_by + agg + explode. If the aggregation always results in one scalar value per group, map this value to one row position. Sorting of the given groups is required if the groups are not part of the window operation for the operation, otherwise the result would not make sense. This operation changes the number of rows.

Returns:



2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
# File 'lib/polars/expr.rb', line 2776

def over(partition_by = nil, *more_exprs, order_by: nil, descending: false, nulls_last: false, mapping_strategy: "group_to_rows")
  partition_by_rbexprs =
    if !partition_by.nil?
      Utils.parse_into_list_of_expressions(partition_by, *more_exprs)
    else
      nil
    end

  order_by_rbexprs = !order_by.nil? ? Utils.parse_into_list_of_expressions(order_by) : nil

  wrap_expr(_rbexpr.over(partition_by_rbexprs, order_by_rbexprs, descending, nulls_last, mapping_strategy))
end