Class: Polars::DynamicGroupBy
- Inherits:
-
Object
- Object
- Polars::DynamicGroupBy
- Defined in:
- lib/polars/dynamic_group_by.rb
Overview
A dynamic grouper.
This has an .agg method which allows you to run all polars expressions in a
group by context.
Instance Method Summary collapse
-
#agg(*aggs, **named_aggs) ⇒ DataFrame
Compute aggregations for each group of a group by operation.
-
#having(*predicates) ⇒ DynamicGroupBy
Filter groups with a list of predicates after aggregation.
-
#map_groups(schema, &function) ⇒ DataFrame
Apply a custom/user-defined function (UDF) over the groups as a new DataFrame.
Instance Method Details
#agg(*aggs, **named_aggs) ⇒ DataFrame
Compute aggregations for each group of a group by operation.
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/polars/dynamic_group_by.rb', line 78 def agg(*aggs, **named_aggs) group_by = @df.lazy.group_by_dynamic( @time_column, every: @every, period: @period, offset: @offset, include_boundaries: @include_boundaries, closed: @closed, label: @label, group_by: @group_by, start_by: @start_by ) if @predicates&.any? group_by = group_by.having(@predicates) end group_by.agg(*aggs, **named_aggs).collect( optimizations: QueryOptFlags.none ) end |
#having(*predicates) ⇒ DynamicGroupBy
Filter groups with a list of predicates after aggregation.
Using this method is equivalent to adding the predicates to the aggregation and filtering afterwards.
This method can be chained and all conditions will be combined using &.
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/polars/dynamic_group_by.rb', line 51 def having(*predicates) DynamicGroupBy.new( @df, @time_column, @every, @period, @offset, @include_boundaries, @closed, @label, @group_by, @start_by, Utils._chain_predicates(@predicates, predicates) ) end |
#map_groups(schema, &function) ⇒ DataFrame
Apply a custom/user-defined function (UDF) over the groups as a new DataFrame.
Using this is considered an anti-pattern as it will be very slow because:
- it forces the engine to materialize the whole
DataFramesfor the groups. - it is not parallelized.
- it blocks optimizations as the passed python function is opaque to the optimizer.
The idiomatic way to apply custom functions over multiple columns is using:
Polars.struct([my_columns]).map_elements { |struct_series| ... }
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
# File 'lib/polars/dynamic_group_by.rb', line 120 def map_groups( schema, &function ) if @predicates&.any? msg = "cannot call `map_groups` when filtering groups with `having`" raise TypeError, msg end @df.lazy .group_by_dynamic( index_column: @time_column, every: @every, period: @period, offset: @offset, include_boundaries: @include_boundaries, closed: @closed, group_by: @group_by, start_by: @start_by ) .map_groups(schema, &function) .collect(optimizations: QueryOptFlags.none) end |