Module: Polars::Selectors
- Defined in:
- lib/polars/selectors.rb
Class Method Summary collapse
-
.all ⇒ Selector
Select all columns.
-
.alpha(ascii_only: false, ignore_spaces: false) ⇒ Selector
Select all columns with alphabetic names (eg: only letters).
-
.alphanumeric(ascii_only: false, ignore_spaces: false) ⇒ Selector
Select all columns with alphanumeric names (eg: only letters and the digits 0-9).
-
.array(inner = nil, width: nil) ⇒ Selector
Select all array columns.
-
.binary ⇒ Selector
Select all binary columns.
-
.boolean ⇒ Selector
Select all boolean columns.
-
.by_dtype(*dtypes) ⇒ Selector
Select all columns matching the given dtypes.
-
.by_index(*indices, require_all: true) ⇒ Selector
Select all columns matching the given indices (or range objects).
-
.by_name(*names, require_all: true) ⇒ Selector
Select all columns matching the given names.
-
.categorical ⇒ Selector
Select all categorical columns.
-
.contains(*substring) ⇒ Selector
Select columns whose names contain the given literal substring(s).
-
.date ⇒ Selector
Select all date columns.
-
.datetime(time_unit = nil, time_zone: ["*", nil]) ⇒ Selector
Select all datetime columns, optionally filtering by time unit/zone.
-
.decimal ⇒ Selector
Select all decimal columns.
-
.digit(ascii_only: false) ⇒ Selector
Select all columns having names consisting only of digits.
-
.duration(time_unit = nil) ⇒ Selector
Select all duration columns, optionally filtering by time unit.
-
.empty ⇒ Selector
Select no columns.
-
.ends_with(*suffix) ⇒ Selector
Select columns that end with the given substring(s).
-
.enum ⇒ Selector
Select all enum columns.
-
.exclude(columns, *more_columns) ⇒ Selector
Select all columns except those matching the given columns, datatypes, or selectors.
-
.first(strict: true) ⇒ Selector
Select the first column in the current scope.
-
.float ⇒ Selector
Select all float columns.
-
.integer ⇒ Selector
Select all integer columns.
-
.last(strict: true) ⇒ Selector
Select the last column in the current scope.
-
.list(inner = nil) ⇒ Selector
Select all list columns.
-
.matches(pattern) ⇒ Selector
Select all columns that match the given regex pattern.
-
.nested ⇒ Selector
Select all nested columns.
-
.numeric ⇒ Selector
Select all numeric columns.
-
.object ⇒ Selector
Select all object columns.
-
.signed_integer ⇒ Selector
Select all signed integer columns.
-
.starts_with(*prefix) ⇒ Selector
Select columns that start with the given substring(s).
-
.string(include_categorical: false) ⇒ Selector
Select all String (and, optionally, Categorical) string columns.
-
.struct ⇒ Selector
Select all struct columns.
-
.temporal ⇒ Selector
Select all temporal columns.
-
.time ⇒ Selector
Select all time columns.
-
.unsigned_integer ⇒ Selector
Select all unsigned integer columns.
Class Method Details
.all ⇒ Selector
Select all columns.
74 75 76 |
# File 'lib/polars/selectors.rb', line 74 def self.all Selector._from_rbselector(RbSelector.all) end |
.alpha(ascii_only: false, ignore_spaces: false) ⇒ Selector
Matching column names cannot contain any non-alphabetic characters. Note
that the definition of "alphabetic" consists of all valid Unicode alphabetic
characters (\p{Alphabetic}) by default; this can be changed by setting
ascii_only: true.
Select all columns with alphabetic names (eg: only letters).
175 176 177 178 179 180 |
# File 'lib/polars/selectors.rb', line 175 def self.alpha(ascii_only: false, ignore_spaces: false) # note that we need to supply a pattern compatible with the *rust* regex crate re_alpha = ascii_only ? "a-zA-Z" : "\\p{Alphabetic}" re_space = ignore_spaces ? " " : "" Selector._from_rbselector(RbSelector.matches("^[#{re_alpha}#{re_space}]+$")) end |
.alphanumeric(ascii_only: false, ignore_spaces: false) ⇒ Selector
Matching column names cannot contain any non-alphabetic or integer characters.
Note that the definition of "alphabetic" consists of all valid Unicode alphabetic
characters (\p{Alphabetic}) and digit characters (\d) by default; this
can be changed by setting ascii_only: true.
Select all columns with alphanumeric names (eg: only letters and the digits 0-9).
262 263 264 265 266 267 268 269 270 |
# File 'lib/polars/selectors.rb', line 262 def self.alphanumeric(ascii_only: false, ignore_spaces: false) # note that we need to supply patterns compatible with the *rust* regex crate re_alpha = ascii_only ? "a-zA-Z" : "\\p{Alphabetic}" re_digit = ascii_only ? "0-9" : "\\d" re_space = ignore_spaces ? " " : "" return Selector._from_rbselector( RbSelector.matches("^[#{re_alpha}#{re_digit}#{re_space}]+$") ) end |
.array(inner = nil, width: nil) ⇒ Selector
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Select all array columns.
793 794 795 796 |
# File 'lib/polars/selectors.rb', line 793 def self.array(inner = nil, width: nil) inner_s = !inner.nil? ? inner._rbselector : nil Selector._from_rbselector(RbSelector.array(inner_s, width)) end |
.binary ⇒ Selector
Select all binary columns.
295 296 297 |
# File 'lib/polars/selectors.rb', line 295 def self.binary by_dtype([Binary]) end |
.boolean ⇒ Selector
Select all boolean columns.
347 348 349 |
# File 'lib/polars/selectors.rb', line 347 def self.boolean by_dtype([Boolean]) end |
.by_dtype(*dtypes) ⇒ Selector
Select all columns matching the given dtypes.
Group by string columns and sum the numeric columns: df.group_by(Polars.cs.string).agg(Polars.cs.numeric.sum).sort("other") # => # shape: (2, 2) # ┌───────┬──────────┐ # │ other ┆ value │ # │ --- ┆ --- │ # │ str ┆ i64 │ # ╞═══════╪══════════╡ # │ bar ┆ 5000555 │ # │ foo ┆ -3265500 │ # └───────┴──────────┘
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 |
# File 'lib/polars/selectors.rb', line 402 def self.by_dtype(*dtypes) all_dtypes = [] dtypes.each do |tp| if Utils.is_polars_dtype(tp) || tp.is_a?(Class) all_dtypes << tp elsif tp.is_a?(::Array) tp.each do |t| if !(Utils.is_polars_dtype(t) || t.is_a?(Class)) msg = "invalid dtype: #{t.inspect}" raise TypeError, msg end all_dtypes << t end else msg = "invalid dtype: #{tp.inspect}" raise TypeError, msg end end Selector._by_dtype(all_dtypes) end |
.by_index(*indices, require_all: true) ⇒ Selector
Matching columns are returned in the order in which their indexes appear in the selector, not the underlying schema order.
Select all columns matching the given indices (or range objects).
500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 |
# File 'lib/polars/selectors.rb', line 500 def self.by_index(*indices, require_all: true) all_indices = [] indices.each do |idx| if idx.is_a?(Enumerable) all_indices.concat(idx.to_a) elsif idx.is_a?(Integer) all_indices << idx else msg = "invalid index value: #{idx.inspect}" raise TypeError, msg end end Selector._from_rbselector(RbSelector.by_index(all_indices, require_all)) end |
.by_name(*names, require_all: true) ⇒ Selector
Matching columns are returned in the order in which they are declared in the selector, not the underlying schema order.
Select all columns matching the given names.
577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 |
# File 'lib/polars/selectors.rb', line 577 def self.by_name(*names, require_all: true) all_names = [] names.each do |nm| if nm.is_a?(::String) all_names << nm elsif nm.is_a?(::Array) nm.each do |n| if !n.is_a?(::String) msg = "invalid name: #{n.inspect}" raise TypeError, msg end all_names << n end else msg = "invalid name: #{nm.inspect}" raise TypeError, msg end end Selector._by_name(all_names, strict: require_all, expand_patterns: false) end |
.categorical ⇒ Selector
Select all categorical columns.
928 929 930 |
# File 'lib/polars/selectors.rb', line 928 def self.categorical Selector._from_rbselector(RbSelector.categorical) end |
.contains(*substring) ⇒ Selector
Select columns whose names contain the given literal substring(s).
987 988 989 990 991 992 |
# File 'lib/polars/selectors.rb', line 987 def self.contains(*substring) escaped_substring = _re_string(substring) raw_params = "^.*#{escaped_substring}.*$" Selector._from_rbselector(RbSelector.matches(raw_params)) end |
.date ⇒ Selector
Select all date columns.
1031 1032 1033 |
# File 'lib/polars/selectors.rb', line 1031 def self.date by_dtype([Date]) end |
.datetime(time_unit = nil, time_zone: ["*", nil]) ⇒ Selector
Select all datetime columns, optionally filtering by time unit/zone.
1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 |
# File 'lib/polars/selectors.rb', line 1047 def self.datetime(time_unit = nil, time_zone: ["*", nil]) if time_unit.nil? time_unit_lst = ["ms", "us", "ns"] else time_unit_lst = time_unit.is_a?(::String) ? [time_unit] : time_unit.to_a end if time_zone.nil? time_zone_lst = [nil] elsif time_zone # TODO improve time_zone_lst = time_zone.to_a end Selector._from_rbselector(RbSelector.datetime(time_unit_lst, time_zone_lst)) end |
.decimal ⇒ Selector
Select all decimal columns.
1104 1105 1106 1107 |
# File 'lib/polars/selectors.rb', line 1104 def self.decimal # TODO: allow explicit selection by scale/precision? Selector._from_rbselector(RbSelector.decimal) end |
.digit(ascii_only: false) ⇒ Selector
Matching column names cannot contain any non-digit characters. Note that the
definition of "digit" consists of all valid Unicode digit characters (\d)
by default; this can be changed by setting ascii_only: true.
Select all columns having names consisting only of digits.
1192 1193 1194 1195 |
# File 'lib/polars/selectors.rb', line 1192 def self.digit(ascii_only: false) re_digit = ascii_only ? "[0-9]" : "\\d" Selector._from_rbselector(RbSelector.matches("^#{re_digit}+$")) end |
.duration(time_unit = nil) ⇒ Selector
Select all duration columns, optionally filtering by time unit.
1204 1205 1206 1207 1208 1209 1210 1211 1212 |
# File 'lib/polars/selectors.rb', line 1204 def self.duration(time_unit = nil) if time_unit.nil? time_unit = ["ms", "us", "ns"] else time_unit = time_unit.is_a?(::String) ? [time_unit] : time_unit.to_a end Selector._from_rbselector(RbSelector.duration(time_unit)) end |
.empty ⇒ Selector
Select no columns.
This is useful for composition with other selectors.
34 35 36 |
# File 'lib/polars/selectors.rb', line 34 def self.empty Selector._from_rbselector(RbSelector.empty) end |
.ends_with(*suffix) ⇒ Selector
Select columns that end with the given substring(s).
1269 1270 1271 1272 1273 1274 |
# File 'lib/polars/selectors.rb', line 1269 def self.ends_with(*suffix) escaped_suffix = _re_string(suffix) raw_params = "^.*#{escaped_suffix}$" Selector._from_rbselector(RbSelector.matches(raw_params)) end |
.enum ⇒ Selector
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Select all enum columns.
640 641 642 |
# File 'lib/polars/selectors.rb', line 640 def self.enum Selector._from_rbselector(RbSelector.enum_) end |
.exclude(columns, *more_columns) ⇒ Selector
If excluding a single selector it is simpler to write as ~selector instead.
Select all columns except those matching the given columns, datatypes, or selectors.
1324 1325 1326 |
# File 'lib/polars/selectors.rb', line 1324 def self.exclude(columns, *more_columns) ~_combine_as_selector(columns, *more_columns) end |
.first(strict: true) ⇒ Selector
Select the first column in the current scope.
1367 1368 1369 |
# File 'lib/polars/selectors.rb', line 1367 def self.first(strict: true) Selector._from_rbselector(RbSelector.first(strict)) end |
.float ⇒ Selector
Select all float columns.
1411 1412 1413 |
# File 'lib/polars/selectors.rb', line 1411 def self.float Selector._from_rbselector(RbSelector.float) end |
.integer ⇒ Selector
Select all integer columns.
1454 1455 1456 |
# File 'lib/polars/selectors.rb', line 1454 def self.integer Selector._from_rbselector(RbSelector.integer) end |
.last(strict: true) ⇒ Selector
Select the last column in the current scope.
1611 1612 1613 |
# File 'lib/polars/selectors.rb', line 1611 def self.last(strict: true) Selector._from_rbselector(RbSelector.last(strict)) end |
.list(inner = nil) ⇒ Selector
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Select all list columns.
705 706 707 708 |
# File 'lib/polars/selectors.rb', line 705 def self.list(inner = nil) inner_s = !inner.nil? ? inner._rbselector : nil Selector._from_rbselector(RbSelector.list(inner_s)) end |
.matches(pattern) ⇒ Selector
Select all columns that match the given regex pattern.
1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 |
# File 'lib/polars/selectors.rb', line 1655 def self.matches(pattern) if pattern == ".*" all else if pattern.start_with?(".*") pattern = pattern[2..] elsif pattern.end_with?(".*") pattern = pattern[..-3] end pfx = !pattern.start_with?("^") ? "^.*" : "" sfx = !pattern.end_with?("$") ? ".*$" : "" raw_params = "#{pfx}#{pattern}#{sfx}" Selector._from_rbselector(RbSelector.matches(raw_params)) end end |
.nested ⇒ Selector
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Select all nested columns.
A nested column is a list, array or struct.
885 886 887 |
# File 'lib/polars/selectors.rb', line 885 def self.nested Selector._from_rbselector(RbSelector.nested) end |
.numeric ⇒ Selector
Select all numeric columns.
1713 1714 1715 |
# File 'lib/polars/selectors.rb', line 1713 def self.numeric Selector._from_rbselector(RbSelector.numeric) end |
.object ⇒ Selector
Select all object columns.
1732 1733 1734 |
# File 'lib/polars/selectors.rb', line 1732 def self.object Selector._from_rbselector(RbSelector.object) end |
.signed_integer ⇒ Selector
Select all signed integer columns.
1511 1512 1513 |
# File 'lib/polars/selectors.rb', line 1511 def self.signed_integer Selector._from_rbselector(RbSelector.signed_integer) end |
.starts_with(*prefix) ⇒ Selector
Select columns that start with the given substring(s).
1791 1792 1793 1794 1795 1796 |
# File 'lib/polars/selectors.rb', line 1791 def self.starts_with(*prefix) escaped_prefix = _re_string(prefix) raw_params = "^#{escaped_prefix}.*$" Selector._from_rbselector(RbSelector.matches(raw_params)) end |
.string(include_categorical: false) ⇒ Selector
Select all String (and, optionally, Categorical) string columns.
df.group_by(Polars.cs.string).agg(Polars.cs.numeric.sum).sort(Polars.cs.string) shape: (2, 3) ┌─────┬─────┬─────┐ │ w ┆ x ┆ y │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ f64 │ ╞═════╪═════╪═════╡ │ xx ┆ 0 ┆ 2.0 │ │ yy ┆ 6 ┆ 7.0 │ └─────┴─────┴─────┘
1841 1842 1843 1844 1845 1846 1847 1848 |
# File 'lib/polars/selectors.rb', line 1841 def self.string(include_categorical: false) string_dtypes = [String] if include_categorical string_dtypes << Categorical end by_dtype(string_dtypes) end |
.struct ⇒ Selector
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Select all struct columns.
838 839 840 |
# File 'lib/polars/selectors.rb', line 838 def self.struct Selector._from_rbselector(RbSelector.struct_) end |
.temporal ⇒ Selector
Select all temporal columns.
1900 1901 1902 |
# File 'lib/polars/selectors.rb', line 1900 def self.temporal Selector._from_rbselector(RbSelector.temporal) end |
.time ⇒ Selector
Select all time columns.
1943 1944 1945 |
# File 'lib/polars/selectors.rb', line 1943 def self.time by_dtype([Time]) end |
.unsigned_integer ⇒ Selector
Select all unsigned integer columns.
1568 1569 1570 |
# File 'lib/polars/selectors.rb', line 1568 def self.unsigned_integer Selector._from_rbselector(RbSelector.unsigned_integer) end |