Class: Polars::StringExpr
- Inherits:
-
Object
- Object
- Polars::StringExpr
- Defined in:
- lib/polars/string_expr.rb
Overview
Namespace for string related expressions.
Instance Method Summary collapse
-
#contains(pattern, literal: false, strict: true) ⇒ Expr
Check if string contains a substring that matches a regex.
-
#contains_any(patterns, ascii_case_insensitive: false) ⇒ Expr
Use the aho-corasick algorithm to find matches.
-
#count_matches(pattern, literal: false) ⇒ Expr
(also: #count_match)
Count all successive non-overlapping regex matches.
-
#decode(encoding, strict: true) ⇒ Expr
Decode a value using the provided encoding.
-
#encode(encoding) ⇒ Expr
Encode a value using the provided encoding.
-
#ends_with(sub) ⇒ Expr
Check if string values end with a substring.
-
#extract(pattern, group_index: 1) ⇒ Expr
Extract the target capture group from provided patterns.
-
#extract_all(pattern) ⇒ Expr
Extracts all matches for the given regex pattern.
-
#extract_groups(pattern) ⇒ Expr
Extract all capture groups for the given regex pattern.
-
#join(delimiter = "-", ignore_nulls: true) ⇒ Expr
(also: #concat)
Vertically concat the values in the Series to a single string value.
-
#json_decode(dtype = nil, infer_schema_length: 100) ⇒ Expr
(also: #json_extract)
Parse string values as JSON.
-
#json_path_match(json_path) ⇒ Expr
Extract the first match of json string with provided JSONPath expression.
-
#len_bytes ⇒ Expr
(also: #lengths)
Get length of the strings as
:u32
(as number of bytes). -
#len_chars ⇒ Expr
(also: #n_chars)
Get length of the strings as
:u32
(as number of chars). -
#pad_end(length, fill_char = " ") ⇒ Expr
(also: #ljust)
Pad the end of the string until it reaches the given length.
-
#pad_start(length, fill_char = " ") ⇒ Expr
(also: #rjust)
Pad the start of the string until it reaches the given length.
-
#parse_int(radix = 2, strict: true) ⇒ Expr
Parse integers with base radix from strings.
-
#replace(pattern, value, literal: false, n: 1) ⇒ Expr
Replace first matching regex/literal substring with a new string value.
-
#replace_all(pattern, value, literal: false) ⇒ Expr
Replace all matching regex/literal substrings with a new string value.
-
#replace_many(patterns, replace_with, ascii_case_insensitive: false) ⇒ Expr
Use the aho-corasick algorithm to replace many matches.
-
#reverse ⇒ Expr
Returns string values in reversed order.
-
#slice(offset, length = nil) ⇒ Expr
Create subslices of the string values of a Utf8 Series.
-
#split(by, inclusive: false) ⇒ Expr
Split the string by a substring.
-
#split_exact(by, n, inclusive: false) ⇒ Expr
Split the string by a substring using
n
splits. -
#splitn(by, n) ⇒ Expr
Split the string by a substring, restricted to returning at most
n
items. -
#starts_with(sub) ⇒ Expr
Check if string values start with a substring.
-
#strip_chars(characters = nil) ⇒ Expr
(also: #strip)
Remove leading and trailing whitespace.
-
#strip_chars_end(characters = nil) ⇒ Expr
(also: #rstrip)
Remove trailing whitespace.
-
#strip_chars_start(characters = nil) ⇒ Expr
(also: #lstrip)
Remove leading whitespace.
-
#strip_prefix(prefix) ⇒ Expr
Remove prefix.
-
#strip_suffix(suffix) ⇒ Expr
Remove suffix.
-
#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ Expr
Parse a Utf8 expression to a Date/Datetime/Time type.
-
#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Date column.
-
#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Expr
Convert a Utf8 column into a Datetime column.
-
#to_decimal(inference_length = 100) ⇒ Expr
Convert a String column into a Decimal column.
-
#to_integer(base: 10, strict: true) ⇒ Expr
Convert an Utf8 column into an Int64 column with base radix.
-
#to_lowercase ⇒ Expr
Transform to lowercase variant.
-
#to_time(format = nil, strict: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Time column.
-
#to_titlecase ⇒ Expr
Transform to titlecase variant.
-
#to_uppercase ⇒ Expr
Transform to uppercase variant.
-
#zfill(length) ⇒ Expr
Fills the string with zeroes.
Instance Method Details
#contains(pattern, literal: false, strict: true) ⇒ Expr
Check if string contains a substring that matches a regex.
691 692 693 694 |
# File 'lib/polars/string_expr.rb', line 691 def contains(pattern, literal: false, strict: true) pattern = Utils.parse_into_expression(pattern, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_contains(pattern, literal, strict)) end |
#contains_any(patterns, ascii_case_insensitive: false) ⇒ Expr
Use the aho-corasick algorithm to find matches.
This version determines if any of the patterns find a match.
1405 1406 1407 1408 1409 1410 |
# File 'lib/polars/string_expr.rb', line 1405 def contains_any(patterns, ascii_case_insensitive: false) patterns = Utils.parse_into_expression(patterns, str_as_lit: false, list_as_series: true) Utils.wrap_expr( _rbexpr.str_contains_any(patterns, ascii_case_insensitive) ) end |
#count_matches(pattern, literal: false) ⇒ Expr Also known as: count_match
Count all successive non-overlapping regex matches.
1059 1060 1061 1062 |
# File 'lib/polars/string_expr.rb', line 1059 def count_matches(pattern, literal: false) pattern = Utils.parse_into_expression(pattern, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_count_matches(pattern, literal)) end |
#decode(encoding, strict: true) ⇒ Expr
Decode a value using the provided encoding.
873 874 875 876 877 878 879 880 881 |
# File 'lib/polars/string_expr.rb', line 873 def decode(encoding, strict: true) if encoding == "hex" Utils.wrap_expr(_rbexpr.str_hex_decode(strict)) elsif encoding == "base64" Utils.wrap_expr(_rbexpr.str_base64_decode(strict)) else raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}" end end |
#encode(encoding) ⇒ Expr
Encode a value using the provided encoding.
904 905 906 907 908 909 910 911 912 |
# File 'lib/polars/string_expr.rb', line 904 def encode(encoding) if encoding == "hex" Utils.wrap_expr(_rbexpr.str_hex_encode) elsif encoding == "base64" Utils.wrap_expr(_rbexpr.str_base64_encode) else raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}" end end |
#ends_with(sub) ⇒ Expr
Check if string values end with a substring.
731 732 733 734 |
# File 'lib/polars/string_expr.rb', line 731 def ends_with(sub) sub = Utils.parse_into_expression(sub, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_ends_with(sub)) end |
#extract(pattern, group_index: 1) ⇒ Expr
Extract the target capture group from provided patterns.
942 943 944 945 |
# File 'lib/polars/string_expr.rb', line 942 def extract(pattern, group_index: 1) pattern = Utils.parse_into_expression(pattern, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_extract(pattern, group_index)) end |
#extract_all(pattern) ⇒ Expr
Extracts all matches for the given regex pattern.
Extracts each successive non-overlapping regex match in an individual string as an array.
974 975 976 977 |
# File 'lib/polars/string_expr.rb', line 974 def extract_all(pattern) pattern = Utils.parse_into_expression(pattern, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_extract_all(pattern)) end |
#extract_groups(pattern) ⇒ Expr
Extract all capture groups for the given regex pattern.
1031 1032 1033 |
# File 'lib/polars/string_expr.rb', line 1031 def extract_groups(pattern) Utils.wrap_expr(_rbexpr.str_extract_groups(pattern)) end |
#join(delimiter = "-", ignore_nulls: true) ⇒ Expr Also known as: concat
Vertically concat the values in the Series to a single string value.
357 358 359 |
# File 'lib/polars/string_expr.rb', line 357 def join(delimiter = "-", ignore_nulls: true) Utils.wrap_expr(_rbexpr.str_join(delimiter, ignore_nulls)) end |
#json_decode(dtype = nil, infer_schema_length: 100) ⇒ Expr Also known as: json_extract
Parse string values as JSON.
Throw errors if encounter invalid JSON strings.
803 804 805 806 807 808 |
# File 'lib/polars/string_expr.rb', line 803 def json_decode(dtype = nil, infer_schema_length: 100) if !dtype.nil? dtype = Utils.rb_type_to_dtype(dtype) end Utils.wrap_expr(_rbexpr.str_json_decode(dtype, infer_schema_length)) end |
#json_path_match(json_path) ⇒ Expr
Extract the first match of json string with provided JSONPath expression.
Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.
Documentation on JSONPath standard can be found here.
842 843 844 845 |
# File 'lib/polars/string_expr.rb', line 842 def json_path_match(json_path) json_path = Utils.parse_into_expression(json_path, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_json_path_match(json_path)) end |
#len_bytes ⇒ Expr Also known as: lengths
The returned lengths are equal to the number of bytes in the UTF8 string. If you
need the length in terms of the number of characters, use n_chars
instead.
Get length of the strings as :u32
(as number of bytes).
285 286 287 |
# File 'lib/polars/string_expr.rb', line 285 def len_bytes Utils.wrap_expr(_rbexpr.str_len_bytes) end |
#len_chars ⇒ Expr Also known as: n_chars
If you know that you are working with ASCII text, lengths
will be
equivalent, and faster (returns length in terms of the number of bytes).
Get length of the strings as :u32
(as number of chars).
318 319 320 |
# File 'lib/polars/string_expr.rb', line 318 def len_chars Utils.wrap_expr(_rbexpr.str_len_chars) end |
#pad_end(length, fill_char = " ") ⇒ Expr Also known as: ljust
Pad the end of the string until it reaches the given length.
622 623 624 |
# File 'lib/polars/string_expr.rb', line 622 def pad_end(length, fill_char = " ") Utils.wrap_expr(_rbexpr.str_pad_end(length, fill_char)) end |
#pad_start(length, fill_char = " ") ⇒ Expr Also known as: rjust
Pad the start of the string until it reaches the given length.
592 593 594 |
# File 'lib/polars/string_expr.rb', line 592 def pad_start(length, fill_char = " ") Utils.wrap_expr(_rbexpr.str_pad_start(length, fill_char)) end |
#parse_int(radix = 2, strict: true) ⇒ Expr
Parse integers with base radix from strings.
By default base 2. ParseError/Overflows become Nulls.
1364 1365 1366 |
# File 'lib/polars/string_expr.rb', line 1364 def parse_int(radix = 2, strict: true) to_integer(base: 2, strict: strict).cast(Int32, strict: strict) end |
#replace(pattern, value, literal: false, n: 1) ⇒ Expr
Replace first matching regex/literal substring with a new string value.
1198 1199 1200 1201 1202 |
# File 'lib/polars/string_expr.rb', line 1198 def replace(pattern, value, literal: false, n: 1) pattern = Utils.parse_into_expression(pattern, str_as_lit: true) value = Utils.parse_into_expression(value, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_replace_n(pattern, value, literal, n)) end |
#replace_all(pattern, value, literal: false) ⇒ Expr
Replace all matching regex/literal substrings with a new string value.
1228 1229 1230 1231 1232 |
# File 'lib/polars/string_expr.rb', line 1228 def replace_all(pattern, value, literal: false) pattern = Utils.parse_into_expression(pattern, str_as_lit: true) value = Utils.parse_into_expression(value, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_replace_all(pattern, value, literal)) end |
#replace_many(patterns, replace_with, ascii_case_insensitive: false) ⇒ Expr
Use the aho-corasick algorithm to replace many matches.
1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 |
# File 'lib/polars/string_expr.rb', line 1476 def replace_many(patterns, replace_with, ascii_case_insensitive: false) patterns = Utils.parse_into_expression(patterns, str_as_lit: false, list_as_series: true) replace_with = Utils.parse_into_expression( replace_with, str_as_lit: true, list_as_series: true ) Utils.wrap_expr( _rbexpr.str_replace_many( patterns, replace_with, ascii_case_insensitive ) ) end |
#reverse ⇒ Expr
Returns string values in reversed order.
1252 1253 1254 |
# File 'lib/polars/string_expr.rb', line 1252 def reverse Utils.wrap_expr(_rbexpr.str_reverse) end |
#slice(offset, length = nil) ⇒ Expr
Create subslices of the string values of a Utf8 Series.
1283 1284 1285 1286 1287 |
# File 'lib/polars/string_expr.rb', line 1283 def slice(offset, length = nil) offset = Utils.parse_into_expression(offset) length = Utils.parse_into_expression(length) Utils.wrap_expr(_rbexpr.str_slice(offset, length)) end |
#split(by, inclusive: false) ⇒ Expr
Split the string by a substring.
1088 1089 1090 1091 1092 1093 1094 |
# File 'lib/polars/string_expr.rb', line 1088 def split(by, inclusive: false) by = Utils.parse_into_expression(by, str_as_lit: true) if inclusive return Utils.wrap_expr(_rbexpr.str_split_inclusive(by)) end Utils.wrap_expr(_rbexpr.str_split(by)) end |
#split_exact(by, n, inclusive: false) ⇒ Expr
Split the string by a substring using n
splits.
Results in a struct of n+1
fields.
If it cannot make n
splits, the remaining field elements will be null.
1130 1131 1132 1133 1134 1135 1136 1137 |
# File 'lib/polars/string_expr.rb', line 1130 def split_exact(by, n, inclusive: false) by = Utils.parse_into_expression(by, str_as_lit: true) if inclusive Utils.wrap_expr(_rbexpr.str_split_exact_inclusive(by, n)) else Utils.wrap_expr(_rbexpr.str_split_exact(by, n)) end end |
#splitn(by, n) ⇒ Expr
Split the string by a substring, restricted to returning at most n
items.
If the number of possible splits is less than n-1
, the remaining field
elements will be null. If the number of possible splits is n-1
or greater,
the last (nth) substring will contain the remainder of the string.
1167 1168 1169 1170 |
# File 'lib/polars/string_expr.rb', line 1167 def splitn(by, n) by = Utils.parse_into_expression(by, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_splitn(by, n)) end |
#starts_with(sub) ⇒ Expr
Check if string values start with a substring.
771 772 773 774 |
# File 'lib/polars/string_expr.rb', line 771 def starts_with(sub) sub = Utils.parse_into_expression(sub, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_starts_with(sub)) end |
#strip_chars(characters = nil) ⇒ Expr Also known as: strip
Remove leading and trailing whitespace.
448 449 450 451 |
# File 'lib/polars/string_expr.rb', line 448 def strip_chars(characters = nil) characters = Utils.parse_into_expression(characters, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_strip_chars(characters)) end |
#strip_chars_end(characters = nil) ⇒ Expr Also known as: rstrip
Remove trailing whitespace.
502 503 504 505 |
# File 'lib/polars/string_expr.rb', line 502 def strip_chars_end(characters = nil) characters = Utils.parse_into_expression(characters, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_strip_chars_end(characters)) end |
#strip_chars_start(characters = nil) ⇒ Expr Also known as: lstrip
Remove leading whitespace.
475 476 477 478 |
# File 'lib/polars/string_expr.rb', line 475 def strip_chars_start(characters = nil) characters = Utils.parse_into_expression(characters, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_strip_chars_start(characters)) end |
#strip_prefix(prefix) ⇒ Expr
Remove prefix.
The prefix will be removed from the string exactly once, if found.
532 533 534 535 |
# File 'lib/polars/string_expr.rb', line 532 def strip_prefix(prefix) prefix = Utils.parse_into_expression(prefix, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_strip_prefix(prefix)) end |
#strip_suffix(suffix) ⇒ Expr
Remove suffix.
The suffix will be removed from the string exactly once, if found.
562 563 564 565 |
# File 'lib/polars/string_expr.rb', line 562 def strip_suffix(suffix) suffix = Utils.parse_into_expression(suffix, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_strip_suffix(suffix)) end |
#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ Expr
When parsing a Datetime the column precision will be inferred from the format string, if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found then the default is "us".
Parse a Utf8 expression to a Date/Datetime/Time type.
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
# File 'lib/polars/string_expr.rb', line 197 def strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) _validate_format_argument(format) if dtype == Date to_date(format, strict: strict, exact: exact, cache: cache) elsif dtype == Datetime || dtype.is_a?(Datetime) dtype = Datetime.new if dtype == Datetime time_unit = dtype.time_unit time_zone = dtype.time_zone to_datetime(format, time_unit: time_unit, time_zone: time_zone, strict: strict, exact: exact, cache: cache) elsif dtype == Time to_time(format, strict: strict, cache: cache) else raise ArgumentError, "dtype should be of type {Date, Datetime, Time}" end end |
#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Date column.
40 41 42 43 |
# File 'lib/polars/string_expr.rb', line 40 def to_date(format = nil, strict: true, exact: true, cache: true) _validate_format_argument(format) Utils.wrap_expr(_rbexpr.str_to_date(format, strict, exact, cache)) end |
#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Expr
Convert a Utf8 column into a Datetime column.
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'lib/polars/string_expr.rb', line 79 def to_datetime( format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise" ) _validate_format_argument(format) unless ambiguous.is_a?(Expr) ambiguous = Polars.lit(ambiguous) end Utils.wrap_expr( _rbexpr.str_to_datetime( format, time_unit, time_zone, strict, exact, cache, ambiguous._rbexpr ) ) end |
#to_decimal(inference_length = 100) ⇒ Expr
Convert a String column into a Decimal column.
This method infers the needed parameters precision
and scale
.
253 254 255 |
# File 'lib/polars/string_expr.rb', line 253 def to_decimal(inference_length = 100) Utils.wrap_expr(_rbexpr.str_to_decimal(inference_length)) end |
#to_integer(base: 10, strict: true) ⇒ Expr
Convert an Utf8 column into an Int64 column with base radix.
1331 1332 1333 1334 |
# File 'lib/polars/string_expr.rb', line 1331 def to_integer(base: 10, strict: true) base = Utils.parse_into_expression(base, str_as_lit: false) Utils.wrap_expr(_rbexpr.str_to_integer(base, strict)) end |
#to_lowercase ⇒ Expr
Transform to lowercase variant.
400 401 402 |
# File 'lib/polars/string_expr.rb', line 400 def to_lowercase Utils.wrap_expr(_rbexpr.str_to_lowercase) end |
#to_time(format = nil, strict: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Time column.
130 131 132 133 |
# File 'lib/polars/string_expr.rb', line 130 def to_time(format = nil, strict: true, cache: true) _validate_format_argument(format) Utils.wrap_expr(_rbexpr.str_to_time(format, strict, cache)) end |
#to_titlecase ⇒ Expr
Transform to titlecase variant.
423 424 425 |
# File 'lib/polars/string_expr.rb', line 423 def to_titlecase Utils.wrap_expr(_rbexpr.str_to_titlecase) end |
#to_uppercase ⇒ Expr
Transform to uppercase variant.
379 380 381 |
# File 'lib/polars/string_expr.rb', line 379 def to_uppercase Utils.wrap_expr(_rbexpr.str_to_uppercase) end |
#zfill(length) ⇒ Expr
Fills the string with zeroes.
Return a copy of the string left filled with ASCII '0' digits to make a string of length width.
A leading sign prefix ('+'/'-') is handled by inserting the padding after the
sign character rather than before. The original string is returned if width is
less than or equal to s.length
.
656 657 658 659 |
# File 'lib/polars/string_expr.rb', line 656 def zfill(length) length = Utils.parse_into_expression(length) Utils.wrap_expr(_rbexpr.str_zfill(length)) end |