Class: Polars::StringExpr
- Inherits:
-
Object
- Object
- Polars::StringExpr
- Defined in:
- lib/polars/string_expr.rb
Overview
Namespace for string related expressions.
Instance Method Summary collapse
-
#contains(pattern, literal: false, strict: true) ⇒ Expr
Check if string contains a substring that matches a regex.
-
#contains_any(patterns, ascii_case_insensitive: false) ⇒ Expr
Use the aho-corasick algorithm to find matches.
-
#count_matches(pattern, literal: false) ⇒ Expr
(also: #count_match)
Count all successive non-overlapping regex matches.
-
#decode(encoding, strict: true) ⇒ Expr
Decode a value using the provided encoding.
-
#encode(encoding) ⇒ Expr
Encode a value using the provided encoding.
-
#ends_with(sub) ⇒ Expr
Check if string values end with a substring.
-
#extract(pattern, group_index: 1) ⇒ Expr
Extract the target capture group from provided patterns.
-
#extract_all(pattern) ⇒ Expr
Extracts all matches for the given regex pattern.
-
#extract_groups(pattern) ⇒ Expr
Extract all capture groups for the given regex pattern.
-
#join(delimiter = "-", ignore_nulls: true) ⇒ Expr
(also: #concat)
Vertically concat the values in the Series to a single string value.
-
#json_decode(dtype = nil, infer_schema_length: 100) ⇒ Expr
(also: #json_extract)
Parse string values as JSON.
-
#json_path_match(json_path) ⇒ Expr
Extract the first match of json string with provided JSONPath expression.
-
#len_bytes ⇒ Expr
(also: #lengths)
Get length of the strings as
:u32
(as number of bytes). -
#len_chars ⇒ Expr
(also: #n_chars)
Get length of the strings as
:u32
(as number of chars). -
#pad_end(length, fill_char = " ") ⇒ Expr
(also: #ljust)
Pad the end of the string until it reaches the given length.
-
#pad_start(length, fill_char = " ") ⇒ Expr
(also: #rjust)
Pad the start of the string until it reaches the given length.
-
#parse_int(radix = 2, strict: true) ⇒ Expr
Parse integers with base radix from strings.
-
#replace(pattern, value, literal: false, n: 1) ⇒ Expr
Replace first matching regex/literal substring with a new string value.
-
#replace_all(pattern, value, literal: false) ⇒ Expr
Replace all matching regex/literal substrings with a new string value.
-
#replace_many(patterns, replace_with, ascii_case_insensitive: false) ⇒ Expr
Use the aho-corasick algorithm to replace many matches.
-
#reverse ⇒ Expr
Returns string values in reversed order.
-
#slice(offset, length = nil) ⇒ Expr
Create subslices of the string values of a Utf8 Series.
-
#split(by, inclusive: false) ⇒ Expr
Split the string by a substring.
-
#split_exact(by, n, inclusive: false) ⇒ Expr
Split the string by a substring using
n
splits. -
#splitn(by, n) ⇒ Expr
Split the string by a substring, restricted to returning at most
n
items. -
#starts_with(sub) ⇒ Expr
Check if string values start with a substring.
-
#strip_chars(characters = nil) ⇒ Expr
(also: #strip)
Remove leading and trailing whitespace.
-
#strip_chars_end(characters = nil) ⇒ Expr
(also: #rstrip)
Remove trailing whitespace.
-
#strip_chars_start(characters = nil) ⇒ Expr
(also: #lstrip)
Remove leading whitespace.
-
#strip_prefix(prefix) ⇒ Expr
Remove prefix.
-
#strip_suffix(suffix) ⇒ Expr
Remove suffix.
-
#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ Expr
Parse a Utf8 expression to a Date/Datetime/Time type.
-
#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Date column.
-
#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Expr
Convert a Utf8 column into a Datetime column.
-
#to_decimal(inference_length = 100) ⇒ Expr
Convert a String column into a Decimal column.
-
#to_integer(base: 10, strict: true) ⇒ Expr
Convert an Utf8 column into an Int64 column with base radix.
-
#to_lowercase ⇒ Expr
Transform to lowercase variant.
-
#to_time(format = nil, strict: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Time column.
-
#to_titlecase ⇒ Expr
Transform to titlecase variant.
-
#to_uppercase ⇒ Expr
Transform to uppercase variant.
-
#zfill(length) ⇒ Expr
Fills the string with zeroes.
Instance Method Details
#contains(pattern, literal: false, strict: true) ⇒ Expr
Check if string contains a substring that matches a regex.
692 693 694 695 |
# File 'lib/polars/string_expr.rb', line 692 def contains(pattern, literal: false, strict: true) pattern = Utils.parse_into_expression(pattern, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_contains(pattern, literal, strict)) end |
#contains_any(patterns, ascii_case_insensitive: false) ⇒ Expr
Use the aho-corasick algorithm to find matches.
This version determines if any of the patterns find a match.
1406 1407 1408 1409 1410 1411 |
# File 'lib/polars/string_expr.rb', line 1406 def contains_any(patterns, ascii_case_insensitive: false) patterns = Utils.parse_into_expression(patterns, str_as_lit: false, list_as_series: true) Utils.wrap_expr( _rbexpr.str_contains_any(patterns, ascii_case_insensitive) ) end |
#count_matches(pattern, literal: false) ⇒ Expr Also known as: count_match
Count all successive non-overlapping regex matches.
1060 1061 1062 1063 |
# File 'lib/polars/string_expr.rb', line 1060 def count_matches(pattern, literal: false) pattern = Utils.parse_into_expression(pattern, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_count_matches(pattern, literal)) end |
#decode(encoding, strict: true) ⇒ Expr
Decode a value using the provided encoding.
874 875 876 877 878 879 880 881 882 |
# File 'lib/polars/string_expr.rb', line 874 def decode(encoding, strict: true) if encoding == "hex" Utils.wrap_expr(_rbexpr.str_hex_decode(strict)) elsif encoding == "base64" Utils.wrap_expr(_rbexpr.str_base64_decode(strict)) else raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}" end end |
#encode(encoding) ⇒ Expr
Encode a value using the provided encoding.
905 906 907 908 909 910 911 912 913 |
# File 'lib/polars/string_expr.rb', line 905 def encode(encoding) if encoding == "hex" Utils.wrap_expr(_rbexpr.str_hex_encode) elsif encoding == "base64" Utils.wrap_expr(_rbexpr.str_base64_encode) else raise ArgumentError, "encoding must be one of {{'hex', 'base64'}}, got #{encoding}" end end |
#ends_with(sub) ⇒ Expr
Check if string values end with a substring.
732 733 734 735 |
# File 'lib/polars/string_expr.rb', line 732 def ends_with(sub) sub = Utils.parse_into_expression(sub, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_ends_with(sub)) end |
#extract(pattern, group_index: 1) ⇒ Expr
Extract the target capture group from provided patterns.
943 944 945 946 |
# File 'lib/polars/string_expr.rb', line 943 def extract(pattern, group_index: 1) pattern = Utils.parse_into_expression(pattern, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_extract(pattern, group_index)) end |
#extract_all(pattern) ⇒ Expr
Extracts all matches for the given regex pattern.
Extracts each successive non-overlapping regex match in an individual string as an array.
975 976 977 978 |
# File 'lib/polars/string_expr.rb', line 975 def extract_all(pattern) pattern = Utils.parse_into_expression(pattern, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_extract_all(pattern)) end |
#extract_groups(pattern) ⇒ Expr
Extract all capture groups for the given regex pattern.
1032 1033 1034 |
# File 'lib/polars/string_expr.rb', line 1032 def extract_groups(pattern) Utils.wrap_expr(_rbexpr.str_extract_groups(pattern)) end |
#join(delimiter = "-", ignore_nulls: true) ⇒ Expr Also known as: concat
Vertically concat the values in the Series to a single string value.
357 358 359 |
# File 'lib/polars/string_expr.rb', line 357 def join(delimiter = "-", ignore_nulls: true) Utils.wrap_expr(_rbexpr.str_join(delimiter, ignore_nulls)) end |
#json_decode(dtype = nil, infer_schema_length: 100) ⇒ Expr Also known as: json_extract
Parse string values as JSON.
Throw errors if encounter invalid JSON strings.
804 805 806 807 808 809 |
# File 'lib/polars/string_expr.rb', line 804 def json_decode(dtype = nil, infer_schema_length: 100) if !dtype.nil? dtype = Utils.rb_type_to_dtype(dtype) end Utils.wrap_expr(_rbexpr.str_json_decode(dtype, infer_schema_length)) end |
#json_path_match(json_path) ⇒ Expr
Extract the first match of json string with provided JSONPath expression.
Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.
Documentation on JSONPath standard can be found here.
843 844 845 846 |
# File 'lib/polars/string_expr.rb', line 843 def json_path_match(json_path) json_path = Utils.parse_into_expression(json_path, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_json_path_match(json_path)) end |
#len_bytes ⇒ Expr Also known as: lengths
The returned lengths are equal to the number of bytes in the UTF8 string. If you
need the length in terms of the number of characters, use n_chars
instead.
Get length of the strings as :u32
(as number of bytes).
285 286 287 |
# File 'lib/polars/string_expr.rb', line 285 def len_bytes Utils.wrap_expr(_rbexpr.str_len_bytes) end |
#len_chars ⇒ Expr Also known as: n_chars
If you know that you are working with ASCII text, lengths
will be
equivalent, and faster (returns length in terms of the number of bytes).
Get length of the strings as :u32
(as number of chars).
318 319 320 |
# File 'lib/polars/string_expr.rb', line 318 def len_chars Utils.wrap_expr(_rbexpr.str_len_chars) end |
#pad_end(length, fill_char = " ") ⇒ Expr Also known as: ljust
Pad the end of the string until it reaches the given length.
623 624 625 |
# File 'lib/polars/string_expr.rb', line 623 def pad_end(length, fill_char = " ") Utils.wrap_expr(_rbexpr.str_pad_end(length, fill_char)) end |
#pad_start(length, fill_char = " ") ⇒ Expr Also known as: rjust
Pad the start of the string until it reaches the given length.
593 594 595 |
# File 'lib/polars/string_expr.rb', line 593 def pad_start(length, fill_char = " ") Utils.wrap_expr(_rbexpr.str_pad_start(length, fill_char)) end |
#parse_int(radix = 2, strict: true) ⇒ Expr
Parse integers with base radix from strings.
By default base 2. ParseError/Overflows become Nulls.
1365 1366 1367 |
# File 'lib/polars/string_expr.rb', line 1365 def parse_int(radix = 2, strict: true) to_integer(base: 2, strict: strict).cast(Int32, strict: strict) end |
#replace(pattern, value, literal: false, n: 1) ⇒ Expr
Replace first matching regex/literal substring with a new string value.
1199 1200 1201 1202 1203 |
# File 'lib/polars/string_expr.rb', line 1199 def replace(pattern, value, literal: false, n: 1) pattern = Utils.parse_into_expression(pattern, str_as_lit: true) value = Utils.parse_into_expression(value, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_replace_n(pattern, value, literal, n)) end |
#replace_all(pattern, value, literal: false) ⇒ Expr
Replace all matching regex/literal substrings with a new string value.
1229 1230 1231 1232 1233 |
# File 'lib/polars/string_expr.rb', line 1229 def replace_all(pattern, value, literal: false) pattern = Utils.parse_into_expression(pattern, str_as_lit: true) value = Utils.parse_into_expression(value, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_replace_all(pattern, value, literal)) end |
#replace_many(patterns, replace_with, ascii_case_insensitive: false) ⇒ Expr
Use the aho-corasick algorithm to replace many matches.
1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 |
# File 'lib/polars/string_expr.rb', line 1477 def replace_many(patterns, replace_with, ascii_case_insensitive: false) patterns = Utils.parse_into_expression(patterns, str_as_lit: false, list_as_series: true) replace_with = Utils.parse_into_expression( replace_with, str_as_lit: true, list_as_series: true ) Utils.wrap_expr( _rbexpr.str_replace_many( patterns, replace_with, ascii_case_insensitive ) ) end |
#reverse ⇒ Expr
Returns string values in reversed order.
1253 1254 1255 |
# File 'lib/polars/string_expr.rb', line 1253 def reverse Utils.wrap_expr(_rbexpr.str_reverse) end |
#slice(offset, length = nil) ⇒ Expr
Create subslices of the string values of a Utf8 Series.
1284 1285 1286 1287 1288 |
# File 'lib/polars/string_expr.rb', line 1284 def slice(offset, length = nil) offset = Utils.parse_into_expression(offset) length = Utils.parse_into_expression(length) Utils.wrap_expr(_rbexpr.str_slice(offset, length)) end |
#split(by, inclusive: false) ⇒ Expr
Split the string by a substring.
1089 1090 1091 1092 1093 1094 1095 |
# File 'lib/polars/string_expr.rb', line 1089 def split(by, inclusive: false) by = Utils.parse_into_expression(by, str_as_lit: true) if inclusive return Utils.wrap_expr(_rbexpr.str_split_inclusive(by)) end Utils.wrap_expr(_rbexpr.str_split(by)) end |
#split_exact(by, n, inclusive: false) ⇒ Expr
Split the string by a substring using n
splits.
Results in a struct of n+1
fields.
If it cannot make n
splits, the remaining field elements will be null.
1131 1132 1133 1134 1135 1136 1137 1138 |
# File 'lib/polars/string_expr.rb', line 1131 def split_exact(by, n, inclusive: false) by = Utils.parse_into_expression(by, str_as_lit: true) if inclusive Utils.wrap_expr(_rbexpr.str_split_exact_inclusive(by, n)) else Utils.wrap_expr(_rbexpr.str_split_exact(by, n)) end end |
#splitn(by, n) ⇒ Expr
Split the string by a substring, restricted to returning at most n
items.
If the number of possible splits is less than n-1
, the remaining field
elements will be null. If the number of possible splits is n-1
or greater,
the last (nth) substring will contain the remainder of the string.
1168 1169 1170 1171 |
# File 'lib/polars/string_expr.rb', line 1168 def splitn(by, n) by = Utils.parse_into_expression(by, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_splitn(by, n)) end |
#starts_with(sub) ⇒ Expr
Check if string values start with a substring.
772 773 774 775 |
# File 'lib/polars/string_expr.rb', line 772 def starts_with(sub) sub = Utils.parse_into_expression(sub, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_starts_with(sub)) end |
#strip_chars(characters = nil) ⇒ Expr Also known as: strip
Remove leading and trailing whitespace.
449 450 451 452 |
# File 'lib/polars/string_expr.rb', line 449 def strip_chars(characters = nil) characters = Utils.parse_into_expression(characters, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_strip_chars(characters)) end |
#strip_chars_end(characters = nil) ⇒ Expr Also known as: rstrip
Remove trailing whitespace.
503 504 505 506 |
# File 'lib/polars/string_expr.rb', line 503 def strip_chars_end(characters = nil) characters = Utils.parse_into_expression(characters, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_strip_chars_end(characters)) end |
#strip_chars_start(characters = nil) ⇒ Expr Also known as: lstrip
Remove leading whitespace.
476 477 478 479 |
# File 'lib/polars/string_expr.rb', line 476 def strip_chars_start(characters = nil) characters = Utils.parse_into_expression(characters, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_strip_chars_start(characters)) end |
#strip_prefix(prefix) ⇒ Expr
Remove prefix.
The prefix will be removed from the string exactly once, if found.
533 534 535 536 |
# File 'lib/polars/string_expr.rb', line 533 def strip_prefix(prefix) prefix = Utils.parse_into_expression(prefix, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_strip_prefix(prefix)) end |
#strip_suffix(suffix) ⇒ Expr
Remove suffix.
The suffix will be removed from the string exactly once, if found.
563 564 565 566 |
# File 'lib/polars/string_expr.rb', line 563 def strip_suffix(suffix) suffix = Utils.parse_into_expression(suffix, str_as_lit: true) Utils.wrap_expr(_rbexpr.str_strip_suffix(suffix)) end |
#strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) ⇒ Expr
When parsing a Datetime the column precision will be inferred from the format string, if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found then the default is "us".
Parse a Utf8 expression to a Date/Datetime/Time type.
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
# File 'lib/polars/string_expr.rb', line 197 def strptime(dtype, format = nil, strict: true, exact: true, cache: true, utc: false) _validate_format_argument(format) if dtype == Date to_date(format, strict: strict, exact: exact, cache: cache) elsif dtype == Datetime || dtype.is_a?(Datetime) dtype = Datetime.new if dtype == Datetime time_unit = dtype.time_unit time_zone = dtype.time_zone to_datetime(format, time_unit: time_unit, time_zone: time_zone, strict: strict, exact: exact, cache: cache) elsif dtype == Time to_time(format, strict: strict, cache: cache) else raise ArgumentError, "dtype should be of type {Date, Datetime, Time}" end end |
#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Date column.
40 41 42 43 |
# File 'lib/polars/string_expr.rb', line 40 def to_date(format = nil, strict: true, exact: true, cache: true) _validate_format_argument(format) Utils.wrap_expr(_rbexpr.str_to_date(format, strict, exact, cache)) end |
#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Expr
Convert a Utf8 column into a Datetime column.
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'lib/polars/string_expr.rb', line 79 def to_datetime( format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise" ) _validate_format_argument(format) unless ambiguous.is_a?(Expr) ambiguous = Polars.lit(ambiguous) end Utils.wrap_expr( _rbexpr.str_to_datetime( format, time_unit, time_zone, strict, exact, cache, ambiguous._rbexpr ) ) end |
#to_decimal(inference_length = 100) ⇒ Expr
Convert a String column into a Decimal column.
This method infers the needed parameters precision
and scale
.
253 254 255 |
# File 'lib/polars/string_expr.rb', line 253 def to_decimal(inference_length = 100) Utils.wrap_expr(_rbexpr.str_to_decimal(inference_length)) end |
#to_integer(base: 10, strict: true) ⇒ Expr
Convert an Utf8 column into an Int64 column with base radix.
1332 1333 1334 1335 |
# File 'lib/polars/string_expr.rb', line 1332 def to_integer(base: 10, strict: true) base = Utils.parse_into_expression(base, str_as_lit: false) Utils.wrap_expr(_rbexpr.str_to_integer(base, strict)) end |
#to_lowercase ⇒ Expr
Transform to lowercase variant.
400 401 402 |
# File 'lib/polars/string_expr.rb', line 400 def to_lowercase Utils.wrap_expr(_rbexpr.str_to_lowercase) end |
#to_time(format = nil, strict: true, cache: true) ⇒ Expr
Convert a Utf8 column into a Time column.
130 131 132 133 |
# File 'lib/polars/string_expr.rb', line 130 def to_time(format = nil, strict: true, cache: true) _validate_format_argument(format) Utils.wrap_expr(_rbexpr.str_to_time(format, strict, cache)) end |
#to_titlecase ⇒ Expr
Transform to titlecase variant.
423 424 425 426 |
# File 'lib/polars/string_expr.rb', line 423 def to_titlecase raise Todo Utils.wrap_expr(_rbexpr.str_to_titlecase) end |
#to_uppercase ⇒ Expr
Transform to uppercase variant.
379 380 381 |
# File 'lib/polars/string_expr.rb', line 379 def to_uppercase Utils.wrap_expr(_rbexpr.str_to_uppercase) end |
#zfill(length) ⇒ Expr
Fills the string with zeroes.
Return a copy of the string left filled with ASCII '0' digits to make a string of length width.
A leading sign prefix ('+'/'-') is handled by inserting the padding after the
sign character rather than before. The original string is returned if width is
less than or equal to s.length
.
657 658 659 660 |
# File 'lib/polars/string_expr.rb', line 657 def zfill(length) length = Utils.parse_into_expression(length) Utils.wrap_expr(_rbexpr.str_zfill(length)) end |