Class: Polars::StringNameSpace
- Inherits:
-
Object
- Object
- Polars::StringNameSpace
- Defined in:
- lib/polars/string_name_space.rb
Overview
Series.str namespace.
Instance Method Summary collapse
-
#contains(pattern, literal: false, strict: true) ⇒ Series
Check if strings in Series contain a substring that matches a regex.
-
#contains_any(patterns, ascii_case_insensitive: false) ⇒ Series
Use the Aho-Corasick algorithm to find matches.
-
#count_matches(pattern, literal: false) ⇒ Series
Count all successive non-overlapping regex matches.
-
#decode(encoding, strict: true) ⇒ Series
Decode a value using the provided encoding.
-
#encode(encoding) ⇒ Series
Encode a value using the provided encoding.
-
#ends_with(suffix) ⇒ Series
Check if string values end with a substring.
-
#escape_regex ⇒ Series
Returns string values with all regular expression meta characters escaped.
-
#extract(pattern, group_index: 1) ⇒ Series
Extract the target capture group from provided patterns.
-
#extract_all(pattern) ⇒ Series
Extracts all matches for the given regex pattern.
-
#extract_groups(pattern) ⇒ Series
Extract all capture groups for the given regex pattern.
-
#extract_many(patterns, ascii_case_insensitive: false, overlapping: false, leftmost: false) ⇒ Series
Use the Aho-Corasick algorithm to extract many matches.
-
#find(pattern, literal: false, strict: true) ⇒ Series
Return the bytes offset of the first substring matching a pattern.
-
#find_many(patterns, ascii_case_insensitive: false, overlapping: false, leftmost: false) ⇒ Series
Use the Aho-Corasick algorithm to find all matches.
-
#head(n) ⇒ Series
Return the first n characters of each string in a String Series.
-
#join(delimiter = nil, ignore_nulls: true) ⇒ Series
Vertically concat the values in the Series to a single string value.
-
#json_decode(dtype = nil, infer_schema_length: N_INFER_DEFAULT) ⇒ Series
Parse string values as JSON.
-
#json_path_match(json_path) ⇒ Series
Extract the first match of json string with provided JSONPath expression.
-
#len_bytes ⇒ Series
Return the length of each string as the number of bytes.
-
#len_chars ⇒ Series
Return the length of each string as the number of characters.
-
#normalize(form = "NFC") ⇒ Series
Returns the Unicode normal form of the string values.
-
#pad_end(length, fill_char = " ") ⇒ Series
Pad the end of the string until it reaches the given length.
-
#pad_start(length, fill_char = " ") ⇒ Series
Pad the start of the string until it reaches the given length.
-
#replace(pattern, value, literal: false, n: 1) ⇒ Series
Replace first matching regex/literal substring with a new string value.
-
#replace_all(pattern, value, literal: false) ⇒ Series
Replace all matching regex/literal substrings with a new string value.
-
#replace_many(patterns, replace_with = NO_DEFAULT, ascii_case_insensitive: false, leftmost: false) ⇒ Series
Use the Aho-Corasick algorithm to replace many matches.
-
#reverse ⇒ Series
Returns string values in reversed order.
-
#slice(offset, length = nil) ⇒ Series
Create subslices of the string values of a Utf8 Series.
-
#split(by, inclusive: false) ⇒ Series
Split the string by a substring.
-
#split_exact(by, n, inclusive: false) ⇒ Series
Split the string by a substring using
nsplits. -
#splitn(by, n) ⇒ Series
Split the string by a substring, restricted to returning at most
nitems. -
#starts_with(prefix) ⇒ Series
Check if string values start with a substring.
-
#strip_chars(characters = nil) ⇒ Series
Remove leading and trailing whitespace.
-
#strip_chars_end(characters = nil) ⇒ Series
Remove trailing whitespace.
-
#strip_chars_start(characters = nil) ⇒ Series
Remove leading whitespace.
-
#strip_prefix(prefix) ⇒ Series
Remove prefix.
-
#strip_suffix(suffix) ⇒ Series
Remove suffix.
-
#strptime(dtype, format = nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Series
Parse a Series of dtype Utf8 to a Date/Datetime Series.
-
#tail(n) ⇒ Series
Return the last n characters of each string in a String Series.
-
#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Series
Convert a Utf8 column into a Date column.
-
#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Series
Convert a Utf8 column into a Datetime column.
-
#to_decimal(inference_length = 100, scale: nil) ⇒ Series
Convert a String column into a Decimal column.
-
#to_integer(base: 10, dtype: Int64, strict: true) ⇒ Series
Convert an String column into a column of dtype with base radix.
-
#to_lowercase ⇒ Series
Modify the strings to their lowercase equivalent.
-
#to_time(format = nil, strict: true, cache: true) ⇒ Series
Convert a Utf8 column into a Time column.
-
#to_uppercase ⇒ Series
Modify the strings to their uppercase equivalent.
-
#zfill(length) ⇒ Series
Fills the string with zeroes.
Dynamic Method Handling
This class handles dynamic methods through the method_missing method in the class Polars::ExprDispatch
Instance Method Details
#contains(pattern, literal: false, strict: true) ⇒ Series
Check if strings in Series contain a substring that matches a regex.
304 305 306 |
# File 'lib/polars/string_name_space.rb', line 304 def contains(pattern, literal: false, strict: true) super end |
#contains_any(patterns, ascii_case_insensitive: false) ⇒ Series
This method supports matching on string literals only, and does not support regular expression matching.
Use the Aho-Corasick algorithm to find matches.
Determines if any of the patterns are contained in the string.
1254 1255 1256 1257 1258 1259 |
# File 'lib/polars/string_name_space.rb', line 1254 def contains_any( patterns, ascii_case_insensitive: false ) super end |
#count_matches(pattern, literal: false) ⇒ Series
Count all successive non-overlapping regex matches.
633 634 635 |
# File 'lib/polars/string_name_space.rb', line 633 def count_matches(pattern, literal: false) super end |
#decode(encoding, strict: true) ⇒ Series
Decode a value using the provided encoding.
434 435 436 |
# File 'lib/polars/string_name_space.rb', line 434 def decode(encoding, strict: true) super end |
#encode(encoding) ⇒ Series
Encode a value using the provided encoding.
456 457 458 |
# File 'lib/polars/string_name_space.rb', line 456 def encode(encoding) super end |
#ends_with(suffix) ⇒ Series
Check if string values end with a substring.
385 386 387 |
# File 'lib/polars/string_name_space.rb', line 385 def ends_with(suffix) super end |
#escape_regex ⇒ Series
Returns string values with all regular expression meta characters escaped.
1522 1523 1524 |
# File 'lib/polars/string_name_space.rb', line 1522 def escape_regex super end |
#extract(pattern, group_index: 1) ⇒ Series
Extract the target capture group from provided patterns.
553 554 555 |
# File 'lib/polars/string_name_space.rb', line 553 def extract(pattern, group_index: 1) super end |
#extract_all(pattern) ⇒ Series
Extracts all matches for the given regex pattern.
Extract each successive non-overlapping regex match in an individual string as an array
577 578 579 |
# File 'lib/polars/string_name_space.rb', line 577 def extract_all(pattern) super end |
#extract_groups(pattern) ⇒ Series
All group names are strings.
Extract all capture groups for the given regex pattern.
610 611 612 |
# File 'lib/polars/string_name_space.rb', line 610 def extract_groups(pattern) super end |
#extract_many(patterns, ascii_case_insensitive: false, overlapping: false, leftmost: false) ⇒ Series
This method supports matching on string literals only, and does not support regular expression matching.
Use the Aho-Corasick algorithm to extract many matches.
1384 1385 1386 1387 1388 1389 1390 1391 |
# File 'lib/polars/string_name_space.rb', line 1384 def extract_many( patterns, ascii_case_insensitive: false, overlapping: false, leftmost: false ) super end |
#find(pattern, literal: false, strict: true) ⇒ Series
To modify regular expression behaviour (such as case-sensitivity) with
flags, use the inline (?iLmsuxU) syntax.
Return the bytes offset of the first substring matching a pattern.
If the pattern is not found, returns nil.
363 364 365 |
# File 'lib/polars/string_name_space.rb', line 363 def find(pattern, literal: false, strict: true) super end |
#find_many(patterns, ascii_case_insensitive: false, overlapping: false, leftmost: false) ⇒ Series
This method supports matching on string literals only, and does not support regular expression matching.
Use the Aho-Corasick algorithm to find all matches.
The function returns the byte offset of the start of each match.
The return type will be List<UInt32>
1460 1461 1462 1463 1464 1465 1466 1467 |
# File 'lib/polars/string_name_space.rb', line 1460 def find_many( patterns, ascii_case_insensitive: false, overlapping: false, leftmost: false ) super end |
#head(n) ⇒ Series
Return the first n characters of each string in a String Series.
1130 1131 1132 |
# File 'lib/polars/string_name_space.rb', line 1130 def head(n) super end |
#join(delimiter = nil, ignore_nulls: true) ⇒ Series
Vertically concat the values in the Series to a single string value.
1497 1498 1499 1500 1501 1502 1503 1504 1505 |
# File 'lib/polars/string_name_space.rb', line 1497 def join(delimiter = nil, ignore_nulls: true) # TODO update if delimiter.nil? warn "The default `delimiter` for `join` method will change from `-` to empty string in a future version" delimiter = "-" end super end |
#json_decode(dtype = nil, infer_schema_length: N_INFER_DEFAULT) ⇒ Series
Parse string values as JSON.
Throws an error if invalid JSON strings are encountered.
484 485 486 487 488 489 490 491 492 493 494 495 |
# File 'lib/polars/string_name_space.rb', line 484 def json_decode(dtype = nil, infer_schema_length: N_INFER_DEFAULT) if !dtype.nil? s = Utils.wrap_s(_s) return ( s.to_frame .select_seq(F.col(s.name).str.json_decode(dtype)) .to_series ) end Utils.wrap_s(_s.str_json_decode(infer_schema_length)) end |
#json_path_match(json_path) ⇒ Series
Extract the first match of json string with provided JSONPath expression.
Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.
Documentation on JSONPath standard can be found here.
525 526 527 |
# File 'lib/polars/string_name_space.rb', line 525 def json_path_match(json_path) super end |
#len_bytes ⇒ Series
Return the length of each string as the number of bytes.
244 245 246 |
# File 'lib/polars/string_name_space.rb', line 244 def len_bytes super end |
#len_chars ⇒ Series
Return the length of each string as the number of characters.
264 265 266 |
# File 'lib/polars/string_name_space.rb', line 264 def len_chars super end |
#normalize(form = "NFC") ⇒ Series
Returns the Unicode normal form of the string values.
This uses the forms described in Unicode Standard Annex 15: https://www.unicode.org/reports/tr15/.
1555 1556 1557 |
# File 'lib/polars/string_name_space.rb', line 1555 def normalize(form = "NFC") super end |
#pad_end(length, fill_char = " ") ⇒ Series
Pad the end of the string until it reaches the given length.
969 970 971 |
# File 'lib/polars/string_name_space.rb', line 969 def pad_end(length, fill_char = " ") super end |
#pad_start(length, fill_char = " ") ⇒ Series
Pad the start of the string until it reaches the given length.
943 944 945 |
# File 'lib/polars/string_name_space.rb', line 943 def pad_start(length, fill_char = " ") super end |
#replace(pattern, value, literal: false, n: 1) ⇒ Series
Replace first matching regex/literal substring with a new string value.
773 774 775 |
# File 'lib/polars/string_name_space.rb', line 773 def replace(pattern, value, literal: false, n: 1) super end |
#replace_all(pattern, value, literal: false) ⇒ Series
Replace all matching regex/literal substrings with a new string value.
798 799 800 |
# File 'lib/polars/string_name_space.rb', line 798 def replace_all(pattern, value, literal: false) super end |
#replace_many(patterns, replace_with = NO_DEFAULT, ascii_case_insensitive: false, leftmost: false) ⇒ Series
This method supports matching on string literals only, and does not support regular expression matching.
Use the Aho-Corasick algorithm to replace many matches.
1343 1344 1345 1346 1347 1348 1349 1350 |
# File 'lib/polars/string_name_space.rb', line 1343 def replace_many( patterns, replace_with = NO_DEFAULT, ascii_case_insensitive: false, leftmost: false ) super end |
#reverse ⇒ Series
Returns string values in reversed order.
1054 1055 1056 |
# File 'lib/polars/string_name_space.rb', line 1054 def reverse super end |
#slice(offset, length = nil) ⇒ Series
Create subslices of the string values of a Utf8 Series.
1092 1093 1094 1095 |
# File 'lib/polars/string_name_space.rb', line 1092 def slice(offset, length = nil) s = Utils.wrap_s(_s) s.to_frame.select(Polars.col(s.name).str.slice(offset, length)).to_series end |
#split(by, inclusive: false) ⇒ Series
Split the string by a substring.
645 646 647 |
# File 'lib/polars/string_name_space.rb', line 645 def split(by, inclusive: false) super end |
#split_exact(by, n, inclusive: false) ⇒ Series
Split the string by a substring using n splits.
Results in a struct of n+1 fields.
If it cannot make n splits, the remaining field elements will be null.
696 697 698 |
# File 'lib/polars/string_name_space.rb', line 696 def split_exact(by, n, inclusive: false) super end |
#splitn(by, n) ⇒ Series
Split the string by a substring, restricted to returning at most n items.
If the number of possible splits is less than n-1, the remaining field
elements will be null. If the number of possible splits is n-1 or greater,
the last (nth) substring will contain the remainder of the string.
745 746 747 748 |
# File 'lib/polars/string_name_space.rb', line 745 def splitn(by, n) s = Utils.wrap_s(_s) s.to_frame.select(Polars.col(s.name).str.splitn(by, n)).to_series end |
#starts_with(prefix) ⇒ Series
Check if string values start with a substring.
407 408 409 |
# File 'lib/polars/string_name_space.rb', line 407 def starts_with(prefix) super end |
#strip_chars(characters = nil) ⇒ Series
Remove leading and trailing whitespace.
821 822 823 |
# File 'lib/polars/string_name_space.rb', line 821 def strip_chars(characters = nil) super end |
#strip_chars_end(characters = nil) ⇒ Series
Remove trailing whitespace.
867 868 869 |
# File 'lib/polars/string_name_space.rb', line 867 def strip_chars_end(characters = nil) super end |
#strip_chars_start(characters = nil) ⇒ Series
Remove leading whitespace.
844 845 846 |
# File 'lib/polars/string_name_space.rb', line 844 def strip_chars_start(characters = nil) super end |
#strip_prefix(prefix) ⇒ Series
Remove prefix.
The prefix will be removed from the string exactly once, if found.
892 893 894 |
# File 'lib/polars/string_name_space.rb', line 892 def strip_prefix(prefix) super end |
#strip_suffix(suffix) ⇒ Series
Remove suffix.
The suffix will be removed from the string exactly once, if found.
917 918 919 |
# File 'lib/polars/string_name_space.rb', line 917 def strip_suffix(suffix) super end |
#strptime(dtype, format = nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Series
Parse a Series of dtype Utf8 to a Date/Datetime Series.
190 191 192 |
# File 'lib/polars/string_name_space.rb', line 190 def strptime(dtype, format = nil, strict: true, exact: true, cache: true, ambiguous: "raise") super end |
#tail(n) ⇒ Series
Return the last n characters of each string in a String Series.
1167 1168 1169 |
# File 'lib/polars/string_name_space.rb', line 1167 def tail(n) super end |
#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Series
Convert a Utf8 column into a Date column.
41 42 43 |
# File 'lib/polars/string_name_space.rb', line 41 def to_date(format = nil, strict: true, exact: true, cache: true) super end |
#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Series
Convert a Utf8 column into a Datetime column.
86 87 88 89 90 91 92 93 94 95 96 |
# File 'lib/polars/string_name_space.rb', line 86 def to_datetime( format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise" ) super end |
#to_decimal(inference_length = 100, scale: nil) ⇒ Series
Convert a String column into a Decimal column.
This method infers the needed parameters precision and scale.
220 221 222 223 224 225 226 |
# File 'lib/polars/string_name_space.rb', line 220 def to_decimal(inference_length = 100, scale: nil) if !scale.nil? raise Todo end Utils.wrap_s(_s.str_to_decimal_infer(inference_length)) end |
#to_integer(base: 10, dtype: Int64, strict: true) ⇒ Series
Convert an String column into a column of dtype with base radix.
1211 1212 1213 1214 1215 1216 1217 |
# File 'lib/polars/string_name_space.rb', line 1211 def to_integer( base: 10, dtype: Int64, strict: true ) super end |
#to_lowercase ⇒ Series
Modify the strings to their lowercase equivalent.
1017 1018 1019 |
# File 'lib/polars/string_name_space.rb', line 1017 def to_lowercase super end |
#to_time(format = nil, strict: true, cache: true) ⇒ Series
Convert a Utf8 column into a Time column.
123 124 125 |
# File 'lib/polars/string_name_space.rb', line 123 def to_time(format = nil, strict: true, cache: true) super end |
#to_uppercase ⇒ Series
Modify the strings to their uppercase equivalent.
1035 1036 1037 |
# File 'lib/polars/string_name_space.rb', line 1035 def to_uppercase super end |
#zfill(length) ⇒ Series
Fills the string with zeroes.
Return a copy of the string left filled with ASCII '0' digits to make a string of length width.
A leading sign prefix ('+'/'-') is handled by inserting the padding after the
sign character rather than before. The original string is returned if width is
less than or equal to s.length.
999 1000 1001 |
# File 'lib/polars/string_name_space.rb', line 999 def zfill(length) super end |