Class: Polars::StringNameSpace

Inherits:
Object
  • Object
show all
Defined in:
lib/polars/string_name_space.rb

Overview

Series.str namespace.

Instance Method Summary collapse

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class Polars::ExprDispatch

Instance Method Details

#contains(pattern, literal: false, strict: true) ⇒ Series

Check if strings in Series contain a substring that matches a regex.

Examples:

s = Polars::Series.new(["Crab", "cat and dog", "rab$bit", nil])
s.str.contains("cat|bit")
# =>
# shape: (4,)
# Series: '' [bool]
# [
#         false
#         true
#         true
#         null
# ]
s.str.contains("rab$", literal: true)
# =>
# shape: (4,)
# Series: '' [bool]
# [
#         false
#         false
#         true
#         null
# ]

Parameters:

  • pattern (String)

    A valid regex pattern.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

  • strict (Boolean) (defaults to: true)

    Raise an error if the underlying pattern is not a valid regex, otherwise mask out with a null value.

Returns:



304
305
306
# File 'lib/polars/string_name_space.rb', line 304

def contains(pattern, literal: false, strict: true)
  super
end

#contains_any(patterns, ascii_case_insensitive: false) ⇒ Series

Note:

This method supports matching on string literals only, and does not support regular expression matching.

Use the Aho-Corasick algorithm to find matches.

Determines if any of the patterns are contained in the string.

Examples:

s = Polars::Series.new(
  "lyrics",
  [
    "Everybody wants to rule the world",
    "Tell me what you want, what you really really want",
    "Can you feel the love tonight"
  ]
)
s.str.contains_any(["you", "me"])
# =>
# shape: (3,)
# Series: 'lyrics' [bool]
# [
#         false
#         true
#         true
# ]

Parameters:

  • patterns (Object)

    String patterns to search.

  • ascii_case_insensitive (Boolean) (defaults to: false)

    Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

Returns:



1254
1255
1256
1257
1258
1259
# File 'lib/polars/string_name_space.rb', line 1254

def contains_any(
  patterns,
  ascii_case_insensitive: false
)
  super
end

#count_matches(pattern, literal: false) ⇒ Series

Count all successive non-overlapping regex matches.

Examples:

s = Polars::Series.new("foo", ["123 bla 45 asd", "xyz 678 910t"])
s.str.count_matches('\d')
# =>
# shape: (2,)
# Series: 'foo' [u32]
# [
#         5
#         6
# ]

Parameters:

  • pattern (String)

    A valid regex pattern

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string, not as a regular expression.

Returns:



633
634
635
# File 'lib/polars/string_name_space.rb', line 633

def count_matches(pattern, literal: false)
  super
end

#decode(encoding, strict: true) ⇒ Series

Decode a value using the provided encoding.

Examples:

s = Polars::Series.new(["666f6f", "626172", nil])
s.str.decode("hex")
# =>
# shape: (3,)
# Series: '' [binary]
# [
#         b"foo"
#         b"bar"
#         null
# ]

Parameters:

  • encoding ("hex", "base64")

    The encoding to use.

  • strict (Boolean) (defaults to: true)

    How to handle invalid inputs:

    • true: An error will be thrown if unable to decode a value.
    • false: Unhandled values will be replaced with nil.

Returns:



434
435
436
# File 'lib/polars/string_name_space.rb', line 434

def decode(encoding, strict: true)
  super
end

#encode(encoding) ⇒ Series

Encode a value using the provided encoding.

Examples:

s = Polars::Series.new(["foo", "bar", nil])
s.str.encode("hex")
# =>
# shape: (3,)
# Series: '' [str]
# [
#         "666f6f"
#         "626172"
#         null
# ]

Parameters:

  • encoding ("hex", "base64")

    The encoding to use.

Returns:



456
457
458
# File 'lib/polars/string_name_space.rb', line 456

def encode(encoding)
  super
end

#ends_with(suffix) ⇒ Series

Check if string values end with a substring.

Examples:

s = Polars::Series.new("fruits", ["apple", "mango", nil])
s.str.ends_with("go")
# =>
# shape: (3,)
# Series: 'fruits' [bool]
# [
#         false
#         true
#         null
# ]

Parameters:

  • suffix (String)

    Suffix substring.

Returns:



385
386
387
# File 'lib/polars/string_name_space.rb', line 385

def ends_with(suffix)
  super
end

#escape_regexSeries

Returns string values with all regular expression meta characters escaped.

Examples:

Polars::Series.new(["abc", "def", nil, "abc(\\w+)"]).str.escape_regex
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "abc"
#         "def"
#         null
#         "abc\(\\w\+\)"
# ]

Returns:



1522
1523
1524
# File 'lib/polars/string_name_space.rb', line 1522

def escape_regex
  super
end

#extract(pattern, group_index: 1) ⇒ Series

Extract the target capture group from provided patterns.

Examples:

df = Polars::DataFrame.new({"foo" => ["123 bla 45 asd", "xyz 678 910t"]})
df.select([Polars.col("foo").str.extract('(\d+)')])
# =>
# shape: (2, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ str │
# ╞═════╡
# │ 123 │
# │ 678 │
# └─────┘

Parameters:

  • pattern (String)

    A valid regex pattern

  • group_index (Integer) (defaults to: 1)

    Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

Returns:



553
554
555
# File 'lib/polars/string_name_space.rb', line 553

def extract(pattern, group_index: 1)
  super
end

#extract_all(pattern) ⇒ Series

Extracts all matches for the given regex pattern.

Extract each successive non-overlapping regex match in an individual string as an array

Examples:

s = Polars::Series.new("foo", ["123 bla 45 asd", "xyz 678 910t"])
s.str.extract_all('(\d+)')
# =>
# shape: (2,)
# Series: 'foo' [list[str]]
# [
#         ["123", "45"]
#         ["678", "910"]
# ]

Parameters:

  • pattern (String)

    A valid regex pattern

Returns:



577
578
579
# File 'lib/polars/string_name_space.rb', line 577

def extract_all(pattern)
  super
end

#extract_groups(pattern) ⇒ Series

Note:

All group names are strings.

Extract all capture groups for the given regex pattern.

Examples:

s = Polars::Series.new(
  "url",
  [
    "http://vote.com/ballon_dor?candidate=messi&ref=python",
    "http://vote.com/ballon_dor?candidate=weghorst&ref=polars",
    "http://vote.com/ballon_dor?error=404&ref=rust"
  ]
)
s.str.extract_groups("candidate=(?<candidate>\\w+)&ref=(?<ref>\\w+)")
# =>
# shape: (3,)
# Series: 'url' [struct[2]]
# [
#         {"messi","python"}
#         {"weghorst","polars"}
#         {null,null}
# ]

Parameters:

  • pattern (String)

    A valid regular expression pattern containing at least one capture group, compatible with the regex crate.

Returns:



610
611
612
# File 'lib/polars/string_name_space.rb', line 610

def extract_groups(pattern)
  super
end

#extract_many(patterns, ascii_case_insensitive: false, overlapping: false, leftmost: false) ⇒ Series

Note:

This method supports matching on string literals only, and does not support regular expression matching.

Use the Aho-Corasick algorithm to extract many matches.

Examples:

s = Polars::Series.new("values", ["discontent"])
patterns = ["winter", "disco", "onte", "discontent"]
s.str.extract_many(patterns, overlapping: true)
# =>
# shape: (1,)
# Series: 'values' [list[str]]
# [
#         ["disco", "onte", "discontent"]
# ]

Parameters:

  • patterns (Object)

    String patterns to search.

  • ascii_case_insensitive (Boolean) (defaults to: false)

    Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

  • overlapping (Boolean) (defaults to: false)

    Whether matches may overlap.

  • leftmost (Boolean) (defaults to: false)

    Guarantees in case there are overlapping matches that the leftmost match is used. In case there are multiple candidates for the leftmost match the pattern which comes first in patterns is used. May not be used together with overlapping: true.

Returns:



1384
1385
1386
1387
1388
1389
1390
1391
# File 'lib/polars/string_name_space.rb', line 1384

def extract_many(
  patterns,
  ascii_case_insensitive: false,
  overlapping: false,
  leftmost: false
)
  super
end

#find(pattern, literal: false, strict: true) ⇒ Series

Note:

To modify regular expression behaviour (such as case-sensitivity) with flags, use the inline (?iLmsuxU) syntax.

Return the bytes offset of the first substring matching a pattern.

If the pattern is not found, returns nil.

Examples:

Find the index of the first substring matching a regex pattern:

s = Polars::Series.new("txt", ["Crab", "Lobster", nil, "Crustacean"])
s.str.find("a|e").rename("idx_rx")
# =>
# shape: (4,)
# Series: 'idx_rx' [u32]
# [
#         2
#         5
#         null
#         5
# ]

Find the index of the first substring matching a literal pattern:

s.str.find("e", literal: true).rename("idx_lit")
# =>
# shape: (4,)
# Series: 'idx_lit' [u32]
# [
#         null
#         5
#         null
#         7
# ]

Match against a pattern found in another column or (expression):

p = Polars::Series.new("pat", ["a[bc]", "b.t", "[aeiuo]", "(?i)A[BC]"])
s.str.find(p).rename("idx")
# =>
# shape: (4,)
# Series: 'idx' [u32]
# [
#         2
#         2
#         null
#         5
# ]

Parameters:

  • pattern

    A valid regular expression pattern, compatible with the regex crate.

  • literal (defaults to: false)

    Treat pattern as a literal string, not as a regular expression.

  • strict (defaults to: true)

    Raise an error if the underlying pattern is not a valid regex, otherwise mask out with a null value.

Returns:



363
364
365
# File 'lib/polars/string_name_space.rb', line 363

def find(pattern, literal: false, strict: true)
  super
end

#find_many(patterns, ascii_case_insensitive: false, overlapping: false, leftmost: false) ⇒ Series

Note:

This method supports matching on string literals only, and does not support regular expression matching.

Use the Aho-Corasick algorithm to find all matches.

The function returns the byte offset of the start of each match. The return type will be List<UInt32>

Examples:

df = Polars::DataFrame.new({"values" => ["discontent"]})
patterns = ["winter", "disco", "onte", "discontent"]
df.with_columns(
  Polars.col("values")
  .str.extract_many(patterns, overlapping: false)
  .alias("matches"),
  Polars.col("values")
  .str.extract_many(patterns, overlapping: true)
  .alias("matches_overlapping")
)
# =>
# shape: (1, 3)
# ┌────────────┬───────────┬─────────────────────────────────┐
# │ values     ┆ matches   ┆ matches_overlapping             │
# │ ---        ┆ ---       ┆ ---                             │
# │ str        ┆ list[str] ┆ list[str]                       │
# ╞════════════╪═══════════╪═════════════════════════════════╡
# │ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"… │
# └────────────┴───────────┴─────────────────────────────────┘
df = Polars::DataFrame.new(
  {
    "values" => ["discontent", "rhapsody"],
    "patterns" => [
      ["winter", "disco", "onte", "discontent"],
      ["rhap", "ody", "coalesce"]
    ]
  }
)
df.select(Polars.col("values").str.find_many("patterns"))
# =>
# shape: (2, 1)
# ┌───────────┐
# │ values    │
# │ ---       │
# │ list[u32] │
# ╞═══════════╡
# │ [0]       │
# │ [0, 5]    │
# └───────────┘

Parameters:

  • patterns (Object)

    String patterns to search.

  • ascii_case_insensitive (Boolean) (defaults to: false)

    Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

  • overlapping (Boolean) (defaults to: false)

    Whether matches may overlap.

  • leftmost (Boolean) (defaults to: false)

    Guarantees in case there are overlapping matches that the leftmost match is used. In case there are multiple candidates for the leftmost match the pattern which comes first in patterns is used. May not be used together with overlapping: true.

Returns:



1460
1461
1462
1463
1464
1465
1466
1467
# File 'lib/polars/string_name_space.rb', line 1460

def find_many(
  patterns,
  ascii_case_insensitive: false,
  overlapping: false,
  leftmost: false
)
  super
end

#head(n) ⇒ Series

Return the first n characters of each string in a String Series.

Examples:

Return up to the first 5 characters.

s = Polars::Series.new(["pear", nil, "papaya", "dragonfruit"])
s.str.head(5)
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "pear"
#         null
#         "papay"
#         "drago"
# ]

Return up to the 3rd character from the end.

s = Polars::Series.new(["pear", nil, "papaya", "dragonfruit"])
s.str.head(-3)
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "p"
#         null
#         "pap"
#         "dragonfr"
# ]

Parameters:

  • n (Object)

    Length of the slice (integer or expression). Negative indexing is supported; see note (2) below.

Returns:



1130
1131
1132
# File 'lib/polars/string_name_space.rb', line 1130

def head(n)
  super
end

#join(delimiter = nil, ignore_nulls: true) ⇒ Series

Vertically concat the values in the Series to a single string value.

Examples:

Polars::Series.new([1, nil, 2]).str.join("-")
# =>
# shape: (1,)
# Series: '' [str]
# [
#         "1-2"
# ]
Polars::Series.new([1, nil, 2]).str.join("-", ignore_nulls: false)
# =>
# shape: (1,)
# Series: '' [str]
# [
#         null
# ]

Parameters:

  • delimiter (String) (defaults to: nil)

    The delimiter to insert between consecutive string values.

  • ignore_nulls (Boolean) (defaults to: true)

    Ignore null values (default). If set to false, null values will be propagated. This means that if the column contains any null values, the output is null.

Returns:



1497
1498
1499
1500
1501
1502
1503
1504
1505
# File 'lib/polars/string_name_space.rb', line 1497

def join(delimiter = nil, ignore_nulls: true)
  # TODO update
  if delimiter.nil?
    warn "The default `delimiter` for `join` method will change from `-` to empty string in a future version"
    delimiter = "-"
  end

  super
end

#json_decode(dtype = nil, infer_schema_length: N_INFER_DEFAULT) ⇒ Series

Parse string values as JSON.

Throws an error if invalid JSON strings are encountered.

Examples:

s = Polars::Series.new("json", ['{"a":1, "b": true}', nil, '{"a":2, "b": false}'])
s.str.json_decode
# =>
# shape: (3,)
# Series: 'json' [struct[2]]
# [
#         {1,true}
#         null
#         {2,false}
# ]

Parameters:

  • dtype (Object) (defaults to: nil)

    The dtype to cast the extracted value to. If nil, the dtype will be inferred from the JSON value.

  • infer_schema_length (Integer) (defaults to: N_INFER_DEFAULT)

    The maximum number of rows to scan for schema inference. If set to nil, the full data may be scanned (this is slow).

Returns:



484
485
486
487
488
489
490
491
492
493
494
495
# File 'lib/polars/string_name_space.rb', line 484

def json_decode(dtype = nil, infer_schema_length: N_INFER_DEFAULT)
  if !dtype.nil?
    s = Utils.wrap_s(_s)
    return (
      s.to_frame
      .select_seq(F.col(s.name).str.json_decode(dtype))
      .to_series
    )
  end

  Utils.wrap_s(_s.str_json_decode(infer_schema_length))
end

#json_path_match(json_path) ⇒ Series

Extract the first match of json string with provided JSONPath expression.

Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.

Documentation on JSONPath standard can be found here.

Examples:

df = Polars::DataFrame.new(
  {"json_val" => ['{"a":"1"}', nil, '{"a":2}', '{"a":2.1}', '{"a":true}']}
)
df.select(Polars.col("json_val").str.json_path_match("$.a"))[0.., 0]
# =>
# shape: (5,)
# Series: 'json_val' [str]
# [
#         "1"
#         null
#         "2"
#         "2.1"
#         "true"
# ]

Parameters:

  • json_path (String)

    A valid JSON path query string.

Returns:



525
526
527
# File 'lib/polars/string_name_space.rb', line 525

def json_path_match(json_path)
  super
end

#len_bytesSeries

Return the length of each string as the number of bytes.

Examples:

s = Polars::Series.new(["Café", "345", "東京", nil])
s.str.len_bytes
# =>
# shape: (4,)
# Series: '' [u32]
# [
#         5
#         3
#         6
#         null
# ]

Returns:



244
245
246
# File 'lib/polars/string_name_space.rb', line 244

def len_bytes
  super
end

#len_charsSeries

Return the length of each string as the number of characters.

Examples:

s = Polars::Series.new(["Café", "345", "東京", nil])
s.str.len_chars
# =>
# shape: (4,)
# Series: '' [u32]
# [
#         4
#         3
#         2
#         null
# ]

Returns:



264
265
266
# File 'lib/polars/string_name_space.rb', line 264

def len_chars
  super
end

#normalize(form = "NFC") ⇒ Series

Returns the Unicode normal form of the string values.

This uses the forms described in Unicode Standard Annex 15: https://www.unicode.org/reports/tr15/.

Examples:

s = Polars::Series.new(["01²", "KADOKAWA"])
s.str.normalize("NFC")
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "01²"
#         "KADOKAWA"
# ]
s.str.normalize("NFKC")
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "012"
#         "KADOKAWA"
# ]

Parameters:

  • form ('NFC', 'NFKC', 'NFD', 'NFKD') (defaults to: "NFC")

    Unicode form to use.

Returns:



1555
1556
1557
# File 'lib/polars/string_name_space.rb', line 1555

def normalize(form = "NFC")
  super
end

#pad_end(length, fill_char = " ") ⇒ Series

Pad the end of the string until it reaches the given length.

Examples:

s = Polars::Series.new(["cow", "monkey", "hippopotamus", nil])
s.str.pad_end(8, "*")
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "cow*****"
#         "monkey**"
#         "hippopotamus"
#         null
# ]

Parameters:

  • length (Integer)

    Pad the string until it reaches this length. Strings with length equal to or greater than this value are returned as-is.

  • fill_char (String) (defaults to: " ")

    The character to pad the string with.

Returns:



969
970
971
# File 'lib/polars/string_name_space.rb', line 969

def pad_end(length, fill_char = " ")
  super
end

#pad_start(length, fill_char = " ") ⇒ Series

Pad the start of the string until it reaches the given length.

Examples:

s = Polars::Series.new("a", ["cow", "monkey", "hippopotamus", nil])
s.str.pad_start(8, "*")
# =>
# shape: (4,)
# Series: 'a' [str]
# [
#         "*****cow"
#         "**monkey"
#         "hippopotamus"
#         null
# ]

Parameters:

  • length (Integer)

    Pad the string until it reaches this length. Strings with length equal to or greater than this value are returned as-is.

  • fill_char (String) (defaults to: " ")

    The character to pad the string with.

Returns:



943
944
945
# File 'lib/polars/string_name_space.rb', line 943

def pad_start(length, fill_char = " ")
  super
end

#replace(pattern, value, literal: false, n: 1) ⇒ Series

Replace first matching regex/literal substring with a new string value.

Examples:

s = Polars::Series.new(["123abc", "abc456"])
s.str.replace('abc\b', "ABC")
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "123ABC"
#         "abc456"
# ]

Parameters:

  • pattern (String)

    A valid regex pattern.

  • value (String)

    Substring to replace.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

  • n (Integer) (defaults to: 1)

    Number of matches to replace.

Returns:



773
774
775
# File 'lib/polars/string_name_space.rb', line 773

def replace(pattern, value, literal: false, n: 1)
  super
end

#replace_all(pattern, value, literal: false) ⇒ Series

Replace all matching regex/literal substrings with a new string value.

Examples:

df = Polars::Series.new(["abcabc", "123a123"])
df.str.replace_all("a", "-")
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "-bc-bc"
#         "123-123"
# ]

Parameters:

  • pattern (String)

    A valid regex pattern.

  • value (String)

    Substring to replace.

  • literal (Boolean) (defaults to: false)

    Treat pattern as a literal string.

Returns:



798
799
800
# File 'lib/polars/string_name_space.rb', line 798

def replace_all(pattern, value, literal: false)
  super
end

#replace_many(patterns, replace_with = NO_DEFAULT, ascii_case_insensitive: false, leftmost: false) ⇒ Series

Note:

This method supports matching on string literals only, and does not support regular expression matching.

Use the Aho-Corasick algorithm to replace many matches.

Examples:

Replace many patterns by passing lists of equal length to the patterns and replace_with parameters.

s = Polars::Series.new(
  "lyrics",
  [
    "Everybody wants to rule the world",
    "Tell me what you want, what you really really want",
    "Can you feel the love tonight"
  ]
)
s.str.replace_many(["you", "me"], ["me", "you"])
# =>
# shape: (3,)
# Series: 'lyrics' [str]
# [
#         "Everybody wants to rule the wo…
#         "Tell you what me want, what me…
#         "Can me feel the love tonight"
# ]

Broadcast a replacement for many patterns by passing an array of length 1 to the replace_with parameter.

s = Polars::Series.new(
  "lyrics",
  [
    "Everybody wants to rule the world",
    "Tell me what you want, what you really really want",
    "Can you feel the love tonight",
  ]
)
s.str.replace_many(["me", "you", "they"], [""])
# =>
# shape: (3,)
# Series: 'lyrics' [str]
# [
#         "Everybody wants to rule the wo…
#         "Tell  what  want, what  really…
#         "Can  feel the love tonight"
# ]

Passing a mapping with patterns and replacements is also supported as syntactic sugar.

s = Polars::Series.new(
  "lyrics",
  [
    "Everybody wants to rule the world",
    "Tell me what you want, what you really really want",
    "Can you feel the love tonight"
  ]
)
mapping = {"me" => "you", "you" => "me", "want" => "need"}
s.str.replace_many(mapping)
# =>
# shape: (3,)
# Series: 'lyrics' [str]
# [
#         "Everybody needs to rule the wo…
#         "Tell you what me need, what me…
#         "Can me feel the love tonight"
# ]

Parameters:

  • patterns (Object)

    String patterns to search and replace. Also accepts a mapping of patterns to their replacement as syntactic sugar for replace_many(Polars::Series.new(mapping.keys), Polars::Series.new(mapping.values)).

  • replace_with (Object) (defaults to: NO_DEFAULT)

    Strings to replace where a pattern was a match. Length must match the length of patterns or have length 1. This can be broadcasted, so it supports many:one and many:many.

  • ascii_case_insensitive (Boolean) (defaults to: false)

    Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

  • leftmost (Boolean) (defaults to: false)

    Guarantees in case there are overlapping matches that the leftmost match is used. In case there are multiple candidates for the leftmost match the pattern which comes first in patterns is used.

Returns:



1343
1344
1345
1346
1347
1348
1349
1350
# File 'lib/polars/string_name_space.rb', line 1343

def replace_many(
  patterns,
  replace_with = NO_DEFAULT,
  ascii_case_insensitive: false,
  leftmost: false
)
  super
end

#reverseSeries

Returns string values in reversed order.

Examples:

s = Polars::Series.new("text", ["foo", "bar", "man\u0303ana"])
s.str.reverse
# =>
# shape: (3,)
# Series: 'text' [str]
# [
#         "oof"
#         "rab"
#         "anañam"
# ]

Returns:



1054
1055
1056
# File 'lib/polars/string_name_space.rb', line 1054

def reverse
  super
end

#slice(offset, length = nil) ⇒ Series

Create subslices of the string values of a Utf8 Series.

Examples:

s = Polars::Series.new("s", ["pear", nil, "papaya", "dragonfruit"])
s.str.slice(-3)
# =>
# shape: (4,)
# Series: 's' [str]
# [
#         "ear"
#         null
#         "aya"
#         "uit"
# ]

Using the optional length parameter

s.str.slice(4, 3)
# =>
# shape: (4,)
# Series: 's' [str]
# [
#         ""
#         null
#         "ya"
#         "onf"
# ]

Parameters:

  • offset (Integer)

    Start index. Negative indexing is supported.

  • length (Integer) (defaults to: nil)

    Length of the slice. If set to nil (default), the slice is taken to the end of the string.

Returns:



1092
1093
1094
1095
# File 'lib/polars/string_name_space.rb', line 1092

def slice(offset, length = nil)
  s = Utils.wrap_s(_s)
  s.to_frame.select(Polars.col(s.name).str.slice(offset, length)).to_series
end

#split(by, inclusive: false) ⇒ Series

Split the string by a substring.

Parameters:

  • by (String)

    Substring to split by.

  • inclusive (Boolean) (defaults to: false)

    If true, include the split character/string in the results.

Returns:



645
646
647
# File 'lib/polars/string_name_space.rb', line 645

def split(by, inclusive: false)
  super
end

#split_exact(by, n, inclusive: false) ⇒ Series

Split the string by a substring using n splits.

Results in a struct of n+1 fields.

If it cannot make n splits, the remaining field elements will be null.

Examples:

df = Polars::DataFrame.new({"x" => ["a_1", nil, "c", "d_4"]})
df["x"].str.split_exact("_", 1).alias("fields")
# =>
# shape: (4,)
# Series: 'fields' [struct[2]]
# [
#         {"a","1"}
#         {null,null}
#         {"c",null}
#         {"d","4"}
# ]

Split string values in column x in exactly 2 parts and assign each part to a new column.

df["x"]
  .str.split_exact("_", 1)
  .struct.rename_fields(["first_part", "second_part"])
  .alias("fields")
  .to_frame
  .unnest("fields")
# =>
# shape: (4, 2)
# ┌────────────┬─────────────┐
# │ first_part ┆ second_part │
# │ ---        ┆ ---         │
# │ str        ┆ str         │
# ╞════════════╪═════════════╡
# │ a          ┆ 1           │
# │ null       ┆ null        │
# │ c          ┆ null        │
# │ d          ┆ 4           │
# └────────────┴─────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • n (Integer)

    Number of splits to make.

  • inclusive (Boolean) (defaults to: false)

    If true, include the split character/string in the results.

Returns:



696
697
698
# File 'lib/polars/string_name_space.rb', line 696

def split_exact(by, n, inclusive: false)
  super
end

#splitn(by, n) ⇒ Series

Split the string by a substring, restricted to returning at most n items.

If the number of possible splits is less than n-1, the remaining field elements will be null. If the number of possible splits is n-1 or greater, the last (nth) substring will contain the remainder of the string.

Examples:

df = Polars::DataFrame.new({"s" => ["foo bar", nil, "foo-bar", "foo bar baz"]})
df["s"].str.splitn(" ", 2).alias("fields")
# =>
# shape: (4,)
# Series: 'fields' [struct[2]]
# [
#         {"foo","bar"}
#         {null,null}
#         {"foo-bar",null}
#         {"foo","bar baz"}
# ]

Split string values in column s in exactly 2 parts and assign each part to a new column.

df["s"]
  .str.splitn(" ", 2)
  .struct.rename_fields(["first_part", "second_part"])
  .alias("fields")
  .to_frame
  .unnest("fields")
# =>
# shape: (4, 2)
# ┌────────────┬─────────────┐
# │ first_part ┆ second_part │
# │ ---        ┆ ---         │
# │ str        ┆ str         │
# ╞════════════╪═════════════╡
# │ foo        ┆ bar         │
# │ null       ┆ null        │
# │ foo-bar    ┆ null        │
# │ foo        ┆ bar baz     │
# └────────────┴─────────────┘

Parameters:

  • by (String)

    Substring to split by.

  • n (Integer)

    Max number of items to return.

Returns:



745
746
747
748
# File 'lib/polars/string_name_space.rb', line 745

def splitn(by, n)
  s = Utils.wrap_s(_s)
  s.to_frame.select(Polars.col(s.name).str.splitn(by, n)).to_series
end

#starts_with(prefix) ⇒ Series

Check if string values start with a substring.

Examples:

s = Polars::Series.new("fruits", ["apple", "mango", nil])
s.str.starts_with("app")
# =>
# shape: (3,)
# Series: 'fruits' [bool]
# [
#         true
#         false
#         null
# ]

Parameters:

  • prefix (String)

    Prefix substring.

Returns:



407
408
409
# File 'lib/polars/string_name_space.rb', line 407

def starts_with(prefix)
  super
end

#strip_chars(characters = nil) ⇒ Series

Remove leading and trailing whitespace.

Examples:

s = Polars::Series.new([" hello ", "\tworld"])
s.str.strip_chars
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "hello"
#         "world"
# ]

Parameters:

  • characters (String) (defaults to: nil)

    The set of characters to be removed. All combinations of this set of characters will be stripped from the start and end of the string. If set to nil (default), all leading and trailing whitespace is removed instead.

Returns:



821
822
823
# File 'lib/polars/string_name_space.rb', line 821

def strip_chars(characters = nil)
  super
end

#strip_chars_end(characters = nil) ⇒ Series

Remove trailing whitespace.

Examples:

s = Polars::Series.new([" hello ", "world\t"])
s.str.strip_chars_end
# =>
# shape: (2,)
# Series: '' [str]
# [
#         " hello"
#         "world"
# ]

Parameters:

  • characters (String) (defaults to: nil)

    The set of characters to be removed. All combinations of this set of characters will be stripped from the end of the string. If set to nil (default), all leading and trailing whitespace is removed instead.

Returns:



867
868
869
# File 'lib/polars/string_name_space.rb', line 867

def strip_chars_end(characters = nil)
  super
end

#strip_chars_start(characters = nil) ⇒ Series

Remove leading whitespace.

Examples:

s = Polars::Series.new([" hello ", "\tworld"])
s.str.strip_chars_start
# =>
# shape: (2,)
# Series: '' [str]
# [
#         "hello "
#         "world"
# ]

Parameters:

  • characters (String) (defaults to: nil)

    The set of characters to be removed. All combinations of this set of characters will be stripped from the start of the string. If set to nil (default), all leading and trailing whitespace is removed instead.

Returns:



844
845
846
# File 'lib/polars/string_name_space.rb', line 844

def strip_chars_start(characters = nil)
  super
end

#strip_prefix(prefix) ⇒ Series

Remove prefix.

The prefix will be removed from the string exactly once, if found.

Examples:

s = Polars::Series.new(["foobar", "foofoobar", "foo", "bar"])
s.str.strip_prefix("foo")
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "bar"
#         "foobar"
#         ""
#         "bar"
# ]

Parameters:

  • prefix (String)

    The prefix to be removed.

Returns:



892
893
894
# File 'lib/polars/string_name_space.rb', line 892

def strip_prefix(prefix)
  super
end

#strip_suffix(suffix) ⇒ Series

Remove suffix.

The suffix will be removed from the string exactly once, if found.

Examples:

s = Polars::Series.new(["foobar", "foobarbar", "foo", "bar"])
s.str.strip_suffix("bar")
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "foo"
#         "foobar"
#         "foo"
#         ""
# ]

Parameters:

  • suffix (String)

    The suffix to be removed.

Returns:



917
918
919
# File 'lib/polars/string_name_space.rb', line 917

def strip_suffix(suffix)
  super
end

#strptime(dtype, format = nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Series

Parse a Series of dtype Utf8 to a Date/Datetime Series.

Examples:

Dealing with a consistent format:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.strptime(Polars::Datetime, "%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Dealing with different formats.

s = Polars::Series.new(
  "date",
  [
    "2021-04-22",
    "2022-01-04 00:00:00",
    "01/31/22",
    "Sun Jul  8 00:34:60 2001"
  ]
)
s.to_frame.select(
  Polars.coalesce(
    Polars.col("date").str.strptime(Polars::Date, "%F", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%F %T", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%D", strict: false),
    Polars.col("date").str.strptime(Polars::Date, "%c", strict: false)
  )
).to_series
# =>
# shape: (4,)
# Series: 'date' [date]
# [
#         2021-04-22
#         2022-01-04
#         2022-01-31
#         2001-07-08
# ]

Parameters:

  • dtype (Symbol)

    :date, :datetime, or :time.

  • format (String) (defaults to: nil)

    Format to use, refer to the chrono strftime documentation for specification. Example: "%y-%m-%d".

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)
    • If true, require an exact format match.
    • If false, allow the format to match anywhere in the target string.
  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted dates to apply the datetime conversion.

  • ambiguous ('raise', 'earliest', 'latest', 'null') (defaults to: "raise")

    Determine how to deal with ambiguous datetimes:

    • 'raise' (default): raise
    • 'earliest': use the earliest datetime
    • 'latest': use the latest datetime
    • 'null': set to null

Returns:



190
191
192
# File 'lib/polars/string_name_space.rb', line 190

def strptime(dtype, format = nil, strict: true, exact: true, cache: true, ambiguous: "raise")
  super
end

#tail(n) ⇒ Series

Return the last n characters of each string in a String Series.

Examples:

Return up to the last 5 characters:

s = Polars::Series.new(["pear", nil, "papaya", "dragonfruit"])
s.str.tail(5)
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "pear"
#         null
#         "apaya"
#         "fruit"
# ]

Return from the 3rd character to the end:

s = Polars::Series.new(["pear", nil, "papaya", "dragonfruit"])
s.str.tail(-3)
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "r"
#         null
#         "aya"
#         "gonfruit"
# ]

Parameters:

  • n (Object)

    Length of the slice (integer or expression). Negative indexing is supported; see note (2) below.

Returns:



1167
1168
1169
# File 'lib/polars/string_name_space.rb', line 1167

def tail(n)
  super
end

#to_date(format = nil, strict: true, exact: true, cache: true) ⇒ Series

Convert a Utf8 column into a Date column.

Examples:

s = Polars::Series.new(["2020/01/01", "2020/02/01", "2020/03/01"])
s.str.to_date
# =>
# shape: (3,)
# Series: '' [date]
# [
#         2020-01-01
#         2020-02-01
#         2020-03-01
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d". If set to nil (default), the format is inferred from the data.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)

    Require an exact format match. If false, allow the format to match anywhere in the target string.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted dates to apply the conversion.

Returns:



41
42
43
# File 'lib/polars/string_name_space.rb', line 41

def to_date(format = nil, strict: true, exact: true, cache: true)
  super
end

#to_datetime(format = nil, time_unit: nil, time_zone: nil, strict: true, exact: true, cache: true, ambiguous: "raise") ⇒ Series

Convert a Utf8 column into a Datetime column.

Examples:

s = Polars::Series.new(["2020-01-01 01:00Z", "2020-01-01 02:00Z"])
s.str.to_datetime("%Y-%m-%d %H:%M%#z")
# =>
# shape: (2,)
# Series: '' [datetime[μs, UTC]]
# [
#         2020-01-01 01:00:00 UTC
#         2020-01-01 02:00:00 UTC
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d %H:%M:%S". If set to nil (default), the format is inferred from the data.

  • time_unit ("us", "ns", "ms") (defaults to: nil)

    Unit of time for the resulting Datetime column. If set to nil (default), the time unit is inferred from the format string if given, eg: "%F %T%.3f" => Datetime("ms"). If no fractional second component is found, the default is "us".

  • time_zone (String) (defaults to: nil)

    Time zone for the resulting Datetime column.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • exact (Boolean) (defaults to: true)

    Require an exact format match. If false, allow the format to match anywhere in the target string.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted datetimes to apply the conversion.

  • ambiguous ('raise', 'earliest', 'latest', 'null') (defaults to: "raise")

    Determine how to deal with ambiguous datetimes:

    • 'raise' (default): raise
    • 'earliest': use the earliest datetime
    • 'latest': use the latest datetime
    • 'null': set to null

Returns:



86
87
88
89
90
91
92
93
94
95
96
# File 'lib/polars/string_name_space.rb', line 86

def to_datetime(
  format = nil,
  time_unit: nil,
  time_zone: nil,
  strict: true,
  exact: true,
  cache: true,
  ambiguous: "raise"
)
  super
end

#to_decimal(inference_length = 100, scale: nil) ⇒ Series

Convert a String column into a Decimal column.

This method infers the needed parameters precision and scale.

Examples:

s = Polars::Series.new(
  ["40.12", "3420.13", "120134.19", "3212.98", "12.90", "143.09", "143.9"]
)
s.str.to_decimal
# =>
# shape: (7,)
# Series: '' [decimal[8,2]]
# [
#         40.12
#         3420.13
#         120134.19
#         3212.98
#         12.90
#         143.09
#         143.90
# ]

Parameters:

  • inference_length (Integer) (defaults to: 100)

    Number of elements to parse to determine the precision and scale

Returns:



220
221
222
223
224
225
226
# File 'lib/polars/string_name_space.rb', line 220

def to_decimal(inference_length = 100, scale: nil)
  if !scale.nil?
    raise Todo
  end

  Utils.wrap_s(_s.str_to_decimal_infer(inference_length))
end

#to_integer(base: 10, dtype: Int64, strict: true) ⇒ Series

Convert an String column into a column of dtype with base radix.

Examples:

s = Polars::Series.new("bin", ["110", "101", "010", "invalid"])
s.str.to_integer(base: 2, dtype: Polars::Int32, strict: false)
# =>
# shape: (4,)
# Series: 'bin' [i32]
# [
#         6
#         5
#         2
#         null
# ]
s = Polars::Series.new("hex", ["fa1e", "ff00", "cafe", nil])
s.str.to_integer(base: 16)
# =>
# shape: (4,)
# Series: 'hex' [i64]
# [
#         64030
#         65280
#         51966
#         null
# ]

Parameters:

  • base (Integer) (defaults to: 10)

    Positive integer or expression which is the base of the string we are parsing. Default: 10.

  • dtype (Object) (defaults to: Int64)

    Polars integer type to cast to. Default: Int64.

  • strict (Object) (defaults to: true)

    Bool, Default=true will raise any ParseError or overflow as ComputeError. false silently convert to Null.

Returns:



1211
1212
1213
1214
1215
1216
1217
# File 'lib/polars/string_name_space.rb', line 1211

def to_integer(
  base: 10,
  dtype: Int64,
  strict: true
)
  super
end

#to_lowercaseSeries

Modify the strings to their lowercase equivalent.

Examples:

s = Polars::Series.new("foo", ["CAT", "DOG"])
s.str.to_lowercase
# =>
# shape: (2,)
# Series: 'foo' [str]
# [
#         "cat"
#         "dog"
# ]

Returns:



1017
1018
1019
# File 'lib/polars/string_name_space.rb', line 1017

def to_lowercase
  super
end

#to_time(format = nil, strict: true, cache: true) ⇒ Series

Convert a Utf8 column into a Time column.

Examples:

s = Polars::Series.new(["01:00", "02:00", "03:00"])
s.str.to_time("%H:%M")
# =>
# shape: (3,)
# Series: '' [time]
# [
#         01:00:00
#         02:00:00
#         03:00:00
# ]

Parameters:

  • format (String) (defaults to: nil)

    Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%H:%M:%S". If set to nil (default), the format is inferred from the data.

  • strict (Boolean) (defaults to: true)

    Raise an error if any conversion fails.

  • cache (Boolean) (defaults to: true)

    Use a cache of unique, converted times to apply the conversion.

Returns:



123
124
125
# File 'lib/polars/string_name_space.rb', line 123

def to_time(format = nil, strict: true, cache: true)
  super
end

#to_uppercaseSeries

Modify the strings to their uppercase equivalent.

Examples:

s = Polars::Series.new("foo", ["cat", "dog"])
s.str.to_uppercase
# =>
# shape: (2,)
# Series: 'foo' [str]
# [
#         "CAT"
#         "DOG"
# ]

Returns:



1035
1036
1037
# File 'lib/polars/string_name_space.rb', line 1035

def to_uppercase
  super
end

#zfill(length) ⇒ Series

Fills the string with zeroes.

Return a copy of the string left filled with ASCII '0' digits to make a string of length width.

A leading sign prefix ('+'/'-') is handled by inserting the padding after the sign character rather than before. The original string is returned if width is less than or equal to s.length.

Examples:

s = Polars::Series.new([-1, 123, 999999, nil])
s.cast(Polars::String).str.zfill(4)
# =>
# shape: (4,)
# Series: '' [str]
# [
#         "-001"
#         "0123"
#         "999999"
#         null
# ]

Parameters:

  • length (Integer)

    Fill the value up to this length.

Returns:



999
1000
1001
# File 'lib/polars/string_name_space.rb', line 999

def zfill(length)
  super
end