Method: Polars::IO#scan_ipc

Defined in:
lib/polars/io/ipc.rb

#scan_ipc(source, n_rows: nil, cache: true, rechunk: true, row_count_name: nil, row_count_offset: 0, storage_options: nil, hive_partitioning: nil, hive_schema: nil, try_parse_hive_dates: true, include_file_paths: nil) ⇒ LazyFrame

Lazily read from an Arrow IPC (Feather v2) file or multiple files via glob patterns.

This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

Parameters:

  • source (String)

    Path to a IPC file.

  • n_rows (Integer) (defaults to: nil)

    Stop reading from IPC file after reading n_rows.

  • cache (Boolean) (defaults to: true)

    Cache the result after reading.

  • rechunk (Boolean) (defaults to: true)

    Reallocate to contiguous memory when all chunks/ files are parsed.

  • row_count_name (String) (defaults to: nil)

    If not nil, this will insert a row count column with give name into the DataFrame.

  • row_count_offset (Integer) (defaults to: 0)

    Offset to start the row_count column (only use if the name is set).

  • storage_options (Hash) (defaults to: nil)

    Extra options that make sense for a particular storage connection.

  • hive_partitioning (Boolean) (defaults to: nil)

    Infer statistics and schema from Hive partitioned URL and use them to prune reads. This is unset by default (i.e. nil), meaning it is automatically enabled when a single directory is passed, and otherwise disabled.

  • hive_schema (Hash) (defaults to: nil)

    The column names and data types of the columns by which the data is partitioned. If set to nil (default), the schema of the Hive partitions is inferred.

  • try_parse_hive_dates (Boolean) (defaults to: true)

    Whether to try parsing hive values as date/datetime types.

  • include_file_paths (String) (defaults to: nil)

    Include the path of the source file(s) as a column with this name.

Returns:


206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
# File 'lib/polars/io/ipc.rb', line 206

def scan_ipc(
  source,
  n_rows: nil,
  cache: true,
  rechunk: true,
  row_count_name: nil,
  row_count_offset: 0,
  storage_options: nil,
  hive_partitioning: nil,
  hive_schema: nil,
  try_parse_hive_dates: true,
  include_file_paths: nil
)
  _scan_ipc_impl(
    source,
    n_rows: n_rows,
    cache: cache,
    rechunk: rechunk,
    row_count_name: row_count_name,
    row_count_offset: row_count_offset,
    storage_options: storage_options,
    hive_partitioning: hive_partitioning,
    hive_schema: hive_schema,
    try_parse_hive_dates: try_parse_hive_dates,
    include_file_paths: include_file_paths
  )
end