Class: Parquet::ArrowFileWriter

Inherits:
Object
  • Object
show all
Defined in:
lib/parquet/arrow-file-writer.rb

Instance Method Summary collapse

Instance Method Details

#write(record_batch) ⇒ void #write(table, chunk_size: nil) ⇒ void #write(raw_records) ⇒ void

This method returns an undefined value.

Write data to Apache Parquet.

Overloads:

  • #write(record_batch) ⇒ void

    Examples:

    Write a record batch

    record_batch = Arrow::RecordBatch.new(enabled: [true, false])
    schema = record_batch.schema
    Parquet::ArrowFileWriter.open(schema, "data.parquet") do |writer|
      writer.write(record_batch)
    end

    Parameters:

    • record_batch (Arrow::RecordBatch)

      The record batch to be written.

  • #write(table, chunk_size: nil) ⇒ void

    Examples:

    Write a record batch with the default chunk size

    table = Arrow::Table.new(enabled: [true, false])
    schema = table.schema
    Parquet::ArrowFileWriter.open(schema, "data.parquet") do |writer|
      writer.write(table)
    end

    Write a record batch with the specified chunk size

    table = Arrow::Table.new(enabled: [true, false])
    schema = table.schema
    Parquet::ArrowFileWriter.open(schema, "data.parquet") do |writer|
      writer.write(table, chunk_size: 1)
    end

    Parameters:

    • table (Arrow::Table)

      The table to be written.

    • chunk_size (nil, Integer) (defaults to: nil)

      (nil) The maximum number of rows to write per row group.

      If this is ‘nil`, the default value (`1024 * 1024`) is used.

  • #write(raw_records) ⇒ void

    Examples:

    Write a record batch with Array<Array> based data

    schema = Arrow::Schema.new(enabled: :boolean)
    raw_records = [
      [true],
      [false],
    ]
    Parquet::ArrowFileWriter.open(schema, "data.parquet") do |writer|
      writer.write(raw_records)
    end

    Write a record batch with Array<Hash> based data

    schema = Arrow::Schema.new(enabled: :boolean)
    raw_columns = [
      enabled: [true, false],
    ]
    Parquet::ArrowFileWriter.open(schema, "data.parquet") do |writer|
      writer.write(raw_columns)
    end

    Parameters:

    • data (Array<Hash>, Array<Array>)

      The data to be written as primitive Ruby objects.

Since:

  • 18.0.0



84
85
86
87
88
89
90
91
92
93
94
95
96
# File 'lib/parquet/arrow-file-writer.rb', line 84

def write(target, chunk_size: nil)
  case target
  when Arrow::RecordBatch
    write_record_batch(target)
  when Arrow::Table
    # Same as parquet::DEFAULT_MAX_ROW_GROUP_LENGTH in C++
    chunk_size ||= 1024 * 1024
    write_table(target, chunk_size)
  else
    record_batch = Arrow::RecordBatch.new(schema, target)
    write_record_batch(record_batch)
  end
end