Method: Google::Cloud::Bigquery::Dataset#insert

Defined in:
lib/google/cloud/bigquery/dataset.rb

#insert(table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::InsertResponse

Inserts data into the given table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.

Simple Ruby types are generally accepted per JSON rules, along with the following support for BigQuery's more complex types:

BigQuery Ruby Notes
NUMERIC BigDecimal BigDecimal values will be rounded to scale 9.
BIGNUMERIC String Pass as String to avoid rounding to scale 9.
DATETIME DateTime DATETIME does not support time zone.
DATE Date
GEOGRAPHY String
JSON String (Stringified JSON) String, as JSON does not have a schema to verify.
TIMESTAMP Time
TIME Google::Cloud::BigQuery::Time
BYTES File, IO, StringIO, or similar
ARRAY Array Nested arrays, nil values are not supported.
STRUCT Hash Hash keys may be strings or symbols.

Because BigQuery's streaming API is designed for high insertion rates, modifications to the underlying table metadata are eventually consistent when interacting with the streaming system. In most cases metadata changes are propagated within minutes, but during this period API responses may reflect the inconsistent state of the table.

The value :skip can be provided to skip the generation of IDs for all rows, or to skip the generation of an ID for a specific row in the array.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
dataset.insert "my_table", rows

Avoid retrieving the dataset with skip_lookup:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
dataset.insert "my_table", rows

Using autocreate to create a new table if none exists.

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

rows = [
  { "first_name" => "Alice", "age" => 21 },
  { "first_name" => "Bob", "age" => 22 }
]
dataset.insert "my_table", rows, autocreate: true do |t|
  t.schema.string "first_name", mode: :required
  t.schema.integer "age", mode: :required
end

Pass BIGNUMERIC value as a string to avoid rounding to scale 9 in the conversion from BigDecimal:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

row = {
  "my_numeric" => BigDecimal("123456798.987654321"),
  "my_bignumeric" => "123456798.98765432100001" # BigDecimal would be rounded, use String instead!
}
dataset.insert "my_table", row

Parameters:

  • table_id (String)

    The ID of the destination table.

  • rows (Hash, Array<Hash>)

    A hash object or array of hash objects containing the data. Required. BigDecimal values will be rounded to scale 9 to conform with the BigQuery NUMERIC data type. To avoid rounding BIGNUMERIC type values with scale greater than 9, use String instead of BigDecimal.

  • insert_ids (Array<String|Symbol>, Symbol) (defaults to: nil)

    A unique ID for each row. BigQuery uses this property to detect duplicate insertion requests on a best-effort basis. For more information, see data consistency. Optional. If not provided, the client library will assign a UUID to each row before the request is sent.

  • skip_invalid (Boolean) (defaults to: nil)

    Insert all valid rows of a request, even if invalid rows exist. The default value is false, which causes the entire request to fail if any invalid rows exist.

  • ignore_unknown (Boolean) (defaults to: nil)

    Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.

  • autocreate (Boolean) (defaults to: nil)

    Specifies whether the method should create a new table with the given table_id, if no table is found for table_id. The default value is false.

Yields:

  • (table)

    a block for setting the table

Yield Parameters:

  • table (Google::Cloud::Bigquery::Table::Updater)

    An updater to set additional properties on the table in the API request to create it. Only used when autocreate is set and the table does not already exist.

Returns:

Raises:

  • (ArgumentError)

See Also:



2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
# File 'lib/google/cloud/bigquery/dataset.rb', line 2934

def insert table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil, &block
  rows = [rows] if rows.is_a? Hash
  raise ArgumentError, "No rows provided" if rows.empty?

  insert_ids = Array.new(rows.count) { :skip } if insert_ids == :skip
  insert_ids = Array insert_ids
  if insert_ids.count.positive? && insert_ids.count != rows.count
    raise ArgumentError, "insert_ids must be the same size as rows"
  end

  if autocreate
    insert_data_with_autocreate table_id, rows, skip_invalid: skip_invalid, ignore_unknown: ignore_unknown,
                                                insert_ids: insert_ids, &block
  else
    insert_data table_id, rows, skip_invalid: skip_invalid, ignore_unknown: ignore_unknown,
                                insert_ids: insert_ids
  end
end