Class: Google::Cloud::Bigquery::LoadJob

Inherits:
Job
  • Object
show all
Defined in:
lib/google/cloud/bigquery/load_job.rb

Overview

LoadJob

A Job subclass representing a load operation that may be performed on a Table. A LoadJob instance is created when you call Table#load_job.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

gcs_uri = "gs://my-bucket/file-name.csv"
load_job = dataset.load_job "my_new_table", gcs_uri do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

load_job.wait_until_done!
load_job.done? #=> true

See Also:

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Updater

Attributes collapse

Instance Method Summary collapse

Methods inherited from Job

#cancel, #configuration, #created_at, #delete, #done?, #ended_at, #error, #errors, #failed?, #job_id, #labels, #location, #num_child_jobs, #parent_job_id, #pending?, #project_id, #reload!, #rerun!, #reservation_usage, #running?, #script_statistics, #session_id, #started_at, #state, #statistics, #status, #transaction_id, #user_email, #wait_until_done!

Instance Method Details

#allow_jagged_rows?Boolean

Checks if the load operation accepts rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an error is returned. The default value is false. Only applicable to CSV, ignored for other formats.

Returns:

  • (Boolean)

    true when jagged rows are allowed, false otherwise.



258
259
260
261
262
# File 'lib/google/cloud/bigquery/load_job.rb', line 258

def allow_jagged_rows?
  val = @gapi.configuration.load.allow_jagged_rows
  val = false if val.nil?
  val
end

#autodetect?Boolean

Checks if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default is false.

Returns:

  • (Boolean)

    true when autodetect is enabled, false otherwise.



189
190
191
192
193
# File 'lib/google/cloud/bigquery/load_job.rb', line 189

def autodetect?
  val = @gapi.configuration.load.autodetect
  val = false if val.nil?
  val
end

#backup?Boolean

Checks if the source data is a Google Cloud Datastore backup.

Returns:

  • (Boolean)

    true when the source format is DATASTORE_BACKUP, false otherwise.



224
225
226
# File 'lib/google/cloud/bigquery/load_job.rb', line 224

def backup?
  @gapi.configuration.load.source_format == "DATASTORE_BACKUP"
end

#clustering?Boolean

Returns:

  • (Boolean)

    true when the table will be clustered, or false otherwise.

See Also:



625
626
627
# File 'lib/google/cloud/bigquery/load_job.rb', line 625

def clustering?
  !@gapi.configuration.load.clustering.nil?
end

#clustering_fieldsArray<String>?

One or more fields on which the destination table should be clustered. Must be specified with time-based partitioning, data in the table will be first partitioned and subsequently clustered. The order of the returned fields determines the sort order of the data.

BigQuery supports clustering for both partitioned and non-partitioned tables.

See Google::Cloud::Bigquery::LoadJob::Updater#clustering_fields=, Table#clustering_fields and Table#clustering_fields=.

Returns:

  • (Array<String>, nil)

    The clustering fields, or nil if the destination table will not be clustered.

See Also:



651
652
653
# File 'lib/google/cloud/bigquery/load_job.rb', line 651

def clustering_fields
  @gapi.configuration.load.clustering.fields if clustering?
end

#csv?Boolean

Checks if the format of the source data is CSV. The default is true.

Returns:

  • (Boolean)

    true when the source format is CSV, false otherwise.



212
213
214
215
216
# File 'lib/google/cloud/bigquery/load_job.rb', line 212

def csv?
  val = @gapi.configuration.load.source_format
  return true if val.nil?
  val == "CSV"
end

#delimiterString

The delimiter used between fields in the source data. The default is a comma (,).

Returns:

  • (String)

    A string containing the character, such as ",".



85
86
87
# File 'lib/google/cloud/bigquery/load_job.rb', line 85

def delimiter
  @gapi.configuration.load.field_delimiter || ","
end

#destination(view: nil) ⇒ Table

The table into which the operation loads data. This is the table on which Table#load_job was invoked.

Parameters:

  • view (String) (defaults to: nil)

    Specifies the view that determines which table information is returned. By default, basic table information and storage statistics (STORAGE_STATS) are returned. Accepted values include :unspecified, :basic, :storage, and :full. For more information, see BigQuery Classes. The default value is the :unspecified view type.

Returns:

  • (Table)

    A table instance.



73
74
75
76
77
# File 'lib/google/cloud/bigquery/load_job.rb', line 73

def destination view: nil
  table = @gapi.configuration.load.destination_table
  return nil unless table
  retrieve_table table.project_id, table.dataset_id, table.table_id, metadata_view: view
end

#encryptionGoogle::Cloud::BigQuery::EncryptionConfiguration

The encryption configuration of the destination table.

Returns:

  • (Google::Cloud::BigQuery::EncryptionConfiguration)

    Custom encryption configuration (e.g., Cloud KMS keys).



355
356
357
358
359
# File 'lib/google/cloud/bigquery/load_job.rb', line 355

def encryption
  EncryptionConfiguration.from_gapi(
    @gapi.configuration.load.destination_encryption_configuration
  )
end

#hive_partitioning?Boolean

Checks if hive partitioning options are set.

Returns:

  • (Boolean)

    true when hive partitioning options are set, or false otherwise.

See Also:



382
383
384
# File 'lib/google/cloud/bigquery/load_job.rb', line 382

def hive_partitioning?
  !@gapi.configuration.load.hive_partitioning_options.nil?
end

#hive_partitioning_modeString?

The mode of hive partitioning to use when reading data. The following modes are supported:

  1. AUTO: automatically infer partition key name(s) and type(s).
  2. STRINGS: automatically infer partition key name(s). All types are interpreted as strings.
  3. CUSTOM: partition key schema is encoded in the source URI prefix.

Returns:

  • (String, nil)

    The mode of hive partitioning, or nil if not set.

See Also:



399
400
401
# File 'lib/google/cloud/bigquery/load_job.rb', line 399

def hive_partitioning_mode
  @gapi.configuration.load.hive_partitioning_options.mode if hive_partitioning?
end

#hive_partitioning_source_uri_prefixString?

The common prefix for all source uris when hive partition detection is requested. The prefix must end immediately before the partition key encoding begins. For example, consider files following this data layout:

gs://bucket/path_to_table/dt=2019-01-01/country=BR/id=7/file.avro
gs://bucket/path_to_table/dt=2018-12-31/country=CA/id=3/file.avro

When hive partitioning is requested with either AUTO or STRINGS mode, the common prefix can be either of gs://bucket/path_to_table or gs://bucket/path_to_table/ (trailing slash does not matter).

Returns:

  • (String, nil)

    The common prefix for all source uris, or nil if not set.

See Also:



421
422
423
# File 'lib/google/cloud/bigquery/load_job.rb', line 421

def hive_partitioning_source_uri_prefix
  @gapi.configuration.load.hive_partitioning_options.source_uri_prefix if hive_partitioning?
end

#ignore_unknown_values?Boolean

Checks if the load operation allows extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned. The default is false.

Returns:

  • (Boolean)

    true when unknown values are ignored, false otherwise.



274
275
276
277
278
# File 'lib/google/cloud/bigquery/load_job.rb', line 274

def ignore_unknown_values?
  val = @gapi.configuration.load.ignore_unknown_values
  val = false if val.nil?
  val
end

#input_file_bytesInteger

The number of bytes of source data in the load job.

Returns:

  • (Integer)

    The number of bytes.



330
331
332
333
334
# File 'lib/google/cloud/bigquery/load_job.rb', line 330

def input_file_bytes
  Integer @gapi.statistics.load.input_file_bytes
rescue StandardError
  nil
end

#input_filesInteger

The number of source data files in the load job.

Returns:

  • (Integer)

    The number of source files.



319
320
321
322
323
# File 'lib/google/cloud/bigquery/load_job.rb', line 319

def input_files
  Integer @gapi.statistics.load.input_files
rescue StandardError
  nil
end

#iso8859_1?Boolean

Checks if the character encoding of the data is ISO-8859-1.

Returns:

  • (Boolean)

    true when the character encoding is ISO-8859-1, false otherwise.



120
121
122
# File 'lib/google/cloud/bigquery/load_job.rb', line 120

def iso8859_1?
  @gapi.configuration.load.encoding == "ISO-8859-1"
end

#json?Boolean

Checks if the format of the source data is newline-delimited JSON. The default is false.

Returns:

  • (Boolean)

    true when the source format is NEWLINE_DELIMITED_JSON, false otherwise.



202
203
204
# File 'lib/google/cloud/bigquery/load_job.rb', line 202

def json?
  @gapi.configuration.load.source_format == "NEWLINE_DELIMITED_JSON"
end

#max_bad_recordsInteger

The maximum number of bad records that the load operation can ignore. If the number of bad records exceeds this value, an error is returned. The default value is 0, which requires that all records be valid.

Returns:

  • (Integer)

    The maximum number of bad records.



146
147
148
149
150
# File 'lib/google/cloud/bigquery/load_job.rb', line 146

def max_bad_records
  val = @gapi.configuration.load.max_bad_records
  val = 0 if val.nil?
  val
end

#null_markerString

Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.

Returns:

  • (String)

    A string representing null value in a CSV file.



163
164
165
166
167
# File 'lib/google/cloud/bigquery/load_job.rb', line 163

def null_marker
  val = @gapi.configuration.load.null_marker
  val = "" if val.nil?
  val
end

#orc?Boolean

Checks if the source format is ORC.

Returns:

  • (Boolean)

    true when the source format is ORC, false otherwise.



234
235
236
# File 'lib/google/cloud/bigquery/load_job.rb', line 234

def orc?
  @gapi.configuration.load.source_format == "ORC"
end

#output_bytesInteger

The number of bytes that have been loaded into the table. While an import job is in the running state, this value may change.

Returns:

  • (Integer)

    The number of bytes that have been loaded.



367
368
369
370
371
# File 'lib/google/cloud/bigquery/load_job.rb', line 367

def output_bytes
  Integer @gapi.statistics.load.output_bytes
rescue StandardError
  nil
end

#output_rowsInteger

The number of rows that have been loaded into the table. While an import job is in the running state, this value may change.

Returns:

  • (Integer)

    The number of rows that have been loaded.



342
343
344
345
346
# File 'lib/google/cloud/bigquery/load_job.rb', line 342

def output_rows
  Integer @gapi.statistics.load.output_rows
rescue StandardError
  nil
end

#parquet?Boolean

Checks if the source format is Parquet.

Returns:

  • (Boolean)

    true when the source format is PARQUET, false otherwise.



244
245
246
# File 'lib/google/cloud/bigquery/load_job.rb', line 244

def parquet?
  @gapi.configuration.load.source_format == "PARQUET"
end

#parquet_enable_list_inference?Boolean?

Indicates whether to use schema inference specifically for Parquet LIST logical type.

Returns:

  • (Boolean, nil)

    The enable_list_inference value in Parquet options, or nil if Parquet options are not set.

See Also:



450
451
452
# File 'lib/google/cloud/bigquery/load_job.rb', line 450

def parquet_enable_list_inference?
  @gapi.configuration.load.parquet_options.enable_list_inference if parquet_options?
end

#parquet_enum_as_string?Boolean?

Indicates whether to infer Parquet ENUM logical type as STRING instead of BYTES by default.

Returns:

  • (Boolean, nil)

    The enum_as_string value in Parquet options, or nil if Parquet options are not set.

See Also:



464
465
466
# File 'lib/google/cloud/bigquery/load_job.rb', line 464

def parquet_enum_as_string?
  @gapi.configuration.load.parquet_options.enum_as_string if parquet_options?
end

#parquet_options?Boolean

Checks if Parquet options are set.

Returns:

  • (Boolean)

    true when Parquet options are set, or false otherwise.

See Also:



435
436
437
# File 'lib/google/cloud/bigquery/load_job.rb', line 435

def parquet_options?
  !@gapi.configuration.load.parquet_options.nil?
end

#quoteString

The value that is used to quote data sections in a CSV file. The default value is a double-quote ("). If your data does not contain quoted sections, the value should be an empty string. If your data contains quoted newline characters, #quoted_newlines? should return true.

Returns:

  • (String)

    A string containing the character, such as "\"".



133
134
135
136
137
# File 'lib/google/cloud/bigquery/load_job.rb', line 133

def quote
  val = @gapi.configuration.load.quote
  val = "\"" if val.nil?
  val
end

#quoted_newlines?Boolean

Checks if quoted data sections may contain newline characters in a CSV file. The default is false.

Returns:

  • (Boolean)

    true when quoted newlines are allowed, false otherwise.



176
177
178
179
180
# File 'lib/google/cloud/bigquery/load_job.rb', line 176

def quoted_newlines?
  val = @gapi.configuration.load.allow_quoted_newlines
  val = false if val.nil?
  val
end

#range_partitioning?Boolean

Checks if the destination table will be range partitioned. See Creating and using integer range partitioned tables.

Returns:

  • (Boolean)

    true when the table is range partitioned, or false otherwise.



476
477
478
# File 'lib/google/cloud/bigquery/load_job.rb', line 476

def range_partitioning?
  !@gapi.configuration.load.range_partitioning.nil?
end

#range_partitioning_endInteger?

The end of range partitioning, exclusive. See Creating and using integer range partitioned tables.

Returns:

  • (Integer, nil)

    The end of range partitioning, exclusive, or nil if not range partitioned.



528
529
530
# File 'lib/google/cloud/bigquery/load_job.rb', line 528

def range_partitioning_end
  @gapi.configuration.load.range_partitioning.range.end if range_partitioning?
end

#range_partitioning_fieldString?

The field on which the destination table will be range partitioned, if any. The field must be a top-level NULLABLE/REQUIRED field. The only supported type is INTEGER/INT64. See Creating and using integer range partitioned tables.

Returns:

  • (String, nil)

    The partition field, if a field was configured, or nil if not range partitioned.



490
491
492
# File 'lib/google/cloud/bigquery/load_job.rb', line 490

def range_partitioning_field
  @gapi.configuration.load.range_partitioning.field if range_partitioning?
end

#range_partitioning_intervalInteger?

The width of each interval. See Creating and using integer range partitioned tables.

Returns:

  • (Integer, nil)

    The width of each interval, for data in range partitions, or nil if not range partitioned.



515
516
517
518
# File 'lib/google/cloud/bigquery/load_job.rb', line 515

def range_partitioning_interval
  return nil unless range_partitioning?
  @gapi.configuration.load.range_partitioning.range.interval
end

#range_partitioning_startInteger?

The start of range partitioning, inclusive. See Creating and using integer range partitioned tables.

Returns:

  • (Integer, nil)

    The start of range partitioning, inclusive, or nil if not range partitioned.



502
503
504
# File 'lib/google/cloud/bigquery/load_job.rb', line 502

def range_partitioning_start
  @gapi.configuration.load.range_partitioning.range.start if range_partitioning?
end

#schemaSchema?

The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.

The returned object is frozen and changes are not allowed. Use Table#schema to update the schema.

Returns:

  • (Schema, nil)

    A schema object, or nil.



290
291
292
# File 'lib/google/cloud/bigquery/load_job.rb', line 290

def schema
  Schema.from_gapi(@gapi.configuration.load.schema).freeze
end

#schema_update_optionsArray<String>

Allows the schema of the destination table to be updated as a side effect of the load job if a schema is autodetected or supplied in the job configuration. Schema update options are supported in two cases: when write disposition is WRITE_APPEND; when write disposition is WRITE_TRUNCATE and the destination table is a partition of a table, specified by partition decorators. For normal tables, WRITE_TRUNCATE will always overwrite the schema. One or more of the following values are specified:

  • ALLOW_FIELD_ADDITION: allow adding a nullable field to the schema.
  • ALLOW_FIELD_RELAXATION: allow relaxing a required field in the original schema to nullable.

Returns:

  • (Array<String>)

    An array of strings.



310
311
312
# File 'lib/google/cloud/bigquery/load_job.rb', line 310

def schema_update_options
  Array @gapi.configuration.load.schema_update_options
end

#skip_leading_rowsInteger

The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.

Returns:

  • (Integer)

    The number of header rows at the top of a CSV file to skip.



97
98
99
# File 'lib/google/cloud/bigquery/load_job.rb', line 97

def skip_leading_rows
  @gapi.configuration.load.skip_leading_rows || 0
end

#sourcesObject

The URI or URIs representing the Google Cloud Storage files from which the operation loads data.



57
58
59
# File 'lib/google/cloud/bigquery/load_job.rb', line 57

def sources
  Array @gapi.configuration.load.source_uris
end

#time_partitioning?Boolean

Checks if the destination table will be time partitioned. See Partitioned Tables.

Returns:

  • (Boolean)

    true when the table will be time-partitioned, or false otherwise.



541
542
543
# File 'lib/google/cloud/bigquery/load_job.rb', line 541

def time_partitioning?
  !@gapi.configuration.load.time_partitioning.nil?
end

#time_partitioning_expirationInteger?

The expiration for the destination table time partitions, if any, in seconds. See Partitioned Tables.

Returns:

  • (Integer, nil)

    The expiration time, in seconds, for data in time partitions, or nil if not present.



585
586
587
588
589
590
# File 'lib/google/cloud/bigquery/load_job.rb', line 585

def time_partitioning_expiration
  return nil unless time_partitioning?
  return nil if @gapi.configuration.load.time_partitioning.expiration_ms.nil?

  @gapi.configuration.load.time_partitioning.expiration_ms / 1_000
end

#time_partitioning_fieldString?

The field on which the destination table will be time partitioned, if any. If not set, the destination table will be time partitioned by pseudo column _PARTITIONTIME; if set, the table will be time partitioned by this field. See Partitioned Tables.

Returns:

  • (String, nil)

    The time partition field, if a field was configured. nil if not time partitioned or not set (partitioned by pseudo column '_PARTITIONTIME').



571
572
573
# File 'lib/google/cloud/bigquery/load_job.rb', line 571

def time_partitioning_field
  @gapi.configuration.load.time_partitioning.field if time_partitioning?
end

#time_partitioning_require_filter?Boolean

If set to true, queries over the destination table will require a time partition filter that can be used for partition elimination to be specified. See Partitioned Tables.

Returns:

  • (Boolean)

    true when a time partition filter will be required, or false otherwise.



603
604
605
606
607
# File 'lib/google/cloud/bigquery/load_job.rb', line 603

def time_partitioning_require_filter?
  tp = @gapi.configuration.load.time_partitioning
  return false if tp.nil? || tp.require_partition_filter.nil?
  tp.require_partition_filter
end

#time_partitioning_typeString?

The period for which the destination table will be time partitioned, if any. See Partitioned Tables.

Returns:

  • (String, nil)

    The time partition type. The supported types are DAY, HOUR, MONTH, and YEAR, which will generate one partition per day, hour, month, and year, respectively; or nil if not present.



555
556
557
# File 'lib/google/cloud/bigquery/load_job.rb', line 555

def time_partitioning_type
  @gapi.configuration.load.time_partitioning.type if time_partitioning?
end

#utf8?Boolean

Checks if the character encoding of the data is UTF-8. This is the default.

Returns:

  • (Boolean)

    true when the character encoding is UTF-8, false otherwise.



108
109
110
111
112
# File 'lib/google/cloud/bigquery/load_job.rb', line 108

def utf8?
  val = @gapi.configuration.load.encoding
  return true if val.nil?
  val == "UTF-8"
end