Class: Google::Cloud::Bigquery::External::DataSource

Inherits:

Object

Object
Google::Cloud::Bigquery::External::DataSource

show all

Defined in:: lib/google/cloud/bigquery/external/data_source.rb

Overview

DataSource

External::DataSource and its subclasses represents an external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

The AVRO and Datastore Backup formats use DataSource. See CsvSource, JsonSource, SheetsSource, BigtableSource for the other formats.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

avro_url = "gs://bucket/path/to/*.avro"
avro_table = bigquery.external avro_url do |avro|
  avro.autodetect = true
end

data = bigquery.query "SELECT * FROM my_ext_table",
                      external: { my_ext_table: avro_table }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Hive partitioning options:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*"
source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/"
external_data = bigquery.external gcs_uri, format: :parquet do |ext|
  ext.hive_partitioning_mode = :auto
  ext.hive_partitioning_require_partition_filter = true
  ext.hive_partitioning_source_uri_prefix = source_uri_prefix
end

external_data.hive_partitioning? #=> true
external_data.hive_partitioning_mode #=> "AUTO"
external_data.hive_partitioning_require_partition_filter? #=> true
external_data.hive_partitioning_source_uri_prefix #=> source_uri_prefix

Direct Known Subclasses

AvroSource, BigtableSource, CsvSource, JsonSource, ParquetSource, SheetsSource

Instance Method Summary collapse

#autodetect ⇒ Boolean
Indicates if the schema and format options are detected automatically.
#autodetect=(new_autodetect) ⇒ Object
Set whether to detect schema and format options automatically.
#avro? ⇒ Boolean
Whether the data format is "AVRO".
#backup? ⇒ Boolean
Whether the data format is "DATASTORE_BACKUP".
#bigtable? ⇒ Boolean
Whether the data format is "BIGTABLE".
#compression ⇒ String
The compression type of the data source.
#compression=(new_compression) ⇒ Object
Set the compression type of the data source.
#csv? ⇒ Boolean
Whether the data format is "CSV".
#date_format ⇒ String^?
Format used to parse DATE values.
#date_format=(date_format) ⇒ Object
Sets the format used to parse DATE values.
#datetime_format ⇒ String^?
Format used to parse DATETIME values.
#datetime_format=(datetime_format) ⇒ Object
Sets the format used to parse DATETIME values.
#format ⇒ String
The data format.
#hive_partitioning? ⇒ Boolean
Checks if hive partitioning options are set.
#hive_partitioning_mode ⇒ String^?
The mode of hive partitioning to use when reading data.
#hive_partitioning_mode=(mode) ⇒ Object
Sets the mode of hive partitioning to use when reading data.
#hive_partitioning_require_partition_filter=(require_partition_filter) ⇒ Object
Sets whether queries over the table using this external data source require a partition filter that can be used for partition elimination to be specified.
#hive_partitioning_require_partition_filter? ⇒ Boolean
Whether queries over the table using this external data source require a partition filter that can be used for partition elimination to be specified.
#hive_partitioning_source_uri_prefix ⇒ String^?
The common prefix for all source uris when hive partition detection is requested.
#hive_partitioning_source_uri_prefix=(source_uri_prefix) ⇒ Object
Sets the common prefix for all source uris when hive partition detection is requested.
#ignore_unknown ⇒ Boolean
Indicates if BigQuery should allow extra values that are not represented in the table schema.
#ignore_unknown=(new_ignore_unknown) ⇒ Object
Set whether BigQuery should allow extra values that are not represented in the table schema.
#json? ⇒ Boolean
Whether the data format is "NEWLINE_DELIMITED_JSON".
#max_bad_records ⇒ Integer
The maximum number of bad records that BigQuery can ignore when reading data.
#max_bad_records=(new_max_bad_records) ⇒ Object
Set the maximum number of bad records that BigQuery can ignore when reading data.
#orc? ⇒ Boolean
Whether the data format is "ORC".
#parquet? ⇒ Boolean
Whether the data format is "PARQUET".
#reference_file_schema_uri ⇒ String^?
The URI of a reference file with the table schema.
#reference_file_schema_uri=(uri) ⇒ Object
Sets the URI of a reference file with the table schema.
#sheets? ⇒ Boolean
Whether the data format is "GOOGLE_SHEETS".
#time_format ⇒ String^?
Format used to parse TIME values.
#time_format=(time_format) ⇒ Object
Sets the format used to parse TIME values.
#time_zone ⇒ String^?
Time zone used when parsing timestamp values that do not have specific time zone information (e.g. 2024-04-20 12:34:56).
#time_zone=(time_zone) ⇒ Object
Sets the time zone used when parsing timestamp values that do not have specific time zone information (e.g. 2024-04-20 12:34:56).
#timestamp_format ⇒ String^?
Format used to parse TIMESTAMP values.
#timestamp_format=(timestamp_format) ⇒ Object
Sets the format used to parse TIMESTAMP values.
#urls ⇒ Array<String>
The fully-qualified URIs that point to your data in Google Cloud.

Instance Method Details

#autodetect ⇒ `Boolean`

Indicates if the schema and format options are detected automatically.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
end

csv_table.autodetect #=> true

Returns:

(Boolean)



318
319
320

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 318

def autodetect
  @gapi.autodetect
end

#autodetect=(new_autodetect) ⇒ `Object`

Set whether to detect schema and format options automatically. Any option specified explicitly will be honored.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
end

csv_table.autodetect #=> true

Parameters:

new_autodetect (Boolean) —
New autodetect value

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 340

def autodetect= new_autodetect
  frozen_check!
  @gapi.autodetect = new_autodetect
end

#avro? ⇒ `Boolean`

Whether the data format is "AVRO".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

avro_url = "gs://bucket/path/to/*.avro"
avro_table = bigquery.external avro_url

avro_table.format #=> "AVRO"
avro_table.avro? #=> true

Returns:

(Boolean)



183
184
185

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 183

def avro?
  @gapi.source_format == "AVRO"
end

#backup? ⇒ `Boolean`

Whether the data format is "DATASTORE_BACKUP".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

backup_url = "gs://bucket/path/to/data.backup_info"
backup_table = bigquery.external backup_url

backup_table.format #=> "DATASTORE_BACKUP"
backup_table.backup? #=> true

Returns:

(Boolean)



203
204
205

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 203

def backup?
  @gapi.source_format == "DATASTORE_BACKUP"
end

#bigtable? ⇒ `Boolean`

Whether the data format is "BIGTABLE".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

bigtable_url = "https://googleapis.com/bigtable/projects/..."
bigtable_table = bigquery.external bigtable_url

bigtable_table.format #=> "BIGTABLE"
bigtable_table.bigtable? #=> true

Returns:

(Boolean)



223
224
225

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 223

def bigtable?
  @gapi.source_format == "BIGTABLE"
end

#compression ⇒ `String`

The compression type of the data source. Possible values include "GZIP" and nil. The default value is nil. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats. Optional.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.compression = "GZIP"
end

csv_table.compression #=> "GZIP"

Returns:

(String)



364
365
366

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 364

def compression
  @gapi.compression
end

#compression=(new_compression) ⇒ `Object`

Set the compression type of the data source. Possible values include "GZIP" and nil. The default value is nil. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats. Optional.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.compression = "GZIP"
end

csv_table.compression #=> "GZIP"

Parameters:

new_compression (String) —
New compression value

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 388

def compression= new_compression
  frozen_check!
  @gapi.compression = new_compression
end

#csv? ⇒ `Boolean`

Whether the data format is "CSV".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url

csv_table.format #=> "CSV"
csv_table.csv? #=> true

Returns:

(Boolean)



123
124
125

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 123

def csv?
  @gapi.source_format == "CSV"
end

#date_format ⇒ `String`^?

Format used to parse DATE values. Supports SQL-style values. See date and time formatting guide

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

external_data = bigquery.external "gs://bucket/path/to/data.csv" do |ext|
  ext.date_format = "YYYY-MM-DD"
end

external_data.date_format #=> "YYYY-MM-DD"

Returns:

(String, nil) —
The date format pattern. nil if not set.



984
985
986

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 984

def date_format
  @gapi.date_format
end

#date_format=(date_format) ⇒ `Object`

Sets the format used to parse DATE values. Supports SQL-style values. See date and time formatting guide

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

external_data = bigquery.external "gs://bucket/path/to/data.csv" do |ext|
  ext.date_format = "YYYY-MM-DD"
end

external_data.date_format #=> "YYYY-MM-DD"

Parameters:

date_format (String, nil) —
The date format pattern. nil to unset.

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 1006

def date_format= date_format
  frozen_check!
  @gapi.date_format = date_format
end

#datetime_format ⇒ `String`^?

Format used to parse DATETIME values. Supports SQL-style values. See date and time formatting guide

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

external_data = bigquery.external "gs://bucket/path/to/data.csv" do |ext|
  ext.datetime_format = "YYYY-MM-DD HH24:MI:SS"
end

external_data.datetime_format #=> "YYYY-MM-DD HH24:MI:SS"

Returns:

(String, nil) —
The datetime format pattern. nil if not set.



940
941
942

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 940

def datetime_format
  @gapi.datetime_format
end

#datetime_format=(datetime_format) ⇒ `Object`

Sets the format used to parse DATETIME values. Supports SQL-style values. See date and time formatting guide

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

external_data = bigquery.external "gs://bucket/path/to/data.csv" do |ext|
  ext.datetime_format = "YYYY-MM-DD HH24:MI:SS"
end

external_data.datetime_format #=> "YYYY-MM-DD HH24:MI:SS"

Parameters:

datetime_format (String, nil) —
The datetime format pattern. nil to unset.

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 962

def datetime_format= datetime_format
  frozen_check!
  @gapi.datetime_format = datetime_format
end

#format ⇒ `String`

The data format. For CSV files, specify "CSV". For Google sheets, specify "GOOGLE_SHEETS". For newline-delimited JSON, specify "NEWLINE_DELIMITED_JSON". For Avro files, specify "AVRO". For Google Cloud Datastore backups, specify "DATASTORE_BACKUP". [Beta] For Google Cloud Bigtable, specify "BIGTABLE".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url

csv_table.format #=> "CSV"

Returns:

(String)



103
104
105

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 103

def format
  @gapi.source_format
end

#hive_partitioning? ⇒ `Boolean`

Checks if hive partitioning options are set.

Not all storage formats support hive partitioning. Requesting hive partitioning on an unsupported format will lead to an error. Currently supported types include: avro, csv, json, orc and parquet. If your data is stored in ORC or Parquet on Cloud Storage, see Querying columnar formats on Cloud Storage.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*"
source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/"
external_data = bigquery.external gcs_uri, format: :parquet do |ext|
  ext.hive_partitioning_mode = :auto
  ext.hive_partitioning_require_partition_filter = true
  ext.hive_partitioning_source_uri_prefix = source_uri_prefix
end

external_data.hive_partitioning? #=> true
external_data.hive_partitioning_mode #=> "AUTO"
external_data.hive_partitioning_require_partition_filter? #=> true
external_data.hive_partitioning_source_uri_prefix #=> source_uri_prefix

Returns:

(Boolean) —
true when hive partitioning options are set, or false otherwise.



578
579
580

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 578

def hive_partitioning?
  !@gapi.hive_partitioning_options.nil?
end

#hive_partitioning_mode ⇒ `String`^?

The mode of hive partitioning to use when reading data. The following modes are supported:

AUTO: automatically infer partition key name(s) and type(s).
STRINGS: automatically infer partition key name(s). All types are interpreted as strings.
CUSTOM: partition key schema is encoded in the source URI prefix.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*"
source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/"
external_data = bigquery.external gcs_uri, format: :parquet do |ext|
  ext.hive_partitioning_mode = :auto
  ext.hive_partitioning_require_partition_filter = true
  ext.hive_partitioning_source_uri_prefix = source_uri_prefix
end

external_data.hive_partitioning? #=> true
external_data.hive_partitioning_mode #=> "AUTO"
external_data.hive_partitioning_require_partition_filter? #=> true
external_data.hive_partitioning_source_uri_prefix #=> source_uri_prefix

Returns:

(String, nil) —
The mode of hive partitioning, or nil if not set.



609
610
611

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 609

def hive_partitioning_mode
  @gapi.hive_partitioning_options.mode if hive_partitioning?
end

#hive_partitioning_mode=(mode) ⇒ `Object`

Sets the mode of hive partitioning to use when reading data. The following modes are supported:

auto: automatically infer partition key name(s) and type(s).
strings: automatically infer partition key name(s). All types are interpreted as strings.
custom: partition key schema is encoded in the source URI prefix.

See #format, #hive_partitioning_require_partition_filter= and #hive_partitioning_source_uri_prefix=.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*"
source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/"
external_data = bigquery.external gcs_uri, format: :parquet do |ext|
  ext.hive_partitioning_mode = :auto
  ext.hive_partitioning_require_partition_filter = true
  ext.hive_partitioning_source_uri_prefix = source_uri_prefix
end

external_data.hive_partitioning? #=> true
external_data.hive_partitioning_mode #=> "AUTO"
external_data.hive_partitioning_require_partition_filter? #=> true
external_data.hive_partitioning_source_uri_prefix #=> source_uri_prefix

Parameters:

mode (String, Symbol) —
The mode of hive partitioning to use when reading data.

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 647

def hive_partitioning_mode= mode
  @gapi.hive_partitioning_options ||= Google::Apis::BigqueryV2::HivePartitioningOptions.new
  @gapi.hive_partitioning_options.mode = mode.to_s.upcase
end

#hive_partitioning_require_partition_filter=(require_partition_filter) ⇒ `Object`

Sets whether queries over the table using this external data source require a partition filter that can be used for partition elimination to be specified.

See #format, #hive_partitioning_mode= and #hive_partitioning_source_uri_prefix=.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*"
source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/"
external_data = bigquery.external gcs_uri, format: :parquet do |ext|
  ext.hive_partitioning_mode = :auto
  ext.hive_partitioning_require_partition_filter = true
  ext.hive_partitioning_source_uri_prefix = source_uri_prefix
end

external_data.hive_partitioning? #=> true
external_data.hive_partitioning_mode #=> "AUTO"
external_data.hive_partitioning_require_partition_filter? #=> true
external_data.hive_partitioning_source_uri_prefix #=> source_uri_prefix

Parameters:

require_partition_filter (Boolean) —
true if a partition filter must be specified, false otherwise.

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 708

def hive_partitioning_require_partition_filter= require_partition_filter
  @gapi.hive_partitioning_options ||= Google::Apis::BigqueryV2::HivePartitioningOptions.new
  @gapi.hive_partitioning_options.require_partition_filter = require_partition_filter
end

#hive_partitioning_require_partition_filter? ⇒ `Boolean`

Whether queries over the table using this external data source require a partition filter that can be used for partition elimination to be specified. Note that this field should only be true when creating a permanent external table or querying a temporary external table.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*"
source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/"
external_data = bigquery.external gcs_uri, format: :parquet do |ext|
  ext.hive_partitioning_mode = :auto
  ext.hive_partitioning_require_partition_filter = true
  ext.hive_partitioning_source_uri_prefix = source_uri_prefix
end

external_data.hive_partitioning? #=> true
external_data.hive_partitioning_mode #=> "AUTO"
external_data.hive_partitioning_require_partition_filter? #=> true
external_data.hive_partitioning_source_uri_prefix #=> source_uri_prefix

Returns:

(Boolean) —
true when queries over this table require a partition filter, or false otherwise.

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 677

def hive_partitioning_require_partition_filter?
  return false unless hive_partitioning?
  !@gapi.hive_partitioning_options.require_partition_filter.nil?
end

#hive_partitioning_source_uri_prefix ⇒ `String`^?

The common prefix for all source uris when hive partition detection is requested. The prefix must end immediately before the partition key encoding begins. For example, consider files following this data layout:

gs://bucket/path_to_table/dt=2019-01-01/country=BR/id=7/file.avro
gs://bucket/path_to_table/dt=2018-12-31/country=CA/id=3/file.avro

When hive partitioning is requested with either AUTO or STRINGS mode, the common prefix can be either of gs://bucket/path_to_table or gs://bucket/path_to_table/ (trailing slash does not matter).

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*"
source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/"
external_data = bigquery.external gcs_uri, format: :parquet do |ext|
  ext.hive_partitioning_mode = :auto
  ext.hive_partitioning_require_partition_filter = true
  ext.hive_partitioning_source_uri_prefix = source_uri_prefix
end

external_data.hive_partitioning? #=> true
external_data.hive_partitioning_mode #=> "AUTO"
external_data.hive_partitioning_require_partition_filter? #=> true
external_data.hive_partitioning_source_uri_prefix #=> source_uri_prefix

Returns:

(String, nil) —
The common prefix for all source uris, or nil if not set.



746
747
748

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 746

def hive_partitioning_source_uri_prefix
  @gapi.hive_partitioning_options.source_uri_prefix if hive_partitioning?
end

#hive_partitioning_source_uri_prefix=(source_uri_prefix) ⇒ `Object`

Sets the common prefix for all source uris when hive partition detection is requested. The prefix must end immediately before the partition key encoding begins. For example, consider files following this data layout:

gs://bucket/path_to_table/dt=2019-01-01/country=BR/id=7/file.avro
gs://bucket/path_to_table/dt=2018-12-31/country=CA/id=3/file.avro

See #format, #hive_partitioning_mode= and #hive_partitioning_require_partition_filter=.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*"
source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/"
external_data = bigquery.external gcs_uri, format: :parquet do |ext|
  ext.hive_partitioning_mode = :auto
  ext.hive_partitioning_require_partition_filter = true
  ext.hive_partitioning_source_uri_prefix = source_uri_prefix
end

external_data.hive_partitioning? #=> true
external_data.hive_partitioning_mode #=> "AUTO"
external_data.hive_partitioning_require_partition_filter? #=> true
external_data.hive_partitioning_source_uri_prefix #=> source_uri_prefix

Parameters:

source_uri_prefix (String) —
The common prefix for all source uris.

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 785

def hive_partitioning_source_uri_prefix= source_uri_prefix
  @gapi.hive_partitioning_options ||= Google::Apis::BigqueryV2::HivePartitioningOptions.new
  @gapi.hive_partitioning_options.source_uri_prefix = source_uri_prefix
end

#ignore_unknown ⇒ `Boolean`

Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

BigQuery treats trailing columns as an extra in CSV, named values that don't match any column names in JSON. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats. Optional.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.ignore_unknown = true
end

csv_table.ignore_unknown #=> true

Returns:

(Boolean)



419
420
421

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 419

def ignore_unknown
  @gapi.ignore_unknown_values
end

#ignore_unknown=(new_ignore_unknown) ⇒ `Object`

Set whether BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.ignore_unknown = true
end

csv_table.ignore_unknown #=> true

Parameters:

new_ignore_unknown (Boolean) —
New ignore_unknown value

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 449

def ignore_unknown= new_ignore_unknown
  frozen_check!
  @gapi.ignore_unknown_values = new_ignore_unknown
end

#json? ⇒ `Boolean`

Whether the data format is "NEWLINE_DELIMITED_JSON".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

json_url = "gs://bucket/path/to/data.json"
json_table = bigquery.external json_url

json_table.format #=> "NEWLINE_DELIMITED_JSON"
json_table.json? #=> true

Returns:

(Boolean)



143
144
145

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 143

def json?
  @gapi.source_format == "NEWLINE_DELIMITED_JSON"
end

#max_bad_records ⇒ `Integer`

The maximum number of bad records that BigQuery can ignore when reading data. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.max_bad_records = 10
end

csv_table.max_bad_records #=> 10

Returns:

(Integer)



476
477
478

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 476

def max_bad_records
  @gapi.max_bad_records
end

#max_bad_records=(new_max_bad_records) ⇒ `Object`

Set the maximum number of bad records that BigQuery can ignore when reading data. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.max_bad_records = 10
end

csv_table.max_bad_records #=> 10

Parameters:

new_max_bad_records (Integer) —
New max_bad_records value

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 502

def max_bad_records= new_max_bad_records
  frozen_check!
  @gapi.max_bad_records = new_max_bad_records
end

#orc? ⇒ `Boolean`

Whether the data format is "ORC".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*"
source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/"
external_data = bigquery.external gcs_uri, format: :orc do |ext|
  ext.hive_partitioning_mode = :auto
  ext.hive_partitioning_source_uri_prefix = source_uri_prefix
end
external_data.format #=> "ORC"
external_data.orc? #=> true

Returns:

(Boolean)



246
247
248

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 246

def orc?
  @gapi.source_format == "ORC"
end

#parquet? ⇒ `Boolean`

Whether the data format is "PARQUET".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*"
source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/"
external_data = bigquery.external gcs_uri, format: :parquet do |ext|
  ext.hive_partitioning_mode = :auto
  ext.hive_partitioning_source_uri_prefix = source_uri_prefix
end
external_data.format #=> "PARQUET"
external_data.parquet? #=> true

Returns:

(Boolean)



269
270
271

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 269

def parquet?
  @gapi.source_format == "PARQUET"
end

#reference_file_schema_uri ⇒ `String`^?

The URI of a reference file with the table schema. This is enabled for the following formats: AVRO, PARQUET, ORC.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

external_data = bigquery.external "gs://bucket/path/to/data.avro", format: :avro do |ext|
  ext.reference_file_schema_uri = "gs://bucket/path/to/schema.json"
end

external_data.reference_file_schema_uri #=> "gs://bucket/path/to/schema.json"

Returns:

(String, nil) —
The URI. nil if not set.



524
525
526

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 524

def reference_file_schema_uri
  @gapi.reference_file_schema_uri
end

#reference_file_schema_uri=(uri) ⇒ `Object`

Sets the URI of a reference file with the table schema. This is enabled for the following formats: AVRO, PARQUET, ORC.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

external_data = bigquery.external "gs://bucket/path/to/data.avro", format: :avro do |ext|
  ext.reference_file_schema_uri = "gs://bucket/path/to/schema.json"
end

external_data.reference_file_schema_uri #=> "gs://bucket/path/to/schema.json"

Parameters:

uri (String, nil) —
The URI. nil to unset.

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 545

def reference_file_schema_uri= uri
  frozen_check!
  @gapi.reference_file_schema_uri = uri
end

#sheets? ⇒ `Boolean`

Whether the data format is "GOOGLE_SHEETS".

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sheets_url = "https://docs.google.com/spreadsheets/d/1234567980"
sheets_table = bigquery.external sheets_url

sheets_table.format #=> "GOOGLE_SHEETS"
sheets_table.sheets? #=> true

Returns:

(Boolean)



163
164
165

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 163

def sheets?
  @gapi.source_format == "GOOGLE_SHEETS"
end

#time_format ⇒ `String`^?

Format used to parse TIME values. Supports SQL-style values. See date and time formatting guide

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

external_data = bigquery.external "gs://bucket/path/to/data.csv" do |ext|
  ext.time_format = "HH24:MI:SS"
end

external_data.time_format #=> "HH24:MI:SS"

Returns:

(String, nil) —
The time format pattern. nil if not set.



852
853
854

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 852

def time_format
  @gapi.time_format
end

#time_format=(time_format) ⇒ `Object`

Sets the format used to parse TIME values. Supports SQL-style See date and time formatting guide values.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

external_data = bigquery.external "gs://bucket/path/to/data.csv" do |ext|
  ext.time_format = "HH24:MI:SS"
end

external_data.time_format #=> "HH24:MI:SS"

Parameters:

time_format (String, nil) —
The time format pattern. nil if not set.

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 874

def time_format= time_format
  frozen_check!
  @gapi.time_format = time_format
end

#time_zone ⇒ `String`^?

Time zone used when parsing timestamp values that do not have specific time zone information (e.g. 2024-04-20 12:34:56). The expected format is an IANA timezone string (e.g. America/Los_Angeles).

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

external_data = bigquery.external "gs://bucket/path/to/data.csv" do |ext|
  ext.time_zone = "America/Los_Angeles"
end

external_data.time_zone #=> "America/Los_Angeles"

Returns:

(String, nil) —
The IANA time zone name. nil if not set.



808
809
810

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 808

def time_zone
  @gapi.time_zone
end

#time_zone=(time_zone) ⇒ `Object`

Sets the time zone used when parsing timestamp values that do not have specific time zone information (e.g. 2024-04-20 12:34:56). The expected format is an IANA timezone string (e.g. America/Los_Angeles).

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

external_data = bigquery.external "gs://bucket/path/to/data.csv" do |ext|
  ext.time_zone = "America/Los_Angeles"
end

external_data.time_zone #=> "America/Los_Angeles"

Parameters:

time_zone (String, nil) —
The IANA time zone name. nil to unset.

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 830

def time_zone= time_zone
  frozen_check!
  @gapi.time_zone = time_zone
end

#timestamp_format ⇒ `String`^?

Format used to parse TIMESTAMP values. Supports SQL-style values. See date and time formatting guide

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

external_data = bigquery.external "gs://bucket/path/to/data.csv" do |ext|
  ext.timestamp_format = "YYYY-MM-DD HH24:MI:SS.FF3 TZH"
end

external_data.timestamp_format #=> "YYYY-MM-DD HH24:MI:SS.FF3 TZH"

Returns:

(String, nil) —
The timestamp format pattern. nil if not set.



896
897
898

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 896

def timestamp_format
  @gapi.timestamp_format
end

#timestamp_format=(timestamp_format) ⇒ `Object`

Sets the format used to parse TIMESTAMP values. Supports SQL-style values. See date and time formatting guide

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

external_data = bigquery.external "gs://bucket/path/to/data.csv" do |ext|
  ext.timestamp_format = "YYYY-MM-DD HH24:MI:SS.FF3 TZH"
end

external_data.timestamp_format #=> "YYYY-MM-DD HH24:MI:SS.FF3 TZH"

Parameters:

timestamp_format (String, nil) —
The timestamp format pattern. nil to unset.

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 918

def timestamp_format= timestamp_format
  frozen_check!
  @gapi.timestamp_format = timestamp_format
end

#urls ⇒ `Array<String>`

The fully-qualified URIs that point to your data in Google Cloud. For Google Cloud Storage URIs: Each URI can contain one '' wildcard character and it must come after the 'bucket' name. Size limits related to load jobs apply to external data sources. For Google Cloud Bigtable URIs: Exactly one URI can be specified and it has be a fully specified and valid HTTPS URL for a Google Cloud Bigtable table. For Google Cloud Datastore backups, exactly one URI can be specified, and it must end with '.backup_info'. Also, the '' wildcard character is not allowed.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url

csv_table.urls #=> ["gs://bucket/path/to/data.csv"]

Returns:

(Array<String>)



296
297
298

# File 'lib/google/cloud/bigquery/external/data_source.rb', line 296

def urls
  @gapi.source_uris
end

Class: Google::Cloud::Bigquery::External::DataSource

Overview

DataSource

Examples:

Hive partitioning options:

Direct Known Subclasses

Instance Method Summary collapse

Instance Method Details

#autodetect ⇒ Boolean

Examples:

#autodetect=(new_autodetect) ⇒ Object

Examples:

#avro? ⇒ Boolean

Examples:

#backup? ⇒ Boolean

Examples:

#bigtable? ⇒ Boolean

Examples:

#compression ⇒ String

Examples:

#compression=(new_compression) ⇒ Object

Examples:

#csv? ⇒ Boolean

Examples:

#date_format ⇒ String?

Examples:

#date_format=(date_format) ⇒ Object

Examples:

#datetime_format ⇒ String?

Examples:

#datetime_format=(datetime_format) ⇒ Object

Examples:

#format ⇒ String

Examples:

#hive_partitioning? ⇒ Boolean

Examples:

#hive_partitioning_mode ⇒ String?

Examples:

#hive_partitioning_mode=(mode) ⇒ Object

Examples:

#hive_partitioning_require_partition_filter=(require_partition_filter) ⇒ Object

Examples:

#hive_partitioning_require_partition_filter? ⇒ Boolean

Examples:

#hive_partitioning_source_uri_prefix ⇒ String?

Examples:

#hive_partitioning_source_uri_prefix=(source_uri_prefix) ⇒ Object

Examples:

#ignore_unknown ⇒ Boolean

Examples:

#ignore_unknown=(new_ignore_unknown) ⇒ Object

Examples:

#json? ⇒ Boolean

Examples:

#max_bad_records ⇒ Integer

Examples:

#max_bad_records=(new_max_bad_records) ⇒ Object

Examples:

#orc? ⇒ Boolean

Examples:

#parquet? ⇒ Boolean

Examples:

#reference_file_schema_uri ⇒ String?

Examples:

#reference_file_schema_uri=(uri) ⇒ Object

Examples:

#sheets? ⇒ Boolean

Examples:

#time_format ⇒ String?

Examples:

#time_format=(time_format) ⇒ Object

Examples:

#time_zone ⇒ String?

Examples:

#time_zone=(time_zone) ⇒ Object

Examples:

#timestamp_format ⇒ String?

Examples:

#timestamp_format=(timestamp_format) ⇒ Object

Examples:

#autodetect ⇒ `Boolean`

#autodetect=(new_autodetect) ⇒ `Object`

#avro? ⇒ `Boolean`

#backup? ⇒ `Boolean`

#bigtable? ⇒ `Boolean`

#compression ⇒ `String`

#compression=(new_compression) ⇒ `Object`

#csv? ⇒ `Boolean`

#date_format ⇒ `String`^?

#date_format=(date_format) ⇒ `Object`

#datetime_format ⇒ `String`^?

#datetime_format=(datetime_format) ⇒ `Object`

#format ⇒ `String`

#hive_partitioning? ⇒ `Boolean`

#hive_partitioning_mode ⇒ `String`^?

#hive_partitioning_mode=(mode) ⇒ `Object`

#hive_partitioning_require_partition_filter=(require_partition_filter) ⇒ `Object`

#hive_partitioning_require_partition_filter? ⇒ `Boolean`

#hive_partitioning_source_uri_prefix ⇒ `String`^?

#hive_partitioning_source_uri_prefix=(source_uri_prefix) ⇒ `Object`

#ignore_unknown ⇒ `Boolean`

#ignore_unknown=(new_ignore_unknown) ⇒ `Object`

#json? ⇒ `Boolean`

#max_bad_records ⇒ `Integer`

#max_bad_records=(new_max_bad_records) ⇒ `Object`

#orc? ⇒ `Boolean`

#parquet? ⇒ `Boolean`

#reference_file_schema_uri ⇒ `String`^?

#reference_file_schema_uri=(uri) ⇒ `Object`

#sheets? ⇒ `Boolean`

#time_format ⇒ `String`^?

#time_format=(time_format) ⇒ `Object`

#time_zone ⇒ `String`^?

#time_zone=(time_zone) ⇒ `Object`

#timestamp_format ⇒ `String`^?

#timestamp_format=(timestamp_format) ⇒ `Object`

#urls ⇒ `Array<String>`