Class: Google::Cloud::Bigquery::Project

Inherits:

Object

Object
Google::Cloud::Bigquery::Project

show all

Defined in:: lib/google/cloud/bigquery/project.rb,
lib/google/cloud/bigquery/project/list.rb

Overview

Project

Projects are top-level containers in Google Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.

Google::Cloud::Bigquery::Project is the main object for interacting with Google BigQuery. Dataset objects are created, accessed, and deleted by Google::Cloud::Bigquery::Project.

See Google::Cloud#bigquery.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

Defined Under Namespace

Classes: List

Instance Attribute Summary collapse

#name ⇒ String^? readonly
The descriptive name of the project.
#numeric_id ⇒ Integer^? readonly
The numeric ID of the project.

Data collapse

#copy(source_table, destination_table, create: nil, write: nil, reservation: nil) {|job| ... } ⇒ Boolean
Copies the data from the source table to the destination table using a synchronous method that blocks for a response.
#copy_job(source_table, destination_table, create: nil, write: nil, job_id: nil, prefix: nil, labels: nil, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::CopyJob
Copies the data from the source table to the destination table using an asynchronous method.
#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, location: nil, access_policy_version: nil, dataset_view: nil) {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset
Creates a new dataset.
#dataset(dataset_id, skip_lookup: nil, project_id: nil, access_policy_version: nil, dataset_view: nil) ⇒ Google::Cloud::Bigquery::Dataset^?
Retrieves an existing dataset by ID.
#datasets(all: nil, filter: nil, token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Dataset>
Retrieves the list of datasets belonging to the project.
#encryption(kms_key: nil) ⇒ Google::Cloud::Bigquery::EncryptionConfiguration
Creates a new Bigquery::EncryptionConfiguration instance.
#external(url, format: nil) {|ext| ... } ⇒ External::DataSource
Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery.
#extract(source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, reservation: nil) {|job| ... } ⇒ Boolean
Extracts the data from a table or exports a model to Google Cloud Storage using a synchronous method that blocks for a response.
#extract_job(source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, job_id: nil, prefix: nil, labels: nil, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::ExtractJob
Extracts the data from a table or exports a model to Google Cloud Storage asynchronously, immediately returning an ExtractJob that can be used to track the progress of the export job.
#job(job_id, location: nil) ⇒ Google::Cloud::Bigquery::Job^?
Retrieves an existing job by ID.
#jobs(all: nil, token: nil, max: nil, filter: nil, min_created_at: nil, max_created_at: nil, parent_job: nil) ⇒ Array<Google::Cloud::Bigquery::Job>
Retrieves the list of jobs belonging to the project.
#load(table_id, files, dataset_id: "_SESSION", format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ Boolean
Loads data into the provided destination table using a synchronous method that blocks for a response.
#load_job(table_id, files, dataset_id: nil, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, project_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ Google::Cloud::Bigquery::LoadJob
Loads data into the provided destination table using an asynchronous method.
#projects(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Project>
Retrieves the list of all projects for which the currently authorized account has been granted any project role.
#query(query, params: nil, types: nil, external: nil, max: nil, cache: true, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil, session_id: nil, format_options_use_int64_timestamp: true, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::Data
Queries data and waits for the results.
#query_job(query, params: nil, types: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dryrun: nil, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil, create_session: nil, session_id: nil, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::QueryJob
Queries data by creating a query job.
#schema {|schema| ... } ⇒ Google::Cloud::Bigquery::Schema
Creates a new schema instance.
#time(hour, minute, second) ⇒ Bigquery::Time
Creates a Bigquery::Time object to represent a time, independent of a specific date.

Instance Method Summary collapse

#initialize(service) ⇒ Project constructor
Creates a new Service instance.
#project_id ⇒ Object (also: #project)
The BigQuery project connected to.
#service_account_email ⇒ String
The email address of the service account for the project used to connect to BigQuery.
#universe_domain ⇒ String
The universe domain the client is connected to.

Constructor Details

#initialize(service) ⇒ `Project`

Creates a new Service instance.

See Google::Cloud.bigquery



66
67
68

# File 'lib/google/cloud/bigquery/project.rb', line 66

def initialize service
  @service = service
end

Instance Attribute Details

#name ⇒ `String`^? (readonly)

The descriptive name of the project. Can only be present if the project was retrieved with #projects.

Returns:

(String, nil) —
the current value of name



54
55
56

# File 'lib/google/cloud/bigquery/project.rb', line 54

def name
  @name
end

#numeric_id ⇒ `Integer`^? (readonly)

The numeric ID of the project. Can only be present if the project was retrieved with #projects.

Returns:

(Integer, nil) —
the current value of numeric_id



54
55
56

# File 'lib/google/cloud/bigquery/project.rb', line 54

def numeric_id
  @numeric_id
end

Instance Method Details

#copy(source_table, destination_table, create: nil, write: nil, reservation: nil) {|job| ... } ⇒ `Boolean`

Copies the data from the source table to the destination table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See #copy_job for the asynchronous version. Use this method instead of Table#copy to copy from source tables in other projects.

The geographic location for the job ("US", "EU", etc.) can be set via CopyJob::Updater#location= in a block passed to this method.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
destination_table = dataset.table "my_destination_table"

bigquery.copy "bigquery-public-data.samples.shakespeare",
              destination_table

Parameters:

source_table (String, Table) —
The source table for the copied data. This can be a table object; or a string ID as specified by the Standard SQL Query Reference (project-name.dataset_id.table_id) or the Legacy SQL Query Reference (project-name:dataset_id.table_id).
destination_table (String, Table) —
The destination table for the copied data. This can be a table object; or a string ID as specified by the Standard SQL Query Reference (project-name.dataset_id.table_id) or the Legacy SQL Query Reference (project-name:dataset_id.table_id).
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is needed.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
write (String) (defaults to: nil) —
Specifies how to handle data already present in the destination table. The default value is empty.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - An error will be returned if the destination table already contains data.
reservation (String) (defaults to: nil) —
The reservation that job would use. User can specify a reservation to execute the job. If reservation is not set, reservation is determined based on the rules defined by the reservation assignments. The expected format is projects/project/locations/location/reservations/reservation``.

Yields:

(job) —
a job configuration object

Yield Parameters:

job (Google::Cloud::Bigquery::CopyJob::Updater) —
a job configuration object for setting additional options.

Returns:

(Boolean) —
Returns true if the copy operation succeeded.

# File 'lib/google/cloud/bigquery/project.rb', line 293

def copy source_table, destination_table, create: nil, write: nil, reservation: nil, &block
  job = copy_job source_table, destination_table, create: create, write: write, reservation: reservation, &block
  job.wait_until_done!
  ensure_job_succeeded! job
  true
end

#copy_job(source_table, destination_table, create: nil, write: nil, job_id: nil, prefix: nil, labels: nil, reservation: nil) {|job| ... } ⇒ `Google::Cloud::Bigquery::CopyJob`

Copies the data from the source table to the destination table using an asynchronous method. In this method, a CopyJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See #copy for the synchronous version. Use this method instead of Table#copy_job to copy from source tables in other projects.

The geographic location for the job ("US", "EU", etc.) can be set via CopyJob::Updater#location= in a block passed to this method.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
source_table_id = "bigquery-public-data.samples.shakespeare"
destination_table = dataset.table "my_destination_table"

copy_job = bigquery.copy_job source_table_id, destination_table

copy_job.wait_until_done!
copy_job.done? #=> true

Parameters:

source_table (String, Table) —
The source table for the copied data. This can be a table object; or a string ID as specified by the Standard SQL Query Reference (project-name.dataset_id.table_id) or the Legacy SQL Query Reference (project-name:dataset_id.table_id).
destination_table (String, Table) —
The destination table for the copied data. This can be a table object; or a string ID as specified by the Standard SQL Query Reference (project-name.dataset_id.table_id) or the Legacy SQL Query Reference (project-name:dataset_id.table_id).
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is needed.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
write (String) (defaults to: nil) —
Specifies how to handle data already present in the destination table. The default value is empty.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - An error will be returned if the destination table already contains data.
job_id (String) (defaults to: nil) —
A user-defined ID for the copy job. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

See Generating a job ID.
prefix (String) (defaults to: nil) —
A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.

The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
reservation (String) (defaults to: nil) —
The reservation that job would use. User can specify a reservation to execute the job. If reservation is not set, reservation is determined based on the rules defined by the reservation assignments. The expected format is projects/project/locations/location/reservations/reservation``.

Yields:

(job) —
a job configuration object

Yield Parameters:

job (Google::Cloud::Bigquery::CopyJob::Updater) —
a job configuration object for setting additional options.

Returns:

(Google::Cloud::Bigquery::CopyJob)

# File 'lib/google/cloud/bigquery/project.rb', line 208

def copy_job source_table, destination_table, create: nil, write: nil, job_id: nil, prefix: nil, labels: nil,
             reservation: nil
  ensure_service!
  options = { create: create, write: write, labels: labels, job_id: job_id, prefix: prefix,
    reservation: reservation }

  updater = CopyJob::Updater.from_options(
    service,
    Service.get_table_ref(source_table, default_ref: project_ref),
    Service.get_table_ref(destination_table, default_ref: project_ref),
    options
  )

  yield updater if block_given?

  job_gapi = updater.to_gapi
  gapi = service.copy_table job_gapi
  Job.from_gapi gapi, service
end

#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, location: nil, access_policy_version: nil, dataset_view: nil) {|access| ... } ⇒ `Google::Cloud::Bigquery::Dataset`

Creates a new dataset.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset"

A name and description can be provided:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Or, configure access with a block: (See Dataset::Access)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.create_dataset "my_dataset" do |dataset|
  dataset.access.add_writer_user "[email protected]"
end

Parameters:

dataset_id (String) —
A unique ID for this dataset, without the project name. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), or underscores (_). The maximum length is 1,024 characters.
name (String) (defaults to: nil) —
A descriptive name for the dataset.
description (String) (defaults to: nil) —
A user-friendly description of the dataset.
expiration (Integer) (defaults to: nil) —
The default lifetime of all tables in the dataset, in milliseconds. The minimum value is 3_600_000 (one hour).
location (String) (defaults to: nil) —
The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.
access_policy_version (Integer) (defaults to: nil) —
Optional. The version of the provided access policy schema. Valid values are 0, 1, and 3. Requests specifying an invalid value will be rejected. This version refers to the schema version of the access policy and not the version of access policy. This field's value can be equal or more than the access policy schema provided in the request. For example, requests with conditional access policy binding in datasets must specify version 3. But dataset with no conditional role bindings in access policy may specify any valid value or leave the field unset. If unset or if 0 or 1 value is used for dataset with conditional bindings, request will be rejected. This field will be mapped to IAM Policy version and will be used to set policy in IAM.
dataset_view (String) (defaults to: nil) —
The dataset_view parameter is an optional field in the GetDatasetRequest used to specify which information about a BigQuery dataset should be returned in the response. By controlling this parameter, users can request a partial or full response, which helps enforce fine-grained access control based on their permissions. DatasetView provides constants for this parameter.

Yields:

(access) —
a block for setting rules

Yield Parameters:

access (Google::Cloud::Bigquery::Dataset) —
the object accepting rules

Returns:

(Google::Cloud::Bigquery::Dataset)

# File 'lib/google/cloud/bigquery/project.rb', line 1651

def create_dataset dataset_id, name: nil, description: nil,
                   expiration: nil, location: nil, access_policy_version: nil,
                   dataset_view: nil
  ensure_service!

  new_ds = Google::Apis::BigqueryV2::Dataset.new(
    dataset_reference: Google::Apis::BigqueryV2::DatasetReference.new(
      project_id: project, dataset_id: dataset_id
    )
  )

  # Can set location only on creation, no Dataset#location method
  new_ds.update! location: location unless location.nil?

  updater = Dataset::Updater.new(new_ds).tap do |b|
    b.name = name unless name.nil?
    b.description = description unless description.nil?
    b.default_expiration = expiration unless expiration.nil?
  end

  if block_given?
    yield updater
    updater.check_for_mutated_access!
  end

  gapi = service.insert_dataset new_ds, access_policy_version: access_policy_version
  Dataset.from_gapi gapi, service, access_policy_version: access_policy_version, dataset_view: dataset_view
end

#dataset(dataset_id, skip_lookup: nil, project_id: nil, access_policy_version: nil, dataset_view: nil) ⇒ `Google::Cloud::Bigquery::Dataset`^?

Retrieves an existing dataset by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
puts dataset.name

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", project_id: "another_project"
puts dataset.name

Avoid retrieving the dataset resource with `skip_lookup`:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset", skip_lookup: true

Parameters:

dataset_id (String) —
The ID of a dataset.
skip_lookup (Boolean) (defaults to: nil) —
Optionally create just a local reference object without verifying that the resource exists on the BigQuery service. Calls made on this object will raise errors if the resource does not exist. Default is false. Optional.
project_id (String) (defaults to: nil) —
The GCP Project where the dataset lives.
access_policy_version (Integer) (defaults to: nil) —
Optional. The version of the provided access policy schema. Valid values are 0, 1, and 3. Requests specifying an invalid value will be rejected. This version refers to the schema version of the access policy and not the version of access policy. This field's value can be equal or more than the access policy schema provided in the request. For example, requests with conditional access policy binding in datasets must specify version 3. But dataset with no conditional role bindings in access policy may specify any valid value or leave the field unset. If unset or if 0 or 1 value is used for dataset with conditional bindings, request will be rejected. This field will be mapped to IAM Policy version and will be used to set policy in IAM.
dataset_view (String) (defaults to: nil) —
The dataset_view parameter is an optional field in the GetDatasetRequest used to specify which information about a BigQuery dataset should be returned in the response. By controlling this parameter, users can request a partial or full response, which helps enforce fine-grained access control based on their permissions. DatasetView provides constants for this parameter.

Returns:

(Google::Cloud::Bigquery::Dataset, nil) —
Returns nil if the dataset does not exist.

# File 'lib/google/cloud/bigquery/project.rb', line 1573

def dataset dataset_id, skip_lookup: nil, project_id: nil, access_policy_version: nil, dataset_view: nil
  ensure_service!
  project_id ||= project
  return Dataset.new_reference project_id, dataset_id, service if skip_lookup
  gapi = service.get_project_dataset project_id, dataset_id, access_policy_version: access_policy_version,
dataset_view: dataset_view
  Dataset.from_gapi gapi, service, access_policy_version: access_policy_version, dataset_view: dataset_view
rescue Google::Cloud::NotFoundError
  nil
end

#datasets(all: nil, filter: nil, token: nil, max: nil) ⇒ `Array<Google::Cloud::Bigquery::Dataset>`

Retrieves the list of datasets belonging to the project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

datasets = bigquery.datasets
datasets.each do |dataset|
  puts dataset.name
end

Retrieve hidden datasets with the `all` optional arg:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

all_datasets = bigquery.datasets all: true

Retrieve all datasets: (See Dataset::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

datasets = bigquery.datasets
datasets.all do |dataset|
  puts dataset.name
end

Parameters:

all (Boolean) (defaults to: nil) —
Whether to list all datasets, including hidden ones. The default is false.
filter (String) (defaults to: nil) —
An expression for filtering the results of the request by label. The syntax is labels.<name>[:<value>]. Multiple filters can be ANDed together by connecting with a space. Example: labels.department:receiving labels.active. See Filtering datasets using labels.
token (String) (defaults to: nil) —
A previously-returned page token representing part of the larger set of results to view.
max (Integer) (defaults to: nil) —
Maximum number of datasets to return.

Returns:

(Array<Google::Cloud::Bigquery::Dataset>) —
(See Dataset::List)

# File 'lib/google/cloud/bigquery/project.rb', line 1724

def datasets all: nil, filter: nil, token: nil, max: nil
  ensure_service!
  gapi = service.list_datasets all: all, filter: filter, token: token, max: max
  Dataset::List.from_gapi gapi, service, all, filter, max
end

#encryption(kms_key: nil) ⇒ `Google::Cloud::Bigquery::EncryptionConfiguration`

Creates a new Bigquery::EncryptionConfiguration instance.

This method does not execute an API call. Use the encryption configuration to encrypt a table when creating one via Bigquery::Dataset#create_table, Bigquery::Dataset#load, Bigquery::Table#copy, or Bigquery::Project#query.

Examples:

Encrypt a new table

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name

table = dataset.create_table "my_table" do |updater|
  updater.encryption = encrypt_config
end

Encrypt a load destination table

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name
job = dataset.load_job "my_table", "gs://abc/file" do |job|
  job.encryption = encrypt_config
end

Encrypt a copy destination table

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name
job = table.copy_job "my_dataset.new_table" do |job|
  job.encryption = encrypt_config
end

Encrypt a query destination table

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name
job = bigquery.query_job "SELECT 1;" do |query|
  query.table = dataset.table "my_table", skip_lookup: true
  query.encryption = encrypt_config
end

Parameters:

kms_key (String) (defaults to: nil) —
Name of the Cloud KMS encryption key that will be used to protect the destination BigQuery table. The BigQuery Service Account associated with your project requires access to this encryption key.

Returns:

(Google::Cloud::Bigquery::EncryptionConfiguration)

# File 'lib/google/cloud/bigquery/project.rb', line 2085

def encryption kms_key: nil
  encrypt_config = Bigquery::EncryptionConfiguration.new
  encrypt_config.kms_key = kms_key unless kms_key.nil?
  encrypt_config
end

#external(url, format: nil) {|ext| ... } ⇒ `External::DataSource`

Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = bigquery.query "SELECT * FROM my_ext_table",
                      external: { my_ext_table: csv_table }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Parameters:

url (String, Array<String>) —
The fully-qualified URL(s) that point to your data in Google Cloud. An attempt will be made to derive the format from the URLs provided.
format (String|Symbol) (defaults to: nil) —
The data format. This value will be used even if the provided URLs are recognized as a different format. Optional.

The following values are supported:
- csv - CSV
- json - Newline-delimited JSON
- avro - Avro
- sheets - Google Sheets
- datastore_backup - Cloud Datastore backup
- bigtable - Bigtable

Yields:

(ext)

Returns:

(External::DataSource) —
External data source.

See Also:

Querying External Data Sources

# File 'lib/google/cloud/bigquery/project.rb', line 1510

def external url, format: nil
  ext = External.from_urls url, format
  yield ext if block_given?
  ext
end

#extract(source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, reservation: nil) {|job| ... } ⇒ `Boolean`

Extracts the data from a table or exports a model to Google Cloud Storage using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See #extract_job for the asynchronous version.

Use this method instead of Table#extract or Model#extract to extract data from source tables or models in other projects.

The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method.

Examples:

Export table data

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

bigquery.extract "bigquery-public-data.samples.shakespeare",
                 "gs://my-bucket/shakespeare.csv"

Export a model

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

bigquery.extract model, "gs://my-bucket/#{model.model_id}"

Parameters:

source (Table, Model, String) —
The source table or model for the extract operation. This can be a table or model object; or a table ID string as specified by the Standard SQL Query Reference (project-name.dataset_id.table_id) or the Legacy SQL Query Reference (project-name:dataset_id.table_id).
extract_url (Google::Cloud::Storage::File, String, Array<String>) —
The Google Storage file or file URI pattern(s) to which BigQuery should extract. For a model export this value should be a string ending in an object name prefix, since multiple objects will be exported.
format (String) (defaults to: nil) —
The exported file format. The default value for tables is csv. Tables with nested or repeated fields cannot be exported as CSV. The default value for models is ml_tf_saved_model.

Supported values for tables:
- csv - CSV
- json - Newline-delimited JSON
- avro - Avro
Supported values for models:
- ml_tf_saved_model - TensorFlow SavedModel
- ml_xgboost_booster - XGBoost Booster
compression (String) (defaults to: nil) —
The compression type to use for exported files. Possible values include GZIP and NONE. The default value is NONE. Not applicable when extracting models.
delimiter (String) (defaults to: nil) —
Delimiter to use between fields in the exported table data. Default is ,. Not applicable when extracting models.
header (Boolean) (defaults to: nil) —
Whether to print out a header row in table exports. Default is true. Not applicable when extracting models.
reservation (String) (defaults to: nil) —
The reservation that job would use. User can specify a reservation to execute the job. If reservation is not set, reservation is determined based on the rules defined by the reservation assignments. The expected format is projects/project/locations/location/reservations/reservation``.

Yields:

(job) —
a job configuration object

Yield Parameters:

job (Google::Cloud::Bigquery::ExtractJob::Updater) —
a job configuration object for setting additional options.

Returns:

(Boolean) —
Returns true if the extract operation succeeded.

See Also:

# File 'lib/google/cloud/bigquery/project.rb', line 2310

def extract source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, reservation: nil,
            &block
  job = extract_job source, extract_url,
                    format:      format,
                    compression: compression,
                    delimiter:   delimiter,
                    header:      header,
                    reservation: reservation,
                    &block
  job.wait_until_done!
  ensure_job_succeeded! job
  true
end

#extract_job(source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, job_id: nil, prefix: nil, labels: nil, reservation: nil) {|job| ... } ⇒ `Google::Cloud::Bigquery::ExtractJob`

Extracts the data from a table or exports a model to Google Cloud Storage asynchronously, immediately returning an ExtractJob that can be used to track the progress of the export job. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling

Job#wait_until_done!. See #extract for the synchronous version.

Use this method instead of Table#extract_job or Model#extract_job to extract data from source tables or models in other projects.

The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method.

Examples:

Export table data

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

table_id = "bigquery-public-data.samples.shakespeare"
extract_job = bigquery.extract_job table_id, "gs://my-bucket/shakespeare.csv"
extract_job.wait_until_done!
extract_job.done? #=> true

Export a model

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

extract_job = bigquery.extract model, "gs://my-bucket/#{model.model_id}"

Parameters:

source (Table, Model, String) —
The source table or model for the extract operation. This can be a table or model object; or a table ID string as specified by the Standard SQL Query Reference (project-name.dataset_id.table_id) or the Legacy SQL Query Reference (project-name:dataset_id.table_id).
extract_url (Google::Cloud::Storage::File, String, Array<String>) —
The Google Storage file or file URI pattern(s) to which BigQuery should extract. For a model export this value should be a string ending in an object name prefix, since multiple objects will be exported.
format (String) (defaults to: nil) —
The exported file format. The default value for tables is csv. Tables with nested or repeated fields cannot be exported as CSV. The default value for models is ml_tf_saved_model.

Supported values for tables:
- csv - CSV
- json - Newline-delimited JSON
- avro - Avro
Supported values for models:
- ml_tf_saved_model - TensorFlow SavedModel
- ml_xgboost_booster - XGBoost Booster
compression (String) (defaults to: nil) —
The compression type to use for exported files. Possible values include GZIP and NONE. The default value is NONE. Not applicable when extracting models.
delimiter (String) (defaults to: nil) —
Delimiter to use between fields in the exported table data. Default is ,. Not applicable when extracting models.
header (Boolean) (defaults to: nil) —
Whether to print out a header row in table exports. Default is true. Not applicable when extracting models.
job_id (String) (defaults to: nil) —
A user-defined ID for the extract job. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

See Generating a job ID.
prefix (String) (defaults to: nil) —
A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.

The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
reservation (String) (defaults to: nil) —
The reservation that job would use. User can specify a reservation to execute the job. If reservation is not set, reservation is determined based on the rules defined by the reservation assignments. The expected format is projects/project/locations/location/reservations/reservation``.

Yields:

(job) —
a job configuration object

Yield Parameters:

job (Google::Cloud::Bigquery::ExtractJob::Updater) —
a job configuration object for setting additional options.

Returns:

(Google::Cloud::Bigquery::ExtractJob)

See Also:

# File 'lib/google/cloud/bigquery/project.rb', line 2208

def extract_job source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, job_id: nil,
                prefix: nil, labels: nil, reservation: nil
  ensure_service!
  options = { format: format, compression: compression, delimiter: delimiter, header: header, job_id: job_id,
              prefix: prefix, labels: labels, reservation: reservation }
  source_ref = if source.respond_to? :model_ref
                 source.model_ref
               else
                 Service.get_table_ref source, default_ref: project_ref
               end

  updater = ExtractJob::Updater.from_options service, source_ref, extract_url, options

  yield updater if block_given?

  job_gapi = updater.to_gapi
  gapi = service.extract_table job_gapi
  Job.from_gapi gapi, service
end

#job(job_id, location: nil) ⇒ `Google::Cloud::Bigquery::Job`^?

Retrieves an existing job by ID.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.job "my_job"

Parameters:

job_id (String) —
The ID of a job.
location (String) (defaults to: nil) —
The geographic location where the job was created. Required except for US and EU.

Returns:

(Google::Cloud::Bigquery::Job, nil) —
Returns nil if the job does not exist.

# File 'lib/google/cloud/bigquery/project.rb', line 1747

def job job_id, location: nil
  ensure_service!
  gapi = service.get_job job_id, location: location
  Job.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#jobs(all: nil, token: nil, max: nil, filter: nil, min_created_at: nil, max_created_at: nil, parent_job: nil) ⇒ `Array<Google::Cloud::Bigquery::Job>`

Retrieves the list of jobs belonging to the project.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

jobs = bigquery.jobs
jobs.each do |job|
  # process job
end

Retrieve only running jobs using the `filter` optional arg:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

running_jobs = bigquery.jobs filter: "running"
running_jobs.each do |job|
  # process job
end

Retrieve only jobs created within provided times:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

two_days_ago = Time.now - 60*60*24*2
three_days_ago = Time.now - 60*60*24*3

jobs = bigquery.jobs min_created_at: three_days_ago,
                     max_created_at: two_days_ago
jobs.each do |job|
  # process job
end

Retrieve all jobs: (See Job::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

jobs = bigquery.jobs
jobs.all do |job|
  # process job
end

Retrieve child jobs by setting `parent_job`:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

multi_statement_sql = "-- Declare a variable to hold names as an array.\nDECLARE top_names ARRAY<STRING>;\n-- Build an array of the top 100 names from the year 2017.\nSET top_names = (\nSELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)\nFROM `bigquery-public-data.usa_names.usa_1910_current`\nWHERE year = 2017\n);\n-- Which names appear as words in Shakespeare's plays?\nSELECT\nname AS shakespeare_name\nFROM UNNEST(top_names) AS name\nWHERE name IN (\nSELECT word\nFROM `bigquery-public-data.samples.shakespeare`\n);\n"

job = bigquery.query_job multi_statement_sql

job.wait_until_done!

child_jobs = bigquery.jobs parent_job: job

child_jobs.each do |child_job|
  script_statistics = child_job.script_statistics
  puts script_statistics.evaluation_kind
  script_statistics.stack_frames.each do |stack_frame|
    puts stack_frame.text
  end
end

Parameters:

all (Boolean) (defaults to: nil) —
Whether to display jobs owned by all users in the project. The default is false. Optional.
token (String) (defaults to: nil) —
A previously-returned page token representing part of the larger set of results to view. Optional.
max (Integer) (defaults to: nil) —
Maximum number of jobs to return. Optional.
filter (String) (defaults to: nil) —
A filter for job state. Optional.

Acceptable values are:
- done - Finished jobs
- pending - Pending jobs
- running - Running jobs
min_created_at (Time) (defaults to: nil) —
Min value for Job#created_at. When provided, only jobs created after or at this time are returned. Optional.
max_created_at (Time) (defaults to: nil) —
Max value for Job#created_at. When provided, only jobs created before or at this time are returned. Optional.
parent_job (Google::Cloud::Bigquery::Job, String) (defaults to: nil) —
A job object or a job ID. If set, retrieve only child jobs of the specified parent. Optional. See Job#job_id, Job#num_child_jobs, and Job#parent_job_id.

Returns:

(Array<Google::Cloud::Bigquery::Job>) —
(See Job::List)

# File 'lib/google/cloud/bigquery/project.rb', line 1866

def jobs all: nil,
         token: nil,
         max: nil,
         filter: nil,
         min_created_at: nil,
         max_created_at: nil,
         parent_job: nil
  ensure_service!
  parent_job = parent_job.job_id if parent_job.is_a? Job
  options = {
    parent_job_id: parent_job,
    all: all,
    token: token,
    max: max, filter: filter,
    min_created_at: min_created_at,
    max_created_at: max_created_at
  }
  gapi = service.list_jobs(**options)
  Job::List.from_gapi gapi, service, **options
end

#load(table_id, files, dataset_id: "_SESSION", format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ `Boolean`

Loads data into the provided destination table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #load_job.

For the source of the data, you can pass a google-cloud storage file path or a google-cloud-storage File instance. Or, you can upload a file directly. See Loading Data with a POST Request.

The geographic location for the job ("US", "EU", etc.) can be set via LoadJob::Updater#location= in a block passed to this method.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gs_url = "gs://my-bucket/file-name.csv"
bigquery.load "my_new_table", gs_url, dataset_id: "my_dataset" do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

Parameters:

table_id (String) —
The destination table to load the data into.
files (File, Google::Cloud::Storage::File, String, URI, Array<Google::Cloud::Storage::File, String, URI>) —
A file or the URI of a Google Cloud Storage file, or an Array of those, containing data to load into the table.
format (String) (defaults to: nil) —
The exported file format. The default value is csv.

The following values are supported:
- csv - CSV
- json - Newline-delimited JSON
- avro - Avro
- orc - ORC
- parquet - Parquet
- datastore_backup - Cloud Datastore backup
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is needed.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
dataset_id (String) (defaults to: "_SESSION") —
The destination table to load the data into. For load job with session it defaults to "_SESSION"
write (String) (defaults to: nil) —
Specifies how to handle data already present in the table. The default value is append.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - An error will be returned if the table already contains data.
projection_fields (Array<String>) (defaults to: nil) —
If the format option is set to datastore_backup, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned.
jagged_rows (Boolean) (defaults to: nil) —
Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.
quoted_newlines (Boolean) (defaults to: nil) —
Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false.
autodetect (Boolean) (defaults to: nil) —
Indicates if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default value is false.
encoding (String) (defaults to: nil) —
The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8.
delimiter (String) (defaults to: nil) —
Specifices the separator for fields in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is ,.
ignore_unknown (Boolean) (defaults to: nil) —
Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

The format property determines what BigQuery treats as an extra value:
- CSV: Trailing columns
- JSON: Named values that don't match any column names
max_bad_records (Integer) (defaults to: nil) —
The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.
null_marker (String) (defaults to: nil) —
Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.
quote (String) (defaults to: nil) —
The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ". If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.
skip_leading (Integer) (defaults to: nil) —
The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.
schema (Google::Cloud::Bigquery::Schema) (defaults to: nil) —
The schema for the destination table. Optional. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

See #schema for the creation of the schema for use with this option. Also note that for most use cases, the block yielded by this method is a more convenient way to configure the schema.
session_id (string) (defaults to: nil) —
Session ID in which the load job must run.
date_format (String) (defaults to: nil) —
Format used to parse DATE values. Supports SQL-style format strings. See date and time formatting guide
datetime_format (String) (defaults to: nil) —
Format used to parse DATETIME values. Supports SQL-style format strings. See date and time formatting guide
time_format (String) (defaults to: nil) —
Format used to parse TIME values. Supports SQL-style format strings. See date and time formatting guide
timestamp_format (String) (defaults to: nil) —
Format used to parse TIMESTAMP values. Supports SQL-style format strings. See date and time formatting guide
null_markers (Array<String>) (defaults to: nil) —
A list of strings represented as SQL NULL value in a CSV file. null_marker and null_markers can't be set at the same time. If null_marker is set, null_markers has to be not set. If null_markers is set, null_marker has to be not set. If both null_marker and null_markers are set at the same time, a user error would be thrown. Any strings listed in null_markers, including empty string would be interpreted as SQL NULL. This applies to all column types.
source_column_match (String) (defaults to: nil) —
Controls the strategy used to match loaded columns to the schema. If not set, a sensible default is chosen based on how the schema is provided. If autodetect is used, then columns are matched by name. Otherwise, columns are matched by position. This is done to keep the behavior backward-compatible.

Acceptable values are:
- POSITION - matches by position. This assumes that the columns are ordered the same way as the schema.
- NAME - matches by name. This reads the header row as column names and reorders columns to match the field names in the schema.
time_zone (String) (defaults to: nil) —
The time zone used when parsing timestamp values.
reference_file_schema_uri (String) (defaults to: nil) —
The URI of the reference file with the reader schema. This file is only loaded if it is part of source URIs, but is not loaded otherwise. It is enabled for the following formats: AVRO, PARQUET, ORC.
preserve_ascii_control_characters (Boolean) (defaults to: nil) —
When source_format is set to CSV, indicates if the embedded ASCII control characters (the first 32 characters in the ASCII-table, from \x00 to \x1F) are preserved. By default, ASCII control characters are not preserved.
reservation (String) (defaults to: nil) —
The reservation that job would use. User can specify a reservation to execute the job. If reservation is not set, reservation is determined based on the rules defined by the reservation assignments. The expected format is projects/project/locations/location/reservations/reservation``.

Yields:

(updater) —
A block for setting the schema of the destination table and other options for the load job. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

Yield Parameters:

updater (Google::Cloud::Bigquery::LoadJob::Updater) —
An updater to modify the load job and its schema.

Returns:

(Boolean) —
Returns true if the load job was successful.

# File 'lib/google/cloud/bigquery/project.rb', line 1437

def load table_id, files, dataset_id: "_SESSION", format: nil, create: nil, write: nil,
         projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil,
         delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil,
         skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil,
         date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil,
         null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil,
         preserve_ascii_control_characters: nil, reservation: nil, &block
  job = load_job table_id, files, dataset_id: dataset_id,
                format: format, create: create, write: write, projection_fields: projection_fields,
                jagged_rows: jagged_rows, quoted_newlines: quoted_newlines, encoding: encoding,
                delimiter: delimiter, ignore_unknown: ignore_unknown, max_bad_records: max_bad_records,
                quote: quote, skip_leading: skip_leading, schema: schema, autodetect: autodetect,
                null_marker: null_marker, session_id: session_id, date_format: date_format,
                datetime_format: datetime_format, time_format: time_format,
                timestamp_format: timestamp_format, null_markers: null_markers,
                source_column_match: source_column_match, time_zone: time_zone,
                reference_file_schema_uri: reference_file_schema_uri,
                preserve_ascii_control_characters: preserve_ascii_control_characters, reservation: reservation,
                 &block

  job.wait_until_done!
  ensure_job_succeeded! job
  true
end

#load_job(table_id, files, dataset_id: nil, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, project_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ `Google::Cloud::Bigquery::LoadJob`

Loads data into the provided destination table using an asynchronous method. In this method, a LoadJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #load.

For the source of the data, you can pass a google-cloud storage file path or a google-cloud-storage File instance. Or, you can upload a file directly. See Loading Data with a POST Request.

The geographic location for the job ("US", "EU", etc.) can be set via LoadJob::Updater#location= in a block passed to this method.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gs_url = "gs://my-bucket/file-name.csv"
load_job = bigquery.load_job "temp_table", gs_url, autodetect: true, create_session: true
load_job.wait_until_done!
session_id = load_job.statistics["sessionInfo"]["sessionId"]

Parameters:

table_id (String) —
The destination table to load the data into.
files (File, Google::Cloud::Storage::File, String, URI, Array<Google::Cloud::Storage::File, String, URI>) —
A file or the URI of a Google Cloud Storage file, or an Array of those, containing data to load into the table.
format (String) (defaults to: nil) —
The exported file format. The default value is csv.

The following values are supported:
- csv - CSV
- json - Newline-delimited JSON
- avro - Avro
- orc - ORC
- parquet - Parquet
- datastore_backup - Cloud Datastore backup
dataset_id (String) (defaults to: nil) —
The destination table to load the data into. For load job with create_session/session_id it defaults to "_SESSION"
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is needed.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
write (String) (defaults to: nil) —
Specifies how to handle data already present in the table. The default value is append.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - An error will be returned if the table already contains data.
projection_fields (Array<String>) (defaults to: nil) —
If the format option is set to datastore_backup, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned.
jagged_rows (Boolean) (defaults to: nil) —
Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.
quoted_newlines (Boolean) (defaults to: nil) —
Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false.
autodetect (Boolean) (defaults to: nil) —
Indicates if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default value is false.
encoding (String) (defaults to: nil) —
The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8.
delimiter (String) (defaults to: nil) —
Specifices the separator for fields in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is ,.
ignore_unknown (Boolean) (defaults to: nil) —
Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

The format property determines what BigQuery treats as an extra value:
- CSV: Trailing columns
- JSON: Named values that don't match any column names
max_bad_records (Integer) (defaults to: nil) —
The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.
null_marker (String) (defaults to: nil) —
Specifies a string that represents a null value in a CSV file. For example, if you specify \N, BigQuery interprets \N as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.
quote (String) (defaults to: nil) —
The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ". If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.
skip_leading (Integer) (defaults to: nil) —
The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.
schema (Google::Cloud::Bigquery::Schema) (defaults to: nil) —
The schema for the destination table. Optional. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

See #schema for the creation of the schema for use with this option. Also note that for most use cases, the block yielded by this method is a more convenient way to configure the schema.
job_id (String) (defaults to: nil) —
A user-defined ID for the load job. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

See Generating a job ID.
prefix (String) (defaults to: nil) —
A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.

The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
create_session (Boolean) (defaults to: nil) —
If set to true a new session will be created and the load job will happen in the table created within that session. Note: This will work only for tables in _SESSION dataset else the property will be ignored by the backend.
session_id (string) (defaults to: nil) —
Session ID in which the load job must run.
project_id (string) (defaults to: nil) —
Project ID where the destination table exists.
dryrun (Boolean) (defaults to: nil) —
If set, don't actually run this job. Behavior is undefined however for non-query jobs and may result in an error. Deprecated.
date_format (String) (defaults to: nil) —
Format used to parse DATE values. Supports SQL-style format strings. See date and time formatting guide
datetime_format (String) (defaults to: nil) —
Format used to parse DATETIME values. Supports SQL-style format strings. See date and time formatting guide
time_format (String) (defaults to: nil) —
Format used to parse TIME values. Supports SQL-style format strings. See date and time formatting guide
timestamp_format (String) (defaults to: nil) —
Format used to parse TIMESTAMP values. Supports SQL-style format strings. See date and time formatting guide
null_markers (Array<String>) (defaults to: nil) —
A list of strings represented as SQL NULL value in a CSV file. null_marker and null_markers can't be set at the same time. If null_marker is set, null_markers has to be not set. If null_markers is set, null_marker has to be not set. If both null_marker and null_markers are set at the same time, a user error would be thrown. Any strings listed in null_markers, including empty string would be interpreted as SQL NULL. This applies to all column types.
source_column_match (String) (defaults to: nil) —
Controls the strategy used to match loaded columns to the schema. If not set, a sensible default is chosen based on how the schema is provided. If autodetect is used, then columns are matched by name. Otherwise, columns are matched by position. This is done to keep the behavior backward-compatible.

Acceptable values are:
- POSITION - matches by position. This assumes that the columns are ordered the same way as the schema.
- NAME - matches by name. This reads the header row as column names and reorders columns to match the field names in the schema.
time_zone (String) (defaults to: nil) —
The time zone used when parsing timestamp values.
reference_file_schema_uri (String) (defaults to: nil) —
The URI of the reference file with the reader schema. This file is only loaded if it is part of source URIs, but is not loaded otherwise. It is enabled for the following formats: AVRO, PARQUET, ORC.
preserve_ascii_control_characters (Boolean) (defaults to: nil) —
When source_format is set to CSV, indicates if the embedded ASCII control characters (the first 32 characters in the ASCII-table, from \x00 to \x1F) are preserved. By default, ASCII control characters are not preserved.
reservation (String) (defaults to: nil) —
The reservation that job would use. User can specify a reservation to execute the job. If reservation is not set, reservation is determined based on the rules defined by the reservation assignments. The expected format is projects/project/locations/location/reservations/reservation``.

Yields:

(updater) —
A block for setting the schema and other options for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.

Yield Parameters:

updater (Google::Cloud::Bigquery::LoadJob::Updater) —
An updater to modify the load job and its schema.

Returns:

(Google::Cloud::Bigquery::LoadJob) —
A new load job object.

# File 'lib/google/cloud/bigquery/project.rb', line 1218

def load_job table_id, files, dataset_id: nil, format: nil, create: nil, write: nil,
             projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil,
             delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil,
             skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil,
             null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, project_id: nil,
             date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil,
             null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil,
             preserve_ascii_control_characters: nil, reservation: nil, &block
  ensure_service!
  dataset_id ||= "_SESSION" unless create_session.nil? && session_id.nil?
  session_dataset = dataset dataset_id, skip_lookup: true, project_id: project_id
  table = session_dataset.table table_id, skip_lookup: true
  table.load_job  files,
                  format: format, create: create, write: write, projection_fields: projection_fields,
                  jagged_rows: jagged_rows, quoted_newlines: quoted_newlines, encoding: encoding,
                  delimiter: delimiter, ignore_unknown: ignore_unknown,
                  max_bad_records: max_bad_records, quote: quote, skip_leading: skip_leading,
                  dryrun: dryrun, schema: schema, job_id: job_id, prefix: prefix, labels: labels,
                  autodetect: autodetect, null_marker: null_marker, create_session: create_session,
                  session_id: session_id, date_format: date_format, datetime_format: datetime_format,
                  time_format: time_format, timestamp_format: timestamp_format,
                  null_markers: null_markers, source_column_match: source_column_match,
                  time_zone: time_zone, reference_file_schema_uri: reference_file_schema_uri,
                  preserve_ascii_control_characters: preserve_ascii_control_characters,
                  reservation: reservation, &block
end

#project_id ⇒ `Object` Also known as: project

The BigQuery project connected to.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new(
  project_id: "my-project",
  credentials: "/path/to/keyfile.json"
)

bigquery.project_id #=> "my-project"



92
93
94

# File 'lib/google/cloud/bigquery/project.rb', line 92

def project_id
  service.project
end

#projects(token: nil, max: nil) ⇒ `Array<Google::Cloud::Bigquery::Project>`

Retrieves the list of all projects for which the currently authorized account has been granted any project role. The returned project instances share the same credentials as the project used to retrieve them, but lazily create a new API connection for interactions with the BigQuery service.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

projects = bigquery.projects
projects.each do |project|
  puts project.name
  project.datasets.all.each do |dataset|
    puts dataset.name
  end
end

Retrieve all projects: (See Google::Cloud::Bigquery::Project::List#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

projects = bigquery.projects

projects.all do |project|
  puts project.name
  project.datasets.all.each do |dataset|
    puts dataset.name
  end
end

Parameters:

token (String) (defaults to: nil) —
A previously-returned page token representing part of the larger set of results to view.
max (Integer) (defaults to: nil) —
Maximum number of projects to return.

Returns:

(Array<Google::Cloud::Bigquery::Project>) —
(See List)

# File 'lib/google/cloud/bigquery/project.rb', line 1928

def projects token: nil, max: nil
  ensure_service!
  gapi = service.list_projects token: token, max: max
  Project::List.from_gapi gapi, service, max
end

#query(query, params: nil, types: nil, external: nil, max: nil, cache: true, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil, session_id: nil, format_options_use_int64_timestamp: true, reservation: nil) {|job| ... } ⇒ `Google::Cloud::Bigquery::Data`

Queries data and waits for the results. In this method, a QueryJob is created and its results are saved to a temporary table, then read from the table. Timeouts and transient errors are generally handled as needed to complete the query. When used for executing DDL/DML statements, this method does not return row data.

The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT name FROM `my_project.my_dataset.my_table`"
data = bigquery.query sql

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT name FROM [my_project:my_dataset.my_table]"
data = bigquery.query sql, legacy_sql: true

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Retrieve all rows: (See Data#all)

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name FROM `my_dataset.my_table`"

data.all do |row|
  puts row[:name]
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE id = ?",
                      params: [1]

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE id = @id",
                      params: { id: 1 }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Query using named query parameters with types:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE id IN UNNEST(@ids)",
                      params: { ids: [] },
                      types: { ids: [:INT64] }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Execute a DDL statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "CREATE TABLE `my_dataset.my_table` (x INT64)"

table_ref = data.ddl_target_table # Or ddl_target_routine for CREATE/DROP FUNCTION/PROCEDURE

Execute a DML statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

data = bigquery.query "UPDATE `my_dataset.my_table` SET x = x + 1 WHERE x IS NOT NULL"

puts data.num_dml_affected_rows

Query using external data source, set destination:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = bigquery.query "SELECT * FROM my_ext_table" do |query|
  query.external = { my_ext_table: csv_table }
  dataset = bigquery.dataset "my_dataset", skip_lookup: true
  query.table = dataset.table "my_table", skip_lookup: true
end

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Parameters:

query (String) —
A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

params (Array, Hash) (defaults to: nil) —

Standard SQL only. Used to pass query arguments when the query string contains either positional (?) or named (@myparam) query parameters. If value passed is an array ["foo"], the query must use positional query parameters. If value passed is a hash { myparam: "foo" }, the query must use named query parameters. When set, legacy_sql will automatically be set to false and standard_sql to true.

BigQuery types are converted from Ruby types as follows:

BigQuery	Ruby	Notes
`BOOL`	`true`/`false`
`INT64`	`Integer`
`FLOAT64`	`Float`
`NUMERIC`	`BigDecimal`	`BigDecimal` values will be rounded to scale 9.
`BIGNUMERIC`	`BigDecimal`	NOT AUTOMATIC: Must be mapped using `types`, below.
`STRING`	`String`
`DATETIME`	`DateTime`	`DATETIME` does not support time zone.
`DATE`	`Date`
`GEOGRAPHY`	`String` (WKT or GeoJSON)	NOT AUTOMATIC: Must be mapped using `types`, below.
`JSON`	`String` (Stringified JSON)	String, as JSON does not have a schema to verify.
`TIMESTAMP`	`Time`
`TIME`	`Google::Cloud::BigQuery::Time`
`BYTES`	`File`, `IO`, `StringIO`, or similar
`ARRAY`	`Array`	Nested arrays, `nil` values are not supported.
`STRUCT`	`Hash`	Hash keys may be strings or symbols.

See Data Types for an overview of each BigQuery data type, including allowed values. For the GEOGRAPHY type, see Working with BigQuery GIS data.

types (Array, Hash) (defaults to: nil) —
Standard SQL only. Types of the SQL parameters in params. It is not always possible to infer the right SQL type from a value in params. In these cases, types must be used to specify the SQL type for these values.

Arguments must match the value type passed to params. This must be an Array when the query uses positional query parameters. This must be an Hash when the query uses named query parameters. The values should be BigQuery type codes from the following list:
- :BOOL
- :INT64
- :FLOAT64
- :NUMERIC
- :BIGNUMERIC
- :STRING
- :DATETIME
- :DATE
- :GEOGRAPHY
- :JSON
- :TIMESTAMP
- :TIME
- :BYTES
- Array - Lists are specified by providing the type code in an array. For example, an array of integers are specified as [:INT64].
- Hash - Types for STRUCT values (Hash objects) are specified using a Hash object, where the keys match the params hash, and the values are the types value that matches the data.
Types are optional.
external (Hash<String|Symbol, External::DataSource>) (defaults to: nil) —
A Hash that represents the mapping of the external tables to the table names used in the SQL query. The hash keys are the table names, and the hash values are the external table objects. See #query.
max (Integer) (defaults to: nil) —
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.
cache (Boolean) (defaults to: true) —
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.
dataset (String) (defaults to: nil) —
Specifies the default datasetId and projectId to assume for any unqualified table names in the query. If not set, all table names in the query string must be qualified in the format 'datasetId.tableId'.
project (String) (defaults to: nil) —
Specifies the default projectId to assume for any unqualified table names in the query. Only used if dataset option is set.
standard_sql (Boolean) (defaults to: nil) —
Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. When set to true, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is true.
legacy_sql (Boolean) (defaults to: nil) —
Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL When set to false, the values of large_results and flatten are ignored; the query will be run as if large_results is true and flatten is false. Optional. The default value is false.
session_id (String) (defaults to: nil) —
The ID of an existing session. See the create_session param in #query_job and Job#session_id.
format_options_use_int64_timestamp (Boolean) (defaults to: true) —
Output timestamp as usec int64. Default is true.
reservation (String) (defaults to: nil) —
The reservation that job would use. User can specify a reservation to execute the job. If reservation is not set, reservation is determined based on the rules defined by the reservation assignments. The expected format is projects/project/locations/location/reservations/reservation``.

Yields:

(job) —
a job configuration object

Yield Parameters:

job (Google::Cloud::Bigquery::QueryJob::Updater) —
a job configuration object for setting additional options for the query.

Returns:

(Google::Cloud::Bigquery::Data)

See Also:

Querying Data

# File 'lib/google/cloud/bigquery/project.rb', line 951

def query query,
          params: nil,
          types: nil,
          external: nil,
          max: nil,
          cache: true,
          dataset: nil,
          project: nil,
          standard_sql: nil,
          legacy_sql: nil,
          session_id: nil,
          format_options_use_int64_timestamp: true,
          reservation: nil,
          &block
  job = query_job query,
                  params: params,
                  types: types,
                  external: external,
                  cache: cache,
                  dataset: dataset,
                  project: project,
                  standard_sql: standard_sql,
                  legacy_sql: legacy_sql,
                  session_id: session_id,
                  reservation: reservation,
                  &block
  job.wait_until_done!

  if job.failed?
    begin
      # raise to activate ruby exception cause handling
      raise job.gapi_error
    rescue StandardError => e
      # wrap Google::Apis::Error with Google::Cloud::Error
      raise Google::Cloud::Error.from_error(e)
    end
  end

  job.data max: max, format_options_use_int64_timestamp: format_options_use_int64_timestamp
end

#query_job(query, params: nil, types: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dryrun: nil, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil, create_session: nil, session_id: nil, reservation: nil) {|job| ... } ⇒ `Google::Cloud::Bigquery::QueryJob`

Queries data by creating a query job.

The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method.

Examples:

Query using standard SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM `my_project.my_dataset.my_table`"

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using legacy SQL:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM [my_project:my_dataset.my_table]",
                         legacy_sql: true

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using positional query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM `my_dataset.my_table` WHERE id = ?",
                         params: [1]

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using named query parameters:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM `my_dataset.my_table` WHERE id = @id",
                         params: { id: 1 }

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Query using named query parameters with types:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT name FROM `my_dataset.my_table` WHERE id IN UNNEST(@ids)",
                         params: { ids: [] },
                         types: { ids: [:INT64] }

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Execute a DDL statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "CREATE TABLE`my_dataset.my_table` (x INT64)"

job.wait_until_done!
if !job.failed?
  table_ref = job.ddl_target_table # Or ddl_target_routine for CREATE/DROP FUNCTION/PROCEDURE
end

Execute a DML statement:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "UPDATE `my_dataset.my_table` SET x = x + 1 WHERE x IS NOT NULL"

job.wait_until_done!
if !job.failed?
  puts job.num_dml_affected_rows
end

Query using external data source, set destination:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

job = bigquery.query_job "SELECT * FROM my_ext_table" do |query|
  query.external = { my_ext_table: csv_table }
  dataset = bigquery.dataset "my_dataset", skip_lookup: true
  query.table = dataset.table "my_table", skip_lookup: true
end

job.wait_until_done!
if !job.failed?
  job.data.each do |row|
    puts row[:name]
  end
end

Parameters:

query (String) —
A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

params (Array, Hash) (defaults to: nil) —

BigQuery types are converted from Ruby types as follows:

BigQuery	Ruby	Notes
`BOOL`	`true`/`false`
`INT64`	`Integer`
`FLOAT64`	`Float`
`NUMERIC`	`BigDecimal`	`BigDecimal` values will be rounded to scale 9.
`BIGNUMERIC`	`BigDecimal`	NOT AUTOMATIC: Must be mapped using `types`, below.
`STRING`	`String`
`DATETIME`	`DateTime`	`DATETIME` does not support time zone.
`DATE`	`Date`
`GEOGRAPHY`	`String` (WKT or GeoJSON)	NOT AUTOMATIC: Must be mapped using `types`, below.
`JSON`	`String` (Stringified JSON)	String, as JSON does not have a schema to verify.
`TIMESTAMP`	`Time`
`TIME`	`Google::Cloud::BigQuery::Time`
`BYTES`	`File`, `IO`, `StringIO`, or similar
`ARRAY`	`Array`	Nested arrays, `nil` values are not supported.
`STRUCT`	`Hash`	Hash keys may be strings or symbols.

See Data Types for an overview of each BigQuery data type, including allowed values. For the GEOGRAPHY type, see Working with BigQuery GIS data.

types (Array, Hash) (defaults to: nil) —
Standard SQL only. Types of the SQL parameters in params. It is not always possible to infer the right SQL type from a value in params. In these cases, types must be used to specify the SQL type for these values.

Arguments must match the value type passed to params. This must be an Array when the query uses positional query parameters. This must be an Hash when the query uses named query parameters. The values should be BigQuery type codes from the following list:
- :BOOL
- :INT64
- :FLOAT64
- :NUMERIC
- :BIGNUMERIC
- :STRING
- :DATETIME
- :DATE
- :GEOGRAPHY
- :JSON
- :TIMESTAMP
- :TIME
- :BYTES
- Array - Lists are specified by providing the type code in an array. For example, an array of integers are specified as [:INT64].
- Hash - Types for STRUCT values (Hash objects) are specified using a Hash object, where the keys match the params hash, and the values are the types value that matches the data.
Types are optional.
external (Hash<String|Symbol, External::DataSource>) (defaults to: nil) —
A Hash that represents the mapping of the external tables to the table names used in the SQL query. The hash keys are the table names, and the hash values are the external table objects. See #query.
priority (String) (defaults to: "INTERACTIVE") —
Specifies a priority for the query. Possible values include INTERACTIVE and BATCH. The default value is INTERACTIVE.
cache (Boolean) (defaults to: true) —
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.
table (Table) (defaults to: nil) —
The destination table where the query results should be stored. If not present, a new table will be created to store the results.
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is needed.

The following values are supported:
- needed - Create the table if it does not exist.
- never - The table must already exist. A 'notFound' error is raised if the table does not exist.
write (String) (defaults to: nil) —
Specifies the action that occurs if the destination table already exists. The default value is empty.

The following values are supported:
- truncate - BigQuery overwrites the table data.
- append - BigQuery appends the data to the table.
- empty - A 'duplicate' error is returned in the job result if the table exists and contains data.
dryrun (Boolean) (defaults to: nil) —
If set to true, BigQuery doesn't run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false.
dataset (Dataset, String) (defaults to: nil) —
The default dataset to use for unqualified table names in the query. Optional.
project (String) (defaults to: nil) —
Specifies the default projectId to assume for any unqualified table names in the query. Only used if dataset option is set.
standard_sql (Boolean) (defaults to: nil) —
Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. Optional. The default value is true.
legacy_sql (Boolean) (defaults to: nil) —
Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL dialect. Optional. The default value is false.
large_results (Boolean) (defaults to: nil) —
This option is specific to Legacy SQL. If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires table parameter to be set.
flatten (Boolean) (defaults to: nil) —
This option is specific to Legacy SQL. Flattens all nested and repeated fields in the query results. The default value is true. large_results parameter must be true if this is set to false.
maximum_billing_tier (Integer) (defaults to: nil) —
Limits the billing tier for this job. Queries that have resource usage beyond this tier will fail (without incurring a charge). WARNING: The billed byte amount can be multiplied by an amount up to this number! Most users should not need to alter this setting, and we recommend that you avoid introducing new uses of it. Deprecated.
maximum_bytes_billed (Integer) (defaults to: nil) —
Limits the bytes billed for this job. Queries that will have bytes billed beyond this limit will fail (without incurring a charge). Optional. If unspecified, this will be set to your project default.
job_id (String) (defaults to: nil) —
A user-defined ID for the query job. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

See Generating a job ID.
prefix (String) (defaults to: nil) —
A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.

See Generating a job ID.
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.

The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
udfs (Array<String>, String) (defaults to: nil) —
User-defined function resources used in a legacy SQL query. May be either a code resource to load from a Google Cloud Storage URI (gs://bucket/path), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code.

This parameter is used for defining User Defined Function (UDF) resources only when using legacy SQL. Users of standard SQL should leverage either DDL (e.g. CREATE [TEMPORARY] FUNCTION ...) or the Routines API to define UDF resources.

For additional information on migrating, see: Migrating to standard SQL - Differences in user-defined JavaScript functions
create_session (Boolean) (defaults to: nil) —
If true, creates a new session, where the session ID will be a server generated random id. If false, runs query with an existing session ID when one is provided in the session_id param, otherwise runs query in non-session mode. See Job#session_id. The default value is false.
session_id (String) (defaults to: nil) —
The ID of an existing session. See also the create_session param and Job#session_id.
reservation (String) (defaults to: nil) —
The reservation that job would use. User can specify a reservation to execute the job. If reservation is not set, reservation is determined based on the rules defined by the reservation assignments. The expected format is projects/project/locations/location/reservations/reservation``.

Yields:

(job) —
a job configuration object

Yield Parameters:

job (Google::Cloud::Bigquery::QueryJob::Updater) —
a job configuration object for setting query options.

Returns:

(Google::Cloud::Bigquery::QueryJob)

# File 'lib/google/cloud/bigquery/project.rb', line 631

def query_job query,
              params: nil,
              types: nil,
              external: nil,
              priority: "INTERACTIVE",
              cache: true,
              table: nil,
              create: nil,
              write: nil,
              dryrun: nil,
              dataset: nil,
              project: nil,
              standard_sql: nil,
              legacy_sql: nil,
              large_results: nil,
              flatten: nil,
              maximum_billing_tier: nil,
              maximum_bytes_billed: nil,
              job_id: nil,
              prefix: nil,
              labels: nil,
              udfs: nil,
              create_session: nil,
              session_id: nil,
              reservation: nil
  ensure_service!
  project ||= self.project
  options = {
    params: params,
    types: types,
    external: external,
    priority: priority,
    cache: cache,
    table: table,
    create: create,
    write: write,
    dryrun: dryrun,
    dataset: dataset,
    project: project,
    standard_sql: standard_sql,
    legacy_sql: legacy_sql,
    large_results: large_results,
    flatten: flatten,
    maximum_billing_tier: maximum_billing_tier,
    maximum_bytes_billed: maximum_bytes_billed,
    job_id: job_id,
    prefix: prefix,
    labels: labels,
    udfs: udfs,
    create_session: create_session,
    session_id: session_id,
    reservation: reservation
  }

  updater = QueryJob::Updater.from_options service, query, options

  yield updater if block_given?

  gapi = service.query_job updater.to_gapi
  Job.from_gapi gapi, service
end

#schema {|schema| ... } ⇒ `Google::Cloud::Bigquery::Schema`

Creates a new schema instance. An optional block may be given to configure the schema, otherwise the schema is returned empty and may be configured directly.

The returned schema can be passed to Dataset#load using the schema option. However, for most use cases, the block yielded by Dataset#load is a more convenient way to configure the schema for the destination table.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

schema = bigquery.schema do |s|
  s.string "first_name", mode: :required
  s.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

dataset = bigquery.dataset "my_dataset"

gs_url = "gs://my-bucket/file-name.csv"
load_job = dataset.load_job "my_new_table", gs_url, schema: schema

Yields:

(schema) —
a block for setting the schema

Yield Parameters:

schema (Schema) —
the object accepting the schema

Returns:

(Google::Cloud::Bigquery::Schema)

# File 'lib/google/cloud/bigquery/project.rb', line 2014

def schema
  s = Schema.from_gapi
  yield s if block_given?
  s
end

#service_account_email ⇒ `String`

The email address of the service account for the project used to connect to BigQuery. (See also #project_id.)

Returns:

(String) —
The service account email address.



103
104
105

# File 'lib/google/cloud/bigquery/project.rb', line 103

def service_account_email
  @service_account_email ||= service.project_service_account.email
end

#time(hour, minute, second) ⇒ `Bigquery::Time`

Creates a Bigquery::Time object to represent a time, independent of a specific date.

Examples:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

fourpm = bigquery.time 16, 0, 0
data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE time_of_date = @time",
                      params: { time: fourpm }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Create Time with fractional seconds:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

precise_time = bigquery.time 16, 35, 15.376541
data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE time_of_date >= @time",
                      params: { time: precise_time }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Parameters:

hour (Integer) —
Hour, valid values from 0 to 23.
minute (Integer) —
Minute, valid values from 0 to 59.
second (Integer, Float) —
Second, valid values from 0 to 59. Can contain microsecond precision.

Returns:

(Bigquery::Time)



1977
1978
1979

# File 'lib/google/cloud/bigquery/project.rb', line 1977

def time hour, minute, second
  Bigquery::Time.new "#{hour}:#{minute}:#{second}"
end

#universe_domain ⇒ `String`

The universe domain the client is connected to

Returns:

(String)



75
76
77

# File 'lib/google/cloud/bigquery/project.rb', line 75

def universe_domain
  service.universe_domain
end

Class: Google::Cloud::Bigquery::Project

Overview

Project

Examples:

Defined Under Namespace

Instance Attribute Summary collapse

Data collapse

Instance Method Summary collapse

Constructor Details

#initialize(service) ⇒ Project

Instance Attribute Details

#name ⇒ String? (readonly)

#numeric_id ⇒ Integer? (readonly)

Instance Method Details

#copy(source_table, destination_table, create: nil, write: nil, reservation: nil) {|job| ... } ⇒ Boolean

Examples:

#copy_job(source_table, destination_table, create: nil, write: nil, job_id: nil, prefix: nil, labels: nil, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::CopyJob

Examples:

#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, location: nil, access_policy_version: nil, dataset_view: nil) {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset

Examples:

A name and description can be provided:

Or, configure access with a block: (See Dataset::Access)

#dataset(dataset_id, skip_lookup: nil, project_id: nil, access_policy_version: nil, dataset_view: nil) ⇒ Google::Cloud::Bigquery::Dataset?

Examples:

Avoid retrieving the dataset resource with skip_lookup:

#datasets(all: nil, filter: nil, token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Dataset>

Examples:

Retrieve hidden datasets with the all optional arg:

Retrieve all datasets: (See Dataset::List#all)

#encryption(kms_key: nil) ⇒ Google::Cloud::Bigquery::EncryptionConfiguration

Examples:

Encrypt a new table

Encrypt a load destination table

Encrypt a copy destination table

Encrypt a query destination table

#external(url, format: nil) {|ext| ... } ⇒ External::DataSource

Examples:

#extract(source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, reservation: nil) {|job| ... } ⇒ Boolean

Examples:

Export table data

Export a model

#extract_job(source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, job_id: nil, prefix: nil, labels: nil, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::ExtractJob

Job#wait_until_done!. See #extract for the synchronous version.

Examples:

Export table data

Export a model

#job(job_id, location: nil) ⇒ Google::Cloud::Bigquery::Job?

Examples:

#jobs(all: nil, token: nil, max: nil, filter: nil, min_created_at: nil, max_created_at: nil, parent_job: nil) ⇒ Array<Google::Cloud::Bigquery::Job>

Examples:

Retrieve only running jobs using the filter optional arg:

Retrieve only jobs created within provided times:

Retrieve all jobs: (See Job::List#all)

Retrieve child jobs by setting parent_job:

Examples:

Examples:

#project_id ⇒ Object Also known as: project

Examples:

#projects(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Project>

Examples:

Retrieve all projects: (See Google::Cloud::Bigquery::Project::List#all)

#query(query, params: nil, types: nil, external: nil, max: nil, cache: true, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil, session_id: nil, format_options_use_int64_timestamp: true, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::Data

Examples:

Query using standard SQL:

Query using legacy SQL:

Retrieve all rows: (See Data#all)

Query using positional query parameters:

Query using named query parameters:

Query using named query parameters with types:

Execute a DDL statement:

Execute a DML statement:

Query using external data source, set destination:

Examples:

Query using standard SQL:

Query using legacy SQL:

Query using positional query parameters:

Query using named query parameters:

Query using named query parameters with types:

Execute a DDL statement:

Execute a DML statement:

#initialize(service) ⇒ `Project`

#name ⇒ `String`^? (readonly)

#numeric_id ⇒ `Integer`^? (readonly)

#copy(source_table, destination_table, create: nil, write: nil, reservation: nil) {|job| ... } ⇒ `Boolean`

#copy_job(source_table, destination_table, create: nil, write: nil, job_id: nil, prefix: nil, labels: nil, reservation: nil) {|job| ... } ⇒ `Google::Cloud::Bigquery::CopyJob`

#create_dataset(dataset_id, name: nil, description: nil, expiration: nil, location: nil, access_policy_version: nil, dataset_view: nil) {|access| ... } ⇒ `Google::Cloud::Bigquery::Dataset`

#dataset(dataset_id, skip_lookup: nil, project_id: nil, access_policy_version: nil, dataset_view: nil) ⇒ `Google::Cloud::Bigquery::Dataset`^?

Avoid retrieving the dataset resource with `skip_lookup`:

#datasets(all: nil, filter: nil, token: nil, max: nil) ⇒ `Array<Google::Cloud::Bigquery::Dataset>`

Retrieve hidden datasets with the `all` optional arg:

#encryption(kms_key: nil) ⇒ `Google::Cloud::Bigquery::EncryptionConfiguration`

#external(url, format: nil) {|ext| ... } ⇒ `External::DataSource`

#extract(source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, reservation: nil) {|job| ... } ⇒ `Boolean`

#extract_job(source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, job_id: nil, prefix: nil, labels: nil, reservation: nil) {|job| ... } ⇒ `Google::Cloud::Bigquery::ExtractJob`

#job(job_id, location: nil) ⇒ `Google::Cloud::Bigquery::Job`^?

#jobs(all: nil, token: nil, max: nil, filter: nil, min_created_at: nil, max_created_at: nil, parent_job: nil) ⇒ `Array<Google::Cloud::Bigquery::Job>`

Retrieve only running jobs using the `filter` optional arg:

Retrieve child jobs by setting `parent_job`:

#project_id ⇒ `Object` Also known as: project

#projects(token: nil, max: nil) ⇒ `Array<Google::Cloud::Bigquery::Project>`

#query(query, params: nil, types: nil, external: nil, max: nil, cache: true, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil, session_id: nil, format_options_use_int64_timestamp: true, reservation: nil) {|job| ... } ⇒ `Google::Cloud::Bigquery::Data`

#schema {|schema| ... } ⇒ `Google::Cloud::Bigquery::Schema`

#service_account_email ⇒ `String`

#time(hour, minute, second) ⇒ `Bigquery::Time`

#universe_domain ⇒ `String`