Class: Google::Cloud::Bigquery::Dataset

Inherits:

Object

Object
Google::Cloud::Bigquery::Dataset

show all

Defined in:: lib/google/cloud/bigquery/dataset.rb,
lib/google/cloud/bigquery/dataset/list.rb,
lib/google/cloud/bigquery/dataset/access.rb

Overview

# Dataset

Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Access, List, Updater

Instance Attribute Summary collapse

Attributes collapse

#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access

Retrieves the access rules for a Dataset.
#api_url ⇒ Object

A URL that can be used to access the dataset using the REST API.
#created_at ⇒ Object

The time when this dataset was created.
#dataset_id ⇒ Object

A unique ID for this dataset, without the project name.
#dataset_ref ⇒ Object

The gapi fragment containing the Project ID and Dataset ID as a camel-cased hash.
#default_expiration ⇒ Object

The default lifetime of all tables in the dataset, in milliseconds.
#default_expiration=(new_default_expiration) ⇒ Object

Updates the default lifetime of all tables in the dataset, in milliseconds.
#description ⇒ Object

A user-friendly description of the dataset.
#description=(new_description) ⇒ Object

Updates the user-friendly description of the dataset.
#etag ⇒ Object

A string hash of the dataset.
#location ⇒ Object

The geographic location where the dataset should reside.
#modified_at ⇒ Object

The date when this dataset or any of its tables was last modified.
#name ⇒ Object

A descriptive name for the dataset.
#name=(new_name) ⇒ Object

Updates the descriptive name for the dataset.
#project_id ⇒ Object

The ID of the project containing this dataset.

Lifecycle collapse

#delete(force: nil) ⇒ Boolean

Permanently deletes the dataset.

Table collapse

#create_table(table_id, name: nil, description: nil, fields: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table

Creates a new table.
#create_view(table_id, query, name: nil, description: nil) ⇒ Google::Cloud::Bigquery::View

Creates a new view table from the given query.
#table(table_id) ⇒ Google::Cloud::Bigquery::Table, ...

Retrieves an existing table by ID.
#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>, Array<Google::Cloud::Bigquery::View>

Retrieves the list of tables belonging to the dataset.

Data collapse

.from_gapi(gapi, conn) ⇒ Object
#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ Google::Cloud::Bigquery::QueryData

Queries data using the [synchronous method](cloud.google.com/bigquery/querying-data).
#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ Google::Cloud::Bigquery::QueryJob

Queries data using the [asynchronous method](cloud.google.com/bigquery/querying-data).

Instance Method Summary collapse

#initialize ⇒ Dataset constructor

A new instance of Dataset.

Constructor Details

#initialize ⇒ `Dataset`

# File 'lib/google/cloud/bigquery/dataset.rb', line 56

def initialize
  @service = nil
  @gapi = {}
end

Instance Attribute Details

#gapi ⇒ `Object`



52
53
54

# File 'lib/google/cloud/bigquery/dataset.rb', line 52

def gapi
  @gapi
end

#service ⇒ `Object`



48
49
50

# File 'lib/google/cloud/bigquery/dataset.rb', line 48

def service
  @service
end

Class Method Details

.from_gapi(gapi, conn) ⇒ `Object`

# File 'lib/google/cloud/bigquery/dataset.rb', line 665

def self.from_gapi gapi, conn
  new.tap do |f|
    f.gapi = gapi
    f.service = conn
  end
end

Instance Method Details

#access {|access| ... } ⇒ `Google::Cloud::Bigquery::Dataset::Access`

Retrieves the access rules for a Dataset. The rules can be updated when passing a block, see Access for all the methods available.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

dataset.access #=> [{"role"=>"OWNER",
               #     "specialGroup"=>"projectOwners"},
               #    {"role"=>"WRITER",
               #     "specialGroup"=>"projectWriters"},
               #    {"role"=>"READER",
               #     "specialGroup"=>"projectReaders"},
               #    {"role"=>"OWNER",
               #     "userByEmail"=>"123456789-...com"}]

Manage the access rules by passing a block:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

dataset.access do |access|
  access.add_owner_group "[email protected]"
  access.add_writer_user "[email protected]"
  access.remove_writer_user "[email protected]"
  access.add_reader_special :all
  access.add_reader_view other_dataset_view_object
end

Yields:

(access) —

a block for setting rules

Yield Parameters:

access (Dataset::Access) —

the object accepting rules

#api_url ⇒ `Object`

A URL that can be used to access the dataset using the REST API.

# File 'lib/google/cloud/bigquery/dataset.rb', line 125

def api_url
  ensure_full_data!
  @gapi.self_link
end

#create_table(table_id, name: nil, description: nil, fields: nil) {|table| ... } ⇒ `Google::Cloud::Bigquery::Table`

Creates a new table. If you are adapting existing code that was written for the [Rest API ](cloud.google.com/bigquery/docs/reference/v2/tables#resource), you can pass the table’s schema as a hash (see example.)

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"

You can also pass name and description options.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"
                             name: "My Table",
                             description: "A description of table."

The table’s schema fields can be passed as an argument.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

schema_fields = [
  Google::Cloud::Bigquery::Schema::Field.new(
    "first_name", :string, mode: :required),
  Google::Cloud::Bigquery::Schema::Field.new(
    "cities_lived", :record, mode: :repeated
    fields: [
      Google::Cloud::Bigquery::Schema::Field.new(
        "place", :string, mode: :required),
      Google::Cloud::Bigquery::Schema::Field.new(
        "number_of_years", :integer, mode: :required),
      ])
]
table = dataset.create_table "my_table", fields: schema_fields

Or the table’s schema can be configured with the block.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema.string "first_name", mode: :required
  t.schema.record "cities_lived", mode: :required do |s|
    s.string "place", mode: :required
    s.integer "number_of_years", mode: :required
  end
end

You can define the schema using a nested block.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |t|
  t.name = "My Table",
  t.description = "A description of my table."
  t.schema do |s|
    s.string "first_name", mode: :required
    s.record "cities_lived", mode: :repeated do |r|
      r.string "place", mode: :required
      r.integer "number_of_years", mode: :required
    end
  end
end

Yields:

(table) —

a block for setting the table

Yield Parameters:

table (Table) —

the table object to be updated

# File 'lib/google/cloud/bigquery/dataset.rb', line 391

def create_table table_id, name: nil, description: nil, fields: nil
  ensure_service!
  new_tb = Google::Apis::BigqueryV2::Table.new(
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id,
      table_id: table_id))
  updater = Table::Updater.new(new_tb).tap do |tb|
    tb.name = name unless name.nil?
    tb.description = description unless description.nil?
    tb.schema.fields = fields unless fields.nil?
  end

  yield updater if block_given?

  gapi = service.insert_table dataset_id, updater.to_gapi
  Table.from_gapi gapi, service
end

#create_view(table_id, query, name: nil, description: nil) ⇒ `Google::Cloud::Bigquery::View`

Creates a new view table from the given query.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]"

A name and description can be provided:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]",
          name: "My View", description: "This is my view"

# File 'lib/google/cloud/bigquery/dataset.rb', line 443

def create_view table_id, query, name: nil, description: nil
  new_view_opts = {
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id, table_id: table_id
    ),
    friendly_name: name,
    description: description,
    view: Google::Apis::BigqueryV2::ViewDefinition.new(
      query: query
    )
  }.delete_if { |_, v| v.nil? }
  new_view = Google::Apis::BigqueryV2::Table.new new_view_opts

  gapi = service.insert_table dataset_id, new_view
  Table.from_gapi gapi, service
end

#created_at ⇒ `Object`

The time when this dataset was created.

# File 'lib/google/cloud/bigquery/dataset.rb', line 180

def created_at
  ensure_full_data!
  begin
    Time.at(Integer(@gapi.creation_time) / 1000.0)
  rescue
    nil
  end
end

#dataset_id ⇒ `Object`

A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.



68
69
70

# File 'lib/google/cloud/bigquery/dataset.rb', line 68

def dataset_id
  @gapi.dataset_reference.dataset_id
end

#dataset_ref ⇒ `Object`

The gapi fragment containing the Project ID and Dataset ID as a camel-cased hash.

# File 'lib/google/cloud/bigquery/dataset.rb', line 85

def dataset_ref
  dataset_ref = @gapi.dataset_reference
  dataset_ref = dataset_ref.to_h if dataset_ref.respond_to? :to_h
  dataset_ref
end

#default_expiration ⇒ `Object`

The default lifetime of all tables in the dataset, in milliseconds.

# File 'lib/google/cloud/bigquery/dataset.rb', line 155

def default_expiration
  ensure_full_data!
  begin
    Integer @gapi.default_table_expiration_ms
  rescue
    nil
  end
end

#default_expiration=(new_default_expiration) ⇒ `Object`

Updates the default lifetime of all tables in the dataset, in milliseconds.

# File 'lib/google/cloud/bigquery/dataset.rb', line 170

def default_expiration= new_default_expiration
  @gapi.update! default_table_expiration_ms: new_default_expiration
  patch_gapi! :default_table_expiration_ms
end

#delete(force: nil) ⇒ `Boolean`

Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

dataset = bigquery.dataset "my_dataset"
dataset.delete

# File 'lib/google/cloud/bigquery/dataset.rb', line 292

def delete force: nil
  ensure_service!
  service.delete_dataset dataset_id, force
  true
end

#description ⇒ `Object`

A user-friendly description of the dataset.

# File 'lib/google/cloud/bigquery/dataset.rb', line 135

def description
  ensure_full_data!
  @gapi.description
end

#description=(new_description) ⇒ `Object`

Updates the user-friendly description of the dataset.

# File 'lib/google/cloud/bigquery/dataset.rb', line 145

def description= new_description
  @gapi.update! description: new_description
  patch_gapi! :description
end

#etag ⇒ `Object`

A string hash of the dataset.

# File 'lib/google/cloud/bigquery/dataset.rb', line 115

def etag
  ensure_full_data!
  @gapi.etag
end

#location ⇒ `Object`

The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.

# File 'lib/google/cloud/bigquery/dataset.rb', line 209

def location
  ensure_full_data!
  @gapi.location
end

#modified_at ⇒ `Object`

The date when this dataset or any of its tables was last modified.

# File 'lib/google/cloud/bigquery/dataset.rb', line 194

def modified_at
  ensure_full_data!
  begin
    Time.at(Integer(@gapi.last_modified_time) / 1000.0)
  rescue
    nil
  end
end

#name ⇒ `Object`

A descriptive name for the dataset.



96
97
98

# File 'lib/google/cloud/bigquery/dataset.rb', line 96

def name
  @gapi.friendly_name
end

#name=(new_name) ⇒ `Object`

Updates the descriptive name for the dataset.

# File 'lib/google/cloud/bigquery/dataset.rb', line 105

def name= new_name
  @gapi.update! friendly_name: new_name
  patch_gapi! :friendly_name
end

#project_id ⇒ `Object`

The ID of the project containing this dataset.



77
78
79

# File 'lib/google/cloud/bigquery/dataset.rb', line 77

def project_id
  @gapi.dataset_reference.project_id
end

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ `Google::Cloud::Bigquery::QueryData`

Queries data using the [synchronous method](cloud.google.com/bigquery/querying-data).

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

data = bigquery.query "SELECT name FROM my_table"
data.each do |row|
  puts row["name"]
end

# File 'lib/google/cloud/bigquery/dataset.rb', line 654

def query query, max: nil, timeout: 10000, dryrun: nil, cache: true
  options = { max: max, timeout: timeout, dryrun: dryrun, cache: cache }
  options[:dataset] ||= dataset_id
  options[:project] ||= project_id
  ensure_service!
  gapi = service.query query, options
  QueryData.from_gapi gapi, service
end

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ `Google::Cloud::Bigquery::QueryJob`

Queries data using the [asynchronous method](cloud.google.com/bigquery/querying-data).

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

job = bigquery.query_job "SELECT name FROM my_table"

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row["name"]
  end
end

# File 'lib/google/cloud/bigquery/dataset.rb', line 595

def query_job query, priority: "INTERACTIVE", cache: true, table: nil,
              create: nil, write: nil, large_results: nil, flatten: nil
  options = { priority: priority, cache: cache, table: table,
              create: create, write: write,
              large_results: large_results, flatten: flatten }
  options[:dataset] ||= self
  ensure_service!
  gapi = service.query_job query, options
  Job.from_gapi gapi, service
end

#table(table_id) ⇒ `Google::Cloud::Bigquery::Table`, ...

Retrieves an existing table by ID.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
puts table.name

# File 'lib/google/cloud/bigquery/dataset.rb', line 480

def table table_id
  ensure_service!
  gapi = service.get_table dataset_id, table_id
  Table.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#tables(token: nil, max: nil) ⇒ `Array<Google::Cloud::Bigquery::Table>`, `Array<Google::Cloud::Bigquery::View>`

Retrieves the list of tables belonging to the dataset.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.each do |table|
  puts table.name
end

Retrieve all tables: (See Table::List#all)

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.all do |table|
  puts table.name
end

# File 'lib/google/cloud/bigquery/dataset.rb', line 523

def tables token: nil, max: nil
  ensure_service!
  options = { token: token, max: max }
  gapi = service.list_tables dataset_id, options
  Table::List.from_gapi gapi, service, dataset_id, max
end

Class: Google::Cloud::Bigquery::Dataset

Overview

Examples:

Direct Known Subclasses

Defined Under Namespace

Instance Attribute Summary collapse

Attributes collapse

Lifecycle collapse

Table collapse

Data collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ Dataset

Instance Attribute Details

#gapi ⇒ Object

#service ⇒ Object

Class Method Details

.from_gapi(gapi, conn) ⇒ Object

Instance Method Details

#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access

Examples:

Manage the access rules by passing a block:

#api_url ⇒ Object

#create_table(table_id, name: nil, description: nil, fields: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table

Examples:

You can also pass name and description options.

The table’s schema fields can be passed as an argument.

Or the table’s schema can be configured with the block.

You can define the schema using a nested block.

#create_view(table_id, query, name: nil, description: nil) ⇒ Google::Cloud::Bigquery::View

Examples:

A name and description can be provided:

#created_at ⇒ Object

#dataset_id ⇒ Object

#dataset_ref ⇒ Object

#default_expiration ⇒ Object

#default_expiration=(new_default_expiration) ⇒ Object

#delete(force: nil) ⇒ Boolean

Examples:

#description ⇒ Object

#description=(new_description) ⇒ Object

#etag ⇒ Object

#location ⇒ Object

#modified_at ⇒ Object

#name ⇒ Object

#name=(new_name) ⇒ Object

#project_id ⇒ Object

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ Google::Cloud::Bigquery::QueryData

Examples:

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ Google::Cloud::Bigquery::QueryJob

Examples:

#table(table_id) ⇒ Google::Cloud::Bigquery::Table, ...

Examples:

#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>, Array<Google::Cloud::Bigquery::View>

Examples:

Retrieve all tables: (See Table::List#all)

#initialize ⇒ `Dataset`

#gapi ⇒ `Object`

#service ⇒ `Object`

.from_gapi(gapi, conn) ⇒ `Object`

#access {|access| ... } ⇒ `Google::Cloud::Bigquery::Dataset::Access`

#api_url ⇒ `Object`

#create_table(table_id, name: nil, description: nil, fields: nil) {|table| ... } ⇒ `Google::Cloud::Bigquery::Table`

#create_view(table_id, query, name: nil, description: nil) ⇒ `Google::Cloud::Bigquery::View`

#created_at ⇒ `Object`

#dataset_id ⇒ `Object`

#dataset_ref ⇒ `Object`

#default_expiration ⇒ `Object`

#default_expiration=(new_default_expiration) ⇒ `Object`

#delete(force: nil) ⇒ `Boolean`

#description ⇒ `Object`

#description=(new_description) ⇒ `Object`

#etag ⇒ `Object`

#location ⇒ `Object`

#modified_at ⇒ `Object`

#name ⇒ `Object`

#name=(new_name) ⇒ `Object`

#project_id ⇒ `Object`

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ `Google::Cloud::Bigquery::QueryData`

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ `Google::Cloud::Bigquery::QueryJob`

#table(table_id) ⇒ `Google::Cloud::Bigquery::Table`, ...

#tables(token: nil, max: nil) ⇒ `Array<Google::Cloud::Bigquery::Table>`, `Array<Google::Cloud::Bigquery::View>`