Class: Google::Cloud::Bigquery::Dataset

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/bigquery/dataset.rb,
lib/google/cloud/bigquery/dataset/list.rb,
lib/google/cloud/bigquery/dataset/access.rb

Overview

# Dataset

Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

dataset = bigquery.create_dataset "my_dataset",
                                  name: "My Dataset",
                                  description: "This is my Dataset"

Direct Known Subclasses

Updater

Defined Under Namespace

Classes: Access, List, Updater

Instance Attribute Summary collapse

Attributes collapse

Lifecycle collapse

Table collapse

Data collapse

Instance Method Summary collapse

Constructor Details

#initializeDataset



56
57
58
59
# File 'lib/google/cloud/bigquery/dataset.rb', line 56

def initialize
  @service = nil
  @gapi = {}
end

Instance Attribute Details

#gapiObject



52
53
54
# File 'lib/google/cloud/bigquery/dataset.rb', line 52

def gapi
  @gapi
end

#serviceObject



48
49
50
# File 'lib/google/cloud/bigquery/dataset.rb', line 48

def service
  @service
end

Class Method Details

.from_gapi(gapi, conn) ⇒ Object



665
666
667
668
669
670
# File 'lib/google/cloud/bigquery/dataset.rb', line 665

def self.from_gapi gapi, conn
  new.tap do |f|
    f.gapi = gapi
    f.service = conn
  end
end

Instance Method Details

#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access

Retrieves the access rules for a Dataset. The rules can be updated when passing a block, see Access for all the methods available.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

dataset.access #=> [{"role"=>"OWNER",
               #     "specialGroup"=>"projectOwners"},
               #    {"role"=>"WRITER",
               #     "specialGroup"=>"projectWriters"},
               #    {"role"=>"READER",
               #     "specialGroup"=>"projectReaders"},
               #    {"role"=>"OWNER",
               #     "userByEmail"=>"123456789-...com"}]

Manage the access rules by passing a block:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

dataset.access do |access|
  access.add_owner_group "[email protected]"
  access.add_writer_user "[email protected]"
  access.remove_writer_user "[email protected]"
  access.add_reader_special :all
  access.add_reader_view other_dataset_view_object
end

Yields:

  • (access)

    a block for setting rules

Yield Parameters:

See Also:



258
259
260
261
262
263
264
265
266
267
268
269
# File 'lib/google/cloud/bigquery/dataset.rb', line 258

def access
  ensure_full_data!
  access_builder = Access.from_gapi @gapi
  if block_given?
    yield access_builder
    if access_builder.changed?
      @gapi.update! access: access_builder.to_gapi
      patch_gapi! :access
    end
  end
  access_builder.freeze
end

#api_urlObject

A URL that can be used to access the dataset using the REST API.



125
126
127
128
# File 'lib/google/cloud/bigquery/dataset.rb', line 125

def api_url
  ensure_full_data!
  @gapi.self_link
end

#create_table(table_id, name: nil, description: nil, fields: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table

Creates a new table. If you are adapting existing code that was written for the [Rest API ](cloud.google.com/bigquery/docs/reference/v2/tables#resource), you can pass the table’s schema as a hash (see example.)

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"

You can also pass name and description options.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table"
                             name: "My Table",
                             description: "A description of table."

The table’s schema fields can be passed as an argument.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

schema_fields = [
  Google::Cloud::Bigquery::Schema::Field.new(
    "first_name", :string, mode: :required),
  Google::Cloud::Bigquery::Schema::Field.new(
    "cities_lived", :record, mode: :repeated
    fields: [
      Google::Cloud::Bigquery::Schema::Field.new(
        "place", :string, mode: :required),
      Google::Cloud::Bigquery::Schema::Field.new(
        "number_of_years", :integer, mode: :required),
      ])
]
table = dataset.create_table "my_table", fields: schema_fields

Or the table’s schema can be configured with the block.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"

table = dataset.create_table "my_table" do |t|
  t.schema.string "first_name", mode: :required
  t.schema.record "cities_lived", mode: :required do |s|
    s.string "place", mode: :required
    s.integer "number_of_years", mode: :required
  end
end

You can define the schema using a nested block.

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.create_table "my_table" do |t|
  t.name = "My Table",
  t.description = "A description of my table."
  t.schema do |s|
    s.string "first_name", mode: :required
    s.record "cities_lived", mode: :repeated do |r|
      r.string "place", mode: :required
      r.integer "number_of_years", mode: :required
    end
  end
end

Yields:

  • (table)

    a block for setting the table

Yield Parameters:

  • table (Table)

    the table object to be updated



391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
# File 'lib/google/cloud/bigquery/dataset.rb', line 391

def create_table table_id, name: nil, description: nil, fields: nil
  ensure_service!
  new_tb = Google::Apis::BigqueryV2::Table.new(
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id,
      table_id: table_id))
  updater = Table::Updater.new(new_tb).tap do |tb|
    tb.name = name unless name.nil?
    tb.description = description unless description.nil?
    tb.schema.fields = fields unless fields.nil?
  end

  yield updater if block_given?

  gapi = service.insert_table dataset_id, updater.to_gapi
  Table.from_gapi gapi, service
end

#create_view(table_id, query, name: nil, description: nil) ⇒ Google::Cloud::Bigquery::View

Creates a new view table from the given query.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]"

A name and description can be provided:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
view = dataset.create_view "my_view",
          "SELECT name, age FROM [proj:dataset.users]",
          name: "My View", description: "This is my view"


443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
# File 'lib/google/cloud/bigquery/dataset.rb', line 443

def create_view table_id, query, name: nil, description: nil
  new_view_opts = {
    table_reference: Google::Apis::BigqueryV2::TableReference.new(
      project_id: project_id, dataset_id: dataset_id, table_id: table_id
    ),
    friendly_name: name,
    description: description,
    view: Google::Apis::BigqueryV2::ViewDefinition.new(
      query: query
    )
  }.delete_if { |_, v| v.nil? }
  new_view = Google::Apis::BigqueryV2::Table.new new_view_opts

  gapi = service.insert_table dataset_id, new_view
  Table.from_gapi gapi, service
end

#created_atObject

The time when this dataset was created.



180
181
182
183
184
185
186
187
# File 'lib/google/cloud/bigquery/dataset.rb', line 180

def created_at
  ensure_full_data!
  begin
    Time.at(Integer(@gapi.creation_time) / 1000.0)
  rescue
    nil
  end
end

#dataset_idObject

A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.



68
69
70
# File 'lib/google/cloud/bigquery/dataset.rb', line 68

def dataset_id
  @gapi.dataset_reference.dataset_id
end

#dataset_refObject

The gapi fragment containing the Project ID and Dataset ID as a camel-cased hash.



85
86
87
88
89
# File 'lib/google/cloud/bigquery/dataset.rb', line 85

def dataset_ref
  dataset_ref = @gapi.dataset_reference
  dataset_ref = dataset_ref.to_h if dataset_ref.respond_to? :to_h
  dataset_ref
end

#default_expirationObject

The default lifetime of all tables in the dataset, in milliseconds.



155
156
157
158
159
160
161
162
# File 'lib/google/cloud/bigquery/dataset.rb', line 155

def default_expiration
  ensure_full_data!
  begin
    Integer @gapi.default_table_expiration_ms
  rescue
    nil
  end
end

#default_expiration=(new_default_expiration) ⇒ Object

Updates the default lifetime of all tables in the dataset, in milliseconds.



170
171
172
173
# File 'lib/google/cloud/bigquery/dataset.rb', line 170

def default_expiration= new_default_expiration
  @gapi.update! default_table_expiration_ms: new_default_expiration
  patch_gapi! :default_table_expiration_ms
end

#delete(force: nil) ⇒ Boolean

Permanently deletes the dataset. The dataset must be empty before it can be deleted unless the force option is set to true.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

dataset = bigquery.dataset "my_dataset"
dataset.delete


292
293
294
295
296
# File 'lib/google/cloud/bigquery/dataset.rb', line 292

def delete force: nil
  ensure_service!
  service.delete_dataset dataset_id, force
  true
end

#descriptionObject

A user-friendly description of the dataset.



135
136
137
138
# File 'lib/google/cloud/bigquery/dataset.rb', line 135

def description
  ensure_full_data!
  @gapi.description
end

#description=(new_description) ⇒ Object

Updates the user-friendly description of the dataset.



145
146
147
148
# File 'lib/google/cloud/bigquery/dataset.rb', line 145

def description= new_description
  @gapi.update! description: new_description
  patch_gapi! :description
end

#etagObject

A string hash of the dataset.



115
116
117
118
# File 'lib/google/cloud/bigquery/dataset.rb', line 115

def etag
  ensure_full_data!
  @gapi.etag
end

#locationObject

The geographic location where the dataset should reside. Possible values include EU and US. The default value is US.



209
210
211
212
# File 'lib/google/cloud/bigquery/dataset.rb', line 209

def location
  ensure_full_data!
  @gapi.location
end

#modified_atObject

The date when this dataset or any of its tables was last modified.



194
195
196
197
198
199
200
201
# File 'lib/google/cloud/bigquery/dataset.rb', line 194

def modified_at
  ensure_full_data!
  begin
    Time.at(Integer(@gapi.last_modified_time) / 1000.0)
  rescue
    nil
  end
end

#nameObject

A descriptive name for the dataset.



96
97
98
# File 'lib/google/cloud/bigquery/dataset.rb', line 96

def name
  @gapi.friendly_name
end

#name=(new_name) ⇒ Object

Updates the descriptive name for the dataset.



105
106
107
108
# File 'lib/google/cloud/bigquery/dataset.rb', line 105

def name= new_name
  @gapi.update! friendly_name: new_name
  patch_gapi! :friendly_name
end

#project_idObject

The ID of the project containing this dataset.



77
78
79
# File 'lib/google/cloud/bigquery/dataset.rb', line 77

def project_id
  @gapi.dataset_reference.project_id
end

#query(query, max: nil, timeout: 10000, dryrun: nil, cache: true) ⇒ Google::Cloud::Bigquery::QueryData

Queries data using the [synchronous method](cloud.google.com/bigquery/querying-data).

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

data = bigquery.query "SELECT name FROM my_table"
data.each do |row|
  puts row["name"]
end


654
655
656
657
658
659
660
661
# File 'lib/google/cloud/bigquery/dataset.rb', line 654

def query query, max: nil, timeout: 10000, dryrun: nil, cache: true
  options = { max: max, timeout: timeout, dryrun: dryrun, cache: cache }
  options[:dataset] ||= dataset_id
  options[:project] ||= project_id
  ensure_service!
  gapi = service.query query, options
  QueryData.from_gapi gapi, service
end

#query_job(query, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, large_results: nil, flatten: nil) ⇒ Google::Cloud::Bigquery::QueryJob

Queries data using the [asynchronous method](cloud.google.com/bigquery/querying-data).

Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery

job = bigquery.query_job "SELECT name FROM my_table"

job.wait_until_done!
if !job.failed?
  job.query_results.each do |row|
    puts row["name"]
  end
end


595
596
597
598
599
600
601
602
603
604
# File 'lib/google/cloud/bigquery/dataset.rb', line 595

def query_job query, priority: "INTERACTIVE", cache: true, table: nil,
              create: nil, write: nil, large_results: nil, flatten: nil
  options = { priority: priority, cache: cache, table: table,
              create: create, write: write,
              large_results: large_results, flatten: flatten }
  options[:dataset] ||= self
  ensure_service!
  gapi = service.query_job query, options
  Job.from_gapi gapi, service
end

#table(table_id) ⇒ Google::Cloud::Bigquery::Table, ...

Retrieves an existing table by ID.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
puts table.name


480
481
482
483
484
485
486
# File 'lib/google/cloud/bigquery/dataset.rb', line 480

def table table_id
  ensure_service!
  gapi = service.get_table dataset_id, table_id
  Table.from_gapi gapi, service
rescue Google::Cloud::NotFoundError
  nil
end

#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>, Array<Google::Cloud::Bigquery::View>

Retrieves the list of tables belonging to the dataset.

Examples:

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.each do |table|
  puts table.name
end

Retrieve all tables: (See Table::List#all)

require "google/cloud"

gcloud = Google::Cloud.new
bigquery = gcloud.bigquery
dataset = bigquery.dataset "my_dataset"
tables = dataset.tables
tables.all do |table|
  puts table.name
end


523
524
525
526
527
528
# File 'lib/google/cloud/bigquery/dataset.rb', line 523

def tables token: nil, max: nil
  ensure_service!
  options = { token: token, max: max }
  gapi = service.list_tables dataset_id, options
  Table::List.from_gapi gapi, service, dataset_id, max
end