Class: Google::Cloud::Bigquery::Dataset
- Inherits:
-
Object
- Object
- Google::Cloud::Bigquery::Dataset
- Defined in:
- lib/google/cloud/bigquery/dataset.rb,
lib/google/cloud/bigquery/dataset/tag.rb,
lib/google/cloud/bigquery/dataset/list.rb,
lib/google/cloud/bigquery/dataset/access.rb
Overview
Dataset
Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.
Direct Known Subclasses
Defined Under Namespace
Classes: Access, List, Tag, Updater
Attributes collapse
-
#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access
Retrieves the access rules for a Dataset.
-
#api_url ⇒ String?
A URL that can be used to access the dataset using the REST API.
-
#created_at ⇒ Time?
The time when this dataset was created.
-
#dataset_id ⇒ String
A unique ID for this dataset, without the project name.
-
#default_collation ⇒ String?
The default collation of the dataset.
-
#default_collation=(new_default_collation) ⇒ Object
Updates the default collation of the dataset.
-
#default_encryption ⇒ EncryptionConfiguration?
The EncryptionConfiguration object that represents the default encryption method for all tables and models in the dataset.
-
#default_encryption=(value) ⇒ Object
Set the EncryptionConfiguration object that represents the default encryption method for all tables and models in the dataset.
-
#default_expiration ⇒ Integer?
The default lifetime of all tables in the dataset, in milliseconds.
-
#default_expiration=(new_default_expiration) ⇒ Object
Updates the default lifetime of all tables in the dataset, in milliseconds.
-
#description ⇒ String?
A user-friendly description of the dataset.
-
#description=(new_description) ⇒ Object
Updates the user-friendly description of the dataset.
-
#etag ⇒ String?
The ETag hash of the dataset.
-
#labels ⇒ Hash<String, String>?
A hash of user-provided labels associated with this dataset.
-
#labels=(labels) ⇒ Object
Updates the hash of user-provided labels associated with this dataset.
-
#location ⇒ String?
The geographic location where the dataset should reside.
-
#modified_at ⇒ Time?
The date when this dataset or any of its tables was last modified.
-
#name ⇒ String?
A descriptive name for the dataset.
-
#name=(new_name) ⇒ Object
Updates the descriptive name for the dataset.
-
#project_id ⇒ String
The ID of the project containing this dataset.
-
#storage_billing_model ⇒ String?
Gets the Storage Billing Model for the dataset.
-
#storage_billing_model=(value) ⇒ Object
Sets the Storage Billing Model for the dataset.
-
#tags ⇒ Google::Cloud::Bigquery::Dataset::Tag
Retrieves the tags associated with this dataset.
Lifecycle collapse
-
#delete(force: nil) ⇒ Boolean
Permanently deletes the dataset.
Table collapse
-
#create_materialized_view(table_id, query, name: nil, description: nil, enable_refresh: nil, refresh_interval_ms: nil) ⇒ Google::Cloud::Bigquery::Table
Creates a new materialized view.
-
#create_table(table_id, name: nil, description: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table
Creates a new table.
-
#create_view(table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ Google::Cloud::Bigquery::Table
Creates a new view, which is a virtual table defined by the given SQL query.
-
#table(table_id, skip_lookup: nil, view: nil) ⇒ Google::Cloud::Bigquery::Table?
Retrieves an existing table by ID.
-
#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>
Retrieves the list of tables belonging to the dataset.
Model collapse
-
#model(model_id, skip_lookup: nil) ⇒ Google::Cloud::Bigquery::Model?
Retrieves an existing model by ID.
-
#models(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Model>
Retrieves the list of models belonging to the dataset.
Routine collapse
-
#create_routine(routine_id) {|routine| ... } ⇒ Google::Cloud::Bigquery::Routine
Creates a new routine.
-
#routine(routine_id, skip_lookup: nil) ⇒ Google::Cloud::Bigquery::Routine?
Retrieves an existing routine by ID.
-
#routines(token: nil, max: nil, filter: nil) ⇒ Array<Google::Cloud::Bigquery::Routine>
Retrieves the list of routines belonging to the dataset.
Data collapse
-
#build_access_entry(target_types: nil) ⇒ Google::Apis::BigqueryV2::DatasetAccessEntry
Build an object of type Google::Apis::BigqueryV2::DatasetAccessEntry from the self.
-
#exists?(force: false) ⇒ Boolean
Determines whether the dataset exists in the BigQuery service.
-
#external(url, format: nil) {|ext| ... } ⇒ External::DataSource
Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery.
-
#insert(table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::InsertResponse
Inserts data into the given table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.
-
#insert_async(table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10_000_000, max_rows: 500, interval: 10, threads: 4, view: nil) {|response| ... } ⇒ Table::AsyncInserter
Create an asynchronous inserter object used to insert rows in batches.
-
#load(table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ Boolean
Loads data into the provided destination table using a synchronous method that blocks for a response.
-
#load_job(table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ Google::Cloud::Bigquery::LoadJob
Loads data into the provided destination table using an asynchronous method.
-
#query(query, params: nil, types: nil, external: nil, max: nil, cache: true, standard_sql: nil, legacy_sql: nil, session_id: nil, format_options_use_int64_timestamp: true, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::Data
Queries data and waits for the results.
-
#query_job(query, params: nil, types: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dryrun: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil, create_session: nil, session_id: nil, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::QueryJob
Queries data by creating a query job.
-
#reference? ⇒ Boolean
Whether the dataset was created without retrieving the resource representation from the BigQuery service.
-
#reload! ⇒ Google::Cloud::Bigquery::Dataset
(also: #refresh!)
Reloads the dataset with current data from the BigQuery service.
-
#resource? ⇒ Boolean
Whether the dataset was created with a resource representation from the BigQuery service.
-
#resource_full? ⇒ Boolean
Whether the dataset was created with a full resource representation from the BigQuery service.
-
#resource_partial? ⇒ Boolean
Whether the dataset was created with a partial resource representation from the BigQuery service by retrieval through Project#datasets.
Instance Method Details
#access {|access| ... } ⇒ Google::Cloud::Bigquery::Dataset::Access
Retrieves the access rules for a Dataset. The rules can be updated when passing a block, see Access for all the methods available.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
591 592 593 594 595 596 597 598 599 600 601 602 603 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 591 def access ensure_full_data! reload! unless resource_full? access_builder = Access.from_gapi @gapi if block_given? yield access_builder if access_builder.changed? @gapi.update! access: access_builder.to_gapi patch_gapi! :access end end access_builder.freeze end |
#api_url ⇒ String?
A URL that can be used to access the dataset using the REST API.
213 214 215 216 217 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 213 def api_url return nil if reference? ensure_full_data! @gapi.self_link end |
#build_access_entry(target_types: nil) ⇒ Google::Apis::BigqueryV2::DatasetAccessEntry
Build an object of type Google::Apis::BigqueryV2::DatasetAccessEntry from the self.
3056 3057 3058 3059 3060 3061 3062 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 3056 def build_access_entry target_types: nil params = { dataset: dataset_ref, target_types: target_types }.compact Google::Apis::BigqueryV2::DatasetAccessEntry.new(**params) end |
#create_materialized_view(table_id, query, name: nil, description: nil, enable_refresh: nil, refresh_interval_ms: nil) ⇒ Google::Cloud::Bigquery::Table
Creates a new materialized view.
Materialized views are precomputed views that periodically cache results of a query for increased performance and efficiency. BigQuery leverages precomputed results from materialized views and whenever possible reads only delta changes from the base table to compute up-to-date results.
Queries that use materialized views are generally faster and consume less resources than queries that retrieve the same data only from the base table. Materialized views are helpful to significantly boost performance of workloads that have the characteristic of common and repeated queries.
For logical views, see #create_view.
913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 913 def create_materialized_view table_id, query, name: nil, description: nil, enable_refresh: nil, refresh_interval_ms: nil new_view_opts = { table_reference: Google::Apis::BigqueryV2::TableReference.new( project_id: project_id, dataset_id: dataset_id, table_id: table_id ), friendly_name: name, description: description, materialized_view: Google::Apis::BigqueryV2::MaterializedViewDefinition.new( enable_refresh: enable_refresh, query: query, refresh_interval_ms: refresh_interval_ms ) }.compact new_view = Google::Apis::BigqueryV2::Table.new(**new_view_opts) gapi = service.insert_table dataset_id, new_view Table.from_gapi gapi, service end |
#create_routine(routine_id) {|routine| ... } ⇒ Google::Cloud::Bigquery::Routine
Creates a new routine. The following attributes may be set in the yielded block: Routine::Updater#routine_type=, Routine::Updater#language=, Routine::Updater#arguments=, Routine::Updater#return_type=, Routine::Updater#imported_libraries=, Routine::Updater#body=, and Routine::Updater#description=.
1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1185 def create_routine routine_id ensure_service! new_tb = Google::Apis::BigqueryV2::Routine.new( routine_reference: Google::Apis::BigqueryV2::RoutineReference.new( project_id: project_id, dataset_id: dataset_id, routine_id: routine_id ) ) updater = Routine::Updater.new new_tb yield updater if block_given? gapi = service.insert_routine dataset_id, updater.to_gapi Routine.from_gapi gapi, service end |
#create_table(table_id, name: nil, description: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::Table
Creates a new table. If you are adapting existing code that was written for the Rest API , you can pass the table's schema as a hash (see example.)
752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 752 def create_table table_id, name: nil, description: nil ensure_service! new_tb = Google::Apis::BigqueryV2::Table.new( table_reference: Google::Apis::BigqueryV2::TableReference.new( project_id: project_id, dataset_id: dataset_id, table_id: table_id ) ) updater = Table::Updater.new(new_tb).tap do |tb| tb.name = name unless name.nil? tb.description = description unless description.nil? end yield updater if block_given? gapi = service.insert_table dataset_id, updater.to_gapi Table.from_gapi gapi, service end |
#create_view(table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil) ⇒ Google::Cloud::Bigquery::Table
Creates a new view, which is a virtual table defined by the given SQL query.
With BigQuery's logical views, the query that defines the view is re-executed every time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query. (See Table#view? and Table#query.)
For materialized views, see #create_materialized_view.
837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 837 def create_view table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil use_legacy_sql = Convert.resolve_legacy_sql standard_sql, legacy_sql new_view_opts = { table_reference: Google::Apis::BigqueryV2::TableReference.new( project_id: project_id, dataset_id: dataset_id, table_id: table_id ), friendly_name: name, description: description, view: Google::Apis::BigqueryV2::ViewDefinition.new( query: query, use_legacy_sql: use_legacy_sql, user_defined_function_resources: udfs_gapi(udfs) ) }.compact new_view = Google::Apis::BigqueryV2::Table.new(**new_view_opts) gapi = service.insert_table dataset_id, new_view Table.from_gapi gapi, service end |
#created_at ⇒ Time?
The time when this dataset was created.
327 328 329 330 331 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 327 def created_at return nil if reference? ensure_full_data! Convert.millis_to_time @gapi.creation_time end |
#dataset_id ⇒ String
A unique ID for this dataset, without the project name.
133 134 135 136 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 133 def dataset_id return reference.dataset_id if reference? @gapi.dataset_reference.dataset_id end |
#default_collation ⇒ String?
The default collation of the dataset.
258 259 260 261 262 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 258 def default_collation return nil if reference? ensure_full_data! @gapi.default_collation end |
#default_collation=(new_default_collation) ⇒ Object
Updates the default collation of the dataset.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
275 276 277 278 279 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 275 def default_collation= new_default_collation reload! unless resource_full? @gapi.update! default_collation: new_default_collation patch_gapi! :default_collation end |
#default_encryption ⇒ EncryptionConfiguration?
The EncryptionConfiguration object that represents the default encryption method for all tables and models in the dataset. Once this property is set, all newly-created partitioned tables and models in the dataset will have their encryption set to this value, unless table creation request (or query) overrides it.
Present only if this dataset is using custom default encryption.
459 460 461 462 463 464 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 459 def default_encryption return nil if reference? ensure_full_data! return nil if @gapi.default_encryption_configuration.nil? EncryptionConfiguration.from_gapi(@gapi.default_encryption_configuration).freeze end |
#default_encryption=(value) ⇒ Object
Set the EncryptionConfiguration object that represents the default encryption method for all tables and models in the dataset. Once this property is set, all newly-created partitioned tables and models in the dataset will have their encryption set to this value, unless table creation request (or query) overrides it.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
495 496 497 498 499 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 495 def default_encryption= value ensure_full_data! @gapi.default_encryption_configuration = value.to_gapi patch_gapi! :default_encryption_configuration end |
#default_expiration ⇒ Integer?
The default lifetime of all tables in the dataset, in milliseconds.
290 291 292 293 294 295 296 297 298 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 290 def default_expiration return nil if reference? ensure_full_data! begin Integer @gapi.default_table_expiration_ms rescue StandardError nil end end |
#default_expiration=(new_default_expiration) ⇒ Object
Updates the default lifetime of all tables in the dataset, in milliseconds.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
313 314 315 316 317 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 313 def default_expiration= new_default_expiration reload! unless resource_full? @gapi.update! default_table_expiration_ms: new_default_expiration patch_gapi! :default_table_expiration_ms end |
#delete(force: nil) ⇒ Boolean
Permanently deletes the dataset. The dataset must be empty before it
can be deleted unless the force option is set to true.
640 641 642 643 644 645 646 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 640 def delete force: nil ensure_service! service.delete_dataset dataset_id, force # Set flag for #exists? @exists = false true end |
#description ⇒ String?
A user-friendly description of the dataset.
227 228 229 230 231 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 227 def description return nil if reference? ensure_full_data! @gapi.description end |
#description=(new_description) ⇒ Object
Updates the user-friendly description of the dataset.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
244 245 246 247 248 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 244 def description= new_description reload! unless resource_full? @gapi.update! description: new_description patch_gapi! :description end |
#etag ⇒ String?
The ETag hash of the dataset.
199 200 201 202 203 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 199 def etag return nil if reference? ensure_full_data! @gapi.etag end |
#exists?(force: false) ⇒ Boolean
Determines whether the dataset exists in the BigQuery service. The
result is cached locally. To refresh state, set force to true.
2691 2692 2693 2694 2695 2696 2697 2698 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 2691 def exists? force: false return gapi_exists? if force # If we have a memoized value, return it return @exists unless @exists.nil? # Always true if we have a gapi object return true if resource? gapi_exists? end |
#external(url, format: nil) {|ext| ... } ⇒ External::DataSource
Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.
2035 2036 2037 2038 2039 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 2035 def external url, format: nil ext = External.from_urls url, format yield ext if block_given? ext end |
#insert(table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil) {|table| ... } ⇒ Google::Cloud::Bigquery::InsertResponse
Inserts data into the given table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.
Simple Ruby types are generally accepted per JSON rules, along with the following support for BigQuery's more complex types:
| BigQuery | Ruby | Notes |
|---|---|---|
NUMERIC |
BigDecimal |
BigDecimal values will be rounded to scale 9. |
BIGNUMERIC |
String |
Pass as String to avoid rounding to scale 9. |
DATETIME |
DateTime |
DATETIME does not support time zone. |
DATE |
Date |
|
GEOGRAPHY |
String |
|
JSON |
String (Stringified JSON) |
String, as JSON does not have a schema to verify. |
TIMESTAMP |
Time |
|
TIME |
Google::Cloud::BigQuery::Time |
|
BYTES |
File, IO, StringIO, or similar |
|
ARRAY |
Array |
Nested arrays, nil values are not supported. |
STRUCT |
Hash |
Hash keys may be strings or symbols. |
Because BigQuery's streaming API is designed for high insertion rates, modifications to the underlying table metadata are eventually consistent when interacting with the streaming system. In most cases metadata changes are propagated within minutes, but during this period API responses may reflect the inconsistent state of the table.
The value :skip can be provided to skip the generation of IDs for all rows, or to skip the generation of an
ID for a specific row in the array.
2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 2934 def insert table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil, &block rows = [rows] if rows.is_a? Hash raise ArgumentError, "No rows provided" if rows.empty? insert_ids = Array.new(rows.count) { :skip } if insert_ids == :skip insert_ids = Array insert_ids if insert_ids.count.positive? && insert_ids.count != rows.count raise ArgumentError, "insert_ids must be the same size as rows" end if autocreate insert_data_with_autocreate table_id, rows, skip_invalid: skip_invalid, ignore_unknown: ignore_unknown, insert_ids: insert_ids, &block else insert_data table_id, rows, skip_invalid: skip_invalid, ignore_unknown: ignore_unknown, insert_ids: insert_ids end end |
#insert_async(table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10_000_000, max_rows: 500, interval: 10, threads: 4, view: nil) {|response| ... } ⇒ Table::AsyncInserter
Create an asynchronous inserter object used to insert rows in batches.
3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 3027 def insert_async table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10_000_000, max_rows: 500, interval: 10, threads: 4, view: nil, &block ensure_service! # Get table, don't use Dataset#table which handles NotFoundError gapi = service.get_project_table project_id, dataset_id, table_id, metadata_view: view table = Table.from_gapi gapi, service, metadata_view: view # Get the AsyncInserter from the table table.insert_async skip_invalid: skip_invalid, ignore_unknown: ignore_unknown, max_bytes: max_bytes, max_rows: max_rows, interval: interval, threads: threads, &block end |
#labels ⇒ Hash<String, String>?
A hash of user-provided labels associated with this dataset. Labels are used to organize and group datasets. See Using Labels.
The returned hash is frozen and changes are not allowed. Use #labels= to replace the entire hash.
383 384 385 386 387 388 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 383 def labels return nil if reference? m = @gapi.labels m = m.to_h if m.respond_to? :to_h m.dup.freeze end |
#labels=(labels) ⇒ Object
Updates the hash of user-provided labels associated with this dataset. Labels are used to organize and group datasets. See Using Labels.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
426 427 428 429 430 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 426 def labels= labels reload! unless resource_full? @gapi.labels = labels patch_gapi! :labels end |
#load(table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ Boolean
Loads data into the provided destination table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #load_job.
For the source of the data, you can pass a google-cloud storage file
path or a google-cloud-storage File instance. Or, you can upload a
file directly. See Loading Data with a POST
Request.
The geographic location for the job ("US", "EU", etc.) can be set via LoadJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.
2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 2623 def load table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil, &block job = load_job table_id, files, format: format, create: create, write: write, projection_fields: projection_fields, jagged_rows: jagged_rows, quoted_newlines: quoted_newlines, encoding: encoding, delimiter: delimiter, ignore_unknown: ignore_unknown, max_bad_records: max_bad_records, quote: quote, skip_leading: skip_leading, schema: schema, autodetect: autodetect, null_marker: null_marker, session_id: session_id, date_format: date_format, datetime_format: datetime_format, time_format: time_format, timestamp_format: , null_markers: null_markers, source_column_match: source_column_match, time_zone: time_zone, reference_file_schema_uri: reference_file_schema_uri, preserve_ascii_control_characters: preserve_ascii_control_characters, reservation: reservation, &block job.wait_until_done! ensure_job_succeeded! job true end |
#load_job(table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil) {|updater| ... } ⇒ Google::Cloud::Bigquery::LoadJob
Loads data into the provided destination table using an asynchronous method. In this method, a LoadJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #load.
For the source of the data, you can pass a google-cloud storage file
path or a google-cloud-storage File instance. Or, you can upload a
file directly. See Loading Data with a POST
Request.
The geographic location for the job ("US", "EU", etc.) can be set via LoadJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.
2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 2337 def load_job table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil, dryrun: nil, create_session: nil, session_id: nil, date_format: nil, datetime_format: nil, time_format: nil, timestamp_format: nil, null_markers: nil, source_column_match: nil, time_zone: nil, reference_file_schema_uri: nil, preserve_ascii_control_characters: nil, reservation: nil ensure_service! updater = load_job_updater table_id, format: format, create: create, write: write, projection_fields: projection_fields, jagged_rows: jagged_rows, quoted_newlines: quoted_newlines, encoding: encoding, delimiter: delimiter, ignore_unknown: ignore_unknown, max_bad_records: max_bad_records, quote: quote, skip_leading: skip_leading, dryrun: dryrun, schema: schema, job_id: job_id, prefix: prefix, labels: labels, autodetect: autodetect, null_marker: null_marker, create_session: create_session, session_id: session_id, date_format: date_format, datetime_format: datetime_format, time_format: time_format, timestamp_format: , null_markers: null_markers, source_column_match: source_column_match, time_zone: time_zone, reference_file_schema_uri: reference_file_schema_uri, preserve_ascii_control_characters: preserve_ascii_control_characters, reservation: reservation yield updater if block_given? load_local_or_uri files, updater end |
#location ⇒ String?
The geographic location where the dataset should reside. Possible
values include EU and US. The default value is US.
356 357 358 359 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 356 def location return nil if reference? @gapi.location end |
#model(model_id, skip_lookup: nil) ⇒ Google::Cloud::Bigquery::Model?
Retrieves an existing model by ID.
1066 1067 1068 1069 1070 1071 1072 1073 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1066 def model model_id, skip_lookup: nil ensure_service! return Model.new_reference project_id, dataset_id, model_id, service if skip_lookup gapi = service.get_model dataset_id, model_id Model.from_gapi_json gapi, service rescue Google::Cloud::NotFoundError nil end |
#models(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Model>
Retrieves the list of models belonging to the dataset.
1109 1110 1111 1112 1113 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1109 def models token: nil, max: nil ensure_service! gapi = service.list_models dataset_id, token: token, max: max Model::List.from_gapi gapi, service, dataset_id, max end |
#modified_at ⇒ Time?
The date when this dataset or any of its tables was last modified.
341 342 343 344 345 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 341 def modified_at return nil if reference? ensure_full_data! Convert.millis_to_time @gapi.last_modified_time end |
#name ⇒ String?
A descriptive name for the dataset.
168 169 170 171 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 168 def name return nil if reference? @gapi.friendly_name end |
#name=(new_name) ⇒ Object
Updates the descriptive name for the dataset.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
185 186 187 188 189 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 185 def name= new_name reload! unless resource_full? @gapi.update! friendly_name: new_name patch_gapi! :friendly_name end |
#project_id ⇒ String
The ID of the project containing this dataset.
145 146 147 148 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 145 def project_id return reference.project_id if reference? @gapi.dataset_reference.project_id end |
#query(query, params: nil, types: nil, external: nil, max: nil, cache: true, standard_sql: nil, legacy_sql: nil, session_id: nil, format_options_use_int64_timestamp: true, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::Data
Queries data and waits for the results. In this method, a QueryJob is created and its results are saved to a temporary table, then read from the table. Timeouts and transient errors are generally handled as needed to complete the query. When used for executing DDL/DML statements, this method does not return row data.
Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.
The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1960 def query query, params: nil, types: nil, external: nil, max: nil, cache: true, standard_sql: nil, legacy_sql: nil, session_id: nil, format_options_use_int64_timestamp: true, reservation: nil, &block job = query_job query, params: params, types: types, external: external, cache: cache, standard_sql: standard_sql, legacy_sql: legacy_sql, session_id: session_id, reservation: reservation, &block job.wait_until_done! ensure_job_succeeded! job job.data max: max, format_options_use_int64_timestamp: end |
#query_job(query, params: nil, types: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dryrun: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil, create_session: nil, session_id: nil, reservation: nil) {|job| ... } ⇒ Google::Cloud::Bigquery::QueryJob
Queries data by creating a query job.
Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.
The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.
1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1634 def query_job query, params: nil, types: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dryrun: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil, create_session: nil, session_id: nil, reservation: nil ensure_service! = { params: params, types: types, external: external, priority: priority, cache: cache, table: table, create: create, write: write, dryrun: dryrun, standard_sql: standard_sql, legacy_sql: legacy_sql, large_results: large_results, flatten: flatten, maximum_billing_tier: maximum_billing_tier, maximum_bytes_billed: maximum_bytes_billed, job_id: job_id, prefix: prefix, labels: labels, udfs: udfs, create_session: create_session, session_id: session_id, reservation: reservation } updater = QueryJob::Updater. service, query, updater.dataset = self updater.location = location if location # may be dataset reference yield updater if block_given? gapi = service.query_job updater.to_gapi Job.from_gapi gapi, service end |
#reference? ⇒ Boolean
Whether the dataset was created without retrieving the resource representation from the BigQuery service.
2718 2719 2720 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 2718 def reference? @gapi.nil? end |
#reload! ⇒ Google::Cloud::Bigquery::Dataset Also known as: refresh!
Reloads the dataset with current data from the BigQuery service.
2661 2662 2663 2664 2665 2666 2667 2668 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 2661 def reload! ensure_service! @gapi = service.get_project_dataset project_id, dataset_id, access_policy_version: @access_policy_version, dataset_view: @dataset_view @reference = nil @exists = nil self end |
#resource? ⇒ Boolean
Whether the dataset was created with a resource representation from the BigQuery service.
2740 2741 2742 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 2740 def resource? !@gapi.nil? end |
#resource_full? ⇒ Boolean
Whether the dataset was created with a full resource representation from the BigQuery service.
2787 2788 2789 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 2787 def resource_full? @gapi.is_a? Google::Apis::BigqueryV2::Dataset end |
#resource_partial? ⇒ Boolean
Whether the dataset was created with a partial resource representation from the BigQuery service by retrieval through Project#datasets. See Datasets: list response for the contents of the partial representation. Accessing any attribute outside of the partial representation will result in loading the full representation.
2767 2768 2769 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 2767 def resource_partial? @gapi.is_a? Google::Apis::BigqueryV2::DatasetList::Dataset end |
#routine(routine_id, skip_lookup: nil) ⇒ Google::Cloud::Bigquery::Routine?
Retrieves an existing routine by ID.
1232 1233 1234 1235 1236 1237 1238 1239 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1232 def routine routine_id, skip_lookup: nil ensure_service! return Routine.new_reference project_id, dataset_id, routine_id, service if skip_lookup gapi = service.get_routine dataset_id, routine_id Routine.from_gapi gapi, service rescue Google::Cloud::NotFoundError nil end |
#routines(token: nil, max: nil, filter: nil) ⇒ Array<Google::Cloud::Bigquery::Routine>
Retrieves the list of routines belonging to the dataset.
1277 1278 1279 1280 1281 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1277 def routines token: nil, max: nil, filter: nil ensure_service! gapi = service.list_routines dataset_id, token: token, max: max, filter: filter Routine::List.from_gapi gapi, service, dataset_id, max, filter: filter end |
#storage_billing_model ⇒ String?
Gets the Storage Billing Model for the dataset.
522 523 524 525 526 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 522 def storage_billing_model return nil if reference? ensure_full_data! @gapi.storage_billing_model end |
#storage_billing_model=(value) ⇒ Object
Sets the Storage Billing Model for the dataset.
545 546 547 548 549 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 545 def storage_billing_model= value ensure_full_data! @gapi.storage_billing_model = value patch_gapi! :storage_billing_model end |
#table(table_id, skip_lookup: nil, view: nil) ⇒ Google::Cloud::Bigquery::Table?
Retrieves an existing table by ID.
985 986 987 988 989 990 991 992 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 985 def table table_id, skip_lookup: nil, view: nil ensure_service! return Table.new_reference project_id, dataset_id, table_id, service if skip_lookup gapi = service.get_project_table project_id, dataset_id, table_id, metadata_view: view Table.from_gapi gapi, service, metadata_view: view rescue Google::Cloud::NotFoundError nil end |
#tables(token: nil, max: nil) ⇒ Array<Google::Cloud::Bigquery::Table>
Retrieves the list of tables belonging to the dataset.
1028 1029 1030 1031 1032 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 1028 def tables token: nil, max: nil ensure_service! gapi = service.list_tables dataset_id, token: token, max: max Table::List.from_gapi gapi, service, dataset_id, max end |
#tags ⇒ Google::Cloud::Bigquery::Dataset::Tag
Retrieves the tags associated with this dataset. Tag keys are globally unique, and managed via the resource manager API.
for more information.
614 615 616 617 618 |
# File 'lib/google/cloud/bigquery/dataset.rb', line 614 def ensure_full_data! return nil if @gapi..nil? @gapi..map { |gapi| Tag.from_gapi gapi } end |