Class: Google::Cloud::Bigquery::External::DataSource
- Inherits:
-
Object
- Object
- Google::Cloud::Bigquery::External::DataSource
- Defined in:
- lib/google/cloud/bigquery/external/data_source.rb
Overview
DataSource
External::DataSource and its subclasses represents an external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.
The AVRO and Datastore Backup formats use DataSource. See CsvSource, JsonSource, SheetsSource, BigtableSource for the other formats.
Direct Known Subclasses
AvroSource, BigtableSource, CsvSource, JsonSource, ParquetSource, SheetsSource
Instance Method Summary collapse
-
#autodetect ⇒ Boolean
Indicates if the schema and format options are detected automatically.
-
#autodetect=(new_autodetect) ⇒ Object
Set whether to detect schema and format options automatically.
-
#avro? ⇒ Boolean
Whether the data format is "AVRO".
-
#backup? ⇒ Boolean
Whether the data format is "DATASTORE_BACKUP".
-
#bigtable? ⇒ Boolean
Whether the data format is "BIGTABLE".
-
#compression ⇒ String
The compression type of the data source.
-
#compression=(new_compression) ⇒ Object
Set the compression type of the data source.
-
#csv? ⇒ Boolean
Whether the data format is "CSV".
-
#date_format ⇒ String?
Format used to parse DATE values.
-
#date_format=(date_format) ⇒ Object
Sets the format used to parse DATE values.
-
#datetime_format ⇒ String?
Format used to parse DATETIME values.
-
#datetime_format=(datetime_format) ⇒ Object
Sets the format used to parse DATETIME values.
-
#format ⇒ String
The data format.
-
#hive_partitioning? ⇒ Boolean
Checks if hive partitioning options are set.
-
#hive_partitioning_mode ⇒ String?
The mode of hive partitioning to use when reading data.
-
#hive_partitioning_mode=(mode) ⇒ Object
Sets the mode of hive partitioning to use when reading data.
-
#hive_partitioning_require_partition_filter=(require_partition_filter) ⇒ Object
Sets whether queries over the table using this external data source require a partition filter that can be used for partition elimination to be specified.
-
#hive_partitioning_require_partition_filter? ⇒ Boolean
Whether queries over the table using this external data source require a partition filter that can be used for partition elimination to be specified.
-
#hive_partitioning_source_uri_prefix ⇒ String?
The common prefix for all source uris when hive partition detection is requested.
-
#hive_partitioning_source_uri_prefix=(source_uri_prefix) ⇒ Object
Sets the common prefix for all source uris when hive partition detection is requested.
-
#ignore_unknown ⇒ Boolean
Indicates if BigQuery should allow extra values that are not represented in the table schema.
-
#ignore_unknown=(new_ignore_unknown) ⇒ Object
Set whether BigQuery should allow extra values that are not represented in the table schema.
-
#json? ⇒ Boolean
Whether the data format is "NEWLINE_DELIMITED_JSON".
-
#max_bad_records ⇒ Integer
The maximum number of bad records that BigQuery can ignore when reading data.
-
#max_bad_records=(new_max_bad_records) ⇒ Object
Set the maximum number of bad records that BigQuery can ignore when reading data.
-
#orc? ⇒ Boolean
Whether the data format is "ORC".
-
#parquet? ⇒ Boolean
Whether the data format is "PARQUET".
-
#reference_file_schema_uri ⇒ String?
The URI of a reference file with the table schema.
-
#reference_file_schema_uri=(uri) ⇒ Object
Sets the URI of a reference file with the table schema.
-
#sheets? ⇒ Boolean
Whether the data format is "GOOGLE_SHEETS".
-
#time_format ⇒ String?
Format used to parse TIME values.
-
#time_format=(time_format) ⇒ Object
Sets the format used to parse TIME values.
-
#time_zone ⇒ String?
Time zone used when parsing timestamp values that do not have specific time zone information (e.g.
2024-04-20 12:34:56). -
#time_zone=(time_zone) ⇒ Object
Sets the time zone used when parsing timestamp values that do not have specific time zone information (e.g.
2024-04-20 12:34:56). -
#timestamp_format ⇒ String?
Format used to parse TIMESTAMP values.
-
#timestamp_format=(timestamp_format) ⇒ Object
Sets the format used to parse TIMESTAMP values.
-
#urls ⇒ Array<String>
The fully-qualified URIs that point to your data in Google Cloud.
Instance Method Details
#autodetect ⇒ Boolean
Indicates if the schema and format options are detected automatically.
318 319 320 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 318 def autodetect @gapi.autodetect end |
#autodetect=(new_autodetect) ⇒ Object
Set whether to detect schema and format options automatically. Any option specified explicitly will be honored.
340 341 342 343 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 340 def autodetect= new_autodetect frozen_check! @gapi.autodetect = new_autodetect end |
#avro? ⇒ Boolean
Whether the data format is "AVRO".
183 184 185 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 183 def avro? @gapi.source_format == "AVRO" end |
#backup? ⇒ Boolean
Whether the data format is "DATASTORE_BACKUP".
203 204 205 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 203 def backup? @gapi.source_format == "DATASTORE_BACKUP" end |
#bigtable? ⇒ Boolean
Whether the data format is "BIGTABLE".
223 224 225 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 223 def bigtable? @gapi.source_format == "BIGTABLE" end |
#compression ⇒ String
The compression type of the data source. Possible values include
"GZIP" and nil. The default value is nil. This setting is
ignored for Google Cloud Bigtable, Google Cloud Datastore backups
and Avro formats. Optional.
364 365 366 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 364 def compression @gapi.compression end |
#compression=(new_compression) ⇒ Object
Set the compression type of the data source. Possible values include
"GZIP" and nil. The default value is nil. This setting is
ignored for Google Cloud Bigtable, Google Cloud Datastore backups
and Avro formats. Optional.
388 389 390 391 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 388 def compression= new_compression frozen_check! @gapi.compression = new_compression end |
#csv? ⇒ Boolean
Whether the data format is "CSV".
123 124 125 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 123 def csv? @gapi.source_format == "CSV" end |
#date_format ⇒ String?
Format used to parse DATE values. Supports SQL-style values. See date and time formatting guide
984 985 986 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 984 def date_format @gapi.date_format end |
#date_format=(date_format) ⇒ Object
Sets the format used to parse DATE values. Supports SQL-style values. See date and time formatting guide
1006 1007 1008 1009 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 1006 def date_format= date_format frozen_check! @gapi.date_format = date_format end |
#datetime_format ⇒ String?
Format used to parse DATETIME values. Supports SQL-style values. See date and time formatting guide
940 941 942 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 940 def datetime_format @gapi.datetime_format end |
#datetime_format=(datetime_format) ⇒ Object
Sets the format used to parse DATETIME values. Supports SQL-style values. See date and time formatting guide
962 963 964 965 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 962 def datetime_format= datetime_format frozen_check! @gapi.datetime_format = datetime_format end |
#format ⇒ String
The data format. For CSV files, specify "CSV". For Google sheets, specify "GOOGLE_SHEETS". For newline-delimited JSON, specify "NEWLINE_DELIMITED_JSON". For Avro files, specify "AVRO". For Google Cloud Datastore backups, specify "DATASTORE_BACKUP". [Beta] For Google Cloud Bigtable, specify "BIGTABLE".
103 104 105 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 103 def format @gapi.source_format end |
#hive_partitioning? ⇒ Boolean
Checks if hive partitioning options are set.
Not all storage formats support hive partitioning. Requesting hive partitioning on an unsupported format
will lead to an error. Currently supported types include: avro, csv, json, orc and parquet.
If your data is stored in ORC or Parquet on Cloud Storage, see Querying columnar formats on Cloud
Storage.
578 579 580 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 578 def hive_partitioning? !@gapi..nil? end |
#hive_partitioning_mode ⇒ String?
The mode of hive partitioning to use when reading data. The following modes are supported:
AUTO: automatically infer partition key name(s) and type(s).STRINGS: automatically infer partition key name(s). All types are interpreted as strings.CUSTOM: partition key schema is encoded in the source URI prefix.
609 610 611 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 609 def hive_partitioning_mode @gapi..mode if hive_partitioning? end |
#hive_partitioning_mode=(mode) ⇒ Object
Sets the mode of hive partitioning to use when reading data. The following modes are supported:
auto: automatically infer partition key name(s) and type(s).strings: automatically infer partition key name(s). All types are interpreted as strings.custom: partition key schema is encoded in the source URI prefix.
Not all storage formats support hive partitioning. Requesting hive partitioning on an unsupported format
will lead to an error. Currently supported types include: avro, csv, json, orc and parquet.
If your data is stored in ORC or Parquet on Cloud Storage, see Querying columnar formats on Cloud
Storage.
See #format, #hive_partitioning_require_partition_filter= and #hive_partitioning_source_uri_prefix=.
647 648 649 650 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 647 def hive_partitioning_mode= mode @gapi. ||= Google::Apis::BigqueryV2::HivePartitioningOptions.new @gapi..mode = mode.to_s.upcase end |
#hive_partitioning_require_partition_filter=(require_partition_filter) ⇒ Object
Sets whether queries over the table using this external data source require a partition filter that can be used for partition elimination to be specified.
See #format, #hive_partitioning_mode= and #hive_partitioning_source_uri_prefix=.
708 709 710 711 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 708 def hive_partitioning_require_partition_filter= require_partition_filter @gapi. ||= Google::Apis::BigqueryV2::HivePartitioningOptions.new @gapi..require_partition_filter = require_partition_filter end |
#hive_partitioning_require_partition_filter? ⇒ Boolean
Whether queries over the table using this external data source require a partition filter that can be used for partition elimination to be specified. Note that this field should only be true when creating a permanent external table or querying a temporary external table.
677 678 679 680 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 677 def hive_partitioning_require_partition_filter? return false unless hive_partitioning? !@gapi..require_partition_filter.nil? end |
#hive_partitioning_source_uri_prefix ⇒ String?
The common prefix for all source uris when hive partition detection is requested. The prefix must end immediately before the partition key encoding begins. For example, consider files following this data layout:
gs://bucket/path_to_table/dt=2019-01-01/country=BR/id=7/file.avro
gs://bucket/path_to_table/dt=2018-12-31/country=CA/id=3/file.avro
When hive partitioning is requested with either AUTO or STRINGS mode, the common prefix can be either of
gs://bucket/path_to_table or gs://bucket/path_to_table/ (trailing slash does not matter).
746 747 748 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 746 def hive_partitioning_source_uri_prefix @gapi..source_uri_prefix if hive_partitioning? end |
#hive_partitioning_source_uri_prefix=(source_uri_prefix) ⇒ Object
Sets the common prefix for all source uris when hive partition detection is requested. The prefix must end immediately before the partition key encoding begins. For example, consider files following this data layout:
gs://bucket/path_to_table/dt=2019-01-01/country=BR/id=7/file.avro
gs://bucket/path_to_table/dt=2018-12-31/country=CA/id=3/file.avro
When hive partitioning is requested with either AUTO or STRINGS mode, the common prefix can be either of
gs://bucket/path_to_table or gs://bucket/path_to_table/ (trailing slash does not matter).
See #format, #hive_partitioning_mode= and #hive_partitioning_require_partition_filter=.
785 786 787 788 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 785 def hive_partitioning_source_uri_prefix= source_uri_prefix @gapi. ||= Google::Apis::BigqueryV2::HivePartitioningOptions.new @gapi..source_uri_prefix = source_uri_prefix end |
#ignore_unknown ⇒ Boolean
Indicates if BigQuery should allow extra values that are not
represented in the table schema. If true, the extra values are
ignored. If false, records with extra columns are treated as bad
records, and if there are too many bad records, an invalid error is
returned in the job result. The default value is false.
BigQuery treats trailing columns as an extra in CSV, named values
that don't match any column names in JSON. This setting is ignored
for Google Cloud Bigtable, Google Cloud Datastore backups and Avro
formats. Optional.
419 420 421 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 419 def ignore_unknown @gapi.ignore_unknown_values end |
#ignore_unknown=(new_ignore_unknown) ⇒ Object
Set whether BigQuery should allow extra values that are not
represented in the table schema. If true, the extra values are
ignored. If false, records with extra columns are treated as bad
records, and if there are too many bad records, an invalid error is
returned in the job result. The default value is false.
BigQuery treats trailing columns as an extra in CSV, named values
that don't match any column names in JSON. This setting is ignored
for Google Cloud Bigtable, Google Cloud Datastore backups and Avro
formats. Optional.
449 450 451 452 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 449 def ignore_unknown= new_ignore_unknown frozen_check! @gapi.ignore_unknown_values = new_ignore_unknown end |
#json? ⇒ Boolean
Whether the data format is "NEWLINE_DELIMITED_JSON".
143 144 145 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 143 def json? @gapi.source_format == "NEWLINE_DELIMITED_JSON" end |
#max_bad_records ⇒ Integer
The maximum number of bad records that BigQuery can ignore when reading data. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats.
476 477 478 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 476 def max_bad_records @gapi.max_bad_records end |
#max_bad_records=(new_max_bad_records) ⇒ Object
Set the maximum number of bad records that BigQuery can ignore when reading data. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid. This setting is ignored for Google Cloud Bigtable, Google Cloud Datastore backups and Avro formats.
502 503 504 505 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 502 def max_bad_records= new_max_bad_records frozen_check! @gapi.max_bad_records = new_max_bad_records end |
#orc? ⇒ Boolean
Whether the data format is "ORC".
246 247 248 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 246 def orc? @gapi.source_format == "ORC" end |
#parquet? ⇒ Boolean
Whether the data format is "PARQUET".
269 270 271 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 269 def parquet? @gapi.source_format == "PARQUET" end |
#reference_file_schema_uri ⇒ String?
The URI of a reference file with the table schema. This is enabled
for the following formats: AVRO, PARQUET, ORC.
524 525 526 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 524 def reference_file_schema_uri @gapi.reference_file_schema_uri end |
#reference_file_schema_uri=(uri) ⇒ Object
Sets the URI of a reference file with the table schema. This is
enabled for the following formats: AVRO, PARQUET, ORC.
545 546 547 548 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 545 def reference_file_schema_uri= uri frozen_check! @gapi.reference_file_schema_uri = uri end |
#sheets? ⇒ Boolean
Whether the data format is "GOOGLE_SHEETS".
163 164 165 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 163 def sheets? @gapi.source_format == "GOOGLE_SHEETS" end |
#time_format ⇒ String?
Format used to parse TIME values. Supports SQL-style values. See date and time formatting guide
852 853 854 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 852 def time_format @gapi.time_format end |
#time_format=(time_format) ⇒ Object
Sets the format used to parse TIME values. Supports SQL-style See date and time formatting guide values.
874 875 876 877 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 874 def time_format= time_format frozen_check! @gapi.time_format = time_format end |
#time_zone ⇒ String?
Time zone used when parsing timestamp values that do not have specific
time zone information (e.g. 2024-04-20 12:34:56). The expected format
is an IANA timezone string (e.g. America/Los_Angeles).
808 809 810 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 808 def time_zone @gapi.time_zone end |
#time_zone=(time_zone) ⇒ Object
Sets the time zone used when parsing timestamp values that do not have
specific time zone information (e.g. 2024-04-20 12:34:56). The expected
format is an IANA timezone string (e.g. America/Los_Angeles).
830 831 832 833 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 830 def time_zone= time_zone frozen_check! @gapi.time_zone = time_zone end |
#timestamp_format ⇒ String?
Format used to parse TIMESTAMP values. Supports SQL-style values. See date and time formatting guide
896 897 898 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 896 def @gapi. end |
#timestamp_format=(timestamp_format) ⇒ Object
Sets the format used to parse TIMESTAMP values. Supports SQL-style values. See date and time formatting guide
918 919 920 921 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 918 def frozen_check! @gapi. = end |
#urls ⇒ Array<String>
The fully-qualified URIs that point to your data in Google Cloud. For Google Cloud Storage URIs: Each URI can contain one '' wildcard character and it must come after the 'bucket' name. Size limits related to load jobs apply to external data sources. For Google Cloud Bigtable URIs: Exactly one URI can be specified and it has be a fully specified and valid HTTPS URL for a Google Cloud Bigtable table. For Google Cloud Datastore backups, exactly one URI can be specified, and it must end with '.backup_info'. Also, the '' wildcard character is not allowed.
296 297 298 |
# File 'lib/google/cloud/bigquery/external/data_source.rb', line 296 def urls @gapi.source_uris end |