Method: Aws::S3::Client#select_object_content

Defined in:
lib/aws-sdk-s3/client.rb

#select_object_content(params = {}) ⇒ Types::SelectObjectContentOutput

This operation filters the contents of an Amazon S3 object based on a simple structured query language (SQL) statement. In the request, along with the SQL expression, you must also specify a data serialization format (JSON, CSV, or Apache Parquet) of the object. Amazon S3 uses this format to parse object data into records, and returns only records that match the specified SQL expression. You must also specify the data serialization format for the response.

For more information about Amazon S3 Select, see [Selecting Content from Objects] in the *Amazon Simple Storage Service Developer Guide*.

For more information about using SQL with Amazon S3 Select, see [ SQL Reference for Amazon S3 Select and Glacier Select] in the *Amazon Simple Storage Service Developer Guide*.

Permissions

You must have s3:GetObject permission for this operation. Amazon S3 Select does not support anonymous access. For more information about permissions, see [Specifying Permissions in a Policy] in the *Amazon Simple Storage Service Developer Guide*.

*Object Data Formats*

You can use Amazon S3 Select to query objects that have the following format properties:

  • *CSV, JSON, and Parquet* - Objects must be in CSV, JSON, or Parquet format.

  • UTF-8 - UTF-8 is the only encoding type Amazon S3 Select supports.

  • *GZIP or BZIP2* - CSV and JSON files can be compressed using GZIP or BZIP2. GZIP and BZIP2 are the only compression formats that Amazon S3 Select supports for CSV and JSON files. Amazon S3 Select supports columnar compression for Parquet using GZIP or Snappy. Amazon S3 Select does not support whole-object compression for Parquet objects.

  • *Server-side encryption* - Amazon S3 Select supports querying objects that are protected with server-side encryption.

    For objects that are encrypted with customer-provided encryption keys (SSE-C), you must use HTTPS, and you must use the headers that are documented in the GetObject. For more information about SSE-C, see [Server-Side Encryption (Using Customer-Provided Encryption Keys)] in the *Amazon Simple Storage Service Developer Guide*.

    For objects that are encrypted with Amazon S3 managed encryption keys (SSE-S3) and customer master keys (CMKs) stored in AWS Key Management Service (SSE-KMS), server-side encryption is handled transparently, so you don’t need to specify anything. For more information about server-side encryption, including SSE-S3 and SSE-KMS, see [Protecting Data Using Server-Side Encryption] in the *Amazon Simple Storage Service Developer Guide*.

**Working with the Response Body**

Given the response size is unknown, Amazon S3 Select streams the response as a series of messages and includes a Transfer-Encoding header with chunked as its value in the response. For more information, see RESTSelectObjectAppendix .

**GetObject Support**

The SelectObjectContent operation does not support the following GetObject functionality. For more information, see GetObject.

  • Range: While you can specify a scan range for a Amazon S3 Select request, see SelectObjectContentRequest$ScanRange in the request parameters below, you cannot specify the range of bytes of an object to return.

  • GLACIER, DEEP_ARCHIVE and REDUCED_REDUNDANCY storage classes: You cannot specify the GLACIER, DEEP_ARCHIVE, or REDUCED_REDUNDANCY storage classes. For more information, about storage classes see

    Storage Classes][6

    in the *Amazon Simple Storage Service Developer

    Guide*.

**Special Errors**

For a list of special errors for this operation and for general information about Amazon S3 errors and a list of error codes, see ErrorResponses

**Related Resources**

  • GetObject

  • GetBucketLifecycleConfiguration

  • PutBucketLifecycleConfiguration

[1]: docs.aws.amazon.com/AmazonS3/latest/dev/selecting-content-from-objects.html [2]: docs.aws.amazon.com/AmazonS3/latest/dev/s3-glacier-select-sql-reference.html [3]: docs.aws.amazon.com/AmazonS3/latest/dev/using-with-s3-actions.html [4]: docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html [5]: docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html [6]: docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#storage-class-intro

Examples:

EventStream Operation Example


You can process event once it arrives immediately, or wait until
full response complete and iterate through eventstream enumerator.

To interact with event immediately, you need to register #select_object_content
with callbacks, callbacks can be register for specifc events or for all events,
callback for errors in the event stream is also available for register.

Callbacks can be passed in by `:event_stream_handler` option or within block
statement attached to #select_object_content call directly. Hybrid pattern of both
is also supported.

`:event_stream_handler` option takes in either Proc object or
Aws::S3::EventStreams::SelectObjectContentEventStream object.

Usage pattern a): callbacks with a block attached to #select_object_content
  Example for registering callbacks for all event types and error event

  client.select_object_content( # params input# ) do |stream|
    stream.on_error_event do |event|
      # catch unmodeled error event in the stream
      raise event
      # => Aws::Errors::EventError
      # event.event_type => :error
      # event.error_code => String
      # event.error_message => String
    end

    stream.on_event do |event|
      # process all events arrive
      puts event.event_type
      ...
    end

  end

Usage pattern b): pass in `:event_stream_handler` for #select_object_content

  1) create a Aws::S3::EventStreams::SelectObjectContentEventStream object
  Example for registering callbacks with specific events

    handler = Aws::S3::EventStreams::SelectObjectContentEventStream.new
    handler.on_records_event do |event|
      event # => Aws::S3::Types::Records
    end
    handler.on_stats_event do |event|
      event # => Aws::S3::Types::Stats
    end
    handler.on_progress_event do |event|
      event # => Aws::S3::Types::Progress
    end
    handler.on_cont_event do |event|
      event # => Aws::S3::Types::Cont
    end
    handler.on_end_event do |event|
      event # => Aws::S3::Types::End
    end

  client.select_object_content( # params input #, event_stream_handler: handler)

  2) use a Ruby Proc object
  Example for registering callbacks with specific events

  handler = Proc.new do |stream|
    stream.on_records_event do |event|
      event # => Aws::S3::Types::Records
    end
    stream.on_stats_event do |event|
      event # => Aws::S3::Types::Stats
    end
    stream.on_progress_event do |event|
      event # => Aws::S3::Types::Progress
    end
    stream.on_cont_event do |event|
      event # => Aws::S3::Types::Cont
    end
    stream.on_end_event do |event|
      event # => Aws::S3::Types::End
    end
  end

  client.select_object_content( # params input #, event_stream_handler: handler)

Usage pattern c): hybird pattern of a) and b)

    handler = Aws::S3::EventStreams::SelectObjectContentEventStream.new
    handler.on_records_event do |event|
      event # => Aws::S3::Types::Records
    end
    handler.on_stats_event do |event|
      event # => Aws::S3::Types::Stats
    end
    handler.on_progress_event do |event|
      event # => Aws::S3::Types::Progress
    end
    handler.on_cont_event do |event|
      event # => Aws::S3::Types::Cont
    end
    handler.on_end_event do |event|
      event # => Aws::S3::Types::End
    end

  client.select_object_content( # params input #, event_stream_handler: handler) do |stream|
    stream.on_error_event do |event|
      # catch unmodeled error event in the stream
      raise event
      # => Aws::Errors::EventError
      # event.event_type => :error
      # event.error_code => String
      # event.error_message => String
    end
  end

Besides above usage patterns for process events when they arrive immediately, you can also
iterate through events after response complete.

Events are available at resp.payload # => Enumerator
For parameter input example, please refer to following request syntax

Request syntax with placeholder values


resp = client.select_object_content({
  bucket: "BucketName", # required
  key: "ObjectKey", # required
  sse_customer_algorithm: "SSECustomerAlgorithm",
  sse_customer_key: "SSECustomerKey",
  sse_customer_key_md5: "SSECustomerKeyMD5",
  expression: "Expression", # required
  expression_type: "SQL", # required, accepts SQL
  request_progress: {
    enabled: false,
  },
  input_serialization: { # required
    csv: {
      file_header_info: "USE", # accepts USE, IGNORE, NONE
      comments: "Comments",
      quote_escape_character: "QuoteEscapeCharacter",
      record_delimiter: "RecordDelimiter",
      field_delimiter: "FieldDelimiter",
      quote_character: "QuoteCharacter",
      allow_quoted_record_delimiter: false,
    },
    compression_type: "NONE", # accepts NONE, GZIP, BZIP2
    json: {
      type: "DOCUMENT", # accepts DOCUMENT, LINES
    },
    parquet: {
    },
  },
  output_serialization: { # required
    csv: {
      quote_fields: "ALWAYS", # accepts ALWAYS, ASNEEDED
      quote_escape_character: "QuoteEscapeCharacter",
      record_delimiter: "RecordDelimiter",
      field_delimiter: "FieldDelimiter",
      quote_character: "QuoteCharacter",
    },
    json: {
      record_delimiter: "RecordDelimiter",
    },
  },
  scan_range: {
    start: 1,
    end: 1,
  },
})

Response structure


All events are available at resp.payload:
resp.payload #=> Enumerator
resp.payload.event_types #=> [:records, :stats, :progress, :cont, :end]

For :records event available at #on_records_event callback and response eventstream enumerator:
event.payload #=> IO

For :stats event available at #on_stats_event callback and response eventstream enumerator:
event.details.bytes_scanned #=> Integer
event.details.bytes_processed #=> Integer
event.details.bytes_returned #=> Integer

For :progress event available at #on_progress_event callback and response eventstream enumerator:
event.details.bytes_scanned #=> Integer
event.details.bytes_processed #=> Integer
event.details.bytes_returned #=> Integer

For :cont event available at #on_cont_event callback and response eventstream enumerator:
 #=> EmptyStruct
For :end event available at #on_end_event callback and response eventstream enumerator:
 #=> EmptyStruct

Options Hash (params):

  • :bucket (required, String)

    The S3 bucket.

  • :key (required, String)

    The object key.

  • :sse_customer_algorithm (String)

    The SSE Algorithm used to encrypt the object. For more information, see [Server-Side Encryption (Using Customer-Provided Encryption Keys].

    [1]: docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html

  • :sse_customer_key (String)

    The SSE Customer Key. For more information, see [Server-Side Encryption (Using Customer-Provided Encryption Keys].

    [1]: docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html

  • :sse_customer_key_md5 (String)

    The SSE Customer Key MD5. For more information, see [Server-Side Encryption (Using Customer-Provided Encryption Keys].

    [1]: docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html

  • :expression (required, String)

    The expression that is used to query the object.

  • :expression_type (required, String)

    The type of the provided expression (for example, SQL).

  • :request_progress (Types::RequestProgress)

    Specifies if periodic request progress information should be enabled.

  • :input_serialization (required, Types::InputSerialization)

    Describes the format of the data in the object that is being queried.

  • :output_serialization (required, Types::OutputSerialization)

    Describes the format of the data that you want Amazon S3 to return in response.

  • :scan_range (Types::ScanRange)

    Specifies the byte range of the object to get the records from. A record is processed when its first byte is contained by the range. This parameter is optional, but when specified, it must not be empty. See RFC 2616, Section 14.35.1 about how to specify the start and end of the range.

    ‘ScanRange`may be used in the following ways:

    • ‘<scanrange><start>50</start><end>100</end></scanrange>` - process only the records starting between the bytes 50 and 100 (inclusive, counting from zero)

    • ‘<scanrange><start>50</start></scanrange>` - process only the records starting after the byte 50

    • ‘<scanrange><end>50</end></scanrange>` - process only the records within the last 50 bytes of the file.

Yields:

  • (event_stream_handler)

See Also:



11234
11235
11236
11237
11238
11239
11240
11241
11242
11243
11244
11245
11246
11247
11248
11249
11250
11251
11252
11253
11254
11255
# File 'lib/aws-sdk-s3/client.rb', line 11234

def select_object_content(params = {}, options = {}, &block)
  params = params.dup
  event_stream_handler = case handler = params.delete(:event_stream_handler)
    when EventStreams::SelectObjectContentEventStream then handler
    when Proc then EventStreams::SelectObjectContentEventStream.new.tap(&handler)
    when nil then EventStreams::SelectObjectContentEventStream.new
    else
      msg = "expected :event_stream_handler to be a block or "\
            "instance of Aws::S3::EventStreams::SelectObjectContentEventStream"\
            ", got `#{handler.inspect}` instead"
      raise ArgumentError, msg
    end

  yield(event_stream_handler) if block_given?

  req = build_request(:select_object_content, params)

  req.context[:event_stream_handler] = event_stream_handler
  req.handlers.add(Aws::Binary::DecodeHandler, priority: 95)

  req.send_request(options, &block)
end