Class: Remi::Extractor::S3File

Inherits:
FileSystem show all
Includes:
DataSubject::S3File
Defined in:
lib/remi/data_subjects/s3_file.rb

Overview

S3 File extractor Used to extract files from Amazon S3

To use AWS KMS, supply a :ciphertext and optional :algorithm (default is AES256). The encrypted key stored in the ciphertext must be the same as that used when the file was written.

class MyJob < Remi::Job source :some_file do extractor Remi::Extractor::S3File.new( credentials: { aws_access_key_id: ENV['AWS_ACCESS_KEY_ID'], aws_secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'], region: 'us-west-2' }, bucket: 'my-awesome-bucket', remote_path: 'some_file-', most_recent_only: true, kms_opt: { ciphertext: '' } ) parser Remi::Parser::CsvFile.new( csv_options: { headers: true, col_sep: '|' } ) end end

Examples:

Standard use


class MyJob < Remi::Job
  source :some_file do
    extractor Remi::Extractor::S3File.new(
      credentials: {
        aws_access_key_id: ENV['AWS_ACCESS_KEY_ID'],
        aws_secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'],
        region: 'us-west-2'
      },
      bucket: 'my-awesome-bucket',
      remote_path: 'some_file-',
      most_recent_only: true
    )
    parser Remi::Parser::CsvFile.new(
      csv_options: {
        headers: true,
        col_sep: '|'
      }
    )
  end
end

job = MyJob.new
job.some_file.df
# =>#<Daru::DataFrame:70153153438500 @name = 4c59cfdd-7de7-4264-8666-83153f46a9e4 @size = 3>
#                    id       name
#          0          1     Albert
#          1          2      Betsy
#          2          3       Camu

Using AWS KMS


Instance Attribute Summary

Attributes included from DataSubject::S3File

#aws_credentials, #region

Attributes inherited from FileSystem

#created_within, #group_by, #local_path, #most_recent_by, #most_recent_only, #pattern, #remote_path

Attributes inherited from Remi::Extractor

#logger

Instance Method Summary collapse

Methods included from DataSubject::S3File

#encrypt_args, #init_aws_credentials, #init_kms, #s3

Methods inherited from FileSystem

#entries, #get_created_within, #matching_entries, #most_recent_matching_entry, #most_recent_matching_entry_in_group

Constructor Details

#initialize(*args, **kargs, &block) ⇒ S3File

Returns a new instance of S3File.

Parameters:

  • bucket (String)

    Name of S3 bucket containing the files

  • kms_opt (Hash)

    Hash containing AWS KMS options

  • credentials (Hash)

    Hash containing AWS credentials (must contain :aws_access_key_id, :aws_secret_access_key, :region)



111
112
113
114
# File 'lib/remi/data_subjects/s3_file.rb', line 111

def initialize(*args, **kargs, &block)
  super
  init_s3_file(*args, **kargs, &block)
end

Instance Method Details

#all_entriesArray<Extractor::FileSystemEntry>

Returns (Memoized) list of objects in the bucket/prefix.

Returns:



130
131
132
# File 'lib/remi/data_subjects/s3_file.rb', line 130

def all_entries
  @all_entries ||= all_entries!
end

#all_entries!Array<Extractor::FileSystemEntry>

Returns List of objects in the bucket/prefix.

Returns:



135
136
137
138
139
140
141
142
143
144
145
# File 'lib/remi/data_subjects/s3_file.rb', line 135

def all_entries!
  # S3 does not track anything like a create time, so use last modified for both
  s3.bucket(@bucket_name).objects(prefix: @remote_path.to_s).map do |entry|
    Extractor::FileSystemEntry.new(
      pathname: entry.key,
      create_time: entry.last_modified,
      modified_time: entry.last_modified,
      raw: entry
    )
  end
end

#extractArray<String>

Called to extract files from the source filesystem.

Returns:

  • (Array<String>)

    An array of paths to a local copy of the files extacted



118
119
120
121
122
123
124
125
126
127
# File 'lib/remi/data_subjects/s3_file.rb', line 118

def extract
  init_kms(@kms_opt)

  entries.map do |entry|
    local_file = File.join(@local_path, entry.name)
    logger.info "Downloading #{entry.pathname} from S3 to #{local_file}"
    File.open(local_file, 'wb') { |file| entry.raw.get({ response_target: file }.merge(encrypt_args)) }
    local_file
  end
end