RedshiftExtractor
redshift_extractor moves data from one Amazon Redshift cluster to another. Here is how it works:
- Source database
- UNLOAD - runs a SELECT query and exports the results to CSV files in S3.
- Destination database
Drop - Drops a database table (the table in the destination database where the data will be stored).
Create - Creates a database table.
COPY - Loads data from S3 into a Redshift database.
One database connection is established with the source database to UNLOAD the data to S3. After the data is UNLOADed, a second database connection is establed with the destination database to drop/create the database table that will store the data. The final step is to COPY the data from the S3 files to the destination table.
Running the Code
The RedshiftExtractor::Extractor
class is instantiated with a long hash of arguments (sorry Sandi Metz!).
args = {
database_config_source: "database_config_source",
database_config_destination: "database_config_destination",
unload_s3_destination: "unload_s3_destination",
unload_select_sql: "unload_select_sql",
destination_schema: "destination_schema",
destination_table: "destination_table",
create_sql: "create_sql",
copy_data_source: "copy_data_source",
aws_access_key_id: "aws_access_key_id",
aws_secret_access_key: "aws_secret_access_key"
}
extractor = RedshiftExtractor::Extractor.new(args)
extractor.run
Here is a description of the parameters:
- database_config_source: A hash that's acceptable for the Ruby Postgres gem. Here's an example:
{
dbname: "db_name",
user: "username",
password: "password",
host: "host",
sslmode: 'require',
port: 5439
}
unload_s3_destination: A S3 path, something like
"s3://bucket_name/something_else/"
unload_select_sql: A SQL SELECT query that will be run on the source table
destination_schema, destination_table: The table that will be dropped, recreated, and populated with data from the COPY command
create_sql: The SQL that creates the destination_schema.destination_table table (this SQL is run to recreate the table in the step above)
copy_data_source: This is typically
"#{unload_s3_destination}manifest"
. The UNLOAD command automatically creates a manifest file that can be used by the COPY command to load the data.aws_keys: The keys you get from AWS.
Installation
Add this line to your application's Gemfile:
gem 'redshift_extractor'
And then execute:
$ bundle
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/MrPowers/redshift_extractor.
License
The gem is available as open source under the terms of the MIT License.