Class: Jetpants::Table
- Includes:
- CallbackHandler
- Defined in:
- lib/jetpants/table.rb
Overview
The Table class associates a table name with a column (or list of columns) making up the table’s sharding key or primary key. It is geared towards generating SQL for importing/exporting a table, NOT for representing an application data model.
None of these methods actually execute the SQL they generate, since the Table class isn’t tied to a specific DB. This allows us to represent the set of all sharded tables with a single set of Table objects, without having to duplicate those objects for every Shard or DB instance. If you want to run the generated SQL on a database, use one of the DB#query* methods.
Instance Attribute Summary collapse
-
#chunks ⇒ Object
readonly
Jetpants supports doing import and export operations in parallel “chunks” of the data set.
-
#name ⇒ Object
readonly
Name of the table as it exists in your database.
-
#sharding_keys ⇒ Object
readonly
Your application’s sharding_key is the column used to determine which rows live on which shard.
Class Method Summary collapse
-
.from_config(label) ⇒ Object
Return an array of Table objects based on the contents of Jetpants’ config file entry of the given label.
Instance Method Summary collapse
-
#export_file_path(min_id = false, max_id = false) ⇒ Object
Returns a file path (as a String) for the export dumpfile of the given ID range.
-
#initialize(name, params = {}) ⇒ Table
constructor
Create a Table.
-
#sql_cleanup_delete(sharding_key, min_keep_id, max_keep_id) ⇒ Object
Returns the SQL necessary to clean rows that shouldn’t be on this shard.
-
#sql_cleanup_next_id(sharding_key, id, direction) ⇒ Object
Returns the SQL necessary to iterate over a given sharding key by ID – returns the next ID desired.
-
#sql_count_rows(min_id, max_id) ⇒ Object
Counts number of rows between the given ID ranges.
-
#sql_export_range(min_id = false, max_id = false) ⇒ Object
(also: #sql_export_all)
Returns the SQL for performing a data export of a given ID range.
-
#sql_import_range(min_id = false, max_id = false) ⇒ Object
(also: #sql_import_all)
Returns the SQL necessary to load the table’s data.
- #to_s ⇒ Object
Methods included from CallbackHandler
Constructor Details
#initialize(name, params = {}) ⇒ Table
Create a Table. Params should have string keys, not symbols. Possible keys include ‘sharding_key’ (or equivalently ‘primary_key’), ‘chunks’, and ‘order_by’.
48 49 50 51 52 53 54 |
# File 'lib/jetpants/table.rb', line 48 def initialize(name, params={}) @name = name params['sharding_key'] ||= params['primary_keys'] || params['primary_key'] || 'user_id' @sharding_keys = (params['sharding_key'].is_a?(Array) ? params['sharding_key'] : [params['sharding_key']]) @chunks = params['chunks'] || 1 @order_by = params['order_by'] end |
Instance Attribute Details
#chunks ⇒ Object (readonly)
Jetpants supports doing import and export operations in parallel “chunks” of the data set. For tables with few rows, this is irrelevant and can be left at the default of 1 (meaning no chunking). For tables with hundreds of millions of rows, you may want to do exports/imports in a few hundred chunks to speed things up and keep the transactions smaller.
44 45 46 |
# File 'lib/jetpants/table.rb', line 44 def chunks @chunks end |
#name ⇒ Object (readonly)
Name of the table as it exists in your database.
17 18 19 |
# File 'lib/jetpants/table.rb', line 17 def name @name end |
#sharding_keys ⇒ Object (readonly)
Your application’s sharding_key is the column used to determine which rows live on which shard. Generally this should be the same logical value for your entire application (example: id column of the User table), although the column name need not be identical across tables (one may call it ‘user_id’, another could call it ‘target_user_id’ or ‘from_user_id’.) The sharding_keys attribute stores the name of that column for this particular table.
For a sharded table, sharding_keys should generally be a single column, represented here as a single string.
Jetpants supports mapping-tables with multiple sharding key columns (for instance, if user_id is your app’s sharding key, a “following” table mapping one user_id to another). However this makes exports and cleanup extremely inefficient, so its use is not recommended.
For a non-sharded table, simply set sharding_keys to the first column of the table’s primary key. This is sufficient to make chunked exports/imports work properly.
37 38 39 |
# File 'lib/jetpants/table.rb', line 37 def sharding_keys @sharding_keys end |
Class Method Details
.from_config(label) ⇒ Object
Return an array of Table objects based on the contents of Jetpants’ config file entry of the given label.
58 59 60 61 |
# File 'lib/jetpants/table.rb', line 58 def Table.from_config(label) result = [] Jetpants.send(label).map {|name, attributes| Table.new name, attributes} end |
Instance Method Details
#export_file_path(min_id = false, max_id = false) ⇒ Object
Returns a file path (as a String) for the export dumpfile of the given ID range.
136 137 138 139 140 141 142 143 |
# File 'lib/jetpants/table.rb', line 136 def export_file_path(min_id=false, max_id=false) case when min_id && max_id then "#{Jetpants.export_location}/#{@name}#{min_id}-#{max_id}.out" when min_id then "#{Jetpants.export_location}/#{@name}#{min_id}-and-up.out" when max_id then "#{Jetpants.export_location}/#{@name}start-#{max_id}.out" else "#{Jetpants.export_location}/#{@name}-full.out" end end |
#sql_cleanup_delete(sharding_key, min_keep_id, max_keep_id) ⇒ Object
Returns the SQL necessary to clean rows that shouldn’t be on this shard. Pass in a sharding key and the min/max allowed ID on the shard, and get back a SQL DELETE statement. When running that statement, pass in an ID (obtained from sql_cleanup_next_id) as a bind variable.
113 114 115 116 117 118 119 120 121 122 123 124 |
# File 'lib/jetpants/table.rb', line 113 def sql_cleanup_delete(sharding_key, min_keep_id, max_keep_id) sql = "DELETE FROM #{@name} WHERE #{sharding_key} = ?" # if there are multiple sharding cols, we need to be more careful to keep rows # where the OTHER sharding col(s) do fall within the shard's range @sharding_keys.each do |other_col| next if other_col == sharding_key sql << " AND NOT (#{other_col} >= #{min_keep_id} AND #{other_col} <= #{max_keep_id})" end return sql end |
#sql_cleanup_next_id(sharding_key, id, direction) ⇒ Object
Returns the SQL necessary to iterate over a given sharding key by ID – returns the next ID desired. Useful when performing a cleanup operation over a sparse ID range.
99 100 101 102 103 104 105 106 107 |
# File 'lib/jetpants/table.rb', line 99 def sql_cleanup_next_id(sharding_key, id, direction) if direction == :asc "SELECT MIN(#{sharding_key}) FROM #{@name} WHERE #{sharding_key} > #{id}" elsif direction == :desc "SELECT MAX(#{sharding_key}) FROM #{@name} WHERE #{sharding_key} < #{id}" else raise "Unknown direction parameter #{direction}" end end |
#sql_count_rows(min_id, max_id) ⇒ Object
Counts number of rows between the given ID ranges. Warning: will give potentially misleading counts on multi-sharding-key tables.
128 129 130 131 132 133 |
# File 'lib/jetpants/table.rb', line 128 def sql_count_rows(min_id, max_id) sql = "SELECT COUNT(*) FROM #{@name} WHERE " wheres = [] @sharding_keys.each {|col| wheres << "(#{col} >= #{min_id} AND #{col} <= #{max_id})"} sql << wheres.join(" OR ") end |
#sql_export_range(min_id = false, max_id = false) ⇒ Object Also known as: sql_export_all
Returns the SQL for performing a data export of a given ID range
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/jetpants/table.rb', line 68 def sql_export_range(min_id=false, max_id=false) outfile = export_file_path min_id, max_id sql = "SELECT * FROM #{@name} " if min_id || max_id clauses = case when min_id && max_id then @sharding_keys.collect {|col| "(#{col} >= #{min_id} AND #{col} <= #{max_id}) "} when min_id then @sharding_keys.collect {|col| "#{col} >= #{min_id} "} when max_id then @sharding_keys.collect {|col| "#{col} <= #{max_id} "} end sql << "WHERE " + clauses.join('OR ') end sql << "ORDER BY #{@order_by} " if @order_by sql << "INTO OUTFILE '#{outfile}'" end |
#sql_import_range(min_id = false, max_id = false) ⇒ Object Also known as: sql_import_all
Returns the SQL necessary to load the table’s data. Note that we use an IGNORE on multi-sharding-key tables. This is because we get duplicate rows between export chunk files in this case.
89 90 91 92 93 |
# File 'lib/jetpants/table.rb', line 89 def sql_import_range(min_id=false, max_id=false) outfile = export_file_path min_id, max_id ignore = (@sharding_keys.count > 1 && (min_id || max_id) ? ' IGNORE' : '') sql = "LOAD DATA INFILE '#{outfile}'#{ignore} INTO TABLE #{@name} CHARACTER SET binary" end |
#to_s ⇒ Object
63 64 65 |
# File 'lib/jetpants/table.rb', line 63 def to_s return @name end |