Class: Gitlab::Database::LoadBalancing::Host
- Inherits:
-
Object
- Object
- Gitlab::Database::LoadBalancing::Host
- Defined in:
- lib/gitlab/database/load_balancing/host.rb
Overview
A single database host used for load balancing.
Constant Summary collapse
- CONNECTION_ERRORS =
[ ActionView::Template::Error, ActiveRecord::StatementInvalid, ActiveRecord::ConnectionNotEstablished, ActiveRecord::StatementTimeout, PG::Error ].freeze
- CAN_TRACK_LOGICAL_LSN_QUERY =
This query checks that the current user has permissions before we try and query logical replication status. We also only allow >= PG14 because these views are only accessible to superuser before PG14 even if the has_table_privilege says otherwise.
<<~SQL.squish.freeze SELECT has_table_privilege('pg_replication_origin_status', 'select') AND has_function_privilege('pg_show_replication_origin_status()', 'execute') AND current_setting('server_version_num', true)::int >= 140000 AS allowed SQL
- LATEST_LSN_WITH_LOGICAL_QUERY =
The following is necessary to handle a mix of logical and physical replicas. We assume that if they have pg_replication_origin_status then they are a logical replica. In a logical replica we need to use
remote_lsnrather thanpg_last_wal_replay_lsnin order for our LSN to be comparable to the source cluster. This logic would be broken if we have 2 logical subscriptions or if we have a logical subscription in the source primary cluster. Read more at gitlab.com/gitlab-org/gitlab/-/merge_requests/121621 <<~SQL.squish.freeze CASE WHEN (SELECT TRUE FROM pg_replication_origin_status) THEN (SELECT remote_lsn FROM pg_replication_origin_status) WHEN pg_is_in_recovery() THEN pg_last_wal_replay_lsn() ELSE pg_current_wal_insert_lsn() END SQL
- LATEST_LSN_WITHOUT_LOGICAL_QUERY =
<<~SQL.squish.freeze CASE WHEN pg_is_in_recovery() THEN pg_last_wal_replay_lsn() ELSE pg_current_wal_insert_lsn() END SQL
- REPLICATION_LAG_QUERY =
<<~SQL.squish.freeze SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::float as lag SQL
Instance Attribute Summary collapse
-
#host ⇒ Object
readonly
Returns the value of attribute host.
-
#intervals ⇒ Object
readonly
Returns the value of attribute intervals.
-
#last_checked_at ⇒ Object
readonly
Returns the value of attribute last_checked_at.
-
#load_balancer ⇒ Object
readonly
Returns the value of attribute load_balancer.
-
#pool ⇒ Object
readonly
Returns the value of attribute pool.
-
#port ⇒ Object
readonly
Returns the value of attribute port.
Instance Method Summary collapse
-
#caught_up?(location) ⇒ Boolean
Returns true if this host has caught up to the given transaction write location.
- #check_replica_status? ⇒ Boolean
- #connection ⇒ Object
-
#data_is_recent_enough? ⇒ Boolean
Returns true if the replica has replicated enough data to be useful.
- #database_replica_location ⇒ Object
-
#disconnect!(timeout: 120) ⇒ Object
Disconnects the pool, once all connections are no longer in use.
- #force_disconnect! ⇒ Object
-
#initialize(host, load_balancer, port: nil) ⇒ Host
constructor
host - The address of the database.
- #offline! ⇒ Object
-
#online? ⇒ Boolean
Returns true if the host is online.
- #pool_disconnect! ⇒ Object
- #primary_write_location ⇒ Object
- #query_and_release ⇒ Object
- #query_and_release_fast_timeout(sql) ⇒ Object
- #query_and_release_old(sql) ⇒ Object
- #refresh_status ⇒ Object
- #replica_is_up_to_date? ⇒ Boolean
- #replication_lag_below_threshold? ⇒ Boolean
-
#replication_lag_size(location = primary_write_location) ⇒ Object
Returns the number of bytes this secondary is lagging behind the primary.
-
#replication_lag_time ⇒ Object
Returns the replication lag time of this secondary in seconds as a float.
-
#try_disconnect ⇒ Object
Attempt to disconnect the pool if all connections are no longer in use.
Constructor Details
#initialize(host, load_balancer, port: nil) ⇒ Host
host - The address of the database. load_balancer - The LoadBalancer that manages this Host.
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 63 def initialize(host, load_balancer, port: nil) @host = host @port = port @load_balancer = load_balancer @pool = load_balancer.create_replica_connection_pool( load_balancer.configuration.pool_size, host, port ) @online = true @last_checked_at = Time.zone.now @lag_time = nil @lag_size = nil # Randomly somewhere in between interval and 2*interval we'll refresh the status of the host interval = load_balancer.configuration.replica_check_interval @intervals = (interval..(interval * 2)).step(0.5).to_a end |
Instance Attribute Details
#host ⇒ Object (readonly)
Returns the value of attribute host.
8 9 10 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 8 def host @host end |
#intervals ⇒ Object (readonly)
Returns the value of attribute intervals.
8 9 10 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 8 def intervals @intervals end |
#last_checked_at ⇒ Object (readonly)
Returns the value of attribute last_checked_at.
8 9 10 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 8 def last_checked_at @last_checked_at end |
#load_balancer ⇒ Object (readonly)
Returns the value of attribute load_balancer.
8 9 10 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 8 def load_balancer @load_balancer end |
#pool ⇒ Object (readonly)
Returns the value of attribute pool.
8 9 10 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 8 def pool @pool end |
#port ⇒ Object (readonly)
Returns the value of attribute port.
8 9 10 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 8 def port @port end |
Instance Method Details
#caught_up?(location) ⇒ Boolean
Returns true if this host has caught up to the given transaction write location.
location - The transaction write location as reported by a primary.
275 276 277 278 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 275 def caught_up?(location) lag = replication_lag_size(location) lag.present? && lag.to_i <= 0 end |
#check_replica_status? ⇒ Boolean
178 179 180 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 178 def check_replica_status? (Time.zone.now - last_checked_at) >= intervals.sample end |
#connection ⇒ Object
82 83 84 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 82 def connection pool.lease_connection end |
#data_is_recent_enough? ⇒ Boolean
Returns true if the replica has replicated enough data to be useful.
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 215 def data_is_recent_enough? # It's possible for a replica to not replay WAL data for a while, # despite being up to date. This can happen when a primary does not # receive any writes for a while. # # To prevent this from happening we check if the lag size (in bytes) # of the replica is small enough for the replica to be useful. We # only do this if we haven't replicated in a while so we only need # to connect to the primary when truly necessary. if (@lag_size = replication_lag_size) @lag_size <= load_balancer.configuration.max_replication_difference else false end end |
#database_replica_location ⇒ Object
261 262 263 264 265 266 267 268 269 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 261 def database_replica_location row = query_and_release(<<-SQL.squish) SELECT pg_last_wal_replay_lsn()::text AS location SQL row['location'] if row.any? rescue *CONNECTION_ERRORS nil end |
#disconnect!(timeout: 120) ⇒ Object
Disconnects the pool, once all connections are no longer in use.
timeout - The time after which the pool should be forcefully
disconnected.
90 91 92 93 94 95 96 97 98 99 100 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 90 def disconnect!(timeout: 120) start_time = ::Gitlab::Metrics::System.monotonic_time while (::Gitlab::Metrics::System.monotonic_time - start_time) <= timeout return if try_disconnect sleep(2) end force_disconnect! end |
#force_disconnect! ⇒ Object
113 114 115 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 113 def force_disconnect! pool_disconnect! end |
#offline! ⇒ Object
121 122 123 124 125 126 127 128 129 130 131 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 121 def offline! ::Gitlab::Database::LoadBalancing::Logger.warn( event: :host_offline, message: 'Marking host as offline', db_host: @host, db_port: @port ) @online = false pool_disconnect! end |
#online? ⇒ Boolean
Returns true if the host is online.
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 134 def online? # Avoid using a discarded connection pool because attempting # to use it will fail. After the main process forks, all of # its connection pools are discarded from Rails' ForkTracker. return false if discarded? return @online unless check_replica_status? was_online = @online refresh_status # Log that the host came back online if it was previously offline if @online && !was_online ::Gitlab::Database::LoadBalancing::Logger.info( event: :host_online, message: 'Host is online after replica status check', db_host: @host, db_port: @port, lag_time: @lag_time, lag_size: @lag_size ) # Always log if the host goes offline elsif !@online ::Gitlab::Database::LoadBalancing::Logger.warn( event: :host_offline, message: 'Host is offline after replica status check', db_host: @host, db_port: @port, lag_time: @lag_time, lag_size: @lag_size ) end @online rescue *CONNECTION_ERRORS offline! false end |
#pool_disconnect! ⇒ Object
117 118 119 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 117 def pool_disconnect! pool.disconnect! end |
#primary_write_location ⇒ Object
257 258 259 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 257 def primary_write_location load_balancer.primary_write_location end |
#query_and_release ⇒ Object
280 281 282 283 284 285 286 287 288 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 280 def query_and_release(...) pool.disable_query_cache do if low_timeout_for_host_queries? query_and_release_fast_timeout(...) else query_and_release_old(...) end end end |
#query_and_release_fast_timeout(sql) ⇒ Object
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 298 def query_and_release_fast_timeout(sql) # If we "set local" the timeout in a transaction that was already open we would taint the outer # transaction with that timeout. # However, we don't ever run transactions on replicas, and we only do these health checks on replicas. # Double-check that we're not in a transaction, but this path should never happen. if connection.transaction_open? Gitlab::Database::LoadBalancing::Logger.warn( event: :health_check_in_transaction, message: "Attempt to run a health check query inside of a transaction" ) return query_and_release_old(sql) end begin connection.transaction do connection.exec_query("SET LOCAL statement_timeout TO '100ms';") connection.select_all(sql).first || {} end rescue StandardError {} ensure release_connection end end |
#query_and_release_old(sql) ⇒ Object
290 291 292 293 294 295 296 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 290 def query_and_release_old(sql) connection.select_all(sql).first || {} rescue StandardError {} ensure release_connection end |
#refresh_status ⇒ Object
172 173 174 175 176 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 172 def refresh_status @latest_lsn_query = nil # Periodically clear the cached @latest_lsn_query value in case permissions change @online = replica_is_up_to_date? @last_checked_at = Time.zone.now end |
#replica_is_up_to_date? ⇒ Boolean
182 183 184 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 182 def replica_is_up_to_date? replication_lag_below_threshold? || data_is_recent_enough? end |
#replication_lag_below_threshold? ⇒ Boolean
186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 186 def replication_lag_below_threshold? @lag_time = replication_lag_time return false unless @lag_time return true if @lag_time <= load_balancer.configuration.max_replication_lag_time if ignore_replication_lag_time? ::Gitlab::Database::LoadBalancing::Logger.info( event: :replication_lag_ignored, lag_time: @lag_time, message: 'Replication lag is treated as low because of load_balancer_ignore_replication_lag_time feature flag' ) return true end if double_replication_lag_time? && @lag_time <= (load_balancer.configuration.max_replication_lag_time * 2) ::Gitlab::Database::LoadBalancing::Logger.info( event: :replication_lag_below_double, lag_time: @lag_time, message: 'Replication lag is treated as low because of load_balancer_double_replication_lag_time feature flag' ) return true end false end |
#replication_lag_size(location = primary_write_location) ⇒ Object
Returns the number of bytes this secondary is lagging behind the primary.
This method will return nil if no lag size could be calculated.
245 246 247 248 249 250 251 252 253 254 255 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 245 def replication_lag_size(location = primary_write_location) location = connection.quote(location) row = query_and_release(<<-SQL.squish) SELECT pg_wal_lsn_diff(#{location}, (#{latest_lsn_query}))::float AS diff SQL row['diff'].to_i if row.any? rescue *CONNECTION_ERRORS nil end |
#replication_lag_time ⇒ Object
Returns the replication lag time of this secondary in seconds as a float.
This method will return nil if no lag time could be calculated.
235 236 237 238 239 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 235 def replication_lag_time row = query_and_release(REPLICATION_LAG_QUERY) row['lag'].to_f if row.any? end |
#try_disconnect ⇒ Object
Attempt to disconnect the pool if all connections are no longer in use. Returns true if the pool was disconnected, false if not.
104 105 106 107 108 109 110 111 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 104 def try_disconnect if pool.connections.none?(&:in_use?) pool_disconnect! return true end false end |