Class: RightSupport::Net::RequestBalancer

Inherits:
Object
  • Object
show all
Includes:
Log::Mixin
Defined in:
lib/right_support/net/request_balancer.rb

Overview

Utility class that allows network requests to be randomly distributed across a set of network endpoints. Generally used for REST requests by passing an Array of HTTP service endpoint URLs.

Note that this class also serves as a namespace for endpoint selection policies, which are classes that actually choose the next endpoint based on some criterion (round-robin, health of endpoint, response time, etc).

The balancer does not actually perform requests by itself, which makes this class usable for various network protocols, and potentially even for non- networking purposes. The block passed to #request does all the work; the balancer merely selects a suitable endpoint to pass to its block.

PLEASE NOTE that it is VERY IMPORTANT that the balancer is able to properly distinguish between fatal and non-fatal (retryable) errors. Before you pass a :fatal option to the RequestBalancer constructor, carefully examine its default list of fatal exceptions and default logic for deciding whether a given exception is fatal! There are some subtleties.

Constant Summary collapse

DEFAULT_RETRY_PROC =
Deprecated.

please do not refer to this constant; it will be removed in RightSupport 3.0

lambda do |ep, n|
  n < ep.size
end
FATAL_RUBY_EXCEPTIONS =
Deprecated.

please do not refer to this constant; it will be removed in RightSupport 3.0

Built-in Ruby exceptions that should be considered fatal. Normally one would be inclined to simply say RuntimeError or StandardError, but because gem authors frequently make unwise choices of exception base class, including these top-level base classes could cause us to falsely think that retryable exceptions are fatal.

A good example of this phenomenon is the rest-client gem, whose base exception class is derived from RuntimeError!!

[
  # Exceptions that indicate something is seriously wrong with the Ruby VM.
  NoMemoryError, SystemStackError, SignalException, SystemExit,
  ScriptError,
  # Subclasses of StandardError. We can't include the base class directly as
  # a fatal exception, because there are some retryable exceptions that derive
  # from StandardError.
  ArgumentError, IndexError, LocalJumpError, NameError, RangeError,
  RegexpError, ThreadError, TypeError, ZeroDivisionError
]
FATAL_TEST_EXCEPTIONS =
Deprecated.

please do not refer to this constant; it will be removed in RightSupport 3.0

[]
DEFAULT_FATAL_EXCEPTIONS =
Deprecated.

please do not refer to this constant; it will be removed in RightSupport 3.0

Well-considered exceptions that should count as fatal (non-retryable) by the balancer. Used by default, and if you provide a :fatal option to the balancer, you should probably consult this list in your overridden fatal determination!

FATAL_RUBY_EXCEPTIONS + FATAL_TEST_EXCEPTIONS
DEFAULT_FATAL_PROC =
Deprecated.

please do not refer to this constant; it will be removed in RightSupport 3.0

lambda do |e|
  if DEFAULT_FATAL_EXCEPTIONS.any? { |c| e.is_a?(c) }
    #Some Ruby builtin exceptions indicate program errors
    true
  elsif e.respond_to?(:http_code) && (e.http_code != nil)
    #RestClient and Net::HTTP exceptions all respond to http_code, allowing us
    #to decide based on the HTTP response code.
    #Any HTTP 3xx counts as fatal, in order to force the client to handle it
    #Any HTTP 4xx code EXCEPT 408 (Request Timeout) counts as fatal.
    (e.http_code >= 300 && e.http_code < 500) && (e.http_code != 408)
  else
    #Anything else counts as non-fatal
    false
  end
end
DEFAULT_HEALTH_CHECK_PROC =

no-op health-check

Proc.new do |endpoint|
  true
end
DEFAULT_DEBUG_MODE =

debug mode

::ENV['DEBUG_MODE'] == 'true'
DEFAULT_OPTIONS =

default options

{
  :policy       => nil,
  :retry        => DEFAULT_RETRY_PROC,
  :fatal        => DEFAULT_FATAL_PROC,
  :on_exception => nil,
  :health_check => DEFAULT_HEALTH_CHECK_PROC,
  :resolve      => nil,    # not resolving DNS to IP(s) by default; rely on consul, etc.
  :thread_safe  => false,  # not thread-safe by default,
  :debug_mode   => nil     # infer from DEBUG_MODE
}

Constants included from Log::Mixin

Log::Mixin::Decorator, Log::Mixin::UNDELEGATED

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Log::Mixin

default_logger, default_logger=, included

Constructor Details

#initialize(endpoints, options = {}) ⇒ RequestBalancer

Constructor. Accepts a sequence of request endpoints which it shuffles randomly at creation time; however, the ordering of the endpoints does not change thereafter and the sequence is tried from the beginning for every request.

If you pass the :resolve option, then the list of endpoints is treated as a list of hostnames (or URLs containing hostnames) and the list is expanded out into a larger list with each hostname replaced by several entries, one for each of its IP addresses. If a single DNS hostname is associated with multiple A records, the :resolve option allows the balancer to treat each backing server as a distinct endpoint with its own health state, etc.

Parameters:

  • endpoints (String|Array)

    (e.g. HTTP URLs) for balancing

  • options (Hash) (defaults to: {})

Options Hash (options):

  • :retry (Integer|Proc)

    callback to determine whether to keep retrying; default is to try each endpoint at most once. can also be passed as an integer which provides a set number of attempts with no backoff. for retry with backoff use the backoff_retry_callback method.

  • :resolve (Integer)

    as a timeout in seconds to re-resolve DNS hostnames of endpoints to IP addresses; default is nil (never).

  • :thread_safe (TrueClass|FalseClass)

    as true to guard the balancer state with a mutex, false to be free-threaded (default). Ruby is generally thread-safe because real concurrency does not exist and/or apps consistently use the Rainbows gem to ensure one process per API handler.

  • :debug_mode (TrueClass|FalseClass)

    as true to log additional error information as failures occur, false to only log error summary after all retries fail or nil to infer from DEBUG_MODE (default).

  • :fatal (Proc)

    callback to determine whether an exception is fatal and should not be retried.

  • :on_exception (Proc)

    notification hook that accepts three arguments: whether the exception is fatal, the exception itself, and the endpoint for which the exception happened

  • :health_check (Proc)

    callback that allows balancer to check an endpoint health; should raise an exception if the endpoint is not healthy

  • :on_health_change (Proc)

    callback that is made when the overall health of the endpoints transition to a different level; its single argument contains the new minimum health level



201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
# File 'lib/right_support/net/request_balancer.rb', line 201

def initialize(endpoints, options={})
  @options = DEFAULT_OPTIONS.merge(options)

  # provide thread-safety only when specified.
  if @options[:thread_safe]
    @mutex = ::Mutex.new
    @synchronize = @mutex.method(:synchronize)
  else
    @synchronize = self.method(:free_threaded)
  end

  endpoints = Array(endpoints)
  if endpoints.empty?
    raise ArgumentError, "Must specify at least one endpoint"
  end

  @options[:policy] ||= RightSupport::Net::LB::RoundRobin
  @policy = @options[:policy]
  @policy = @policy.new(options) if @policy.is_a?(Class)

  if (@debug_mode = @options.delete(:debug_mode)).nil?
    @debug_mode = DEFAULT_DEBUG_MODE
  end

  # convert retry counter to a simple retry callback, if necessary.
  @retry = @options.delete(:retry) || DEFAULT_RETRY_PROC
  unless @retry.kind_of?(Proc)
    # ensure that the count is captured by callback for safety.
    @retry = Integer(@retry)
    retry_proc = lambda do |max_attempts|
      lambda do |ep, n|
        n < max_attempts
      end
    end.call(@retry)
    @retry = retry_proc  # and now the type always Proc
  end

  unless test_policy_duck_type(@policy)
    raise ArgumentError, ":policy must be a class/object that responds to :next, :good and :bad"
  end

  # note @retry is now always defined as a callback. the legacy code always
  # had a default retry but it could have been defined later during actual
  # execution instead of being concretely defined on initialization. now it
  # is always defined on initialization.
  unless test_callable_arity(@retry, 2, false)
    raise ArgumentError, ":retry callback must accept two parameters"
  end

  unless test_callable_arity(options[:fatal], 1)
    raise ArgumentError, ":fatal callback must accept one parameter"
  end

  unless test_callable_arity(options[:on_exception], 3, false)
    raise ArgumentError, ":on_exception callback must accept three parameters"
  end

  unless test_callable_arity(options[:health_check], 1, false)
    raise ArgumentError, ":health_check callback must accept one parameter"
  end

  unless test_callable_arity(options[:on_health_change], 1, false)
    raise ArgumentError, ":on_health_change callback must accept one parameter"
  end

  @endpoints = endpoints

  if @options[:resolve]
    # Perform initial DNS resolution
    resolve
  else
    # Use endpoints as-is
    @policy.set_endpoints(@endpoints)
  end
end

Instance Attribute Details

#endpointsObject (readonly)

Returns the value of attribute endpoints.



135
136
137
# File 'lib/right_support/net/request_balancer.rb', line 135

def endpoints
  @endpoints
end

Class Method Details

.backoff_retry_callback(max_attempts) ⇒ Object

encapsulates exponential backoff/retry logic in a callback for use as the :retry option to request balancer.



153
154
155
156
157
158
159
160
161
162
# File 'lib/right_support/net/request_balancer.rb', line 153

def self.backoff_retry_callback(max_attempts)
  lambda do |_, n|
    if n < max_attempts
      sleep 2 ** n
      true
    else
      false
    end
  end
end

.request(endpoints, options = {}, &block) ⇒ Object



147
148
149
# File 'lib/right_support/net/request_balancer.rb', line 147

def self.request(endpoints, options={}, &block)
  new(endpoints, options).request(&block)
end

Instance Method Details

#get_statsObject

Provide an interface so one can query the RequestBalancer for statistics on its endpoints. Merely proxies the balancing policy’s get_stats method. If no method exists in the balancing policy, a hash of endpoints with “n/a” is returned.

Examples

A RequestBalancer created with endpoints [1,2,3,4,5] and using a HealthCheck balancing policy may return:

=> “yellow-3”, 1 => “red”, 2 => “yellow-1”, 3 => “green”, 4 => “yellow-2”

A RequestBalancer created with endpoints [1,2,3,4,5] and specifying no balancing policy or using the default RoundRobin balancing policy may return:

=> “n/a”, 1 => “n/a”, 3 => “n/a”



445
446
447
448
449
450
451
452
453
454
455
# File 'lib/right_support/net/request_balancer.rb', line 445

def get_stats
  result = nil
  if @policy.respond_to?(:get_stats)
    @synchronize.call do
      result = @policy.get_stats
    end
  else
    result = @endpoints.inject({}) { |h, endpoint| h[endpoint] = 'n/a'; h }
  end
  result
end

#lookup_hostname(endpoint) ⇒ String

Un-resolve an IP address.

Parameters:

  • endpoint (String)

    (e.g. HTTP URL) to be un-resolved

Returns:

  • (String)

    the first hostname that resolved to the IP (there should be at most one) or nil



282
283
284
285
286
287
288
289
290
# File 'lib/right_support/net/request_balancer.rb', line 282

def lookup_hostname(endpoint)
  result = nil
  @synchronize.call do
    if resolved_hostname = @resolved_hostnames && @resolved_hostnames.select{ |k, v| v.addresses.include?(endpoint) }
      result = resolved_hostname.shift[0]
    end
  end
  result
end

#requestObject

Perform a request.

Block

This method requires a block, to which it yields in order to perform the actual network request. If the block raises an exception or provides nil, the balancer proceeds to try the next URL in the list.

Raise

ArgumentError

if a block isn’t supplied

NoResult

if every URL in the list times out or returns nil

Return

Return the first non-nil value provided by the block.

Raises:

  • (ArgumentError)


305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
# File 'lib/right_support/net/request_balancer.rb', line 305

def request
  raise ArgumentError, "Must call this method with a block" unless block_given?
  @synchronize.call do
    resolve if need_resolve?
  end

  exceptions = {}
  result     = nil
  complete   = false
  n          = 0

  loop do
    if n > 0
      retry_result = nil
      @synchronize.call do
        retry_result = @retry.call((@ips.nil? || @ips.empty?) ? @endpoints : @ips, n)
      end

      # FIX: this integer result logic is odd but is left for legacy support
      # reasons. technically any retry proc could return integer and invoke
      # this odd side-effect, which was only intended to support :retry as
      # a literal integer. retry proc implementations should now only return
      # boolean to avoid this weirdness. this logic should be removed in v3.
      if retry_result.is_a?(Integer) && n >= retry_result
        retry_result = false
      end
      break unless retry_result
    end

    endpoint = nil
    need_health_check = false
    @synchronize.call do
      endpoint, need_health_check = @policy.next
    end
    break unless endpoint

    n += 1
    t0 = Time.now

    # Perform health check if necessary. Note that we guard this with a rescue, because the
    # health check may raise an exception and we want to log the exception info if this happens.
    if need_health_check
      hc_result = false
      hc_exception = nil
      @synchronize.call do
        begin
          # note that health-check can update the policy's good/bad state
          # for endpoints.
          hc_result = @policy.health_check(endpoint)
        rescue Exception => e
          hc_exception = e
        end
      end
      if hc_result
        logger.info "RequestBalancer: health check succeeded to #{endpoint}"
      elsif hc_exception
        logger.error "RequestBalancer: health check failed to #{endpoint} because of #{hc_exception.class.name}: #{hc_exception.message}"
        if fatal_exception?(hc_exception)
          # Fatal exceptions should still raise, even if only during a health check
          raise hc_exception
        else
          # Nonfatal exceptions: keep on truckin'
          exceptions[endpoint] ||= []
          exceptions[endpoint] << hc_exception
          debug_exception(hc_exception) if @debug_mode
          next
        end
      else
        logger.error "RequestBalancer: health check failed to #{endpoint} because of non-true return value"
        next
      end
    end

    begin
      result = yield(endpoint)
      @synchronize.call do
        @policy.good(endpoint, t0, Time.now)
      end
      complete = true
      break
    rescue Exception => e
      if to_raise = handle_exception(endpoint, e, t0)
        raise(to_raise)
      else
        @synchronize.call do
          @policy.bad(endpoint, t0, Time.now)
        end
        exceptions[endpoint] ||= []
        exceptions[endpoint] << e
        debug_exception(e) if @debug_mode
      end
    end
  end # loop

  return result if complete

  # Produce a summary message for the exception that gives a bit of detail
  msg = []
  stats = get_stats
  exceptions.each_pair do |endpoint, list|
    summary = []
    list.each do |e|
      if e.message.to_s.empty?
        summary << e.class.name
      else
        message_top = e.message.to_s.lines.first.chomp
        if message_top.length > 128
          message_top = message_top[0, 124] + ' ...'
        end
        summary << "#{e.class.name}: #{message_top}"
      end
    end
    health = stats[endpoint] if stats[endpoint] != 'n/a'
    if hostname = lookup_hostname(endpoint)
      msg << "'#{hostname}' (#{endpoint}#{", "+health if health}) => [#{summary.uniq.join(', ')}]"
    else
      msg << "'#{endpoint}' #{"("+health+")" if health} => [#{summary.uniq.join(', ')}]"
    end
  end
  message = "Request failed after #{n} tries to #{exceptions.size} endpoints: (#{msg.join(', ')})"
  logger.error "RequestBalancer: #{message}"
  raise NoResult.new(message, exceptions)
end

#resolved_endpointsArray

Return the actual, potentially DNS-resolved endpoints that are used for requests. If the balancer was constructed with :resolve=>nil, return self.endpoints.

Returns:

  • (Array)

    collection of endpoints



141
142
143
144
145
# File 'lib/right_support/net/request_balancer.rb', line 141

def resolved_endpoints
  @synchronize.call do
    (@ips.nil? || @ips.empty?) ? @endpoints : @ips
  end
end