Class: Karafka::Instrumentation::Vendors::Kubernetes::LivenessListener

Inherits:
BaseListener
  • Object
show all
Defined in:
lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb

Overview

Note:

This listener will bind itself only when Karafka will actually attempt to start and moves from initializing to running. Before that, the TCP server will NOT be active. This is done on purpose to mitigate a case where users would subscribe this listener in ‘karafka.rb` without checking the recommendations of conditional assignment.

Note:

In case of usage within an embedding with Puma, you need to select different port then the one used by Puma itself.

Note:

Please use ‘Kubernetes::SwarmLivenessListener` when operating in the swarm mode

Kubernetes HTTP listener that does not only reply when process is not fully hanging, but also allows to define max time of processing and looping.

Processes like Karafka server can hang while still being reachable. For example, in case something would hang inside of the user code, Karafka could stop polling and no new data would be processed, but process itself would still be active. This listener allows for defining of a ttl that gets bumped on each poll loop and before and after processing of a given messages batch.

Instance Method Summary collapse

Constructor Details

#initialize(hostname: nil, port: 3000, consuming_ttl: 5 * 60 * 1_000, polling_ttl: 5 * 60 * 1_000) ⇒ LivenessListener

Note:

The default TTL matches the default ‘max.poll.interval.ms`

Returns a new instance of LivenessListener.

Parameters:

  • hostname (String, nil) (defaults to: nil)

    hostname or nil to bind on all

  • port (Integer) (defaults to: 3000)

    TCP port on which we want to run our HTTP status server

  • consuming_ttl (Integer) (defaults to: 5 * 60 * 1_000)

    time in ms after which we consider consumption hanging. It allows us to define max consumption time after which k8s should consider given process as hanging

  • polling_ttl (Integer) (defaults to: 5 * 60 * 1_000)

    max time in ms for polling. If polling (any) does not happen that often, process should be considered dead.



50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 50

def initialize(
  hostname: nil,
  port: 3000,
  consuming_ttl: 5 * 60 * 1_000,
  polling_ttl: 5 * 60 * 1_000
)
  # If this is set to true, it indicates unrecoverable error like fencing
  # While fencing can be partial (for one of the SGs), we still should consider this
  # as an undesired state for the whole process because it halts processing in a
  # non-recoverable manner forever
  @unrecoverable = false
  @polling_ttl = polling_ttl
  @consuming_ttl = consuming_ttl
  @mutex = Mutex.new
  @pollings = {}
  @consumptions = {}
  super(hostname: hostname, port: port)
end

Instance Method Details

#healthy?String

Did we exceed any of the ttls

Returns:

  • (String)

    204 string if ok, 500 otherwise



144
145
146
147
148
149
150
151
152
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 144

def healthy?
  time = monotonic_now

  return false if @unrecoverable
  return false if @pollings.values.any? { |tick| (time - tick) > @polling_ttl }
  return false if @consumptions.values.any? { |tick| (time - tick) > @consuming_ttl }

  true
end

#on_app_running(_event) ⇒ Object

Parameters:

  • _event (Karafka::Core::Monitoring::Event)


70
71
72
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 70

def on_app_running(_event)
  start
end

#on_app_stopped(_event) ⇒ Object

Stop the http server when we stop the process

Parameters:

  • _event (Karafka::Core::Monitoring::Event)


76
77
78
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 76

def on_app_stopped(_event)
  stop
end

#on_connection_listener_fetch_loop(_event) ⇒ Object

Tick on each fetch

Parameters:

  • _event (Karafka::Core::Monitoring::Event)


82
83
84
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 82

def on_connection_listener_fetch_loop(_event)
  mark_polling_tick
end

#on_connection_listener_stopped(_event) ⇒ Object

Deregister the polling tracker for given listener

Parameters:

  • _event (Karafka::Core::Monitoring::Event)


136
137
138
139
140
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 136

def on_connection_listener_stopped(_event)
  return if Karafka::App.done?

  clear_polling_tick
end

#on_connection_listener_stopping(_event) ⇒ Object

Deregister the polling tracker for given listener

Parameters:

  • _event (Karafka::Core::Monitoring::Event)


124
125
126
127
128
129
130
131
132
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 124

def on_connection_listener_stopping(_event)
  # We are interested in disabling tracking for given listener only if it was requested
  # when karafka was running. If we would always clear, it would not catch the shutdown
  # polling requirements. The "running" listener shutdown operations happen only when
  # the manager requests it for downscaling.
  return if Karafka::App.done?

  clear_polling_tick
end

#on_error_occurred(event) ⇒ Object

Parameters:

  • event (Karafka::Core::Monitoring::Event)


108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 108

def on_error_occurred(event)
  clear_consumption_tick
  clear_polling_tick

  error = event[:error]

  # We are only interested in the rdkafka errors
  return unless error.is_a?(Rdkafka::RdkafkaError)
  # We mark as unrecoverable only on certain errors that will not be fixed by retrying
  return unless UNRECOVERABLE_RDKAFKA_ERRORS.include?(error.code)

  @unrecoverable = true
end