Class: Milemarker

Inherits:
Object
  • Object
show all
Defined in:
lib/milemarker.rb,
lib/milemarker/version.rb,
lib/milemarker/structured.rb

Overview

milemarker class, to keep track of progress over time for long-running iterating processes

Author:

Direct Known Subclasses

Structured

Defined Under Namespace

Classes: Structured

Constant Summary collapse

VERSION =
"1.0.0"

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(batch_size: 1000, name: nil, logger: nil) ⇒ Milemarker

Create a new milemarker tracker, with an optional name and logger

Parameters:

  • batch_size (Integer) (defaults to: 1000)

    How often the on_batch block will be called

  • name (String) (defaults to: nil)

    Optional “name” for this milemarker, included in the generated log lines

  • Optional (Logger, #info, #warn)

    logger that responds to the normal #info, #warn, etc.



52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# File 'lib/milemarker.rb', line 52

def initialize(batch_size: 1000, name: nil, logger: nil)
  @batch_size = batch_size
  @name       = name
  @logger     = logger

  @batch_number = 0
  @last_batch_size    = 0
  @last_batch_seconds = 0

  @start_time       = Time.now
  @batch_start_time = @start_time
  @batch_end_time   = @start_time

  @count      = 0
  @prev_count = 0
end

Instance Attribute Details

#batch_end_timeTime (readonly)

Returns Time the last batch ended processing.

Returns:

  • (Time)

    Time the last batch ended processing



39
40
41
# File 'lib/milemarker.rb', line 39

def batch_end_time
  @batch_end_time
end

#batch_numberInteger (readonly)

Returns which batch number (total increment / batch_size).

Returns:

  • (Integer)

    which batch number (total increment / batch_size)



24
25
26
# File 'lib/milemarker.rb', line 24

def batch_number
  @batch_number
end

#batch_sizeInteger

Returns batch size for computing ‘on_batch` calls.

Returns:

  • (Integer)

    batch size for computing ‘on_batch` calls



18
19
20
# File 'lib/milemarker.rb', line 18

def batch_size
  @batch_size
end

#batch_start_timeTime (readonly)

Returns Time the last batch started processing.

Returns:

  • (Time)

    Time the last batch started processing



36
37
38
# File 'lib/milemarker.rb', line 36

def batch_start_time
  @batch_start_time
end

#countInteger (readonly)

Returns Total records (really, increments) for the full run.

Returns:

  • (Integer)

    Total records (really, increments) for the full run



42
43
44
# File 'lib/milemarker.rb', line 42

def count
  @count
end

#last_batch_secondsInteger (readonly)

Returns number of second to process the last batch.

Returns:

  • (Integer)

    number of second to process the last batch



27
28
29
# File 'lib/milemarker.rb', line 27

def last_batch_seconds
  @last_batch_seconds
end

#last_batch_sizeInteger (readonly)

Returns number of records (really, number of increments) in the last batch.

Returns:

  • (Integer)

    number of records (really, number of increments) in the last batch



30
31
32
# File 'lib/milemarker.rb', line 30

def last_batch_size
  @last_batch_size
end

#loggerLogger, #info

Returns logging object for automatic logging methods.

Returns:

  • (Logger, #info)

    logging object for automatic logging methods



21
22
23
# File 'lib/milemarker.rb', line 21

def logger
  @logger
end

#nameString

Returns optional “name” of this milemarker, for logging purposes.

Returns:

  • (String)

    optional “name” of this milemarker, for logging purposes



15
16
17
# File 'lib/milemarker.rb', line 15

def name
  @name
end

#prev_countInteger (readonly)

Returns Total count at the time of the last on_batch call. Used to figure out how many records were in the final batch.

Returns:

  • (Integer)

    Total count at the time of the last on_batch call. Used to figure out how many records were in the final batch



46
47
48
# File 'lib/milemarker.rb', line 46

def prev_count
  @prev_count
end

#start_timeTime (readonly)

Returns Time the full process started.

Returns:

  • (Time)

    Time the full process started



33
34
35
# File 'lib/milemarker.rb', line 33

def start_time
  @start_time
end

Instance Method Details

#_increment_and_on_batch(&blk) ⇒ Object Also known as: increment_and_on_batch

Single call to increment and run (if needed) the on_batch block



107
108
109
# File 'lib/milemarker.rb', line 107

def _increment_and_on_batch(&blk)
  incr.on_batch(&blk)
end

#batch_lineString

A line describing the batch suitable for logging, of the form

load records.ndj   8_000_000. This batch 2_000_000 in 26.2s (76_469 r/s). Overall 72_705 r/s.

Returns:

  • (String)

    The batch log line



142
143
144
145
146
# File 'lib/milemarker.rb', line 142

def batch_line
  # rubocop:disable Layout/LineLength
  "#{name} #{ppnum(count, 10)}. This batch #{ppnum(last_batch_size, 5)} in #{ppnum(last_batch_seconds, 4, 1)}s (#{batch_rate_str} r/s). Overall #{total_rate_str} r/s."
  # rubocop:enable Layout/LineLength
end

#batch_rateFloat

Returns rate of the last batch (in recs/second).

Returns:

  • (Float)

    rate of the last batch (in recs/second)



169
170
171
172
173
# File 'lib/milemarker.rb', line 169

def batch_rate
  return 0.0 if count.zero?

  last_batch_size.to_f / last_batch_seconds
end

#batch_rate_str(decimals = 0) ⇒ String

Returns Rate-per-second in form XXX.YY.

Parameters:

  • decimals (Integer) (defaults to: 0)

    Number of decimal places to the right of the decimal point

Returns:

  • (String)

    Rate-per-second in form XXX.YY



178
179
180
# File 'lib/milemarker.rb', line 178

def batch_rate_str(decimals = 0)
  ppnum(batch_rate, 0, decimals)
end

#batch_seconds_so_farFloat

Total seconds since this batch started

Returns:

  • (Float)

    seconds since the beginning of this batch



204
205
206
# File 'lib/milemarker.rb', line 204

def batch_seconds_so_far
  Time.now - batch_start_time
end

#create_logger!(*args, **kwargs) ⇒ Milemarker

Create a logger for use in logging milemaker information

Examples:

mm.create_logger!(STDOUT)

Returns:



92
93
94
95
# File 'lib/milemarker.rb', line 92

def create_logger!(*args, **kwargs)
  @logger = Logger.new(*args, **kwargs)
  self
end

#final_batch_sizeInteger Also known as: batch_count_so_far

Record how many increments there have been since the last on_batch call. Most useful to count how many items are in the final (usually incomplete) batch Note that since Milemarker can’t tell when you’re done processing, you can call this anytime and get the number of items processed since the last on_batch call.

Returns:

  • (Integer)

    Number of items processed in the final batch



153
154
155
# File 'lib/milemarker.rb', line 153

def final_batch_size
  count - prev_count
end

#final_lineString

A line describing the entire run, suitable for logging, of the form

load records.ndj FINISHED. 27_138_118 total records in 00h 12m 39s. Overall 35_718 r/s.

Returns:

  • (String)

    The full log line



162
163
164
165
166
# File 'lib/milemarker.rb', line 162

def final_line
  # rubocop:disable Layout/LineLength
  "#{name} FINISHED. #{ppnum(count, 10)} total records in #{seconds_to_time_string(total_seconds_so_far)}. Overall #{total_rate_str} r/s."
  # rubocop:enable Layout/LineLength
end

#incr(increase = 1) ⇒ Milemarker Also known as: increment

Increment the counter – how many records processed, e.g.

Returns:



82
83
84
85
# File 'lib/milemarker.rb', line 82

def incr(increase = 1)
  @count += increase
  self
end

#increment_and_log_batch_line(level: :info) ⇒ Object

Convenience method, exactly the same as the common idiom

`mm.incr; mm.on_batch {|mm| log.info mm.batch_line}`

Parameters:

  • level (Symbol) (defaults to: :info)

    The level to log at



123
124
125
# File 'lib/milemarker.rb', line 123

def increment_and_log_batch_line(level: :info)
  increment_and_on_batch { log_batch_line(level: level) }
end

#log(msg, level: :info) ⇒ Object

Log a line using the internal logger. Do nothing if no logger is configured.

Parameters:

  • msg (String)

    The message to log

  • level (Symbol) (defaults to: :info)

    The level to log at



229
230
231
# File 'lib/milemarker.rb', line 229

def log(msg, level: :info)
  logger&.send(level, msg)
end

#log_batch_line(level: :info) ⇒ Object

Log the batch line, as described in #batch_line

Parameters:

  • level (Symbol) (defaults to: :info)

    The level to log at



129
130
131
# File 'lib/milemarker.rb', line 129

def log_batch_line(level: :info)
  log(batch_line, level: level)
end

#log_final_line(level: :info) ⇒ Object

Log the final line, as described in #final_line

Parameters:

  • level (Symbol) (defaults to: :info)

    The level to log at



135
136
137
# File 'lib/milemarker.rb', line 135

def log_final_line(level: :info)
  log(final_line, level: level)
end

#on_batch {|Milemarker| ... } ⇒ Object

Run the given block if we’ve exceeded the batch size for the current batch

Yields:



99
100
101
102
103
104
# File 'lib/milemarker.rb', line 99

def on_batch
  if batch_size_exceeded?
    set_milemarker!
    yield self
  end
end

#reset_for_next_batch!Object

Reset the internal counters/timers at the end of a batch. Taken care of by #on_batch; should probably not be called manually.



220
221
222
223
224
# File 'lib/milemarker.rb', line 220

def reset_for_next_batch!
  @batch_start_time  = batch_end_time
  @prev_count        = count
  @batch_number = batch_divisor
end

#set_milemarker!Object

Set/reset all the internal state. Called by #on_batch when necessary; should probably not be called manually



210
211
212
213
214
215
216
# File 'lib/milemarker.rb', line 210

def set_milemarker!
  @batch_end_time     = Time.now
  @last_batch_size    = @count - @prev_count
  @last_batch_seconds = @batch_end_time - @batch_start_time

  reset_for_next_batch!
end

#threadsafe_increment_and_on_batch(&blk) ⇒ Object

Threadsafe version of #increment_and_on_batch, doing the whole thing as a single atomic action



114
115
116
117
118
# File 'lib/milemarker.rb', line 114

def threadsafe_increment_and_on_batch(&blk)
  @mutex.synchronize do
    _increment_and_on_batch(&blk)
  end
end

#threadsafify!Milemarker

Turn ‘increment_and_batch` (and thus `increment_and_log_batch_line`) into a threadsafe version

Returns:



72
73
74
75
76
77
78
# File 'lib/milemarker.rb', line 72

def threadsafify!
  @mutex = Mutex.new
  define_singleton_method(:increment_and_on_batch) do |&blk|
    threadsafe_increment_and_on_batch(&blk)
  end
  self
end

#total_rateFloat

Returns total rate so far (in rec/second).

Returns:

  • (Float)

    total rate so far (in rec/second)



183
184
185
186
187
# File 'lib/milemarker.rb', line 183

def total_rate
  return 0.0 if @count.zero?

  count / total_seconds_so_far
end

#total_rate_str(decimals = 0) ⇒ String

Returns Rate-per-second in form XXX.YY.

Parameters:

  • decimals (Integer) (defaults to: 0)

    Number of decimal places to the right of the decimal point

Returns:

  • (String)

    Rate-per-second in form XXX.YY



192
193
194
# File 'lib/milemarker.rb', line 192

def total_rate_str(decimals = 0)
  ppnum(total_rate, 0, decimals)
end

#total_seconds_so_farFloat

Total seconds since the beginning of this milemarker

Returns:

  • (Float)

    seconds since the milemarker was created



198
199
200
# File 'lib/milemarker.rb', line 198

def total_seconds_so_far
  Time.now - start_time
end