Class: Gitlab::Database::PostgresHll::Buckets

Inherits:
Object
  • Object
show all
Defined in:
lib/gitlab/database/postgres_hll/buckets.rb

Overview

Note:

HyperLogLog is an PROBABILISTIC algorithm that ESTIMATES distinct count of given attribute value for supplied relation Like all probabilistic algorithm is has ERROR RATE margin, that can affect values, for given implementation no higher value was reported (gitlab.com/gitlab-org/gitlab/-/merge_requests/45673#accuracy-estimation) than 5.3% for the most of a cases this value is lower. However, if the exact value is necessary other tools has to be used.

Constant Summary collapse

TOTAL_BUCKETS =
512

Instance Method Summary collapse

Constructor Details

#initialize(buckets = {}) ⇒ Buckets

Returns a new instance of Buckets.



26
27
28
# File 'lib/gitlab/database/postgres_hll/buckets.rb', line 26

def initialize(buckets = {})
  @buckets = buckets
end

Instance Method Details

#estimated_distinct_countFloat

Based on HyperLogLog structure estimates number of unique elements in analysed set.

Returns:

  • (Float)

    Estimate number of unique elements



33
34
35
# File 'lib/gitlab/database/postgres_hll/buckets.rb', line 33

def estimated_distinct_count
  @estimated_distinct_count ||= estimate_cardinality
end

#merge_hash!(other_buckets_hash) ⇒ Object

Updates instance underlying HyperLogLog structure by merging it with other HyperLogLog structure

Parameters:

  • other_buckets_hash

    hash with HyperLogLog structure representation



40
41
42
# File 'lib/gitlab/database/postgres_hll/buckets.rb', line 40

def merge_hash!(other_buckets_hash)
  buckets.merge!(other_buckets_hash) { |_key, old, new| new > old ? new : old }
end

#to_json(_ = nil) ⇒ String

Serialize instance underlying HyperLogLog structure to JSON format, that can be stored in various persistence layers

Returns:

  • (String)

    HyperLogLog data structure serialized to JSON



47
48
49
# File 'lib/gitlab/database/postgres_hll/buckets.rb', line 47

def to_json(_ = nil)
  buckets.to_json
end