Module: ElasticsearchRecord::Relation::CalculationMethods

Defined in:
lib/elasticsearch_record/relation/calculation_methods.rb

Instance Method Summary collapse

Instance Method Details

#average(column_name) ⇒ Object

Calculates the average value on a given column. Returns +nil+ if there's no row. See #calculate for examples with options.

Person.all.average(:age) # => 35.8



193
194
195
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 193

def average(column_name)
  calculate(:avg, column_name, node: :value)
end

#boxplot(column_name) ⇒ Object

A boxplot metrics aggregation that computes boxplot of numeric values extracted from the aggregated documents. These values can be generated from specific numeric or histogram fields in the documents.

The boxplot aggregation returns essential information for making a box plot: minimum, maximum, median, first quartile (25th percentile) and third quartile (75th percentile) values.

Person.all.boxplot(:age)

{ "min": 0.0, "max": 990.0, "q1": 167.5, "q2": 445.0, "q3": 722.5, "lower": 0.0, "upper": 990.0 }



68
69
70
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 68

def boxplot(column_name)
  calculate(:boxplot, column_name)
end

#calculate(metric, *columns, opts: {}, node: nil) ⇒ Object

creates a aggregation with the provided metric (e.g. :sum) and columns. returns the metric node (default: :value) from the aggregations result.

Parameters:

  • metric (Symbol, String)
  • columns (Array<Symbol|String>)
  • opts (Hash) (defaults to: {})
    • additional arguments that get merged with the metric definition
  • node (Symbol) (defaults to: nil)

    (default: nil)



259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 259

def calculate(metric, *columns, opts: {}, node: nil)
  metric_key = "calculate_#{metric}"

  # spawn a new aggregation and return the aggs
  response = if columns.size == 1
               aggregate(metric_key, { metric => { field: columns[0] }.merge(opts) }).aggregations
             else
               aggregate(metric_key, { metric => { fields: columns }.merge(opts) }).aggregations
             end

  if node.present?
    response[metric_key][node]
  else
    response[metric_key]
  end
end

#cardinality(column_name) ⇒ Object

Calculates the cardinality on a given column. Returns +0+ if there's no row.

Person.all.cardinality(:age)

12



182
183
184
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 182

def cardinality(column_name)
  calculate(:cardinality, column_name, node: :value)
end

#count(column_name = nil) ⇒ Object

Count the records.

Person.all.count => the total count of all people

Person.all.count(:age) => returns the total count of all people whose age is present in database



11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 11

def count(column_name = nil)
  # fallback to default
  return super() if block_given?

  # check for already failed query
  return 0 if null_relation?

  # reset column_name, if +:all+ was provided ...
  column_name = nil if column_name == :all

  # check for combined cases
  if self.distinct_value && column_name
    self.cardinality(column_name)
  elsif column_name
    where(:filter, { exists: { field: column_name } }).count
  elsif self.group_values.any?
    self.composite(*self.group_values)
  elsif self.select_values.any?
    self.composite(*self.select_values)
  elsif limit_value == 0 # Shortcut when limit is zero.
    return 0
  elsif limit_value
    # since total will be limited to 10000 results, we need to resolve the real values by a custom query.
    # This query is called through +#select_count+.
    #
    # HINT: +:__query__+ directly interacts with the query-object and sets the 'terminate_after' argument
    # see @ ElasticsearchRecord::Query#arguments & Arel::Collectors::ElasticsearchQuery#assign
    arel = spawn.unscope!(:offset, :limit, :order, :configure, :aggs).configure!(:__query__, argument: { terminate_after: limit_value }).arel
    klass.connection.select_count(arel, "#{klass.name} Count")
  else
    # since total will be limited to 10000 results, we need to resolve the real values by a custom query.
    # This query is called through +#select_count+.
    arel = spawn.unscope!(:offset, :limit, :order, :configure, :aggs)
    klass.connection.select_count(arel, "#{klass.name} Count")
  end
end

#matrix_stats(*column_names) ⇒ Object

The matrix_stats aggregation is a numeric aggregation that computes the following statistics over a set of document fields: count Number of per field samples included in the calculation. mean The average value for each field. variance Per field Measurement for how spread out the samples are from the mean. skewness Per field measurement quantifying the asymmetric distribution around the mean. kurtosis Per field measurement quantifying the shape of the distribution. covariance A matrix that quantitatively describes how changes in one field are associated with another. correlation The covariance matrix scaled to a range of -1 to 1, inclusive. Describes the relationship between field distributions.



122
123
124
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 122

def matrix_stats(*column_names)
  calculate(:matrix_stats, *column_names)
end

#maximum(column_name) ⇒ Object

Calculates the maximum value on a given column. The value is returned with the same data type of the column, or +nil+ if there's no row. See

calculate for examples with options.

Person.all.maximum(:age) # => 93



219
220
221
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 219

def maximum(column_name)
  calculate(:max, column_name, node: :value)
end

#median_absolute_deviation(column_name) ⇒ Object

This single-value aggregation approximates the median absolute deviation of its search results. Median absolute deviation is a measure of variability. It is a robust statistic, meaning that it is useful for describing data that may have outliers, or may not be normally distributed. For such data it can be more descriptive than standard deviation.

It is calculated as the median of each data point’s deviation from the median of the entire sample. That is, for a random variable X, the median absolute deviation is median(|median(X) - Xi|).

Person.all.median_absolute_deviation(:age) # => 91



236
237
238
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 236

def median_absolute_deviation(column_name)
  calculate(:median_absolute_deviation, column_name)
end

#minimum(column_name) ⇒ Object

Calculates the minimum value on a given column. The value is returned with the same data type of the column, or +nil+ if there's no row.

Person.all.minimum(:age)

7



206
207
208
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 206

def minimum(column_name)
  calculate(:min, column_name, node: :value)
end

#percentile_ranks(column_name, values) ⇒ Object

A multi-value metrics aggregation that calculates one or more percentile ranks over numeric values extracted from the aggregated documents.

Percentile rank show the percentage of observed values which are below certain value. For example, if a value is greater than or equal to 95% of the observed values it is said to be at the 95th percentile rank.

Person.all.percentile_ranks(:year, [500,600])

{ "1.0" => 2016.0, "5.0" => 2016.0, "25.0" => 2016.0, "50.0" => 2017.0, "75.0" => 2017.0, "95.0" => 2021.0, "99.0" => 2022.0 }



170
171
172
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 170

def percentile_ranks(column_name, values)
  calculate(:percentile_ranks, column_name, opts: { values: values }, node: :values)
end

#percentiles(column_name) ⇒ Object

A multi-value metrics aggregation that calculates one or more percentiles over numeric values extracted from the aggregated documents. Returns a hash with empty values (but keys still exists) if there is no row.

Person.all.percentiles(:year)

{ "1.0" => 2016.0, "5.0" => 2016.0, "25.0" => 2016.0, "50.0" => 2017.0, "75.0" => 2017.0, "95.0" => 2021.0, "99.0" => 2022.0 }



144
145
146
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 144

def percentiles(column_name)
  calculate(:percentiles, column_name, node: :values)
end

#stats(column_name) ⇒ Object

A multi-value metrics aggregation that computes stats over numeric values extracted from the aggregated documents. # The stats that are returned consist of: min, max, sum, count and avg.

Person.all.stats(:age)

{ "count": 10, "min": 0.0, "max": 990.0, "sum": 16859, "avg": 75.5 }



87
88
89
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 87

def stats(column_name)
  calculate(:stats, column_name)
end

#string_stats(column_name) ⇒ Object

A multi-value metrics aggregation that computes statistics over string values extracted from the aggregated documents. These values can be retrieved either from specific keyword fields.

Person.all.string_stats(:name)

{ "count": 5, "min_length": 24, "max_length": 30, "avg_length": 28.8, "entropy": 3.94617750050791 }



106
107
108
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 106

def string_stats(column_name)
  calculate(:string_stats, column_name)
end

#sum(column_name) ⇒ Object

Calculates the sum of values on a given column. The value is returned with the same data type of the column, +0+ if there's no row. See

calculate for examples with options.

Person.all.sum(:age) # => 4562



249
250
251
# File 'lib/elasticsearch_record/relation/calculation_methods.rb', line 249

def sum(column_name)
  calculate(:sum, column_name, node: :value)
end