Method: RedAmber::Group#summarize

Defined in:
lib/red_amber/group.rb

#summarize {|group| ... } ⇒ DataFrame #summarize {|group| ... } ⇒ DataFrame #summarize {|group| ... } ⇒ DataFrame

Summarize Group by aggregation functions from the block.

Overloads:

  • #summarize {|group| ... } ⇒ DataFrame

    Summarize by a function.

    Examples:

    Single function and single variable

    group = penguins.group(:species)
    group
    
    # =>
    #<RedAmber::Group : 0x000000000000c314>
      species   group_count
      <string>      <uint8>
    0 Adelie            152
    1 Chinstrap          68
    2 Gentoo            124
    
    group.summarize { mean(:bill_length_mm) }
    
    # =>
    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000c364>
      species   mean(bill_length_mm)
      <string>              <double>
    0 Adelie                   38.79
    1 Chinstrap                48.83
    2 Gentoo                    47.5

    Single function only

    group.summarize { mean }
    
    # =>
    #<RedAmber::DataFrame : 3 x 6 Vectors, 0x000000000000c350>
      species   mean(bill_length_mm) mean(bill_depth_mm) ... mean(year)
      <string>              <double>            <double> ...   <double>
    0 Adelie                   38.79               18.35 ...    2008.01
    1 Chinstrap                48.83               18.42 ...    2007.97
    2 Gentoo                    47.5               14.98 ...    2008.08

    Yield Parameters:

    • group (Group)

      passes group object self.

    Yield Returns:

    Returns:

  • #summarize {|group| ... } ⇒ DataFrame

    Summarize by a function.

    Examples:

    Multiple functions

    group.summarize { [min(:bill_length_mm), max(:bill_length_mm)] }
    
    # =>
    #<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000c378>
      species   min(bill_length_mm) max(bill_length_mm)
      <string>             <double>            <double>
    0 Adelie                   32.1                46.0
    1 Chinstrap                40.9                58.0
    2 Gentoo                   40.9                59.6

    Yield Parameters:

    • group (Group)

      passes group object self.

    Yield Returns:

    • (Array<DataFrame>)

      an aggregated DataFrame or an array of aggregated DataFrames.

    Returns:

  • #summarize {|group| ... } ⇒ DataFrame

    Summarize by a function.

    Examples:

    Rename column name by Hash

    group.summarize {
      {
        min_bill_length_mm: min(:bill_length_mm),
        max_bill_length_mm: max(:bill_length_mm),
      }
    }
    
    # =>
    #<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000c378>
      species   min_bill_length_mm max_bill_length_mm
      <string>            <double>           <double>
    0 Adelie                  32.1               46.0
    1 Chinstrap               40.9               58.0
    2 Gentoo                  40.9               59.6

    Yield Parameters:

    • group (Group)

      passes group object self.

    Yield Returns:

    • (Hash{Symbol, String => DataFrame})

      an aggregated DataFrame or an array of aggregated DataFrames. The DataFrame must return only one aggregated column.

    Returns:



549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
# File 'lib/red_amber/group.rb', line 549

def summarize(*args, &block)
  if block
    agg = instance_eval(&block)
    unless args.empty?
      agg = [agg] if agg.is_a?(DataFrame)
      agg = args.zip(agg).to_h
    end
  else
    agg = args
  end

  case agg
  when DataFrame
    agg
  when Array
    aggregations =
      agg.map do |df|
        v = df.vectors[-1]
        [v.key, v]
      end
    agg[0].assign(aggregations)
  when Hash
    aggregations =
      agg.map do |key, df|
        aggregated_keys = df.keys - @group_keys
        if aggregated_keys.size > 1
          message =
            "accept only one column from the Hash: #{aggregated_keys.join(', ')}"
          raise GroupArgumentError, message
        end

        v = df.vectors[-1]
        [key, v]
      end
    agg.values[-1].drop(-1).assign(aggregations)
  else
    raise GroupArgumentError, "Unknown argument: #{agg}"
  end
end