Method: Enumerable#chunk

Defined in:
enum.c

#chunk {|array| ... } ⇒ Object

Each element in the returned enumerator is a 2-element array consisting of:

  • A value returned by the block.

  • An array (“chunk”) containing the element for which that value was returned, and all following elements for which the block returned the same value:

So that:

  • Each block return value that is different from its predecessor begins a new chunk.

  • Each block return value that is the same as its predecessor continues the same chunk.

Example:

e = (0..10).chunk {|i| (i / 3).floor } # => #<Enumerator: ...>
# The enumerator elements.
e.next # => [0, [0, 1, 2]]
e.next # => [1, [3, 4, 5]]
e.next # => [2, [6, 7, 8]]
e.next # => [3, [9, 10]]

Method chunk is especially useful for an enumerable that is already sorted. This example counts words for each initial letter in a large array of words:

# Get sorted words from a web page.
url = 'https://raw.githubusercontent.com/eneko/data-repository/master/data/words.txt'
words = URI::open(url).readlines
# Make chunks, one for each letter.
e = words.chunk {|word| word.upcase[0] } # => #<Enumerator: ...>
# Display 'A' through 'F'.
e.each {|c, words| p [c, words.length]; break if c == 'F' }

Output:

["A", 17096]
["B", 11070]
["C", 19901]
["D", 10896]
["E", 8736]
["F", 6860]

You can use the special symbol :_alone to force an element into its own separate chuck:

a = [0, 0, 1, 1]
e = a.chunk{|i| i.even? ? :_alone : true }
e.to_a # => [[:_alone, [0]], [:_alone, [0]], [true, [1, 1]]]

For example, you can put each line that contains a URL into its own chunk:

pattern = /http/
open(filename) { |f|
  f.chunk { |line| line =~ pattern ? :_alone : true }.each { |key, lines|
    pp lines
  }
}

You can use the special symbol :_separator or nil to force an element to be ignored (not included in any chunk):

a = [0, 0, -1, 1, 1]
e = a.chunk{|i| i < 0 ? :_separator : true }
e.to_a # => [[true, [0, 0]], [true, [1, 1]]]

Note that the separator does end the chunk:

a = [0, 0, -1, 1, -1, 1]
e = a.chunk{|i| i < 0 ? :_separator : true }
e.to_a # => [[true, [0, 0]], [true, [1]], [true, [1]]]

For example, the sequence of hyphens in svn log can be eliminated as follows:

sep = "-"*72 + "\n"
IO.popen("svn log README") { |f|
  f.chunk { |line|
    line != sep || nil
  }.each { |_, lines|
    pp lines
  }
}
#=> ["r20018 | knu | 2008-10-29 13:20:42 +0900 (Wed, 29 Oct 2008) | 2 lines\n",
#    "\n",
#    "* README, README.ja: Update the portability section.\n",
#    "\n"]
#   ["r16725 | knu | 2008-05-31 23:34:23 +0900 (Sat, 31 May 2008) | 2 lines\n",
#    "\n",
#    "* README, README.ja: Add a note about default C flags.\n",
#    "\n"]
#   ...

Paragraphs separated by empty lines can be parsed as follows:

File.foreach("README").chunk { |line|
  /\A\s*\z/ !~ line || nil
}.each { |_, lines|
  pp lines
}

Yields:

  • (array)


3993
3994
3995
3996
3997
3998
3999
4000
4001
4002
4003
4004
4005
# File 'enum.c', line 3993

static VALUE
enum_chunk(VALUE enumerable)
{
    VALUE enumerator;

    RETURN_SIZED_ENUMERATOR(enumerable, 0, 0, enum_size);

    enumerator = rb_obj_alloc(rb_cEnumerator);
    rb_ivar_set(enumerator, id_chunk_enumerable, enumerable);
    rb_ivar_set(enumerator, id_chunk_categorize, rb_block_proc());
    rb_block_call(enumerator, idInitialize, 0, 0, chunk_i, enumerator);
    return enumerator;
}