Class: BioDSL::Sort
- Inherits:
-
Object
- Object
- BioDSL::Sort
- Defined in:
- lib/BioDSL/commands/sort.rb
Overview
Sort records in the stream.
sort
records in the stream given a specific key. Sorting on multiple keys is currently not supported.
Usage
sort(key: <value>[, reverse: <bool>[, block_size: <uint>]])
Options
-
key: <value> - Sort records on the value for key.
-
reverse: <bool> - Reverse sort.
-
block_size: <uint> - Block size used for disk based sorting
(default=250_000_000).
Examples
Consider the following table in the file ‘test.tab`:
#COUNT ORGANISM
4 Dog
3 Cat
1 Eel
To sort this accoring to COUNT in descending order do:
BD.new.read_table(input: "test.tab").sort(key: :COUNT).dump.run
{:COUNT=>1, :ORGANISM=>"Eel"}
{:COUNT=>3, :ORGANISM=>"Cat"}
{:COUNT=>4, :ORGANISM=>"Dog"}
And in ascending order:
BD.new.
read_table(input: "test.tab").
sort(key: :COUNT, reverse: true).
dump.
run
{:COUNT=>4, :ORGANISM=>"Dog"}
{:COUNT=>3, :ORGANISM=>"Cat"}
{:COUNT=>1, :ORGANISM=>"Eel"}
The type of value determines the sorting, alphabetical order:
BD.new.read_table(input: "test.tab").sort(key: :ORGANISM).dump.run
{:COUNT=>3, :ORGANISM=>"Cat"}
{:COUNT=>4, :ORGANISM=>"Dog"}
{:COUNT=>1, :ORGANISM=>"Eel"}
And reverse alphabetic order:
BD.new.
read_table(input: "test.tab").
sort(key: :ORGANISM, reverse: true).
dump.
run
{:COUNT=>1, :ORGANISM=>"Eel"}
{:COUNT=>4, :ORGANISM=>"Dog"}
{:COUNT=>3, :ORGANISM=>"Cat"}
Constant Summary collapse
- STATS =
%i(records_in records_out)
- SORT_BLOCK_SIZE =
max bytes to hold in memory.
250_000_000
Instance Method Summary collapse
-
#initialize(options) ⇒ Sort
constructor
Constructor for Sort.
-
#lmb ⇒ Proc
Return command lambda for Sort.
Constructor Details
#initialize(options) ⇒ Sort
Constructor for Sort.
108 109 110 111 112 113 114 115 116 117 118 119 |
# File 'lib/BioDSL/commands/sort.rb', line 108 def initialize() @options = @block_size = [:block_size] || SORT_BLOCK_SIZE @key = [:key].to_sym @files = [] @records = [] @size = 0 @pqueue = pqueue_init @fds = nil end |
Instance Method Details
#lmb ⇒ Proc
Return command lambda for Sort.
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
# File 'lib/BioDSL/commands/sort.rb', line 124 def lmb lambda do |input, output, status| status_init(status, STATS) input.each do |record| @status[:records_in] += 1 @records << record @size += record.to_s.size save_block if @size > @block_size end save_block open_block_files fill_pqueue output_pqueue(output) end end |