Class: Classifier::Streaming::LineReader
- Includes:
- Enumerable
- Defined in:
- lib/classifier/streaming/line_reader.rb
Overview
Memory-efficient line reader for large files and IO streams. Reads lines one at a time and can yield in configurable batches.
Instance Attribute Summary collapse
-
#batch_size ⇒ Object
readonly
Returns the value of attribute batch_size.
Instance Method Summary collapse
-
#each ⇒ Object
Iterates over each line in the IO stream.
-
#each_batch {|batch| ... } ⇒ Object
Iterates over batches of lines.
-
#estimate_line_count(sample_size: 100) ⇒ Object
Estimates the total number of lines in the IO stream.
-
#initialize(io, batch_size: 100) ⇒ LineReader
constructor
Creates a new LineReader.
Constructor Details
#initialize(io, batch_size: 100) ⇒ LineReader
Creates a new LineReader.
26 27 28 29 |
# File 'lib/classifier/streaming/line_reader.rb', line 26 def initialize(io, batch_size: 100) @io = io @batch_size = batch_size end |
Instance Attribute Details
#batch_size ⇒ Object (readonly)
Returns the value of attribute batch_size.
21 22 23 |
# File 'lib/classifier/streaming/line_reader.rb', line 21 def batch_size @batch_size end |
Instance Method Details
#each ⇒ Object
Iterates over each line in the IO stream. Lines are chomped (trailing newlines removed).
36 37 38 39 40 41 42 |
# File 'lib/classifier/streaming/line_reader.rb', line 36 def each return enum_for(:each) unless block_given? @io.each_line do |line| yield line.chomp end end |
#each_batch {|batch| ... } ⇒ Object
Iterates over batches of lines. Each batch is an array of chomped lines.
49 50 51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/classifier/streaming/line_reader.rb', line 49 def each_batch return enum_for(:each_batch) unless block_given? batch = [] #: Array[String] each do |line| batch << line if batch.size >= @batch_size yield batch batch = [] end end yield batch unless batch.empty? end |
#estimate_line_count(sample_size: 100) ⇒ Object
Estimates the total number of lines in the IO stream. This is a rough estimate based on file size and average line length. Returns nil for non-seekable streams.
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
# File 'lib/classifier/streaming/line_reader.rb', line 68 def estimate_line_count(sample_size: 100) return nil unless @io.respond_to?(:size) && @io.respond_to?(:rewind) begin original_pos = @io.pos @io.rewind sample_bytes = 0 sample_lines = 0 sample_size.times do line = @io.gets break unless line sample_bytes += line.bytesize sample_lines += 1 end @io.seek(original_pos) return nil if sample_lines.zero? avg_line_size = sample_bytes.to_f / sample_lines io_size = @io.__send__(:size) #: Integer (io_size / avg_line_size).round rescue IOError, Errno::ESPIPE nil end end |