Class: IOStreams::Line::Reader
- Inherits:
-
Object
- Object
- IOStreams::Line::Reader
- Defined in:
- lib/io_streams/line/reader.rb
Constant Summary collapse
- MAX_BLOCKS_MULTIPLIER =
Prevent denial of service when a delimiter is not found before this number * ‘buffer_size` characters are read.
100
- LINEFEED_REGEXP =
Regexp.compile(/\r\n|\n|\r/).freeze
Instance Attribute Summary collapse
-
#buffer_size ⇒ Object
readonly
Returns the value of attribute buffer_size.
-
#delimiter ⇒ Object
readonly
Returns the value of attribute delimiter.
-
#line_count ⇒ Object
readonly
Returns the value of attribute line_count.
Class Method Summary collapse
-
.open(file_name_or_io, **args) ⇒ Object
Read a line at a time from a file or stream.
Instance Method Summary collapse
-
#each ⇒ Object
Iterate over every line in the file/stream passing each line to supplied block in turn.
-
#eof? ⇒ Boolean
Returns whether the end of file has been reached for this stream.
-
#initialize(input_stream, delimiter: nil, buffer_size: 65_536) ⇒ Reader
constructor
Create a delimited stream reader from the supplied input stream.
- #readline ⇒ Object
Constructor Details
#initialize(input_stream, delimiter: nil, buffer_size: 65_536) ⇒ Reader
Create a delimited stream reader from the supplied input stream.
Lines returned will be in the encoding of the input stream. To change the encoding of retruned lines, use IOStreams::Encode::Reader.
Parameters
input_stream
The input stream that implements #read
delimiter: [String]
Line / Record delimiter to use to break the stream up into records
Any string to break the stream up by.
This delimiter is removed from each line when `#each` or `#readline` is called.
Default: nil
Automatically detect line endings and break up by line
Searches for the first "\r\n" or "\n" and then uses that as the
delimiter for all subsequent records.
buffer_size: [Integer]
Size of blocks to read from the input stream at a time.
Default: 65536 ( 64K )
TODO:
-
Handle embedded line feeds when reading csv files.
-
Skip Comment lines. RegExp?
-
Skip “empty” / “blank” lines. RegExp?
-
Extract header line(s) / first non-comment, non-blank line
-
Embedded newline support, RegExp? or Proc?
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
# File 'lib/io_streams/line/reader.rb', line 48 def initialize(input_stream, delimiter: nil, buffer_size: 65_536) @input_stream = input_stream @buffer_size = buffer_size # More efficient read buffering only supported when the input stream `#read` method supports it. @use_read_cache_buffer = !@input_stream.method(:read).arity.between?(0, 1) @line_count = 0 @eof = false @read_cache_buffer = nil @buffer = nil read_block # Auto-detect windows/linux line endings if not supplied. \n or \r\n @delimiter = delimiter || auto_detect_line_endings if @buffer # Change the delimiters encoding to match that of the input stream @delimiter = @delimiter.encode(@buffer.encoding) @delimiter_size = @delimiter.size end end |
Instance Attribute Details
#buffer_size ⇒ Object (readonly)
Returns the value of attribute buffer_size.
4 5 6 |
# File 'lib/io_streams/line/reader.rb', line 4 def buffer_size @buffer_size end |
#delimiter ⇒ Object (readonly)
Returns the value of attribute delimiter.
4 5 6 |
# File 'lib/io_streams/line/reader.rb', line 4 def delimiter @delimiter end |
#line_count ⇒ Object (readonly)
Returns the value of attribute line_count.
4 5 6 |
# File 'lib/io_streams/line/reader.rb', line 4 def line_count @line_count end |
Class Method Details
.open(file_name_or_io, **args) ⇒ Object
Read a line at a time from a file or stream
12 13 14 15 16 17 18 |
# File 'lib/io_streams/line/reader.rb', line 12 def self.open(file_name_or_io, **args) if file_name_or_io.is_a?(String) IOStreams::File::Reader.open(file_name_or_io) { |io| yield new(io, **args) } else yield new(file_name_or_io, **args) end end |
Instance Method Details
#each ⇒ Object
Iterate over every line in the file/stream passing each line to supplied block in turn. Returns [Integer] the number of lines read from the file/stream. Note:
-
The line delimiter is not returned.
75 76 77 78 79 80 81 |
# File 'lib/io_streams/line/reader.rb', line 75 def each until eof? line = readline yield(line) unless line.nil? end line_count end |
#eof? ⇒ Boolean
Returns whether the end of file has been reached for this stream
109 110 111 |
# File 'lib/io_streams/line/reader.rb', line 109 def eof? @eof && (@buffer.nil? || @buffer.empty?) end |
#readline ⇒ Object
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
# File 'lib/io_streams/line/reader.rb', line 83 def readline return if eof? # Keep reading until it finds the delimiter while (index = @buffer.index(@delimiter)).nil? && read_block end # Delimiter found? if index data = @buffer.slice(0, index) @buffer = @buffer.slice(index + @delimiter_size, @buffer.size) @line_count += 1 elsif @eof && @buffer.empty? data = nil @buffer = nil else # Last line without delimiter data = @buffer @buffer = nil @line_count += 1 end data end |