Class: BufferedTokenizer
- Inherits:
-
Object
- Object
- BufferedTokenizer
- Defined in:
- lib/em/buftok.rb
Overview
BufferedTokenizer takes a delimiter upon instantiation, or acts line-based by default. It allows input to be spoon-fed from some outside source which receives arbitrary length datagrams which may-or-may-not contain the token by which entities are delimited. In this respect it's ideally paired with something like EventMachine (http://rubyeventmachine.com/).
Instance Method Summary collapse
-
#extract(data) ⇒ Object
Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract.
-
#flush ⇒ Object
Flush the contents of the input buffer, i.e.
-
#initialize(delimiter = $/) ⇒ BufferedTokenizer
constructor
New BufferedTokenizers will operate on lines delimited by a delimiter, which is by default the global input delimiter $/ ("\n").
Constructor Details
#initialize(delimiter = $/) ⇒ BufferedTokenizer
New BufferedTokenizers will operate on lines delimited by a delimiter, which is by default the global input delimiter $/ ("\n").
The input buffer is stored as an array. This is by far the most efficient approach given language constraints (in C a linked list would be a more appropriate data structure). Segments of input data are stored in a list which is only joined when a token is reached, substantially reducing the number of objects required for the operation.
15 16 17 18 19 20 |
# File 'lib/em/buftok.rb', line 15 def initialize(delimiter = $/) @delimiter = delimiter @input = [] @tail = '' @trim = @delimiter.length - 1 end |
Instance Method Details
#extract(data) ⇒ Object
Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract. This makes for easy processing of datagrams using a pattern like:
tokenizer.extract(data).map { |entity| Decode(entity) }.each do ...
Using -1 makes split to return "" if the token is at the end of the string, meaning the last element is the start of the next chunk.
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/em/buftok.rb', line 30 def extract(data) if @trim > 0 tail_end = @tail.slice!(-@trim, @trim) # returns nil if string is too short data = tail_end + data if tail_end end @input << @tail entities = data.split(@delimiter, -1) @tail = entities.shift unless entities.empty? @input << @tail entities.unshift @input.join @input.clear @tail = entities.pop end entities end |
#flush ⇒ Object
Flush the contents of the input buffer, i.e. return the input buffer even though a token has not yet been encountered
52 53 54 55 56 57 58 |
# File 'lib/em/buftok.rb', line 52 def flush @input << @tail buffer = @input.join @input.clear @tail = "" # @tail.clear is slightly faster, but not supported on 1.8.7 buffer end |