Class: UV::BufferedTokenizer
- Inherits:
-
Object
- Object
- UV::BufferedTokenizer
- Defined in:
- lib/uv-rays/buffered_tokenizer.rb
Instance Attribute Summary collapse
-
#delimiter ⇒ Object
Returns the value of attribute delimiter.
-
#indicator ⇒ Object
Returns the value of attribute indicator.
-
#size_limit ⇒ Object
Returns the value of attribute size_limit.
-
#verbose ⇒ Object
Returns the value of attribute verbose.
Instance Method Summary collapse
- #empty? ⇒ Boolean
-
#extract(data) ⇒ Object
Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract.
-
#flush ⇒ String
Flush the contents of the input buffer, i.e.
-
#initialize(options) ⇒ BufferedTokenizer
constructor
A new instance of BufferedTokenizer.
Constructor Details
#initialize(options) ⇒ BufferedTokenizer
Returns a new instance of BufferedTokenizer.
22 23 24 25 26 27 28 29 30 31 |
# File 'lib/uv-rays/buffered_tokenizer.rb', line 22 def initialize() @delimiter = [:delimiter] @indicator = [:indicator] @size_limit = [:size_limit] @verbose = [:verbose] if @size_limit raise ArgumentError, 'no delimiter provided' unless @delimiter @input = '' end |
Instance Attribute Details
#delimiter ⇒ Object
Returns the value of attribute delimiter.
19 20 21 |
# File 'lib/uv-rays/buffered_tokenizer.rb', line 19 def delimiter @delimiter end |
#indicator ⇒ Object
Returns the value of attribute indicator.
19 20 21 |
# File 'lib/uv-rays/buffered_tokenizer.rb', line 19 def indicator @indicator end |
#size_limit ⇒ Object
Returns the value of attribute size_limit.
19 20 21 |
# File 'lib/uv-rays/buffered_tokenizer.rb', line 19 def size_limit @size_limit end |
#verbose ⇒ Object
Returns the value of attribute verbose.
19 20 21 |
# File 'lib/uv-rays/buffered_tokenizer.rb', line 19 def verbose @verbose end |
Instance Method Details
#empty? ⇒ Boolean
93 94 95 |
# File 'lib/uv-rays/buffered_tokenizer.rb', line 93 def empty? @input.empty? end |
#extract(data) ⇒ Object
Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract.
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/uv-rays/buffered_tokenizer.rb', line 42 def extract(data) @input << data # Extract token-delimited entities from the input string with the split command. # There's a bit of craftiness here with the -1 parameter. Normally split would # behave no differently regardless of if the token lies at the very end of the # input buffer or not (i.e. a literal edge case) Specifying -1 forces split to # return "" in this case, meaning that the last entry in the list represents a # new segment of data where the token has not been encountered = @input.split(@delimiter, -1) if @indicator @input = .pop entities = [] .each do |msg| res = msg.split(@indicator, -1) entities << res.last if res.length > 1 end else entities = @input = entities.pop end # Check to see if the buffer has exceeded capacity, if we're imposing a limit if @size_limit && @input.size > @size_limit if @indicator && @indicator.respond_to?(:length) # check for regex # save enough of the buffer that if one character of the indicator were # missing we would match on next extract (very much an edge case) and # best we can do with a full buffer. If we were one char short of a # delimiter it would be unfortunate @input = @input[-(@indicator.length - 1)..-1] else @input = '' end raise 'input buffer exceeded limit' if @verbose end return entities end |
#flush ⇒ String
Flush the contents of the input buffer, i.e. return the input buffer even though a token has not yet been encountered.
86 87 88 89 90 |
# File 'lib/uv-rays/buffered_tokenizer.rb', line 86 def flush buffer = @input @input = '' buffer end |