Class: UV::BufferedTokenizer

Inherits:

Object

Object
UV::BufferedTokenizer

show all

Defined in:: lib/uv-rays/buffered_tokenizer.rb

Instance Attribute Summary collapse

#delimiter ⇒ Object

Returns the value of attribute delimiter.
#indicator ⇒ Object

Returns the value of attribute indicator.
#size_limit ⇒ Object

Returns the value of attribute size_limit.
#verbose ⇒ Object

Returns the value of attribute verbose.

Instance Method Summary collapse

#empty? ⇒ Boolean
#extract(data) ⇒ Object

Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract.
#flush ⇒ String

Flush the contents of the input buffer, i.e.
#initialize(options) ⇒ BufferedTokenizer constructor

A new instance of BufferedTokenizer.

Constructor Details

#initialize(options) ⇒ `BufferedTokenizer`

Returns a new instance of BufferedTokenizer.

Parameters:

options (Hash)

Raises:

(ArgumentError)

# File 'lib/uv-rays/buffered_tokenizer.rb', line 22

def initialize(options)
    @delimiter  = options[:delimiter]
    @indicator  = options[:indicator]
    @size_limit = options[:size_limit]
    @verbose    = options[:verbose] if @size_limit

    raise ArgumentError, 'no delimiter provided' unless @delimiter

    @input = ''
end

Instance Attribute Details

#delimiter ⇒ `Object`

Returns the value of attribute delimiter.



19
20
21

# File 'lib/uv-rays/buffered_tokenizer.rb', line 19

def delimiter
  @delimiter
end

#indicator ⇒ `Object`

Returns the value of attribute indicator.



19
20
21

# File 'lib/uv-rays/buffered_tokenizer.rb', line 19

def indicator
  @indicator
end

#size_limit ⇒ `Object`

Returns the value of attribute size_limit.



19
20
21

# File 'lib/uv-rays/buffered_tokenizer.rb', line 19

def size_limit
  @size_limit
end

#verbose ⇒ `Object`

Returns the value of attribute verbose.



19
20
21

# File 'lib/uv-rays/buffered_tokenizer.rb', line 19

def verbose
  @verbose
end

Instance Method Details

#empty? ⇒ `Boolean`

Returns:

(Boolean)



93
94
95

# File 'lib/uv-rays/buffered_tokenizer.rb', line 93

def empty?
    @input.empty?
end

#extract(data) ⇒ `Object`

Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract.

Examples:


tokenizer.extract(data).
    map { |entity| Decode(entity) }.each { ... }

Parameters:

data (String)

# File 'lib/uv-rays/buffered_tokenizer.rb', line 42

def extract(data)
    @input << data

    # Extract token-delimited entities from the input string with the split command.
    # There's a bit of craftiness here with the -1 parameter.    Normally split would
    # behave no differently regardless of if the token lies at the very end of the
    # input buffer or not (i.e. a literal edge case)    Specifying -1 forces split to
    # return "" in this case, meaning that the last entry in the list represents a
    # new segment of data where the token has not been encountered
    messages = @input.split(@delimiter, -1)

    if @indicator
        @input = messages.pop
        entities = []
        messages.each do |msg|
            res = msg.split(@indicator, -1)
            entities << res.last if res.length > 1
        end
    else
        entities = messages
        @input = entities.pop
    end

    # Check to see if the buffer has exceeded capacity, if we're imposing a limit
    if @size_limit && @input.size > @size_limit
        if @indicator && @indicator.respond_to?(:length) # check for regex
            # save enough of the buffer that if one character of the indicator were
            # missing we would match on next extract (very much an edge case) and
            # best we can do with a full buffer. If we were one char short of a
            # delimiter it would be unfortunate
            @input = @input[-(@indicator.length - 1)..-1]
        else
            @input = ''
        end
        raise 'input buffer exceeded limit' if @verbose
    end

    return entities
end

#flush ⇒ `String`

Flush the contents of the input buffer, i.e. return the input buffer even though a token has not yet been encountered.

Returns:

(String)

# File 'lib/uv-rays/buffered_tokenizer.rb', line 86

def flush
    buffer = @input
    @input = ''
    buffer
end

Class: UV::BufferedTokenizer

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ BufferedTokenizer

Instance Attribute Details

#delimiter ⇒ Object

#indicator ⇒ Object

#size_limit ⇒ Object

#verbose ⇒ Object