Class: Tokkens::Tokens

Inherits:
Object
  • Object
show all
Defined in:
lib/tokkens/tokens.rb

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(offset: 1) ⇒ Tokens

Returns a new instance of Tokens.


12
13
14
15
16
17
18
# File 'lib/tokkens/tokens.rb', line 12

def initialize(offset: 1)
  # liblinear can't use offset 0, libsvm doesn't mind to start at one
  @tokens = {}
  @offset = offset
  @counter = offset
  @frozen = false
end

Instance Attribute Details

#offsetFixnum

Returns Number of first token.

Returns:

  • (Fixnum)

    Number of first token.


10
11
12
# File 'lib/tokkens/tokens.rb', line 10

def offset
  @offset
end

Instance Method Details

#find(i, prefix: nil) ⇒ String, NilClass

Return an token by number.

This class is optimized for retrieving by token, not by number.

Parameters:

  • i (String)

    number to return token for

  • prefix (String) (defaults to: nil)

    optional string to remove from beginning of token

Returns:

  • (String, NilClass)

    given token, or nil when not found


78
79
80
81
82
83
84
85
# File 'lib/tokkens/tokens.rb', line 78

def find(i, prefix: nil)
  @tokens.each do |s, data|
    if data[0] == i
      return (prefix && s.start_with?(prefix)) ? s[prefix.length..-1] : s
    end
  end
  nil
end

#freeze!Object

Stop assigning new numbers to token.

See Also:


23
24
25
# File 'lib/tokkens/tokens.rb', line 23

def freeze!
  @frozen = true
end

#frozen?Boolean

Returns Whether the tokens are frozen or not.

Returns:

  • (Boolean)

    Whether the tokens are frozen or not.

See Also:


37
38
39
# File 'lib/tokkens/tokens.rb', line 37

def frozen?
  @frozen
end

#get(s, **kwargs) ⇒ Fixnum, NilClass

Return a number for a new or existing token.

When the token was seen before, the same number is returned. If the token is first seen and this class isn't #frozen?, a new number is returned; else nil is returned.

Parameters:

  • s (String)

    token to return number for

  • kwargs (Hash)

    a customizable set of options

Options Hash (**kwargs):

  • :prefix (String)

    optional string to prepend to the token

Returns:

  • (Fixnum, NilClass)

    number for given token


66
67
68
69
# File 'lib/tokkens/tokens.rb', line 66

def get(s, **kwargs)
  return unless s and s.strip != ''
  @frozen ? retrieve(s, **kwargs) : upsert(s, **kwargs)
end

#indexesArray<Fixnum>

Return indexes for all of the current tokens.

Returns:

  • (Array<Fixnum>)

    All current token numbers.

See Also:


91
92
93
# File 'lib/tokkens/tokens.rb', line 91

def indexes
  @tokens.values.map(&:first)
end

#limit!(count: nil, occurence: nil) ⇒ Fixnum

Limit the number of tokens.

Parameters:

  • count (Fixnum) (defaults to: nil)

    Maximum number of tokens to retain

  • occurence (Fixnum) (defaults to: nil)

    Keep only tokens seen at least this many times

Returns:

  • (Fixnum)

    Number of tokens left


46
47
48
49
50
51
52
53
54
55
# File 'lib/tokkens/tokens.rb', line 46

def limit!(count: nil, occurence: nil)
  # @todo raise if frozen
  if occurence
    @tokens.delete_if {|name, data| data[1] < occurence }
  end
  if count
    @tokens = @tokens.to_a.sort_by {|a| -a[1][1] }[0..(count-1)].to_h
  end
  @tokens.length
end

#load(filename) ⇒ Object

Load tokens from file.

The tokens are frozen by default.

Parameters:

  • filename (String)

    Filename


100
101
102
103
104
105
106
107
108
109
# File 'lib/tokkens/tokens.rb', line 100

def load(filename)
  File.open(filename) do |f|
    f.each_line do |line|
      id, count, name = line.rstrip.split(/\s+/, 3)
      @tokens[name.strip] = [id.to_i, count]
    end
  end
  # safer
  freeze!
end

#save(filename) ⇒ Object

Save tokens to file.

Parameters:

  • filename (String)

    Filename


114
115
116
117
118
119
120
# File 'lib/tokkens/tokens.rb', line 114

def save(filename)
  File.open(filename, 'w') do |f|
    @tokens.each do |token, (index, count)|
      f.puts "#{index} #{count} #{token}"
    end
  end
end

#thaw!Object

Allow new tokens to be created.

See Also:


30
31
32
# File 'lib/tokkens/tokens.rb', line 30

def thaw!
  @frozen = false
end