Class: Tiktoken::Encoding

Inherits:
Object
  • Object
show all
Defined in:
lib/tiktoken_ruby/encoding.rb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#nameObject (readonly)

Returns the value of attribute name.



4
5
6
# File 'lib/tiktoken_ruby/encoding.rb', line 4

def name
  @name
end

Class Method Details

.for_name(encoding) ⇒ Tiktoken::Encoding

This returns a new Tiktoken::Encoding instance for the requested encoding

Parameters:

  • encoding (Symbol)

    The name of the encoding to load

Returns:



9
10
11
# File 'lib/tiktoken_ruby/encoding.rb', line 9

def self.for_name(encoding)
  Tiktoken::Encoding.new(Tiktoken::BpeFactory.send(encoding.to_sym), encoding.to_sym)
end

.for_name_cached(encoding) ⇒ Tiktoken::Encoding

This returns a Tiktoken::Encoding instance for the requested encoding It will reuse an existing encoding if it’s already been loaded

Parameters:

  • encoding (Symbol)

    The name of the encoding to load

Returns:



17
18
19
20
# File 'lib/tiktoken_ruby/encoding.rb', line 17

def self.for_name_cached(encoding)
  @encodings ||= {}
  @encodings[encoding.to_sym] ||= Tiktoken::Encoding.for_name(encoding)
end

Instance Method Details

#decode(tokens) ⇒ String

Decodes the tokens back into text

Parameters:

  • tokens (Array<Integer>)

    The tokens to decode

Returns:

  • (String)

    The decoded text



42
43
44
# File 'lib/tiktoken_ruby/encoding.rb', line 42

def decode(tokens)
  @ext_base_bpe.decode(tokens)
end

#encode(text, allowed_special: []) ⇒ Array<Integer>

Encodes the text as a list of integer tokens. This encoding will treat special non text tokens as text unless they’re in the allowed_special array. It’s basically like the text was escaped

Parameters:

  • text (String)

    The text to encode

  • allowed_special (Array<String>) (defaults to: [])

    An array of special tokens to allow

Returns:

  • (Array<Integer>)

    The encoded tokens



35
36
37
# File 'lib/tiktoken_ruby/encoding.rb', line 35

def encode(text, allowed_special: [])
  @ext_base_bpe.encode(text, allowed_special)
end

#encode_ordinary(text) ⇒ Array<Integer>

Encodes the text as a list of integer tokens. This encoding will encode special non text tokens basically it’s unescaped

Parameters:

  • text (String)

    The text to encode

Returns:

  • (Array<Integer>)

    The encoded tokens



26
27
28
# File 'lib/tiktoken_ruby/encoding.rb', line 26

def encode_ordinary(text)
  @ext_base_bpe.encode_ordinary(text)
end