Class: Trie
Overview
Trie
Description
Implementation of a trie data structure, well-suited for storing and looking up strings or other sequences.
More specifically, this is an implementation of a Patricia trie (en.wikipedia.org/wiki/Patricia_trie).
Usage
require "trie"
# Create a new Trie and insert some values into it.
t = Trie.new
t.insert("the", 1)
t.insert("they", 2)
t.insert("they", 3)
t.insert("their", 4).insert("they're", 5)
# Search for an exact match of "they".
t.find("they").values # => [2, 3]
# Search for a prefix that will match all keys.
t2 = t.find_prefix("th")
puts t2.size # prints 5
# In the sub-Trie beginning with "th", search for the prefix "ey"
# (therefore, getting the three values with keys beginning with "they").
t2.find_prefix("ey").each_value {|v| puts v } # prints 2, 3, and 5
# Now search for "at" in the sub-Trie, which results in an empty Trie
# (as there are no keys beginning with "that").
puts t2.find_prefix("at").empty? # prints true
# Delete all values keyed by "they" (note that this must be performed on
# the root Trie rather than the one returned by Trie.find_prefix -- read
# the "Notes" section to find out why).
t.delete("they")
Notes
Keys are stored internally as Arrays. If you use Strings as keys they will be automatically converted, and when you use a method to access them later you’ll receive them as Arrays instead. For example:
t = Trie.new.insert("abc", 1).insert("def", 2)
t.keys # => [["a", "b", "c"], ["d", "e", "f"]]
t.keys.collect {|k| k.join } # => ["abc", "def"]
(I’m hesitant to add code that will return keys as Strings if the user has only passed in Strings so far.)
Empty nodes are compressed. The strings “row” and “ruby”, which would normally be stored as
''
/
r
/ \
o u
/ \
w b
\
y
are actually stored as
''
/
r
/ \
ow uby
Because of this implementation (and to allow Trie.find to be called on nodes returned by Trie.find that contain compressed elements), Trie.find and Trie.find_prefix will in some (most) cases return Trie objects that are not members of the root Trie. As a result, methods such as Trie.insert, Trie.delete, and Trie.clear should only be called on Trie objects that were directly returned by Trie.new.
Instance Method Summary collapse
-
#[](key) ⇒ Object
Return all of the items matching a key.
-
#clear ⇒ Object
Clear the trie.
-
#delete(key) ⇒ Object
Delete all values with a given key.
-
#delete_pair(key, value) ⇒ Object
Delete a (key, value) pair.
-
#delete_prefix(prefix) ⇒ Object
Delete all values keyed by a given prefix.
-
#delete_value(value) ⇒ Object
Delete all occurences of an value.
-
#each(prefix = []) ⇒ Object
Calls block once for each (key, value) pair in the Trie, passing the the key and value as parameters.
-
#each_key(prefix = []) {|prefix| ... } ⇒ Object
Calls block once for each key in the Trie, passing the key as a parameter.
-
#each_value ⇒ Object
Calls block once for each (key, value) pair in the Trie, passing the value as a parameter.
-
#empty? ⇒ Boolean
Does this Trie contain no values?.
-
#find(key) ⇒ Object
Get a new Trie object containing all values with the passed-in key.
-
#find_prefix(prefix) ⇒ Object
Get a new Trie object containing all values with keys that begin with the passed-in prefix.
-
#initialize ⇒ Trie
constructor
Create a new empty Trie.
-
#insert(key, value) ⇒ Object
Insert an value into this Trie, keyed by the passed-in key, which can be any sort of indexable object.
-
#keys ⇒ Object
Get an Array containing all keys in this Trie.
-
#num_nodes ⇒ Object
Get the number of nodes used to represent this Trie.
-
#size ⇒ Object
Get the number of values contained in this Trie.
-
#values ⇒ Object
Get an Array containing all values in this Trie.
Methods included from Enumerable
#*, #**, #average, #blank?, #combination, #counts, #cross_product, #foldl, #group_neighbours_by, #grouped_to_h, #groups, #map_recursively, #parallel_map, #permutation, #powerset, #reverse, #reverse_each, #rle, #rzip, #select_recursively, #skip, #sort_numerically, #split_after, #split_at, #split_before, #split_between, #sum, #to_iter, #uniq, #unzip
Methods included from Array::ToCSV
Constructor Details
Instance Method Details
#[](key) ⇒ Object
127 128 129 |
# File 'lib/epitools/trie.rb', line 127 def [](key) find(key).values end |
#clear ⇒ Object
138 139 140 141 142 143 144 |
# File 'lib/epitools/trie.rb', line 138 def clear @values.clear @children.clear @compressed_key.clear @compressed_values.clear self end |
#delete(key) ⇒ Object
153 154 155 156 157 158 159 160 161 162 163 164 165 |
# File 'lib/epitools/trie.rb', line 153 def delete(key) key = key.split('') if key.is_a?(String) if key.empty? @values.clear elsif key == @compressed_key @compressed_key.clear @compressed_values.clear elsif @children[key[0]] @children[key[0]].delete(key[1..-1]) @children.delete(key[0]) if @children[key[0]].empty? end self end |
#delete_pair(key, value) ⇒ Object
192 193 194 195 196 197 198 199 200 201 202 203 204 |
# File 'lib/epitools/trie.rb', line 192 def delete_pair(key, value) key = key.split('') if key.is_a?(String) if key.empty? @values.delete(value) elsif key == @compressed_key @compressed_values.delete(value) @compressed_key.clear elsif @children[key[0]] @children[key[0]].delete_pair(key[1..-1], value) @children.delete(key[0]) if @children[key[0]].empty? end self end |
#delete_prefix(prefix) ⇒ Object
213 214 215 216 217 218 219 220 221 222 |
# File 'lib/epitools/trie.rb', line 213 def delete_prefix(prefix) prefix = prefix.split('') if prefix.is_a?(String) if prefix.empty? or prefix == @compressed_key[0...prefix.size] clear elsif @children[prefix[0]] @children[prefix[0]].delete_prefix(prefix[1..-1]) @children.delete(prefix[0]) if @children[prefix[0]].empty? end self end |
#delete_value(value) ⇒ Object
174 175 176 177 178 179 180 181 182 183 |
# File 'lib/epitools/trie.rb', line 174 def delete_value(value) @compressed_values.delete(value) @compressed_key.clear if @compressed_values.empty? @values.delete(value) @children.each do |p, t| t.delete_value(value) @children.delete(p) if t.empty? end self end |
#each(prefix = []) ⇒ Object
Calls block once for each (key, value) pair in the Trie, passing the the key and value as parameters.
Example
t = Trie.new.insert("a", 1).insert("b", 2)
t.each {|k, v| puts "#{k.join()}: #{v} } # prints "a: 1" and "b: 2"
232 233 234 235 236 237 238 239 |
# File 'lib/epitools/trie.rb', line 232 def each(prefix=[]) @values.each {|v| yield(prefix, v) } @compressed_values.each {|v| yield(prefix.dup.concat(@compressed_key), v) } @children.each do |k, t| t.each(prefix.dup.push(k)) {|key, value| yield(key, value) } end self end |
#each_key(prefix = []) {|prefix| ... } ⇒ Object
249 250 251 252 253 254 255 256 |
# File 'lib/epitools/trie.rb', line 249 def each_key(prefix=[]) yield prefix if not @values.empty? yield prefix.dup.concat(@compressed_key) if not @compressed_values.empty? @children.each do |k, t| t.each_key(prefix.dup.push(k)) {|key| yield key } end self end |
#each_value ⇒ Object
266 267 268 269 270 271 |
# File 'lib/epitools/trie.rb', line 266 def each_value @compressed_values.each {|value| yield value } @values.each {|value| yield value } @children.each_value {|t| t.each_value {|value| yield value } } self end |
#empty? ⇒ Boolean
282 283 284 |
# File 'lib/epitools/trie.rb', line 282 def empty? size == 0 end |
#find(key) ⇒ Object
293 294 295 296 297 298 299 300 301 302 303 304 305 |
# File 'lib/epitools/trie.rb', line 293 def find(key) key = key.split('') if key.is_a?(String) if (key.empty? and @compressed_key.empty?) or key == @compressed_key trie = Trie.new @values.each {|v| trie.insert([], v) } @compressed_values.each {|v| trie.insert([], v) } trie elsif @children[key[0]] @children[key[0]].find(key[1..-1]) else Trie.new end end |
#find_prefix(prefix) ⇒ Object
317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 |
# File 'lib/epitools/trie.rb', line 317 def find_prefix(prefix) prefix = prefix.split('') if prefix.is_a?(String) if prefix.empty? self elsif prefix == @compressed_key[0...prefix.size] trie = Trie.new @compressed_values.each do |value| trie.insert(@compressed_key[prefix.size..-1], value) end trie elsif @children[prefix[0]] @children[prefix[0]].find_prefix(prefix[1..-1]) else Trie.new end end |
#insert(key, value) ⇒ Object
342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 |
# File 'lib/epitools/trie.rb', line 342 def insert(key, value) key = key.split('') if key.is_a?(String) if key != @compressed_key @compressed_values.each {|v| insert_in_child(@compressed_key, v) } @compressed_values.clear @compressed_key.clear end if key.empty? @values.add(value) elsif (@values.empty? and @children.empty?) or key == @compressed_key @compressed_key = key.dup @compressed_values.add(value) else insert_in_child(key, value) end self end |
#keys ⇒ Object
378 379 380 381 382 |
# File 'lib/epitools/trie.rb', line 378 def keys a = [] each_key {|key| a.push(key) } a end |
#num_nodes ⇒ Object
Get the number of nodes used to represent this Trie.
This is only useful for testing.
389 390 391 392 393 |
# File 'lib/epitools/trie.rb', line 389 def num_nodes node_count = 1 @children.each {|p, t| node_count += t.num_nodes } node_count end |
#size ⇒ Object
402 403 404 405 406 |
# File 'lib/epitools/trie.rb', line 402 def size child_count = 0 @children.each_value {|t| child_count += t.size } @compressed_values.size + @values.size + child_count end |
#values ⇒ Object
416 417 418 419 420 |
# File 'lib/epitools/trie.rb', line 416 def values a = [] each_value {|value| a.push(value) } a end |