Class: Guesslanguage
- Inherits:
-
Object
- Object
- Guesslanguage
- Defined in:
- lib/deplate/guesslanguage.rb
Overview
This is ported form/based on:
-
Title: Guess language of text using ZIP
-
Submitter: Dirk Holtwick
-
Last Updated: 2004/12/07
-
Version no: 1.2
-
Category: Algorithms
aspn.activestate.com/ASPN/Cookbook/Python/Recipe/355807 www.heise.de/newsticker/data/wst-28.01.02-003/ xxx.uni-augsburg.de/format/cond-mat/0108530
Instance Method Summary collapse
- #guess(part) ⇒ Object
-
#guess_with_diff(part) ⇒ Object
<part> is a text that will be compared with the registered corpora and the function will return what you defined as <name> in the registration process.
-
#initialize ⇒ Guesslanguage
constructor
A new instance of Guesslanguage.
-
#register(name, corpus) ⇒ Object
register a text as corpus for a language or author.
- #zip(text) ⇒ Object
Constructor Details
#initialize ⇒ Guesslanguage
Returns a new instance of Guesslanguage.
21 22 23 |
# File 'lib/deplate/guesslanguage.rb', line 21 def initialize @data = [] end |
Instance Method Details
#guess(part) ⇒ Object
53 54 55 56 |
# File 'lib/deplate/guesslanguage.rb', line 53 def guess(part) diff, lang = guess_with_diff(part) lang end |
#guess_with_diff(part) ⇒ Object
<part> is a text that will be compared with the registered corpora and the function will return what you defined as <name> in the registration process.
40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/deplate/guesslanguage.rb', line 40 def guess_with_diff(part) what = nil diff = nil for name, corpus, ziplen in @data nz = zip(corpus + part).size - ziplen if diff.nil? or nz < diff what = name diff = nz end end return [diff.to_f/part.size, what] end |
#register(name, corpus) ⇒ Object
register a text as corpus for a language or author. <name> may also be a function or whatever you need to handle the result.
32 33 34 35 |
# File 'lib/deplate/guesslanguage.rb', line 32 def register(name, corpus) ziplen = zip(corpus).size @data << [name, corpus, ziplen] end |
#zip(text) ⇒ Object
25 26 27 |
# File 'lib/deplate/guesslanguage.rb', line 25 def zip(text) Zlib::Deflate.new.deflate(text, Zlib::FINISH) end |