Class: Guesslanguage

Inherits:
Object
  • Object
show all
Defined in:
lib/deplate/guesslanguage.rb

Overview

This is ported form/based on:

  • Title: Guess language of text using ZIP

  • Submitter: Dirk Holtwick

  • Last Updated: 2004/12/07

  • Version no: 1.2

  • Category: Algorithms

aspn.activestate.com/ASPN/Cookbook/Python/Recipe/355807 www.heise.de/newsticker/data/wst-28.01.02-003/ xxx.uni-augsburg.de/format/cond-mat/0108530

Instance Method Summary collapse

Constructor Details

#initializeGuesslanguage

Returns a new instance of Guesslanguage.



21
22
23
# File 'lib/deplate/guesslanguage.rb', line 21

def initialize
    @data = []
end

Instance Method Details

#guess(part) ⇒ Object



53
54
55
56
# File 'lib/deplate/guesslanguage.rb', line 53

def guess(part)
    diff, lang = guess_with_diff(part)
    lang
end

#guess_with_diff(part) ⇒ Object

<part> is a text that will be compared with the registered corpora and the function will return what you defined as <name> in the registration process.



40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/deplate/guesslanguage.rb', line 40

def guess_with_diff(part)
    what = nil
    diff = nil
    for name, corpus, ziplen in @data
        nz = zip(corpus + part).size - ziplen
        if diff.nil? or nz < diff
            what = name
            diff = nz
        end
    end
    return [diff.to_f/part.size, what]
end

#register(name, corpus) ⇒ Object

register a text as corpus for a language or author. <name> may also be a function or whatever you need to handle the result.



32
33
34
35
# File 'lib/deplate/guesslanguage.rb', line 32

def register(name, corpus)
    ziplen = zip(corpus).size
    @data << [name, corpus, ziplen]
end

#zip(text) ⇒ Object



25
26
27
# File 'lib/deplate/guesslanguage.rb', line 25

def zip(text)
    Zlib::Deflate.new.deflate(text, Zlib::FINISH)
end