Class: Lingua::EN::Readability

Inherits:

Object

Object
Lingua::EN::Readability

show all

Defined in:: lib/lingua/en/readability.rb

Overview

The class Lingua::EN::Readability takes English text and analyses formal characteristic

Instance Attribute Summary collapse

#frequencies ⇒ Object readonly

Returns the value of attribute frequencies.
#paragraphs ⇒ Object readonly

Returns the value of attribute paragraphs.
#sentences ⇒ Object readonly

Returns the value of attribute sentences.
#text ⇒ Object readonly

Returns the value of attribute text.
#words ⇒ Object readonly

Returns the value of attribute words.

Instance Method Summary collapse

#flesch ⇒ Object

Flesch reading ease of the text sample.
#fog ⇒ Object

The Gunning Fog Index of the text sample.
#initialize(text) ⇒ Readability constructor

The constructor accepts the text to be analysed, and returns a report object which gives access to the.
#kincaid ⇒ Object

Flesch-Kincaid level of the text sample.
#num_chars ⇒ Object (also: #num_characters)

The number of characters in the sample.
#num_paragraphs ⇒ Object

The number of paragraphs in the sample.
#num_sentences ⇒ Object

The number of sentences in the sample.
#num_syllables ⇒ Object

The total number of syllables in the text sample.
#num_unique_words ⇒ Object

The number of different unique words used in the text sample.
#num_words ⇒ Object

The total number of words used in the sample.
#occurrences(word) ⇒ Object

The number of occurences of the word word in the text sample.
#percent_fog_complex_words ⇒ Object

The percentage of words that are defined as “complex” for the purpose of the Fog Index.
#report ⇒ Object

Return a nicely formatted report on the sample, showing most the useful statistics about the text sample.
#syllables_per_word ⇒ Object

The average number of syllables per word.
#unique_words ⇒ Object

An array containing each unique word used in the text sample.
#words_per_sentence ⇒ Object

The average number of words per sentence.

Constructor Details

#initialize(text) ⇒ `Readability`

The constructor accepts the text to be analysed, and returns a report object which gives access to the

# File 'lib/lingua/en/readability.rb', line 10

def initialize(text)
  @text                = text.dup
  @paragraphs          = Lingua::EN::Paragraph.paragraphs(self.text)
  @sentences           = Lingua::EN::Sentence.sentences(self.text)
  @words               = []
  @frequencies         = {}
  @frequencies.default = 0
  @syllables           = 0
  @complex_words       = 0
  count_words
end

Instance Attribute Details

#frequencies ⇒ `Object` (readonly)

Returns the value of attribute frequencies.



6
7
8

# File 'lib/lingua/en/readability.rb', line 6

def frequencies
  @frequencies
end

#paragraphs ⇒ `Object` (readonly)

Returns the value of attribute paragraphs.



6
7
8

# File 'lib/lingua/en/readability.rb', line 6

def paragraphs
  @paragraphs
end

#sentences ⇒ `Object` (readonly)

Returns the value of attribute sentences.



6
7
8

# File 'lib/lingua/en/readability.rb', line 6

def sentences
  @sentences
end

#text ⇒ `Object` (readonly)

Returns the value of attribute text.



6
7
8

# File 'lib/lingua/en/readability.rb', line 6

def text
  @text
end

#words ⇒ `Object` (readonly)

Returns the value of attribute words.



6
7
8

# File 'lib/lingua/en/readability.rb', line 6

def words
  @words
end

Instance Method Details

#flesch ⇒ `Object`

Flesch reading ease of the text sample. A higher score indicates text that is easier to read. The score is on a 100-point scale, and a score of 60-70 is regarded as optimal for ordinary text.



90
91
92

# File 'lib/lingua/en/readability.rb', line 90

def flesch
  206.835 - (1.015 * words_per_sentence) - (84.6 * syllables_per_word)
end

#fog ⇒ `Object`

The Gunning Fog Index of the text sample. The index indicates the number of years of formal education that a reader of average intelligence would need to comprehend the text. A higher score indicates harder text; a value of around 12 is indicated as ideal for ordinary text.



98
99
100

# File 'lib/lingua/en/readability.rb', line 98

def fog
  ( words_per_sentence +  percent_fog_complex_words ) * 0.4
end

#kincaid ⇒ `Object`

Flesch-Kincaid level of the text sample. This measure scores text based on the American school grade system; a score of 7.0 would indicate that the text is readable by a seventh grader. A score of 7.0 to 8.0 is regarded as optimal for ordinary text.



83
84
85

# File 'lib/lingua/en/readability.rb', line 83

def kincaid
  (11.8 * syllables_per_word) +  (0.39 * words_per_sentence) - 15.59
end

#num_chars ⇒ `Object` Also known as: num_characters

The number of characters in the sample.



35
36
37

# File 'lib/lingua/en/readability.rb', line 35

def num_chars
  text.length
end

#num_paragraphs ⇒ `Object`

The number of paragraphs in the sample. A paragraph is defined as a newline followed by one or more empty or whitespace-only lines.



24
25
26

# File 'lib/lingua/en/readability.rb', line 24

def num_paragraphs
  paragraphs.length
end

#num_sentences ⇒ `Object`

The number of sentences in the sample. The meaning of a “sentence” is defined by Lingua::EN::Sentence.



30
31
32

# File 'lib/lingua/en/readability.rb', line 30

def num_sentences
  sentences.length
end

#num_syllables ⇒ `Object`

The total number of syllables in the text sample. Just for completeness.



47
48
49

# File 'lib/lingua/en/readability.rb', line 47

def num_syllables
  @syllables
end

#num_unique_words ⇒ `Object`

The number of different unique words used in the text sample.



52
53
54

# File 'lib/lingua/en/readability.rb', line 52

def num_unique_words
  @frequencies.keys.length
end

#num_words ⇒ `Object`

The total number of words used in the sample. Numbers as digits are not counted.



42
43
44

# File 'lib/lingua/en/readability.rb', line 42

def num_words
  words.length
end

#occurrences(word) ⇒ `Object`

The number of occurences of the word word in the text sample.



62
63
64

# File 'lib/lingua/en/readability.rb', line 62

def occurrences(word)
  @frequencies[word]
end

#percent_fog_complex_words ⇒ `Object`

The percentage of words that are defined as “complex” for the purpose of the Fog Index. This is non-hyphenated words of three or more syllabes.



104
105
106

# File 'lib/lingua/en/readability.rb', line 104

def percent_fog_complex_words
  ( @complex_words.to_f / words.length.to_f ) * 100
end

#report ⇒ `Object`

Return a nicely formatted report on the sample, showing most the useful statistics about the text sample.

# File 'lib/lingua/en/readability.rb', line 110

def report
  sprintf "Number of paragraphs           %d \n" <<
  "Number of sentences            %d \n" <<
  "Number of words                %d \n" <<
  "Number of characters           %d \n\n" <<
  "Average words per sentence     %.2f \n" <<
  "Average syllables per word     %.2f \n\n" <<
  "Flesch score                   %2.2f \n" <<
  "Flesh-Kincaid grade level      %2.2f \n" <<
  "Fog Index                      %2.2f \n",
    num_paragraphs, num_sentences, num_words, num_characters,
    words_per_sentence, syllables_per_word,
    flesch, kincaid, fog
end

#syllables_per_word ⇒ `Object`

The average number of syllables per word. The syllable count is performed by Lingua::EN::Syllable, and so may not be completely accurate, especially if the Carnegie-Mellon Pronouncing Dictionary is not installed.



75
76
77

# File 'lib/lingua/en/readability.rb', line 75

def syllables_per_word
  @syllables.to_f / words.length.to_f
end

#unique_words ⇒ `Object`

An array containing each unique word used in the text sample.



57
58
59

# File 'lib/lingua/en/readability.rb', line 57

def unique_words
  @frequencies.keys
end

#words_per_sentence ⇒ `Object`

The average number of words per sentence.



67
68
69

# File 'lib/lingua/en/readability.rb', line 67

def words_per_sentence
  words.length.to_f / sentences.length.to_f
end

Class: Lingua::EN::Readability

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text) ⇒ Readability

Instance Attribute Details

#frequencies ⇒ Object (readonly)

#paragraphs ⇒ Object (readonly)

#sentences ⇒ Object (readonly)

#text ⇒ Object (readonly)

#words ⇒ Object (readonly)

Instance Method Details

#flesch ⇒ Object

#fog ⇒ Object

#kincaid ⇒ Object

#num_chars ⇒ Object Also known as: num_characters

#num_paragraphs ⇒ Object

#num_sentences ⇒ Object

#num_syllables ⇒ Object

#num_unique_words ⇒ Object

#num_words ⇒ Object

#occurrences(word) ⇒ Object

#percent_fog_complex_words ⇒ Object

#report ⇒ Object

#syllables_per_word ⇒ Object

#unique_words ⇒ Object

#words_per_sentence ⇒ Object