Class: Lingua::EN::Readability

Inherits:
Object
  • Object
show all
Defined in:
lib/lingua/en/readability.rb

Overview

The class Lingua::EN::Readability takes English text and analyses formal characteristic

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text) ⇒ Readability

The constructor accepts the text to be analysed, and returns a report object which gives access to the



10
11
12
13
14
15
16
17
18
19
20
# File 'lib/lingua/en/readability.rb', line 10

def initialize(text)
  @text                = text.dup
  @paragraphs          = Lingua::EN::Paragraph.paragraphs(self.text)
  @sentences           = Lingua::EN::Sentence.sentences(self.text)
  @words               = []
  @frequencies         = {}
  @frequencies.default = 0
  @syllables           = 0
  @complex_words       = 0
  count_words
end

Instance Attribute Details

#frequenciesObject (readonly)

Returns the value of attribute frequencies.



6
7
8
# File 'lib/lingua/en/readability.rb', line 6

def frequencies
  @frequencies
end

#paragraphsObject (readonly)

Returns the value of attribute paragraphs.



6
7
8
# File 'lib/lingua/en/readability.rb', line 6

def paragraphs
  @paragraphs
end

#sentencesObject (readonly)

Returns the value of attribute sentences.



6
7
8
# File 'lib/lingua/en/readability.rb', line 6

def sentences
  @sentences
end

#textObject (readonly)

Returns the value of attribute text.



6
7
8
# File 'lib/lingua/en/readability.rb', line 6

def text
  @text
end

#wordsObject (readonly)

Returns the value of attribute words.



6
7
8
# File 'lib/lingua/en/readability.rb', line 6

def words
  @words
end

Instance Method Details

#fleschObject

Flesch reading ease of the text sample. A higher score indicates text that is easier to read. The score is on a 100-point scale, and a score of 60-70 is regarded as optimal for ordinary text.



90
91
92
# File 'lib/lingua/en/readability.rb', line 90

def flesch
  206.835 - (1.015 * words_per_sentence) - (84.6 * syllables_per_word)
end

#fogObject

The Gunning Fog Index of the text sample. The index indicates the number of years of formal education that a reader of average intelligence would need to comprehend the text. A higher score indicates harder text; a value of around 12 is indicated as ideal for ordinary text.



98
99
100
# File 'lib/lingua/en/readability.rb', line 98

def fog
  ( words_per_sentence +  percent_fog_complex_words ) * 0.4
end

#kincaidObject

Flesch-Kincaid level of the text sample. This measure scores text based on the American school grade system; a score of 7.0 would indicate that the text is readable by a seventh grader. A score of 7.0 to 8.0 is regarded as optimal for ordinary text.



83
84
85
# File 'lib/lingua/en/readability.rb', line 83

def kincaid
  (11.8 * syllables_per_word) +  (0.39 * words_per_sentence) - 15.59
end

#num_charsObject Also known as: num_characters

The number of characters in the sample.



35
36
37
# File 'lib/lingua/en/readability.rb', line 35

def num_chars
  text.length
end

#num_paragraphsObject

The number of paragraphs in the sample. A paragraph is defined as a newline followed by one or more empty or whitespace-only lines.



24
25
26
# File 'lib/lingua/en/readability.rb', line 24

def num_paragraphs
  paragraphs.length
end

#num_sentencesObject

The number of sentences in the sample. The meaning of a “sentence” is defined by Lingua::EN::Sentence.



30
31
32
# File 'lib/lingua/en/readability.rb', line 30

def num_sentences
  sentences.length
end

#num_syllablesObject

The total number of syllables in the text sample. Just for completeness.



47
48
49
# File 'lib/lingua/en/readability.rb', line 47

def num_syllables
  @syllables
end

#num_unique_wordsObject

The number of different unique words used in the text sample.



52
53
54
# File 'lib/lingua/en/readability.rb', line 52

def num_unique_words
  @frequencies.keys.length
end

#num_wordsObject

The total number of words used in the sample. Numbers as digits are not counted.



42
43
44
# File 'lib/lingua/en/readability.rb', line 42

def num_words
  words.length
end

#occurrences(word) ⇒ Object

The number of occurences of the word word in the text sample.



62
63
64
# File 'lib/lingua/en/readability.rb', line 62

def occurrences(word)
  @frequencies[word]
end

#percent_fog_complex_wordsObject

The percentage of words that are defined as “complex” for the purpose of the Fog Index. This is non-hyphenated words of three or more syllabes.



104
105
106
# File 'lib/lingua/en/readability.rb', line 104

def percent_fog_complex_words
  ( @complex_words.to_f / words.length.to_f ) * 100
end

#reportObject

Return a nicely formatted report on the sample, showing most the useful statistics about the text sample.



110
111
112
113
114
115
116
117
118
119
120
121
122
123
# File 'lib/lingua/en/readability.rb', line 110

def report
  sprintf "Number of paragraphs           %d \n" <<
  "Number of sentences            %d \n" <<
  "Number of words                %d \n" <<
  "Number of characters           %d \n\n" <<
  "Average words per sentence     %.2f \n" <<
  "Average syllables per word     %.2f \n\n" <<
  "Flesch score                   %2.2f \n" <<
  "Flesh-Kincaid grade level      %2.2f \n" <<
  "Fog Index                      %2.2f \n",
    num_paragraphs, num_sentences, num_words, num_characters,
    words_per_sentence, syllables_per_word,
    flesch, kincaid, fog
end

#syllables_per_wordObject

The average number of syllables per word. The syllable count is performed by Lingua::EN::Syllable, and so may not be completely accurate, especially if the Carnegie-Mellon Pronouncing Dictionary is not installed.



75
76
77
# File 'lib/lingua/en/readability.rb', line 75

def syllables_per_word
  @syllables.to_f / words.length.to_f
end

#unique_wordsObject

An array containing each unique word used in the text sample.



57
58
59
# File 'lib/lingua/en/readability.rb', line 57

def unique_words
  @frequencies.keys
end

#words_per_sentenceObject

The average number of words per sentence.



67
68
69
# File 'lib/lingua/en/readability.rb', line 67

def words_per_sentence
  words.length.to_f / sentences.length.to_f
end