Class: Lingua::EN::Readability

Inherits:
Object
  • Object
show all
Defined in:
lib/lingua/en/readability.rb

Overview

The class Lingua::EN::Readability takes English text and analyses formal characteristic

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text) ⇒ Readability

The constructor accepts the text to be analysed, and returns a report object which gives access to the



14
15
16
17
18
19
20
21
22
23
24
# File 'lib/lingua/en/readability.rb', line 14

def initialize(text)
	@text                = text.dup
	@paragraphs          = text.split(/\n\s*\n\s*/)
	@sentences           = Lingua::EN::Sentence.sentences(@text)
	@words               = []
	@frequencies         = {}
	@frequencies.default = 0
	@syllables           = 0
	@complex_words       = 0
	count_words
end

Instance Attribute Details

#frequenciesObject (readonly)

Returns the value of attribute frequencies.



10
11
12
# File 'lib/lingua/en/readability.rb', line 10

def frequencies
  @frequencies
end

#paragraphsObject (readonly)

Returns the value of attribute paragraphs.



10
11
12
# File 'lib/lingua/en/readability.rb', line 10

def paragraphs
  @paragraphs
end

#sentencesObject (readonly)

Returns the value of attribute sentences.



10
11
12
# File 'lib/lingua/en/readability.rb', line 10

def sentences
  @sentences
end

#textObject (readonly)

Returns the value of attribute text.



10
11
12
# File 'lib/lingua/en/readability.rb', line 10

def text
  @text
end

#wordsObject (readonly)

Returns the value of attribute words.



10
11
12
# File 'lib/lingua/en/readability.rb', line 10

def words
  @words
end

Instance Method Details

#fleschObject

Flesch reading ease of the text sample. A higher score indicates text that is easier to read. The score is on a 100-point scale, and a score of 60-70 is regarded as optimal for ordinary text.



93
94
95
# File 'lib/lingua/en/readability.rb', line 93

def flesch
	206.835 - (1.015 * words_per_sentence) - (84.6 * syllables_per_word)
end

#fogObject

The Gunning Fog Index of the text sample. The index indicates the number of years of formal education that a reader of average intelligence would need to comprehend the text. A higher score indicates harder text; a value of around 12 is indicated as ideal for ordinary text.



101
102
103
# File 'lib/lingua/en/readability.rb', line 101

def fog
  ( words_per_sentence +  percent_fog_complex_words ) * 0.4
end

#kincaidObject

Flesch-Kincaid level of the text sample. This measure scores text based on the American school grade system; a score of 7.0 would indicate that the text is readable by a seventh grader. A score of 7.0 to 8.0 is regarded as optimal for ordinary text.



86
87
88
# File 'lib/lingua/en/readability.rb', line 86

def kincaid
	(11.8 * syllables_per_word) +  (0.39 * words_per_sentence) - 15.59
end

#num_charsObject Also known as: num_characters

The number of characters in the sample.



39
40
41
# File 'lib/lingua/en/readability.rb', line 39

def num_chars
	@text.length
end

#num_paragraphsObject

The number of paragraphs in the sample. A paragraph is defined as a newline followed by one or more empty or whitespace-only lines.



28
29
30
# File 'lib/lingua/en/readability.rb', line 28

def num_paragraphs
	@paragraphs.length
end

#num_sentencesObject

The number of sentences in the sample. The meaning of a “sentence” is defined by Lingua::EN::Sentence.



34
35
36
# File 'lib/lingua/en/readability.rb', line 34

def num_sentences
	@sentences.length
end

#num_syllablesObject

The total number of syllables in the text sample. Just for completeness.



51
52
53
# File 'lib/lingua/en/readability.rb', line 51

def num_syllables
	@syllables
end

#num_unique_wordsObject

The number of different unique words used in the text sample.



56
57
58
# File 'lib/lingua/en/readability.rb', line 56

def num_unique_words
	@frequencies.keys.length
end

#num_wordsObject

The total number of words used in the sample. Numbers as digits are not counted.



46
47
48
# File 'lib/lingua/en/readability.rb', line 46

def num_words
	@words.length
end

#occurrences(word) ⇒ Object

The number of occurences of the word word in the text sample.



66
67
68
# File 'lib/lingua/en/readability.rb', line 66

def occurrences(word)
	@frequencies[word]
end

#percent_fog_complex_wordsObject

The percentage of words that are defined as “complex” for the purpose of the Fog Index. This is non-hyphenated words of three or more syllabes.



107
108
109
# File 'lib/lingua/en/readability.rb', line 107

def percent_fog_complex_words
	( @complex_words.to_f / @words.length.to_f ) * 100
end

#reportObject

Return a nicely formatted report on the sample, showing most the useful statistics about the text sample.



113
114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'lib/lingua/en/readability.rb', line 113

def report
	sprintf "Number of paragraphs           %d \n" <<
			"Number of sentences            %d \n" <<
			"Number of words                %d \n" <<
			"Number of characters           %d \n\n" <<
			"Average words per sentence     %.2f \n" <<
			"Average syllables per word     %.2f \n\n" <<
			"Flesch score                   %2.2f \n" <<
			"Flesh-Kincaid grade level      %2.2f \n" <<
			"Fog Index                      %2.2f \n",
			num_paragraphs, num_sentences, num_words, num_characters,
			words_per_sentence, syllables_per_word,
			flesch, kincaid, fog
end

#syllables_per_wordObject

The average number of syllables per word. The syllable count is performed by Lingua::EN::Syllable, and so may not be completely accurate, especially if the Carnegie-Mellon Pronouncing Dictionary is not installed.



78
79
80
# File 'lib/lingua/en/readability.rb', line 78

def syllables_per_word
	@syllables.to_f / @words.length.to_f
end

#unique_wordsObject

An array containing each unique word used in the text sample.



61
62
63
# File 'lib/lingua/en/readability.rb', line 61

def unique_words
	@frequencies.keys
end

#words_per_sentenceObject

The average number of words per sentence.



71
72
73
# File 'lib/lingua/en/readability.rb', line 71

def words_per_sentence
	@words.length.to_f / @sentences.length.to_f
end