Class: Lingua::EN::Readability
- Inherits:
-
Object
- Object
- Lingua::EN::Readability
- Defined in:
- lib/lingua/en/readability.rb
Overview
The class Lingua::EN::Readability takes English text and analyses formal characteristic
Instance Attribute Summary collapse
-
#frequencies ⇒ Object
readonly
Returns the value of attribute frequencies.
-
#paragraphs ⇒ Object
readonly
Returns the value of attribute paragraphs.
-
#sentences ⇒ Object
readonly
Returns the value of attribute sentences.
-
#text ⇒ Object
readonly
Returns the value of attribute text.
-
#words ⇒ Object
readonly
Returns the value of attribute words.
Instance Method Summary collapse
-
#flesch ⇒ Object
Flesch reading ease of the text sample.
-
#fog ⇒ Object
The Gunning Fog Index of the text sample.
-
#initialize(text) ⇒ Readability
constructor
The constructor accepts the text to be analysed, and returns a report object which gives access to the.
-
#kincaid ⇒ Object
Flesch-Kincaid level of the text sample.
-
#num_chars ⇒ Object
(also: #num_characters)
The number of characters in the sample.
-
#num_paragraphs ⇒ Object
The number of paragraphs in the sample.
-
#num_sentences ⇒ Object
The number of sentences in the sample.
-
#num_syllables ⇒ Object
The total number of syllables in the text sample.
-
#num_unique_words ⇒ Object
The number of different unique words used in the text sample.
-
#num_words ⇒ Object
The total number of words used in the sample.
-
#occurrences(word) ⇒ Object
The number of occurences of the word
word
in the text sample. -
#percent_fog_complex_words ⇒ Object
The percentage of words that are defined as “complex” for the purpose of the Fog Index.
-
#report ⇒ Object
Return a nicely formatted report on the sample, showing most the useful statistics about the text sample.
-
#syllables_per_word ⇒ Object
The average number of syllables per word.
-
#unique_words ⇒ Object
An array containing each unique word used in the text sample.
-
#words_per_sentence ⇒ Object
The average number of words per sentence.
Constructor Details
#initialize(text) ⇒ Readability
The constructor accepts the text to be analysed, and returns a report object which gives access to the
14 15 16 17 18 19 20 21 22 23 24 |
# File 'lib/lingua/en/readability.rb', line 14 def initialize(text) @text = text.dup @paragraphs = text.split(/\n\s*\n\s*/) @sentences = Lingua::EN::Sentence.sentences(@text) @words = [] @frequencies = {} @frequencies.default = 0 @syllables = 0 @complex_words = 0 count_words end |
Instance Attribute Details
#frequencies ⇒ Object (readonly)
Returns the value of attribute frequencies.
10 11 12 |
# File 'lib/lingua/en/readability.rb', line 10 def frequencies @frequencies end |
#paragraphs ⇒ Object (readonly)
Returns the value of attribute paragraphs.
10 11 12 |
# File 'lib/lingua/en/readability.rb', line 10 def paragraphs @paragraphs end |
#sentences ⇒ Object (readonly)
Returns the value of attribute sentences.
10 11 12 |
# File 'lib/lingua/en/readability.rb', line 10 def sentences @sentences end |
#text ⇒ Object (readonly)
Returns the value of attribute text.
10 11 12 |
# File 'lib/lingua/en/readability.rb', line 10 def text @text end |
#words ⇒ Object (readonly)
Returns the value of attribute words.
10 11 12 |
# File 'lib/lingua/en/readability.rb', line 10 def words @words end |
Instance Method Details
#flesch ⇒ Object
Flesch reading ease of the text sample. A higher score indicates text that is easier to read. The score is on a 100-point scale, and a score of 60-70 is regarded as optimal for ordinary text.
93 94 95 |
# File 'lib/lingua/en/readability.rb', line 93 def flesch 206.835 - (1.015 * words_per_sentence) - (84.6 * syllables_per_word) end |
#fog ⇒ Object
The Gunning Fog Index of the text sample. The index indicates the number of years of formal education that a reader of average intelligence would need to comprehend the text. A higher score indicates harder text; a value of around 12 is indicated as ideal for ordinary text.
101 102 103 |
# File 'lib/lingua/en/readability.rb', line 101 def fog ( words_per_sentence + percent_fog_complex_words ) * 0.4 end |
#kincaid ⇒ Object
Flesch-Kincaid level of the text sample. This measure scores text based on the American school grade system; a score of 7.0 would indicate that the text is readable by a seventh grader. A score of 7.0 to 8.0 is regarded as optimal for ordinary text.
86 87 88 |
# File 'lib/lingua/en/readability.rb', line 86 def kincaid (11.8 * syllables_per_word) + (0.39 * words_per_sentence) - 15.59 end |
#num_chars ⇒ Object Also known as: num_characters
The number of characters in the sample.
39 40 41 |
# File 'lib/lingua/en/readability.rb', line 39 def num_chars @text.length end |
#num_paragraphs ⇒ Object
The number of paragraphs in the sample. A paragraph is defined as a newline followed by one or more empty or whitespace-only lines.
28 29 30 |
# File 'lib/lingua/en/readability.rb', line 28 def num_paragraphs @paragraphs.length end |
#num_sentences ⇒ Object
The number of sentences in the sample. The meaning of a “sentence” is defined by Lingua::EN::Sentence.
34 35 36 |
# File 'lib/lingua/en/readability.rb', line 34 def num_sentences @sentences.length end |
#num_syllables ⇒ Object
The total number of syllables in the text sample. Just for completeness.
51 52 53 |
# File 'lib/lingua/en/readability.rb', line 51 def num_syllables @syllables end |
#num_unique_words ⇒ Object
The number of different unique words used in the text sample.
56 57 58 |
# File 'lib/lingua/en/readability.rb', line 56 def num_unique_words @frequencies.keys.length end |
#num_words ⇒ Object
The total number of words used in the sample. Numbers as digits are not counted.
46 47 48 |
# File 'lib/lingua/en/readability.rb', line 46 def num_words @words.length end |
#occurrences(word) ⇒ Object
The number of occurences of the word word
in the text sample.
66 67 68 |
# File 'lib/lingua/en/readability.rb', line 66 def occurrences(word) @frequencies[word] end |
#percent_fog_complex_words ⇒ Object
The percentage of words that are defined as “complex” for the purpose of the Fog Index. This is non-hyphenated words of three or more syllabes.
107 108 109 |
# File 'lib/lingua/en/readability.rb', line 107 def percent_fog_complex_words ( @complex_words.to_f / @words.length.to_f ) * 100 end |
#report ⇒ Object
Return a nicely formatted report on the sample, showing most the useful statistics about the text sample.
113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
# File 'lib/lingua/en/readability.rb', line 113 def report sprintf "Number of paragraphs %d \n" << "Number of sentences %d \n" << "Number of words %d \n" << "Number of characters %d \n\n" << "Average words per sentence %.2f \n" << "Average syllables per word %.2f \n\n" << "Flesch score %2.2f \n" << "Flesh-Kincaid grade level %2.2f \n" << "Fog Index %2.2f \n", num_paragraphs, num_sentences, num_words, num_characters, words_per_sentence, syllables_per_word, flesch, kincaid, fog end |
#syllables_per_word ⇒ Object
The average number of syllables per word. The syllable count is performed by Lingua::EN::Syllable, and so may not be completely accurate, especially if the Carnegie-Mellon Pronouncing Dictionary is not installed.
78 79 80 |
# File 'lib/lingua/en/readability.rb', line 78 def syllables_per_word @syllables.to_f / @words.length.to_f end |
#unique_words ⇒ Object
An array containing each unique word used in the text sample.
61 62 63 |
# File 'lib/lingua/en/readability.rb', line 61 def unique_words @frequencies.keys end |
#words_per_sentence ⇒ Object
The average number of words per sentence.
71 72 73 |
# File 'lib/lingua/en/readability.rb', line 71 def words_per_sentence @words.length.to_f / @sentences.length.to_f end |