Class: Lingua::EN::Readability
- Inherits:
-
Object
- Object
- Lingua::EN::Readability
- Defined in:
- lib/lingua/en/readability.rb
Overview
The class Lingua::EN::Readability takes English text and analyses formal characteristic
Instance Attribute Summary collapse
-
#frequencies ⇒ Object
readonly
Returns the value of attribute frequencies.
-
#paragraphs ⇒ Object
readonly
Returns the value of attribute paragraphs.
-
#sentences ⇒ Object
readonly
Returns the value of attribute sentences.
-
#text ⇒ Object
readonly
Returns the value of attribute text.
-
#words ⇒ Object
readonly
Returns the value of attribute words.
Instance Method Summary collapse
-
#flesch ⇒ Object
Flesch reading ease of the text sample.
-
#fog ⇒ Object
The Gunning Fog Index of the text sample.
-
#initialize(text) ⇒ Readability
constructor
The constructor accepts the text to be analysed, and returns a report object which gives access to the.
-
#kincaid ⇒ Object
Flesch-Kincaid level of the text sample.
-
#num_chars ⇒ Object
(also: #num_characters)
The number of characters in the sample.
-
#num_paragraphs ⇒ Object
The number of paragraphs in the sample.
-
#num_sentences ⇒ Object
The number of sentences in the sample.
-
#num_syllables ⇒ Object
The total number of syllables in the text sample.
-
#num_unique_words ⇒ Object
The number of different unique words used in the text sample.
-
#num_words ⇒ Object
The total number of words used in the sample.
-
#occurrences(word) ⇒ Object
The number of occurences of the word
word
in the text sample. -
#percent_fog_complex_words ⇒ Object
The percentage of words that are defined as “complex” for the purpose of the Fog Index.
-
#report ⇒ Object
Return a nicely formatted report on the sample, showing most the useful statistics about the text sample.
-
#syllables_per_word ⇒ Object
The average number of syllables per word.
-
#unique_words ⇒ Object
An array containing each unique word used in the text sample.
-
#words_per_sentence ⇒ Object
The average number of words per sentence.
Constructor Details
#initialize(text) ⇒ Readability
The constructor accepts the text to be analysed, and returns a report object which gives access to the
10 11 12 13 14 15 16 17 18 19 20 |
# File 'lib/lingua/en/readability.rb', line 10 def initialize(text) @text = text.dup @paragraphs = Lingua::EN::Paragraph.paragraphs(self.text) @sentences = Lingua::EN::Sentence.sentences(self.text) @words = [] @frequencies = {} @frequencies.default = 0 @syllables = 0 @complex_words = 0 count_words end |
Instance Attribute Details
#frequencies ⇒ Object (readonly)
Returns the value of attribute frequencies.
6 7 8 |
# File 'lib/lingua/en/readability.rb', line 6 def frequencies @frequencies end |
#paragraphs ⇒ Object (readonly)
Returns the value of attribute paragraphs.
6 7 8 |
# File 'lib/lingua/en/readability.rb', line 6 def paragraphs @paragraphs end |
#sentences ⇒ Object (readonly)
Returns the value of attribute sentences.
6 7 8 |
# File 'lib/lingua/en/readability.rb', line 6 def sentences @sentences end |
#text ⇒ Object (readonly)
Returns the value of attribute text.
6 7 8 |
# File 'lib/lingua/en/readability.rb', line 6 def text @text end |
#words ⇒ Object (readonly)
Returns the value of attribute words.
6 7 8 |
# File 'lib/lingua/en/readability.rb', line 6 def words @words end |
Instance Method Details
#flesch ⇒ Object
Flesch reading ease of the text sample. A higher score indicates text that is easier to read. The score is on a 100-point scale, and a score of 60-70 is regarded as optimal for ordinary text.
90 91 92 |
# File 'lib/lingua/en/readability.rb', line 90 def flesch 206.835 - (1.015 * words_per_sentence) - (84.6 * syllables_per_word) end |
#fog ⇒ Object
The Gunning Fog Index of the text sample. The index indicates the number of years of formal education that a reader of average intelligence would need to comprehend the text. A higher score indicates harder text; a value of around 12 is indicated as ideal for ordinary text.
98 99 100 |
# File 'lib/lingua/en/readability.rb', line 98 def fog ( words_per_sentence + percent_fog_complex_words ) * 0.4 end |
#kincaid ⇒ Object
Flesch-Kincaid level of the text sample. This measure scores text based on the American school grade system; a score of 7.0 would indicate that the text is readable by a seventh grader. A score of 7.0 to 8.0 is regarded as optimal for ordinary text.
83 84 85 |
# File 'lib/lingua/en/readability.rb', line 83 def kincaid (11.8 * syllables_per_word) + (0.39 * words_per_sentence) - 15.59 end |
#num_chars ⇒ Object Also known as: num_characters
The number of characters in the sample.
35 36 37 |
# File 'lib/lingua/en/readability.rb', line 35 def num_chars text.length end |
#num_paragraphs ⇒ Object
The number of paragraphs in the sample. A paragraph is defined as a newline followed by one or more empty or whitespace-only lines.
24 25 26 |
# File 'lib/lingua/en/readability.rb', line 24 def num_paragraphs paragraphs.length end |
#num_sentences ⇒ Object
The number of sentences in the sample. The meaning of a “sentence” is defined by Lingua::EN::Sentence.
30 31 32 |
# File 'lib/lingua/en/readability.rb', line 30 def num_sentences sentences.length end |
#num_syllables ⇒ Object
The total number of syllables in the text sample. Just for completeness.
47 48 49 |
# File 'lib/lingua/en/readability.rb', line 47 def num_syllables @syllables end |
#num_unique_words ⇒ Object
The number of different unique words used in the text sample.
52 53 54 |
# File 'lib/lingua/en/readability.rb', line 52 def num_unique_words @frequencies.keys.length end |
#num_words ⇒ Object
The total number of words used in the sample. Numbers as digits are not counted.
42 43 44 |
# File 'lib/lingua/en/readability.rb', line 42 def num_words words.length end |
#occurrences(word) ⇒ Object
The number of occurences of the word word
in the text sample.
62 63 64 |
# File 'lib/lingua/en/readability.rb', line 62 def occurrences(word) @frequencies[word] end |
#percent_fog_complex_words ⇒ Object
The percentage of words that are defined as “complex” for the purpose of the Fog Index. This is non-hyphenated words of three or more syllabes.
104 105 106 |
# File 'lib/lingua/en/readability.rb', line 104 def percent_fog_complex_words ( @complex_words.to_f / words.length.to_f ) * 100 end |
#report ⇒ Object
Return a nicely formatted report on the sample, showing most the useful statistics about the text sample.
110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
# File 'lib/lingua/en/readability.rb', line 110 def report sprintf "Number of paragraphs %d \n" << "Number of sentences %d \n" << "Number of words %d \n" << "Number of characters %d \n\n" << "Average words per sentence %.2f \n" << "Average syllables per word %.2f \n\n" << "Flesch score %2.2f \n" << "Flesh-Kincaid grade level %2.2f \n" << "Fog Index %2.2f \n", num_paragraphs, num_sentences, num_words, num_characters, words_per_sentence, syllables_per_word, flesch, kincaid, fog end |
#syllables_per_word ⇒ Object
The average number of syllables per word. The syllable count is performed by Lingua::EN::Syllable, and so may not be completely accurate, especially if the Carnegie-Mellon Pronouncing Dictionary is not installed.
75 76 77 |
# File 'lib/lingua/en/readability.rb', line 75 def syllables_per_word @syllables.to_f / words.length.to_f end |
#unique_words ⇒ Object
An array containing each unique word used in the text sample.
57 58 59 |
# File 'lib/lingua/en/readability.rb', line 57 def unique_words @frequencies.keys end |
#words_per_sentence ⇒ Object
The average number of words per sentence.
67 68 69 |
# File 'lib/lingua/en/readability.rb', line 67 def words_per_sentence words.length.to_f / sentences.length.to_f end |