Module: Linguistics::EN::TitleCase
- Defined in:
- lib/linguistics/en/titlecase.rb
Overview
Methods for capitalizing a sentence as a title, nouns as proper nouns, and for turning a sentence into its equivalent CamelCaseSentence and vice-versa. It’s part of the English-language Linguistics module.
Constant Summary collapse
- ARTICLES =
Exceptions: Indefinite articles
%w[a and the]
- SHORT_PREPOSITIONS =
Exceptions: Prepositions shorter than five letters
["amid", "at", "but", "by", "down", "for", "from", "in", "into", "like", "near", "of", "off", "on", "onto", "out", "over", "past", "save", "with", "till", "to", "unto", "up", "upon", "with"]
- COORD_CONJUNCTIONS =
Exceptions: Coordinating conjunctions
%w[and but as]
- TITLE_CASE_EXCEPTIONS =
Titlecase exceptions: “In titles, capitalize the first word, the last word, and all words in between except articles (a, an, and the), prepositions under five letters (in, of, to), and coordinating conjunctions (and, but). These rules apply to titles of long, short, and partial works as well as your own papers” (Anson, Schwegler, and Muth. The Longman Writer’s Companion 240).
ARTICLES | SHORT_PREPOSITIONS | COORD_CONJUNCTIONS
- PROPER_NOUN_EXCEPTIONS =
The words which don’t get capitalized in a compound proper noun
%w{and the of}
Instance Method Summary collapse
-
#proper_noun ⇒ Object
Returns the proper noun form of the inflected object by capitalizing most of the words.
-
#titlecase ⇒ Object
Returns the inflected object as a title-cased String.
-
#to_camel_case ⇒ Object
Turns an English language
string
into a CamelCase word. -
#un_camel_case ⇒ Object
Turns a camel-case
string
(“camelCaseToEnglish”) to plain English (“camel case to english”).
Instance Method Details
#proper_noun ⇒ Object
Returns the proper noun form of the inflected object by capitalizing most of the words.
Some examples:
"bosnia and herzegovina".en.proper_noun
# => "Bosnia and Herzegovina"
"macedonia, the former yugoslav republic of".en.proper_noun
# => "Macedonia, the Former Yugoslav Republic of"
"virgin islands, u.s.".en.proper_noun
# => "Virgin Islands, U.S."
110 111 112 113 114 115 116 117 |
# File 'lib/linguistics/en/titlecase.rb', line 110 def proper_noun return self.to_s.split(/([ .]+)/).collect do |word| next word unless /^[a-z]/.match( word ) && ! (PROPER_NOUN_EXCEPTIONS.include?( word )) word.capitalize end.join end |
#titlecase ⇒ Object
Returns the inflected object as a title-cased String.
Some examples:
"a portrait of the artist as a young man".en.titlecase
# => "A Portrait of the Artist as a Young Man"
"a seven-sided romance".en.titlecase
# => "A Seven-Sided Romance"
"the curious incident of the dog in the night-time".en.titlecase
# => "The Curious Incident of the Dog in the Night-Time"
"the rats of n.i.m.h.".en.titlecase
# => "The Rats of N.I.M.H."
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
# File 'lib/linguistics/en/titlecase.rb', line 68 def titlecase # Split on word-boundaries words = self.to_s.split( /\b/ ) # Always capitalize the first and last words words.first.capitalize! words.last.capitalize! # Now scan the rest of the tokens, skipping non-words and capitalization # exceptions. words.each_with_index do |word, i| # Non-words next unless /^\w+$/.match( word ) # Skip exception-words next if TITLE_CASE_EXCEPTIONS.include?( word ) # Skip second parts of contractions next if words[i - 1] == "'" && /\w/.match( words[i - 2] ) # Have to do it this way instead of capitalize! because that method # also downcases all other letters. word.gsub!( /^(\w)(.*)/ ) { $1.upcase + $2 } end return words.join end |
#to_camel_case ⇒ Object
Turns an English language string
into a CamelCase word.
48 49 50 |
# File 'lib/linguistics/en/titlecase.rb', line 48 def to_camel_case self.to_s.gsub( /\s+([a-z])/i ) { $1.upcase } end |
#un_camel_case ⇒ Object
Turns a camel-case string
(“camelCaseToEnglish”) to plain English (“camel case to english”). Each word is decapitalized.
40 41 42 43 44 |
# File 'lib/linguistics/en/titlecase.rb', line 40 def un_camel_case self.to_s. gsub( /([A-Z])([A-Z])/ ) { "#$1 #$2" }. gsub( /([a-z])([A-Z])/ ) { "#$1 #$2" }.downcase end |