Module: FatCore::String
- Included in:
- String
- Defined in:
- lib/fat_core/string.rb
Defined Under Namespace
Modules: ClassMethods
Transforming collapse
- UPPERS =
('A'..'Z').to_a
Matching collapse
- REGEXP_META_CHARACTERS =
"\\$()*+.<>?[]^{|}".chars
Transforming collapse
-
#as_date ⇒ Date
Convert a string representing a date with only digits, hyphens, or slashes to a Date.
-
#as_string ⇒ String
Return self unmodified.
-
#as_sym ⇒ Symbol
Convert to a lower-case symbol with all white space converted to a single '_' and all non-alphanumerics deleted, such that the string will work as an unquoted Symbol.
-
#clean ⇒ String
Remove leading and trailing white space and compress internal runs of white space to a single space.
-
#entitle ⇒ String
Return self capitalized according to the conventions for capitalizing titles of books or articles.
-
#tex_quote ⇒ String
Return self with special TeX characters replaced with control-sequences that output the literal value of the special characters instead.
-
#wrap(width = 70, hang = 0) ⇒ String
Return a string wrapped to
width
characters with lines following the first indented byhang
characters.
Matching collapse
-
#distance(other) ⇒ Integer
Return the Damerau-Levenshtein distance between self an another string using a transposition block size of 1 and quitting if a max distance of 10 is reached.
-
#fuzzy_match(matcher) ⇒ String?
Return the matched portion of self, minus punctuation characters, if self matches the string
matcher
using the following notion of matching:. -
#matches_with(matcher) ⇒ nil, String
Test whether self matches the
matcher
treatingmatcher
as a case-insensitive regular expression if it is of the form '/.../' or as a string to #fuzzy_match against otherwise. -
#to_regexp ⇒ Regexp
Convert a string of the form '/.../Iixm' to a regular expression.
Numbers collapse
-
#commas(places = nil) ⇒ String
If the string is a valid number, return a string that adds grouping commas to the whole number part; otherwise, return self.
-
#number? ⇒ Boolean
Return whether self is convertible into a valid number.
Instance Method Details
permalink #as_date ⇒ Date
Convert a string representing a date with only digits, hyphens, or slashes to a Date.
110 111 112 |
# File 'lib/fat_core/string.rb', line 110 def as_date ::Date.new($1.to_i, $2.to_i, $3.to_i) if self =~ %r{(\d\d\d\d)[-/]?(\d\d?)[-/]?(\d\d?)} end |
permalink #as_string ⇒ String
Return self unmodified. This method is here so to comply with the API of Symbol#as_string so that it can be applied to a variable that is either a String or a Symbol.
43 44 45 |
# File 'lib/fat_core/string.rb', line 43 def as_string self end |
permalink #as_sym ⇒ Symbol
Convert to a lower-case symbol with all white space converted to a single '_' and all non-alphanumerics deleted, such that the string will work as an unquoted Symbol.
31 32 33 34 35 36 |
# File 'lib/fat_core/string.rb', line 31 def as_sym clean .gsub(/\s+/, '_') .gsub(/[^_A-Za-z0-9]/, '') .downcase.to_sym end |
permalink #clean ⇒ String
Remove leading and trailing white space and compress internal runs of white space to a single space.
18 19 20 |
# File 'lib/fat_core/string.rb', line 18 def clean strip.squeeze(' ') end |
permalink #commas(places = nil) ⇒ String
If the string is a valid number, return a string that adds grouping commas to the whole number part; otherwise, return self. Round the number to the given number places after the decimal if places is positive; round to the left of the decimal if places is negative. Pad with zeroes on the right for positive places, on the left for negative places.
347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 |
# File 'lib/fat_core/string.rb', line 347 def commas(places = nil) numeric_re = /\A([-+])?([\d_]*)((\.)?([\d_]*))?([eE][+-]?[\d_]+)?\z/ return self unless clean =~ numeric_re sig = $1 || '' whole = $2 ? $2.delete('_') : '' frac = $5 || '' exp = $6 || '' # Round frac or whole if places given. For positve places, round fraction # to that many places; for negative, round the whole-number part to the # absolute value of places left of the decimal. if places new_frac = frac.dup new_whole = whole.dup if places.zero? new_frac = '' elsif places.positive? && places < frac.length new_frac = frac[0...places - 1] new_frac[places - 1] = if frac[places].to_i >= 5 (frac[places - 1].to_i + 1).to_s else frac[places - 1] end elsif places >= frac.length new_frac = frac + '0' * (places - frac.length) else # Negative places, round whole to places.abs from decimal places = places.abs if places > whole.length lead = whole[0].to_i >= 5 ? '1' : '0' new_whole[0] = places == whole.length + 1 ? lead : '0' new_whole[1..-1] = '0' * (places - 1) new_frac = '' elsif places > 1 target = whole.length - places new_whole[target] = if whole[target + 1].to_i >= 5 (whole[target].to_i + 1).to_s else whole[target] end new_whole[target + 1..whole.length - 1] = '0' * (whole.length - target - 1) new_frac = '' else # Rounding to 1 place, therefore, no rounding new_frac = '' new_whole = whole end end frac = new_frac whole = new_whole end # Place the commas in the whole part only whole = whole.reverse whole.gsub!(/([0-9]{3})/, '\\1,') whole.gsub!(/,$/, '') whole.reverse! # Reassemble if frac.blank? sig + whole + exp else sig + whole + '.' + frac + exp end end |
permalink #distance(other) ⇒ Integer
Return the Damerau-Levenshtein distance between self an another string using a transposition block size of 1 and quitting if a max distance of 10 is reached.
216 217 218 |
# File 'lib/fat_core/string.rb', line 216 def distance(other) DamerauLevenshtein.distance(self, other.to_s, 1, 10) end |
permalink #entitle ⇒ String
Return self capitalized according to the conventions for capitalizing titles of books or articles. Tries to follow the rules of the University of Chicago's A Manual of Style, Section 7.123, except to the extent that doing so requires knowing the parts of speech of words in the title. Also tries to use sensible capitalization for things such as postal address abbreviations, like P.O Box, Ave., Cir., etc. Considers all-consonant words of 3 or more characters as acronyms to be kept all uppercase, e.g., ddt => DDT, and words that are all uppercase in the input are kept that way, e.g. IBM stays IBM. Thus, if the source string is all uppercase, you should lowercase the whole string before using #entitle, otherwise is will not have the intended effect.
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
# File 'lib/fat_core/string.rb', line 148 def entitle little_words = %w[a an the at for up and but or nor in on under of from as by to] preserve_acronyms = !all_upper? newwords = [] capitalize_next = false words = split(/\s+/) last_k = words.size - 1 words.each_with_index do |w, k| first = (k == 0) last = (k == last_k) if w =~ %r{c/o}i # Care of newwords.push('c/o') elsif w =~ /^p\.?o\.?$/i # Post office newwords.push('P.O.') elsif w =~ /^[0-9]+(st|nd|rd|th)$/i # Ordinals newwords.push(w.downcase) elsif w =~ /^(cr|dr|st|rd|ave|pk|cir)$/i # Common abbrs to capitalize newwords.push(w.capitalize) elsif w =~ /^(us|ne|se|rr)$/i # Common 2-letter abbrs to upcase newwords.push(w.upcase) elsif w =~ /^[0-9].*$/i # Other runs starting with numbers, # like 3-A newwords.push(w.upcase) elsif w =~ /^(N|S|E|W|NE|NW|SE|SW)$/i # Compass directions all caps newwords.push(w.upcase) elsif w =~ /^[^aeiouy]*$/i && w.size > 2 # All consonants and at least 3 chars, probably abbr newwords.push(w.upcase) elsif w =~ /^[A-Z0-9]+\z/ && preserve_acronyms # All uppercase and numbers, keep as is newwords.push(w) elsif w =~ /^(\w+)-(\w+)$/i # Hyphenated double word newwords.push($1.capitalize + '-' + $2.capitalize) elsif capitalize_next # Last word ended with a ':' newwords.push(w.capitalize) capitalize_next = false elsif little_words.include?(w.downcase) # Only capitalize at beginning or end newwords.push(first || last ? w.capitalize : w.downcase) else # All else newwords.push(w.capitalize) end # Capitalize following a ':' capitalize_next = true if newwords.last =~ /:\s*\z/ end newwords.join(' ') end |
permalink #fuzzy_match(matcher) ⇒ String?
Return the matched portion of self, minus punctuation characters, if self
matches the string matcher
using the following notion of matching:
- Remove all periods, commas, apostrophes, and asterisks (the punctuation
characters) from both self and
matcher
, - Treat ':' in the matcher as the equivalent of '.*' in a regular expression, that is, match anything in self,
- Ignore case in the match
- Match if any part of self matches
matcher
260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
# File 'lib/fat_core/string.rb', line 260 def fuzzy_match(matcher) # Remove periods, asterisks, commas, and apostrophes matcher = matcher.gsub(/[\*.,']/, '') target = gsub(/[\*.,']/, '') matchers = matcher.split(/[: ]+/) regexp_string = matchers.map { |m| ".*?#{Regexp.escape(m)}.*?" }.join('[: ]') regexp_string.sub!(/^\.\*\?/, '') regexp_string.sub!(/\.\*\?$/, '') regexp = /#{regexp_string}/i matched_text = if (match = regexp.match(target)) match[0] end matched_text end |
permalink #matches_with(matcher) ⇒ nil, String
Test whether self matches the matcher
treating matcher
as a
case-insensitive regular expression if it is of the form '/.../' or as a
string to #fuzzy_match against otherwise.
230 231 232 233 234 235 236 237 238 239 |
# File 'lib/fat_core/string.rb', line 230 def matches_with(matcher) if matcher.nil? nil elsif matcher =~ %r{^\s*/} re = matcher.to_regexp $& if to_s =~ re else to_s.fuzzy_match(matcher) end end |
permalink #number? ⇒ Boolean
Return whether self is convertible into a valid number.
326 327 328 329 330 331 |
# File 'lib/fat_core/string.rb', line 326 def number? Float(self) true rescue ArgumentError return false end |
permalink #tex_quote ⇒ String
Return self with special TeX characters replaced with control-sequences that output the literal value of the special characters instead. It handles _, $, &, %, #, {, }, \, ^, ~, <, and >.
85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
# File 'lib/fat_core/string.rb', line 85 def tex_quote r = dup r = r.gsub(/[{]/, 'XzXzXobXzXzX') r = r.gsub(/[}]/, 'XzXzXcbXzXzX') r = r.gsub(/\\/, '\textbackslash{}') r = r.gsub(/\^/, '\textasciicircum{}') r = r.gsub(/~/, '\textasciitilde{}') r = r.gsub(/\|/, '\textbar{}') r = r.gsub(/\</, '\textless{}') r = r.gsub(/\>/, '\textgreater{}') r = r.gsub(/([_$&%#])/) { |m| '\\' + m } r = r.gsub('XzXzXobXzXzX', '\\{') r.gsub('XzXzXcbXzXzX', '\\}') end |
permalink #to_regexp ⇒ Regexp
Convert a string of the form '/.../Iixm' to a regular expression. However, make the regular expression case-insensitive by default and extend the modifier syntax to allow '/I' to indicate case-sensitive. Without the surrounding '/', do not make the Regexp case insensitive, just translate it to a Regexp with Regexp.new.
290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 |
# File 'lib/fat_core/string.rb', line 290 def to_regexp if self =~ %r{^\s*/([^/]*)/([Iixm]*)\s*$} body = $1 opts = $2 flags = Regexp::IGNORECASE unless opts.blank? flags = 0 if opts.include?('I') flags |= Regexp::IGNORECASE if opts.include?('i') flags |= Regexp::EXTENDED if opts.include?('x') flags |= Regexp::MULTILINE if opts.include?('m') end flags = nil if flags.zero? body = Regexp.quote(body) if REGEXP_META_CHARACTERS.include?(body) Regexp.new(body, flags) else if REGEXP_META_CHARACTERS.include?(self) Regexp.new(Regexp.quote(self)) else Regexp.new(self) end end end |
permalink #wrap(width = 70, hang = 0) ⇒ String
Return a string wrapped to width
characters with lines following the
first indented by hang
characters.
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
# File 'lib/fat_core/string.rb', line 51 def wrap(width = 70, hang = 0) result = '' first_line = true first_word_on_line = true line_width_so_far = 0 words = split(' ') words.each do |w| if !first_line && first_word_on_line w = ' ' * hang + w end unless first_word_on_line w = ' ' + w end result << w first_word_on_line = false line_width_so_far += 1 + w.length if line_width_so_far >= width result << "\n" line_width_so_far = 0 first_line = false first_word_on_line = true end end result.strip end |