Module: Re
Overview
Regular Expression Construction
Complex regular expressions are hard to construct and even harder to read. The Re library allows users to construct complex regular expressions from simpler expressions. For example, consider the following regular expression that will parse dates:
/\A((?:19|20)[0-9]{2})[\- \/.](0[1-9]|1[012])[\- \/.](0[1-9]|[12][0-9]|3[01])\z/
Using the Re library, that regular expression can be built incrementaly from smaller, easier to understand expressions. Perhaps something like this:
require 're'
include Re
delim = re.any("- /.")
century_prefix = re("19") | re("20")
under_ten = re("0") + re.any("1-9")
ten_to_twelve = re("1") + re.any("012")
ten_and_under_thirty = re.any("12") + re.any("0-9")
thirties = re("3") + re.any("01")
year = (century_prefix + re.digit.repeat(2)).capture(:year)
month = (under_ten | ten_to_twelve).capture(:month)
day = (under_ten | ten_and_under_thirty | thirties).capture(:day)
date = (year + delim + month + delim + day).all
Although it is more code, the individual pieces are smaller and easier to independently verify. As an additional bonus, the capture groups can be retrieved by name:
result = date.match("2009-01-23")
result[:year] # => "2009"
result[:month] # => "01"
result[:day] # => "23"
Usage
include Re
number = re.any("0-9").all
if number =~ string
puts "Matches!"
else
puts "No Match"
end
Examples
re("a") -- matches "a"
re("a") + re("b") -- matches "ab"
re("a") | re("b") -- matches "a" or "b"
re("a").many -- matches "", "a", "aaaaaa"
re("a").one_or_more -- matches "a", "aaaaaa", but not ""
re("a").optional -- matches "" or "a"
re("a").all -- matches "a", but not "xab"
See Re::Rexp for a complete list of expressions.
Using re without an argument allows access to a number of common regular expression constants. For example:
re.space / re.spaces -- matches " ", "\n" or "\t"
re.digit / re.digits -- matches a digit / sequence of digits
Also, re without arguments can also be used to construct character classes:
re.any -- Matches any charactor
re.any("abc") -- Matches "a", "b", or "c"
re.any("0-9") -- Matches the digits 0 through 9
re.any("A-Z", "a-z", "0-9", "_")
-- Matches alphanumeric or an underscore
See Re::ConstructionMethods for a complete list of common constants and character class functions.
See Re.re, Re::Rexp, and Re::ConstructionMethods for details.
Performance
We should say a word or two about performance.
First of all, building regular expressions using Re is slow. If you use Re to build regular expressions, you are encouraged to build the regular expression once and reuse it as needed. This means you won’t do a lot of inline expressions using Re, but rather assign the generated Re regular expression to a constant. For example:
PHONE_RE = re.digit.repeat(3).capture(:area) +
re("-") +
re.digit.repeat(3).capture(:exchange) +
re("-") +
re.digit.repeat(4)).capture(:subscriber)
Alternatively, you can arrange for the regular expression to be constructed only when actually needed. Something like:q
def phone_re
@phone_re ||= re.digit.repeat(3).capture(:area) +
re("-") +
re.digit.repeat(3).capture(:exchange) +
re("-") +
re.digit.repeat(4)).capture(:subscriber)
end
That method constructs the phone number regular expression once and returns a cached value thereafter. Just make sure you put the method in an object that is instantiated once (e.g. a class method).
When used in matching, Re regular expressions perform fairly well compared to native regular expressions. The overhead is a small number of extra method calls and the creation of a Re::Result object to return the match results.
If regular expression performance is a premium in your application, then you can still use Re to construct the regular expression and extract the raw Ruby Regexp object to be used for the actual matching. You lose the ability to use named capture groups easily, but you get raw Ruby regular expression matching performance.
For example, if you wanted to use the raw regular expression from PHONE_RE defined above, you could extract the regular expression like this:
PHONE_REGEXP = PHONE_RE.regexp
And then use it directly:
if PHONE_REGEXP =~ string
# blah blah blah
end
The above match runs at full Ruby matching speed. If you still wanted named capture groups, you can something like this:
match_data = PHONE_REGEXP.match(string)
area_code = match_data[PHONE_RE.name_map[:area]]
License and Copyright
Copyright 2009 by Jim Weirich ([email protected]). All rights Reserved.
Re is provided under the MIT open source license (see MIT-LICENSE)
Links:
- Documentation
- Source
- GemCutter
- Download
- Bug Tracker
- Author
Defined Under Namespace
Modules: ConstructionMethods, Version Classes: Result, Rexp
Constant Summary collapse
- VERSION =
Version::NUMBERS.join('.')
- GROUPED =
Precedence levels for regular expressions:
4- POSTFIX =
®, [chars] :nodoc:
3- CONCAT =
r*, r+, r? :nodoc:
2- ALT =
r + r, literal :nodoc:
1- MULTILINE_MODE =
Mode Bits
Regexp::MULTILINE
- IGNORE_CASE_MODE =
Regexp::IGNORECASE
- NULL =
Matches an empty string. Additional common regular expression construction methods are defined on NULL. See Re::ConstructionMethods for details.
Rexp.literal("")
Instance Method Summary collapse
-
#re(exp = nil) ⇒ Object
Construct a regular expression from the literal string.
Instance Method Details
#re(exp = nil) ⇒ Object
Construct a regular expression from the literal string. Special Regexp characters will be escaped before constructing the regular expression. If no literal is given, then the NULL regular expression is returned.
See Re for example usage.
492 493 494 |
# File 'lib/re.rb', line 492 def re(exp=nil) exp ? Rexp.literal(exp) : NULL end |