Class: Oniguruma::ORegexp
- Inherits:
-
Object
- Object
- Oniguruma::ORegexp
- Defined in:
- lib/oniguruma.rb,
ext/oregexp.c
Class Method Summary collapse
-
.escape(*args) ⇒ Object
(also: quote)
call-seq: ORegexp.escape(str) => a_str ORegexp.quote(str) => a_str.
-
.last_match(index = nil) ⇒ Object
call-seq: ORegexp.last_match => matchdata ORegexp.last_match(fixnum) => str The first form returns the
MatchData
object generated by the last successful pattern match.
Instance Method Summary collapse
-
#==(regexp) ⇒ Object
(also: #eql?)
call-seq: rxp == other_rxp => true or false rxp.eql?(other_rxp) => true or false Equality—Two regexps are equal if their patterns are identical, they have the same character set code, and their
#casefold?
values are the same. -
#===(str) ⇒ Boolean
Case Equality—Synonym for
ORegexp#=~
used in case statements. -
#=~(string) ⇒ Integer?
Matches
rxp
againststring
, returning the offset of the start of the match ornil
if the match failed. -
#casefold? ⇒ Boolean
call-seq: rxp.casefold? => true of false.
-
#gsub(*args) ⇒ Object
Returns a copy of str with all occurrences of rxp pattern replaced with either replacement or the value of the block.
-
#gsub!(*args) ⇒ Object
Performs the substitutions of ORegexp#gsub in place, returning str, or nil if no substitutions were performed.
-
#initialize(pattern, options) ⇒ ORegexp
constructor
call-seq: ORegexp.new( pattern, options_hash ) ORegexp.new( pattern, option_str, encoding_str=nil, syntax_str=nil) .
-
#inspect ⇒ Object
call-seq: rxp.inspect => string.
-
#kcode ⇒ Object
call-seq: rxp.kode => int.
-
#match(*args) ⇒ Object
Returns a
MatchData
object describing the match, ornil
if there was no match. -
#old_initialize ⇒ Object
:stopdoc:.
-
#options ⇒ Object
call-seq: rxp.options => fixnum.
-
#scan(str) ⇒ Object
(also: #match_all)
Both forms iterate through str, matching the pattern.
-
#source ⇒ Object
call-seq: rxp.source => str.
-
#sub(*args) ⇒ Object
Returns a copy of str with the first occurrence of rxp pattern replaced with either replacement or the value of the block.
-
#sub!(*args) ⇒ Object
Performs the substitutions of ORegexp#sub in place, returning str, or nil if no substitutions were performed.
-
#to_s ⇒ Object
call-seq: rxp.to_s => str.
Constructor Details
#initialize(pattern, options) ⇒ ORegexp
call-seq:
ORegexp.new( pattern, )
ORegexp.new( pattern, option_str, encoding_str=nil, syntax_str=nil)
Constructs a new regular expression from pattern, which is a String
. The second parameter may be a Hash
of the form:
{ :options => option_value, :encoding => encoding_value, :syntax => syntax_value }
Where option_value
is a bitwise OR
of Oniguruma::OPTION_XXX
constants; encoding_value
is one of Oniguruma::ENCODING_XXX
constants; and syntax_value
is one of Oniguruma::SYNTAX_XXX
constants.
r1 = ORegexp.new('^a-z+:\\s+\w+') #=> /^a-z+:\s+\w+/
r2 = ORegexp.new('cat', :options => OPTION_IGNORECASE ) #=> /cat/i
r3 = ORegexp.new('dog', :options => OPTION_EXTEND ) #=> /dog/x
#Accept java syntax on SJIS encoding:
r4 = ORegexp.new('ape', :syntax => SYNTAX_JAVA, :encoding => ENCODING_SJIS) #=> /ape/
Second form uses string shortcuts to set options and encoding:
r = ORegexp.new('cat', 'i', 'utf8', 'java')
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
# File 'lib/oniguruma.rb', line 160 def initialize( pattern, *args ) defaults = { :options => OPTION_DEFAULT, :encoding => ENCODING_ASCII, :syntax => SYNTAX_DEFAULT} if args[0].is_a?(String) = {} option_str, encoding_str, syntax_str = *args opt = 0 option_str.each_byte {|x| opt |= (OPTIONS_SHORTCUTS[x.chr] || 0) } [:options] = opt if encoding_str && Oniguruma::const_defined?("ENCODING_#{encoding_str.upcase}") [:encoding] = Oniguruma::const_get("ENCODING_#{encoding_str.upcase}") end if syntax_str && Oniguruma::const_defined?("SYNTAX_#{syntax_str.upcase}") [:syntax] = Oniguruma::const_get("SYNTAX_#{syntax_str.upcase}") end else = args[0] || {} end old_initialize( pattern, defaults.merge( ).freeze ) end |
Class Method Details
.escape(*args) ⇒ Object Also known as: quote
call-seq: ORegexp.escape(str) => a_str ORegexp.quote(str) => a_str
Escapes any characters that would have special meaning in a regular expression. Returns a new escaped string, or self if no characters are escaped. For any string, Regexp.escape(str)=~str
will be true.
ORegexp.escape('\\*?{}.') #=> \\\\\*\?\{\}\.
100 101 102 |
# File 'lib/oniguruma.rb', line 100 def escape( *args ) Regexp.escape( *args ) end |
.last_match(index = nil) ⇒ Object
call-seq:
ORegexp.last_match => matchdata
ORegexp.last_match(fixnum) => str
The first form returns the MatchData
object generated by the last successful pattern match. The second form returns the nth field in this MatchData
object.
ORegexp.new( 'c(.)t' ) =~ 'cat' #=> 0
ORegexp.last_match #=> #<MatchData:0x401b3d30>
ORegexp.last_match(0) #=> "cat"
ORegexp.last_match(1) #=> "a"
ORegexp.last_match(2) #=> nil
121 122 123 124 125 126 127 |
# File 'lib/oniguruma.rb', line 121 def last_match( index = nil) if index @@last_match[index] else @@last_match end end |
Instance Method Details
#==(regexp) ⇒ Object Also known as: eql?
call-seq:
rxp == other_rxp => true or false
rxp.eql?(other_rxp) => true or false
Equality—Two regexps are equal if their patterns are identical, they have the same character set code, and their #casefold?
values are the same.
188 189 190 |
# File 'lib/oniguruma.rb', line 188 def == regexp @pattern == regexp.source && kcode == regexp.kcode && casefold? == regexp.casefold? end |
#===(str) ⇒ Boolean
694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 |
# File 'ext/oregexp.c', line 694
static VALUE oregexp_m_eqq(VALUE self, VALUE str) {
VALUE match;
if (TYPE(str) != T_STRING) {
str = rb_check_string_type(str);
if (NIL_P(str)) {
return Qfalse;
}
}
StringValue(str);
VALUE args[] = {str};
match = oregexp_match(1, args, self);
if (Qnil == match) {
return Qfalse;
}
return Qtrue;
}
|
#=~(string) ⇒ Integer?
Matches rxp
against string
, returning the offset of the start of the match or nil
if the match failed. Sets $~ to the corresponding MatchData
or nil
.
ORegexp.new( 'SIT' ) =~ "insensitive" #=> nil
ORegexp.new( 'SIT', :options => OPTION_IGNORECASE ) =~ "insensitive" #=> 5
722 723 724 725 726 727 728 |
# File 'ext/oregexp.c', line 722
static VALUE oregexp_match_op(VALUE self, VALUE str) {
VALUE args[] = {str};
VALUE ret = oregexp_match(1, args, self);
if(ret == Qnil)
return Qnil;
return INT2FIX(RMATCH(ret)->regs->beg[0]);
}
|
#casefold? ⇒ Boolean
call-seq:
rxp.casefold? => true of false
Returns the value of the case-insensitive flag.
198 199 200 |
# File 'lib/oniguruma.rb', line 198 def casefold? (@options[:options] & OPTION_IGNORECASE) > 0 end |
#gsub(str, replacement) ⇒ Object #gsub(str) {|match_data| ... } ⇒ Object
Returns a copy of str with all occurrences of rxp pattern replaced with either replacement or the value of the block.
If a string is used as the replacement, the sequences 1, 2, and so on may be used to interpolate successive groups in the match.
In the block form, the current MatchData object is passed in as a parameter. The value returned by the block will be substituted for the match on each call.
546 547 548 |
# File 'ext/oregexp.c', line 546
static VALUE oregexp_m_gsub(int argc, VALUE *argv, VALUE self) {
return oregexp_safe_gsub(self, argc, argv, 0, 0);
}
|
#gsub!(str, replacement) ⇒ Object #gsub!(str) {|match_data| ... } ⇒ Object
Performs the substitutions of ORegexp#gsub in place, returning str, or nil if no substitutions were performed.
579 580 581 |
# File 'ext/oregexp.c', line 579
static VALUE oregexp_m_gsub_bang(int argc, VALUE *argv, VALUE self) {
return oregexp_safe_gsub(self, argc, argv, 1, 0);
}
|
#inspect ⇒ Object
call-seq:
rxp.inspect => string
Returns a readable version of rxp
ORegexp.new( 'cat', :options => OPTION_MULTILINE | OPTION_IGNORECASE ).inspect => /cat/im
ORegexp.new( 'cat', :options => OPTION_MULTILINE | OPTION_IGNORECASE ).to_s => (?im-x)cat
271 272 273 274 275 276 277 |
# File 'lib/oniguruma.rb', line 271 def inspect opt_str = "" opt_str += "i" if (@options[:options] & OPTION_IGNORECASE) > 0 opt_str += "m" if (@options[:options] & OPTION_MULTILINE) > 0 opt_str += "x" if (@options[:options] & OPTION_EXTEND) > 0 "/" + @pattern + "/" + opt_str end |
#kcode ⇒ Object
call-seq:
rxp.kode => int
Returns the character set code for the regexp.
206 207 208 |
# File 'lib/oniguruma.rb', line 206 def kcode @options[:encoding] end |
#match(str) ⇒ MatchData? #match(str) ⇒ MatchData?
Returns a MatchData
object describing the match, or nil
if there was no match. This is equivalent to retrieving the value of the special variable $~
following a normal match.
ORegexp.new('(.)(.)(.)').match("abc")[2] #=> "b"
The second form allows to perform the match in a region defined by begin
and end
while still taking into account look-behinds and look-forwards.
ORegexp.new('1*2*').match('11221122').offset => [4,8]
ORegexp.new('(?<=2)1*2*').match('11221122').offset => [4,8]
Compare with:
ORegexp.new('(?<=2)1*2*').match('11221122'[4..-1]) => nil
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 |
# File 'ext/oregexp.c', line 204
static VALUE oregexp_match( int argc, VALUE * argv, VALUE self ) {
ORegexp *oregexp;
Data_Get_Struct( self, ORegexp, oregexp );
if ( argc == 0 || argc > 2) {
rb_raise(rb_eArgError, "wrong number of arguments (%d for 2)", argc);
exit;
}
VALUE string_str = StringValue( argv[0] );
UChar* str_ptr = RSTRING(string_str)->ptr;
int str_len = RSTRING(string_str)->len;
int begin = 0;
int end = str_len;
if (argc > 1 ) {
begin = NUM2INT( argv[1] );
}
// if (argc > 2) {
// end = NUM2INT( argv[2] );
// }
OnigRegion *region = onig_region_new();
int r = onig_search(oregexp->reg, str_ptr, str_ptr + str_len, str_ptr + begin, str_ptr + end, region, ONIG_OPTION_NONE);
rb_backref_set(Qnil);
if (r >= 0) {
VALUE matchData = oregexp_make_match_data( oregexp, region, string_str);
onig_region_free(region, 1 );
rb_backref_set(matchData);
rb_match_busy(matchData);
return matchData;
} else if (r == ONIG_MISMATCH) {
onig_region_free(region, 1 );
return Qnil;
} else {
onig_region_free(region, 1 );
char s[ONIG_MAX_ERROR_MESSAGE_LEN];
onig_error_code_to_str(s, r);
rb_raise(rb_eArgError, "Oniguruma Error: %s", s);
}
}
|
#old_initialize ⇒ Object
:stopdoc:
131 |
# File 'lib/oniguruma.rb', line 131 alias old_initialize initialize |
#options ⇒ Object
call-seq:
rxp. => fixnum
Returns the set of bits corresponding to the options used when creating this ORegexp (see ORegexp::new
for details. Note that additional bits may be set in the returned options: these are used internally by the regular expression code. These extra bits are ignored if the options are passed to ORegexp::new
.
Oniguruma::OPTION_IGNORECASE #=> 1
Oniguruma::OPTION_EXTEND #=> 2
Oniguruma::OPTION_MULTILINE #=> 4
Regexp.new(r.source, :options => Oniguruma::OPTION_EXTEND ) #=> 2
225 226 227 |
# File 'lib/oniguruma.rb', line 225 def @options[:options] end |
#scan(str) ⇒ Array? #scan(str) ⇒ Array? Also known as: match_all
Both forms iterate through str, matching the pattern. For each match, a MatchData object is generated and passed to the block, and added to the resulting array of MatchData objects.
If str does not match pattern, nil is returned.
668 669 670 671 672 |
# File 'ext/oregexp.c', line 668
static VALUE oregexp_m_scan(VALUE self, VALUE str) {
OnigRegion * region = onig_region_new();
struct scan_packet call_args = {self, str, region};
return rb_ensure( oregexp_packed_scan, (VALUE)&call_args, oregexp_cleanup_region, (VALUE)region);
}
|
#source ⇒ Object
call-seq:
rxp.source => str
Returns the original string of the pattern.
ORegex.new( 'ab+c', 'ix' ).source #=> "ab+c"
285 286 287 |
# File 'lib/oniguruma.rb', line 285 def source @pattern.freeze end |
#sub(str, replacement) ⇒ Object #sub(str) {|match_data| ... } ⇒ Object
Returns a copy of str with the first occurrence of rxp pattern replaced with either replacement or the value of the block.
If a string is used as the replacement, the sequences 1, 2, and so on may be used to interpolate successive groups in the match.
In the block form, the current MatchData object is passed in as a parameter. The value returned by the block will be substituted for the match on each call.
566 567 568 |
# File 'ext/oregexp.c', line 566
static VALUE oregexp_m_sub(int argc, VALUE *argv, VALUE self) {
return oregexp_safe_gsub(self, argc, argv, 0, 1);
}
|
#sub!(str, replacement) ⇒ Object #sub!(str) {|match_data| ... } ⇒ Object
Performs the substitutions of ORegexp#sub in place, returning str, or nil if no substitutions were performed.
592 593 594 |
# File 'ext/oregexp.c', line 592
static VALUE oregexp_m_sub_bang(int argc, VALUE *argv, VALUE self) {
return oregexp_safe_gsub(self, argc, argv, 1, 1);
}
|
#to_s ⇒ Object
call-seq:
rxp.to_s => str
Returns a string containing the regular expression and its options (using the (?xxx:yyy)
notation. This string can be fed back in to Regexp::new
to a regular expression with the same semantics as the original. (However, Regexp#==
may not return true when comparing the two, as the source of the regular expression itself may differ, as the example shows). Regexp#inspect
produces a generally more readable version of rxp.
r1 = ORegexp.new( 'ab+c', :options OPTION_IGNORECASE | OPTION_EXTEND ) #=> /ab+c/ix
s1 = r1.to_s #=> "(?ix-m:ab+c)"
r2 = ORegexp.new(s1) #=> /(?ix-m:ab+c)/
r1 == r2 #=> false
r1.source #=> "ab+c"
r2.source #=> "(?ix-m:ab+c)"
247 248 249 250 251 252 253 254 255 256 257 258 259 260 |
# File 'lib/oniguruma.rb', line 247 def to_s opt_str = "(?" opt_str += "i" if (@options[:options] & OPTION_IGNORECASE) > 0 opt_str += "m" if (@options[:options] & OPTION_MULTILINE) > 0 opt_str += "x" if (@options[:options] & OPTION_EXTEND) > 0 unless opt_str == "(?imx" opt_str += "-" opt_str += "i" if (@options[:options] & OPTION_IGNORECASE) == 0 opt_str += "m" if (@options[:options] & OPTION_MULTILINE) == 0 opt_str += "x" if (@options[:options] & OPTION_EXTEND) == 0 end opt_str += ")" opt_str + @pattern end |