Class: String
Overview
A String
object holds and manipulates an arbitrary sequence of bytes, typically representing characters. String objects may be created using String::new
or as literals.
Because of aliasing issues, users of strings should be aware of the methods that modify the contents of a String
object. Typically, methods with names ending in “!” modify their receiver, while those without a “!” return a new String
. However, there are exceptions, such as String#[]=
.
Class Method Summary collapse
-
.try_convert(obj) ⇒ String?
Try to convert obj into a String, using to_str method.
Instance Method Summary collapse
-
#%(arg) ⇒ String
Format—Uses str as a format specification, and returns the result of applying it to arg.
-
#*(integer) ⇒ String
Copy — Returns a new String containing
integer
copies of the receiver. -
#+(other_str) ⇒ String
Concatenation—Returns a new
String
containing other_str concatenated to str. -
#<< ⇒ Object
Append—Concatenates the given object to str.
-
#<=>(other_string) ⇒ -1, ...
Comparison—Returns -1, 0, +1 or nil depending on whether
string
is less than, equal to, or greater thanother_string
. -
#== ⇒ Object
Equality.
-
#=== ⇒ Object
Equality.
-
#=~(obj) ⇒ Fixnum?
Match—If obj is a
Regexp
, use it as a pattern to match against str,and returns the position the match starts, ornil
if there is no match. -
#[] ⇒ Object
Element Reference — If passed a single
index
, returns a substring of one character at that index. -
#[]= ⇒ Object
Element Assignment—Replaces some or all of the content of str.
-
#ascii_only? ⇒ Boolean
Returns true for a string which has only ASCII characters.
-
#b ⇒ String
Returns a copied string whose encoding is ASCII-8BIT.
-
#bytes ⇒ Array
Returns an array of bytes in str.
-
#bytesize ⇒ Integer
Returns the length of
str
in bytes. -
#byteslice ⇒ Object
Byte Reference—If passed a single
Fixnum
, returns a substring of one byte at that position. -
#capitalize ⇒ String
Returns a copy of str with the first character converted to uppercase and the remainder to lowercase.
-
#capitalize! ⇒ String?
Modifies str by converting the first character to uppercase and the remainder to lowercase.
-
#casecmp(other_str) ⇒ -1, ...
Case-insensitive version of
String#<=>
. -
#center(width, padstr = ' ') ⇒ String
Centers
str
inwidth
. -
#chars ⇒ Array
Returns an array of characters in str.
-
#chomp(separator = $/) ⇒ String
Returns a new
String
with the given record separator removed from the end of str (if present). -
#chomp!(separator = $/) ⇒ String?
Modifies str in place as described for
String#chomp
, returning str, ornil
if no modifications were made. -
#chop ⇒ String
Returns a new
String
with the last character removed. -
#chop! ⇒ String?
Processes str as for
String#chop
, returning str, ornil
if str is the empty string. -
#chr ⇒ String
Returns a one-character string at the beginning of the string.
-
#clear ⇒ String
Makes string empty.
-
#codepoints ⇒ Array
Returns an array of the
Integer
ordinals of the characters in str. -
#concat ⇒ Object
Append—Concatenates the given object to str.
-
#count([other_str]) ⇒ Fixnum
Each
other_str
parameter defines a set of characters to count. -
#crypt(salt_str) ⇒ String
Applies a one-way cryptographic hash to str by invoking the standard library function
crypt(3)
with the given salt string. -
#delete([other_str]) ⇒ String
Returns a copy of str with all characters in the intersection of its arguments deleted.
-
#delete!([other_str]) ⇒ String?
Performs a
delete
operation in place, returning str, ornil
if str was not modified. -
#downcase ⇒ String
Returns a copy of str with all uppercase letters replaced with their lowercase counterparts.
-
#downcase! ⇒ String?
Downcases the contents of str, returning
nil
if no changes were made. -
#dump ⇒ String
Produces a version of
str
with all non-printing characters replaced by\nnn
notation and all special characters escaped. -
#each_byte ⇒ Object
Passes each byte in str to the given block, or returns an enumerator if no block is given.
-
#each_char ⇒ Object
Passes each character in str to the given block, or returns an enumerator if no block is given.
-
#each_codepoint ⇒ Object
Passes the
Integer
ordinal of each character in str, also known as a codepoint when applied to Unicode strings to the given block. -
#each_line ⇒ Object
Splits str using the supplied parameter as the record separator (
$/
by default), passing each substring in turn to the supplied block. -
#empty? ⇒ Boolean
Returns
true
if str has a length of zero. -
#encode ⇒ Object
The first form returns a copy of
str
transcoded to encodingencoding
. -
#encode! ⇒ Object
The first form transcodes the contents of str from str.encoding to
encoding
. -
#encoding ⇒ Encoding
Returns the Encoding object that represents the encoding of obj.
-
#end_with?([suffixes]) ⇒ Boolean
Returns true if
str
ends with one of thesuffixes
given. -
#eql?(other) ⇒ Boolean
Two strings are equal if they have the same length and content.
-
#force_encoding(encoding) ⇒ String
Changes the encoding to
encoding
and returns self. - #freeze ⇒ Object
-
#getbyte(index) ⇒ 0 .. 255
returns the indexth byte as an integer.
-
#gsub ⇒ Object
Returns a copy of str with the all occurrences of pattern substituted for the second argument.
-
#gsub! ⇒ Object
Performs the substitutions of
String#gsub
in place, returning str, ornil
if no substitutions were performed. -
#hash ⇒ Fixnum
Return a hash based on the string’s length and content.
-
#hex ⇒ Integer
Treats leading characters from str as a string of hexadecimal digits (with an optional sign and an optional
0x
) and returns the corresponding number. -
#include?(other_str) ⇒ Boolean
Returns
true
if str contains the given string or character. -
#index ⇒ Object
Returns the index of the first occurrence of the given substring or pattern (regexp) in str.
-
#new(str = "") ⇒ String
constructor
Returns a new string object containing a copy of str.
-
#replace(other_str) ⇒ String
Replaces the contents and taintedness of str with the corresponding values in other_str.
-
#insert(index, other_str) ⇒ String
Inserts other_str before the character at the given index, modifying str.
-
#inspect ⇒ String
Returns a printable version of str, surrounded by quote marks, with special characters escaped.
-
#intern ⇒ Object
Returns the
Symbol
corresponding to str, creating the symbol if it did not previously exist. -
#length ⇒ Object
Returns the character length of str.
-
#lines(separator = $/) ⇒ Array
Returns an array of lines in str split using the supplied record separator (
$/
by default). -
#ljust(integer, padstr = ' ') ⇒ String
If integer is greater than the length of str, returns a new
String
of length integer with str left justified and padded with padstr; otherwise, returns str. -
#lstrip ⇒ String
Returns a copy of str with leading whitespace removed.
-
#lstrip! ⇒ self?
Removes leading whitespace from str, returning
nil
if no change was made. -
#match ⇒ Object
Converts pattern to a
Regexp
(if it isn’t already one), then invokes itsmatch
method on str. -
#next ⇒ Object
Returns the successor to str.
-
#next! ⇒ Object
Equivalent to
String#succ
, but modifies the receiver in place. -
#oct ⇒ Integer
Treats leading characters of str as a string of octal digits (with an optional sign) and returns the corresponding number.
-
#ord ⇒ Integer
Return the
Integer
ordinal of a one-character string. -
#partition ⇒ Object
Searches sep or pattern (regexp) in the string and returns the part before it, the match, and the part after it.
-
#prepend(other_str) ⇒ String
Prepend—Prepend the given string to str.
-
#replace(other_str) ⇒ String
Replaces the contents and taintedness of str with the corresponding values in other_str.
-
#reverse ⇒ String
Returns a new string with the characters from str in reverse order.
-
#reverse! ⇒ String
Reverses str in place.
-
#rindex ⇒ Object
Returns the index of the last occurrence of the given substring or pattern (regexp) in str.
-
#rjust(integer, padstr = ' ') ⇒ String
If integer is greater than the length of str, returns a new
String
of length integer with str right justified and padded with padstr; otherwise, returns str. -
#rpartition ⇒ Object
Searches sep or pattern (regexp) in the string from the end of the string, and returns the part before it, the match, and the part after it.
-
#rstrip ⇒ String
Returns a copy of str with trailing whitespace removed.
-
#rstrip! ⇒ self?
Removes trailing whitespace from str, returning
nil
if no change was made. -
#scan ⇒ Object
Both forms iterate through str, matching the pattern (which may be a
Regexp
or aString
). -
#scrub ⇒ Object
If the string is invalid byte sequence then replace invalid bytes with given replacement character, else returns self.
-
#scrub! ⇒ Object
If the string is invalid byte sequence then replace invalid bytes with given replacement character, else returns self.
-
#setbyte(index, integer) ⇒ Integer
modifies the indexth byte as integer.
-
#size ⇒ Object
Returns the character length of str.
-
#slice ⇒ Object
Element Reference — If passed a single
index
, returns a substring of one character at that index. -
#slice! ⇒ Object
Deletes the specified portion from str, and returns the portion deleted.
-
#split(pattern = $;, [limit]) ⇒ Array
Divides str into substrings based on a delimiter, returning an array of these substrings.
-
#squeeze([other_str]) ⇒ String
Builds a set of characters from the other_str parameter(s) using the procedure described for
String#count
. -
#squeeze!([other_str]) ⇒ String?
Squeezes str in place, returning either str, or
nil
if no changes were made. -
#start_with?([prefixes]) ⇒ Boolean
Returns true if
str
starts with one of theprefixes
given. -
#strip ⇒ String
Returns a copy of str with leading and trailing whitespace removed.
-
#strip! ⇒ String?
Removes leading and trailing whitespace from str.
-
#sub ⇒ Object
Returns a copy of
str
with the first occurrence ofpattern
replaced by the second argument. -
#sub! ⇒ Object
Performs the same substitution as String#sub in-place.
-
#succ ⇒ Object
Returns the successor to str.
-
#succ! ⇒ Object
Equivalent to
String#succ
, but modifies the receiver in place. -
#sum(n = 16) ⇒ Integer
Returns a basic n-bit checksum of the characters in str, where n is the optional
Fixnum
parameter, defaulting to 16. -
#swapcase ⇒ String
Returns a copy of str with uppercase alphabetic characters converted to lowercase and lowercase characters converted to uppercase.
-
#swapcase! ⇒ String?
Equivalent to
String#swapcase
, but modifies the receiver in place, returning str, ornil
if no changes were made. -
#to_c ⇒ Object
Returns a complex which denotes the string form.
-
#to_f ⇒ Float
Returns the result of interpreting leading characters in str as a floating point number.
-
#to_i(base = 10) ⇒ Integer
Returns the result of interpreting leading characters in str as an integer base base (between 2 and 36).
-
#to_r ⇒ Object
Returns a rational which denotes the string form.
-
#to_s ⇒ Object
Returns the receiver.
-
#to_str ⇒ Object
Returns the receiver.
-
#to_sym ⇒ Object
Returns the
Symbol
corresponding to str, creating the symbol if it did not previously exist. -
#tr(from_str, to_str) ⇒ String
Returns a copy of
str
with the characters infrom_str
replaced by the corresponding characters into_str
. -
#tr!(from_str, to_str) ⇒ String?
Translates str in place, using the same rules as
String#tr
. -
#tr_s(from_str, to_str) ⇒ String
Processes a copy of str as described under
String#tr
, then removes duplicate characters in regions that were affected by the translation. -
#tr_s!(from_str, to_str) ⇒ String?
Performs
String#tr_s
processing on str in place, returning str, ornil
if no changes were made. -
#unpack(format) ⇒ Array
Decodes str (which may contain binary data) according to the format string, returning an array of each value extracted.
-
#upcase ⇒ String
Returns a copy of str with all lowercase letters replaced with their uppercase counterparts.
-
#upcase! ⇒ String?
Upcases the contents of str, returning
nil
if no changes were made. -
#upto ⇒ Object
Iterates through successive values, starting at str and ending at other_str inclusive, passing each value in turn to the block.
-
#valid_encoding? ⇒ Boolean
Returns true for a string which encoded correctly.
Methods included from Comparable
Constructor Details
#new(str = "") ⇒ String
Returns a new string object containing a copy of str.
1084 1085 1086 1087 1088 1089 1090 1091 1092 |
# File 'string.c', line 1084 static VALUE rb_str_init(int argc, VALUE *argv, VALUE str) { VALUE orig; if (argc > 0 && rb_scan_args(argc, argv, "01", &orig) == 1) rb_str_replace(str, orig); return str; } |
Class Method Details
.try_convert(obj) ⇒ String?
Try to convert obj into a String, using to_str method. Returns converted string or nil if obj cannot be converted for any reason.
String.try_convert("str") #=> "str"
String.try_convert(/re/) #=> nil
1696 1697 1698 1699 1700 |
# File 'string.c', line 1696 static VALUE rb_str_s_try_convert(VALUE dummy, VALUE str) { return rb_check_string_type(str); } |
Instance Method Details
#%(arg) ⇒ String
Format—Uses str as a format specification, and returns the result of applying it to arg. If the format specification contains more than one substitution, then arg must be an Array
or Hash
containing the values to be substituted. See Kernel::sprintf
for details of the format string.
"%05d" % 123 #=> "00123"
"%-5s: %08x" % [ "ID", self.object_id ] #=> "ID : 200e14d6"
"foo = %{foo}" % { :foo => 'bar' } #=> "foo = bar"
1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 |
# File 'string.c', line 1431 static VALUE rb_str_format_m(VALUE str, VALUE arg) { volatile VALUE tmp = rb_check_array_type(arg); if (!NIL_P(tmp)) { return rb_str_format(RARRAY_LENINT(tmp), RARRAY_CONST_PTR(tmp), str); } return rb_str_format(1, &arg, str); } |
#*(integer) ⇒ String
Copy — Returns a new String containing integer
copies of the receiver. integer
must be greater than or equal to 0.
"Ho! " * 3 #=> "Ho! Ho! Ho! "
"Ho! " * 0 #=> ""
1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 |
# File 'string.c', line 1383 VALUE rb_str_times(VALUE str, VALUE times) { VALUE str2; long n, len; char *ptr2; len = NUM2LONG(times); if (len < 0) { rb_raise(rb_eArgError, "negative argument"); } if (len && LONG_MAX/len < RSTRING_LEN(str)) { rb_raise(rb_eArgError, "argument too big"); } str2 = rb_str_new5(str, 0, len *= RSTRING_LEN(str)); ptr2 = RSTRING_PTR(str2); if (len) { n = RSTRING_LEN(str); memcpy(ptr2, RSTRING_PTR(str), n); while (n <= len/2) { memcpy(ptr2 + n, ptr2, n); n *= 2; } memcpy(ptr2 + n, ptr2, len-n); } ptr2[RSTRING_LEN(str2)] = '\0'; OBJ_INFECT(str2, str); rb_enc_cr_str_copy_for_substr(str2, str); return str2; } |
#+(other_str) ⇒ String
Concatenation—Returns a new String
containing other_str concatenated to str.
"Hello from " + self.to_s #=> "Hello from main"
1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 |
# File 'string.c', line 1351 VALUE rb_str_plus(VALUE str1, VALUE str2) { VALUE str3; rb_encoding *enc; StringValue(str2); enc = rb_enc_check(str1, str2); str3 = rb_str_new(0, RSTRING_LEN(str1)+RSTRING_LEN(str2)); memcpy(RSTRING_PTR(str3), RSTRING_PTR(str1), RSTRING_LEN(str1)); memcpy(RSTRING_PTR(str3) + RSTRING_LEN(str1), RSTRING_PTR(str2), RSTRING_LEN(str2)); RSTRING_PTR(str3)[RSTRING_LEN(str3)] = '\0'; if (OBJ_TAINTED(str1) || OBJ_TAINTED(str2)) OBJ_TAINT(str3); ENCODING_CODERANGE_SET(str3, rb_enc_to_index(enc), ENC_CODERANGE_AND(ENC_CODERANGE(str1), ENC_CODERANGE(str2))); return str3; } |
#<<(integer) ⇒ String #concat(integer) ⇒ String #<<(obj) ⇒ String #concat(obj) ⇒ String
Append—Concatenates the given object to str. If the object is a Integer
, it is considered as a codepoint, and is converted to a character before concatenation.
a = "hello "
a << "world" #=> "hello world"
a.concat(33) #=> "hello world!"
2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 |
# File 'string.c', line 2339 VALUE rb_str_concat(VALUE str1, VALUE str2) { unsigned int code; rb_encoding *enc = STR_ENC_GET(str1); if (FIXNUM_P(str2) || RB_TYPE_P(str2, T_BIGNUM)) { if (rb_num_to_uint(str2, &code) == 0) { } else if (FIXNUM_P(str2)) { rb_raise(rb_eRangeError, "%ld out of char range", FIX2LONG(str2)); } else { rb_raise(rb_eRangeError, "bignum out of char range"); } } else { return rb_str_append(str1, str2); } if (enc == rb_usascii_encoding()) { /* US-ASCII automatically extended to ASCII-8BIT */ char buf[1]; buf[0] = (char)code; if (code > 0xFF) { rb_raise(rb_eRangeError, "%u out of char range", code); } rb_str_cat(str1, buf, 1); if (code > 127) { rb_enc_associate(str1, rb_ascii8bit_encoding()); ENC_CODERANGE_SET(str1, ENC_CODERANGE_VALID); } } else { long pos = RSTRING_LEN(str1); int cr = ENC_CODERANGE(str1); int len; char *buf; switch (len = rb_enc_codelen(code, enc)) { case ONIGERR_INVALID_CODE_POINT_VALUE: rb_raise(rb_eRangeError, "invalid codepoint 0x%X in %s", code, rb_enc_name(enc)); break; case ONIGERR_TOO_BIG_WIDE_CHAR_VALUE: case 0: rb_raise(rb_eRangeError, "%u out of char range", code); break; } buf = ALLOCA_N(char, len + 1); rb_enc_mbcput(code, buf, enc); if (rb_enc_precise_mbclen(buf, buf + len + 1, enc) != len) { rb_raise(rb_eRangeError, "invalid codepoint 0x%X in %s", code, rb_enc_name(enc)); } rb_str_resize(str1, pos+len); memcpy(RSTRING_PTR(str1) + pos, buf, len); if (cr == ENC_CODERANGE_7BIT && code > 127) cr = ENC_CODERANGE_VALID; ENC_CODERANGE_SET(str1, cr); } return str1; } |
#<=>(other_string) ⇒ -1, ...
Comparison—Returns -1, 0, +1 or nil depending on whether string
is less than, equal to, or greater than other_string
.
nil
is returned if the two values are incomparable.
If the strings are of different lengths, and the strings are equal when compared up to the shortest length, then the longer string is considered greater than the shorter one.
<=>
is the basis for the methods <
, <=
, >
, >=
, and between?
, included from module Comparable. The method String#== does not use Comparable#==.
"abcdef" <=> "abcde" #=> 1
"abcdef" <=> "abcdef" #=> 0
"abcdef" <=> "abcdefg" #=> -1
"abcdef" <=> "ABCDEF" #=> 1
2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 |
# File 'string.c', line 2595 static VALUE rb_str_cmp_m(VALUE str1, VALUE str2) { int result; if (!RB_TYPE_P(str2, T_STRING)) { VALUE tmp = rb_check_funcall(str2, rb_intern("to_str"), 0, 0); if (RB_TYPE_P(tmp, T_STRING)) { result = rb_str_cmp(str1, tmp); } else { return rb_invcmp(str1, str2); } } else { result = rb_str_cmp(str1, str2); } return INT2FIX(result); } |
#==(obj) ⇒ Boolean #===(obj) ⇒ Boolean
Equality
Returns whether str
== obj
, similar to Object#==.
If obj
is not an instance of String but responds to to_str
, then the two strings are compared using case equality Object#===.
Otherwise, returns similarly to String#eql?, comparing length and content.
2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 |
# File 'string.c', line 2542 VALUE rb_str_equal(VALUE str1, VALUE str2) { if (str1 == str2) return Qtrue; if (!RB_TYPE_P(str2, T_STRING)) { if (!rb_respond_to(str2, rb_intern("to_str"))) { return Qfalse; } return rb_equal(str2, str1); } return str_eql(str1, str2); } |
#==(obj) ⇒ Boolean #===(obj) ⇒ Boolean
Equality
Returns whether str
== obj
, similar to Object#==.
If obj
is not an instance of String but responds to to_str
, then the two strings are compared using case equality Object#===.
Otherwise, returns similarly to String#eql?, comparing length and content.
2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 |
# File 'string.c', line 2542 VALUE rb_str_equal(VALUE str1, VALUE str2) { if (str1 == str2) return Qtrue; if (!RB_TYPE_P(str2, T_STRING)) { if (!rb_respond_to(str2, rb_intern("to_str"))) { return Qfalse; } return rb_equal(str2, str1); } return str_eql(str1, str2); } |
#=~(obj) ⇒ Fixnum?
Match—If obj is a Regexp
, use it as a pattern to match against str,and returns the position the match starts, or nil
if there is no match. Otherwise, invokes obj.=~, passing str as an argument. The default =~
in Object
returns nil
.
Note: str =~ regexp
is not the same as regexp =~ str
. Strings captured from named capture groups are assigned to local variables only in the second case.
"cat o' 9 tails" =~ /\d/ #=> 7
"cat o' 9 tails" =~ 9 #=> nil
2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 |
# File 'string.c', line 2989 static VALUE rb_str_match(VALUE x, VALUE y) { if (SPECIAL_CONST_P(y)) goto generic; switch (BUILTIN_TYPE(y)) { case T_STRING: rb_raise(rb_eTypeError, "type mismatch: String given"); case T_REGEXP: return rb_reg_match(y, x); generic: default: return rb_funcall(y, rb_intern("=~"), 1, x); } } |
#[](index) ⇒ String? #[](start, length) ⇒ String? #[](range) ⇒ String? #[](regexp) ⇒ String? #[](regexp, capture) ⇒ String? #[](match_str) ⇒ String? #slice(index) ⇒ String? #slice(start, length) ⇒ String? #slice(range) ⇒ String? #slice(regexp) ⇒ String? #slice(regexp, capture) ⇒ String? #slice(match_str) ⇒ String?
Element Reference — If passed a single index
, returns a substring of one character at that index. If passed a start
index and a length
, returns a substring containing length
characters starting at the index
. If passed a range
, its beginning and end are interpreted as offsets delimiting the substring to be returned.
In these three cases, if an index is negative, it is counted from the end of the string. For the start
and range
cases the starting index is just before a character and an index matching the string’s size. Additionally, an empty string is returned when the starting index for a character range is at the end of the string.
Returns nil
if the initial index falls outside the string or the length is negative.
If a Regexp
is supplied, the matching portion of the string is returned. If a capture
follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.
If a match_str
is given, that string is returned if it occurs in the string.
Returns nil
if the regular expression does not match or the match string cannot be found.
a = "hello there"
a[1] #=> "e"
a[2, 3] #=> "llo"
a[2..3] #=> "ll"
a[-3, 2] #=> "er"
a[7..-2] #=> "her"
a[-4..-2] #=> "her"
a[-2..-4] #=> ""
a[11, 0] #=> ""
a[11] #=> nil
a[12, 0] #=> nil
a[12..-1] #=> nil
a[/[aeiou](.)\1/] #=> "ell"
a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil
a[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, "non_vowel"] #=> "l"
a[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, "vowel"] #=> "e"
a["lo"] #=> "lo"
a["bye"] #=> nil
3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 |
# File 'string.c', line 3621 static VALUE rb_str_aref_m(int argc, VALUE *argv, VALUE str) { if (argc == 2) { if (RB_TYPE_P(argv[0], T_REGEXP)) { return rb_str_subpat(str, argv[0], argv[1]); } return rb_str_substr(str, NUM2LONG(argv[0]), NUM2LONG(argv[1])); } rb_check_arity(argc, 1, 2); return rb_str_aref(str, argv[0]); } |
#[]=(fixnum) ⇒ Object #[]=(fixnum, fixnum) ⇒ Object #[]=(range) ⇒ Object #[]=(regexp) ⇒ Object #[]=(regexp, fixnum) ⇒ Object #[]=(regexp, name) ⇒ Object #[]=(other_str) ⇒ Object
Element Assignment—Replaces some or all of the content of str. The portion of the string affected is determined using the same criteria as String#[]
. If the replacement string is not the same length as the text it is replacing, the string will be adjusted accordingly. If the regular expression or string is used as the index doesn’t match a position in the string, IndexError
is raised. If the regular expression form is used, the optional second Fixnum
allows you to specify which portion of the match to replace (effectively using the MatchData
indexing rules. The forms that take a Fixnum
will raise an IndexError
if the value is out of range; the Range
form will raise a RangeError
, and the Regexp
and String
will raise an IndexError
on negative match.
3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 |
# File 'string.c', line 3849 static VALUE rb_str_aset_m(int argc, VALUE *argv, VALUE str) { if (argc == 3) { if (RB_TYPE_P(argv[0], T_REGEXP)) { rb_str_subpat_set(str, argv[0], argv[1], argv[2]); } else { rb_str_splice(str, NUM2LONG(argv[0]), NUM2LONG(argv[1]), argv[2]); } return argv[2]; } rb_check_arity(argc, 2, 3); return rb_str_aset(str, argv[0], argv[1]); } |
#ascii_only? ⇒ Boolean
Returns true for a string which has only ASCII characters.
"abc".force_encoding("UTF-8").ascii_only? #=> true
"abc\u{6666}".force_encoding("UTF-8").ascii_only? #=> false
7949 7950 7951 7952 7953 7954 7955 |
# File 'string.c', line 7949 static VALUE rb_str_is_ascii_only_p(VALUE str) { int cr = rb_enc_str_coderange(str); return cr == ENC_CODERANGE_7BIT ? Qtrue : Qfalse; } |
#b ⇒ String
Returns a copied string whose encoding is ASCII-8BIT.
7910 7911 7912 7913 7914 7915 7916 7917 7918 |
# File 'string.c', line 7910 static VALUE rb_str_b(VALUE str) { VALUE str2 = str_alloc(rb_cString); str_replace_shared_without_enc(str2, str); OBJ_INFECT(str2, str); ENC_CODERANGE_CLEAR(str2); return str2; } |
#bytes ⇒ Array
Returns an array of bytes in str. This is a shorthand for str.each_byte.to_a
.
If a block is given, which is a deprecated form, works the same as each_byte
.
6668 6669 6670 6671 6672 |
# File 'string.c', line 6668 static VALUE rb_str_bytes(VALUE str) { return rb_str_enumerate_bytes(str, 1); } |
#bytesize ⇒ Integer
Returns the length of str
in bytes.
"\x80\u3042".bytesize #=> 4
"hello".bytesize #=> 5
1316 1317 1318 1319 1320 |
# File 'string.c', line 1316 static VALUE rb_str_bytesize(VALUE str) { return LONG2NUM(RSTRING_LEN(str)); } |
#byteslice(fixnum) ⇒ String? #byteslice(fixnum, fixnum) ⇒ String? #byteslice(range) ⇒ String?
Byte Reference—If passed a single Fixnum
, returns a substring of one byte at that position. If passed two Fixnum
objects, returns a substring starting at the offset given by the first, and a length given by the second. If given a Range
, a substring containing bytes at offsets given by the range is returned. In all three cases, if an offset is negative, it is counted from the end of str. Returns nil
if the initial offset falls outside the string, the length is negative, or the beginning of the range is greater than the end. The encoding of the resulted string keeps original encoding.
"hello".byteslice(1) #=> "e"
"hello".byteslice(-1) #=> "o"
"hello".byteslice(1, 2) #=> "el"
"\x80\u3042".byteslice(1, 3) #=> "\u3042"
"\x03\u3042\xff".byteslice(1..3) #=> "\u3042"
4519 4520 4521 4522 4523 4524 4525 4526 4527 |
# File 'string.c', line 4519 static VALUE rb_str_byteslice(int argc, VALUE *argv, VALUE str) { if (argc == 2) { return str_byte_substr(str, NUM2LONG(argv[0]), NUM2LONG(argv[1])); } rb_check_arity(argc, 1, 2); return str_byte_aref(str, argv[0]); } |
#capitalize ⇒ String
Returns a copy of str with the first character converted to uppercase and the remainder to lowercase. Note: case conversion is effective only in ASCII region.
"hello".capitalize #=> "Hello"
"HELLO".capitalize #=> "Hello"
"123ABC".capitalize #=> "123abc"
5267 5268 5269 5270 5271 5272 5273 |
# File 'string.c', line 5267 static VALUE rb_str_capitalize(VALUE str) { str = rb_str_dup(str); rb_str_capitalize_bang(str); return str; } |
#capitalize! ⇒ String?
Modifies str by converting the first character to uppercase and the remainder to lowercase. Returns nil
if no changes are made. Note: case conversion is effective only in ASCII region.
a = "hello"
a.capitalize! #=> "Hello"
a #=> "Hello"
a.capitalize! #=> nil
5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 |
# File 'string.c', line 5219 static VALUE rb_str_capitalize_bang(VALUE str) { rb_encoding *enc; char *s, *send; int modify = 0; unsigned int c; int n; str_modify_keep_cr(str); enc = STR_ENC_GET(str); rb_str_check_dummy_enc(enc); if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return Qnil; s = RSTRING_PTR(str); send = RSTRING_END(str); c = rb_enc_codepoint_len(s, send, &n, enc); if (rb_enc_islower(c, enc)) { rb_enc_mbcput(rb_enc_toupper(c, enc), s, enc); modify = 1; } s += n; while (s < send) { c = rb_enc_codepoint_len(s, send, &n, enc); if (rb_enc_isupper(c, enc)) { rb_enc_mbcput(rb_enc_tolower(c, enc), s, enc); modify = 1; } s += n; } if (modify) return str; return Qnil; } |
#casecmp(other_str) ⇒ -1, ...
Case-insensitive version of String#<=>
.
"abcdef".casecmp("abcde") #=> 1
"aBcDeF".casecmp("abcdef") #=> 0
"abcdef".casecmp("abcdefg") #=> -1
"abcdef".casecmp("ABCDEF") #=> 0
2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 |
# File 'string.c', line 2627 static VALUE rb_str_casecmp(VALUE str1, VALUE str2) { long len; rb_encoding *enc; char *p1, *p1end, *p2, *p2end; StringValue(str2); enc = rb_enc_compatible(str1, str2); if (!enc) { return Qnil; } p1 = RSTRING_PTR(str1); p1end = RSTRING_END(str1); p2 = RSTRING_PTR(str2); p2end = RSTRING_END(str2); if (single_byte_optimizable(str1) && single_byte_optimizable(str2)) { while (p1 < p1end && p2 < p2end) { if (*p1 != *p2) { unsigned int c1 = TOUPPER(*p1 & 0xff); unsigned int c2 = TOUPPER(*p2 & 0xff); if (c1 != c2) return INT2FIX(c1 < c2 ? -1 : 1); } p1++; p2++; } } else { while (p1 < p1end && p2 < p2end) { int l1, c1 = rb_enc_ascget(p1, p1end, &l1, enc); int l2, c2 = rb_enc_ascget(p2, p2end, &l2, enc); if (0 <= c1 && 0 <= c2) { c1 = TOUPPER(c1); c2 = TOUPPER(c2); if (c1 != c2) return INT2FIX(c1 < c2 ? -1 : 1); } else { int r; l1 = rb_enc_mbclen(p1, p1end, enc); l2 = rb_enc_mbclen(p2, p2end, enc); len = l1 < l2 ? l1 : l2; r = memcmp(p1, p2, len); if (r != 0) return INT2FIX(r < 0 ? -1 : 1); if (l1 != l2) return INT2FIX(l1 < l2 ? -1 : 1); } p1 += l1; p2 += l2; } } if (RSTRING_LEN(str1) == RSTRING_LEN(str2)) return INT2FIX(0); if (RSTRING_LEN(str1) > RSTRING_LEN(str2)) return INT2FIX(1); return INT2FIX(-1); } |
#center(width, padstr = ' ') ⇒ String
Centers str
in width
. If width
is greater than the length of str
, returns a new String of length width
with str
centered and padded with padstr
; otherwise, returns str
.
"hello".center(4) #=> "hello"
"hello".center(20) #=> " hello "
"hello".center(20, '123') #=> "1231231hello12312312"
7710 7711 7712 7713 7714 |
# File 'string.c', line 7710 static VALUE rb_str_center(int argc, VALUE *argv, VALUE str) { return rb_str_justify(argc, argv, str, 'c'); } |
#chars ⇒ Array
Returns an array of characters in str. This is a shorthand for str.each_char.to_a
.
If a block is given, which is a deprecated form, works the same as each_char
.
6774 6775 6776 6777 6778 |
# File 'string.c', line 6774 static VALUE rb_str_chars(VALUE str) { return rb_str_enumerate_chars(str, 1); } |
#chomp(separator = $/) ⇒ String
Returns a new String
with the given record separator removed from the end of str (if present). If $/
has not been changed from the default Ruby record separator, then chomp
also removes carriage return characters (that is it will remove \n
, \r
, and \r\n
). If $/
is an empty string, it will remove all trailing newlines from the string.
"hello".chomp #=> "hello"
"hello\n".chomp #=> "hello"
"hello\r\n".chomp #=> "hello"
"hello\n\r".chomp #=> "hello\n"
"hello\r".chomp #=> "hello"
"hello \n there".chomp #=> "hello \n there"
"hello".chomp("llo") #=> "he"
"hello\r\n\r\n".chomp('') #=> "hello"
"hello\r\n\r\r\n".chomp('') #=> "hello\r\n\r"
7075 7076 7077 7078 7079 7080 7081 |
# File 'string.c', line 7075 static VALUE rb_str_chomp(int argc, VALUE *argv, VALUE str) { str = rb_str_dup(str); rb_str_chomp_bang(argc, argv, str); return str; } |
#chomp!(separator = $/) ⇒ String?
Modifies str in place as described for String#chomp
, returning str, or nil
if no modifications were made.
6952 6953 6954 6955 6956 6957 6958 6959 6960 6961 6962 6963 6964 6965 6966 6967 6968 6969 6970 6971 6972 6973 6974 6975 6976 6977 6978 6979 6980 6981 6982 6983 6984 6985 6986 6987 6988 6989 6990 6991 6992 6993 6994 6995 6996 6997 6998 6999 7000 7001 7002 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012 7013 7014 7015 7016 7017 7018 7019 7020 7021 7022 7023 7024 7025 7026 7027 7028 7029 7030 7031 7032 7033 7034 7035 7036 7037 7038 7039 7040 7041 7042 7043 7044 7045 7046 7047 7048 7049 7050 |
# File 'string.c', line 6952 static VALUE rb_str_chomp_bang(int argc, VALUE *argv, VALUE str) { rb_encoding *enc; VALUE rs; int newline; char *p, *pp, *e; long len, rslen; str_modify_keep_cr(str); len = RSTRING_LEN(str); if (len == 0) return Qnil; p = RSTRING_PTR(str); e = p + len; if (argc == 0) { rs = rb_rs; if (rs == rb_default_rs) { smart_chomp: enc = rb_enc_get(str); if (rb_enc_mbminlen(enc) > 1) { pp = rb_enc_left_char_head(p, e-rb_enc_mbminlen(enc), e, enc); if (rb_enc_is_newline(pp, e, enc)) { e = pp; } pp = e - rb_enc_mbminlen(enc); if (pp >= p) { pp = rb_enc_left_char_head(p, pp, e, enc); if (rb_enc_ascget(pp, e, 0, enc) == '\r') { e = pp; } } if (e == RSTRING_END(str)) { return Qnil; } len = e - RSTRING_PTR(str); STR_SET_LEN(str, len); } else { if (RSTRING_PTR(str)[len-1] == '\n') { STR_DEC_LEN(str); if (RSTRING_LEN(str) > 0 && RSTRING_PTR(str)[RSTRING_LEN(str)-1] == '\r') { STR_DEC_LEN(str); } } else if (RSTRING_PTR(str)[len-1] == '\r') { STR_DEC_LEN(str); } else { return Qnil; } } RSTRING_PTR(str)[RSTRING_LEN(str)] = '\0'; return str; } } else { rb_scan_args(argc, argv, "01", &rs); } if (NIL_P(rs)) return Qnil; StringValue(rs); rslen = RSTRING_LEN(rs); if (rslen == 0) { while (len>0 && p[len-1] == '\n') { len--; if (len>0 && p[len-1] == '\r') len--; } if (len < RSTRING_LEN(str)) { STR_SET_LEN(str, len); RSTRING_PTR(str)[len] = '\0'; return str; } return Qnil; } if (rslen > len) return Qnil; newline = RSTRING_PTR(rs)[rslen-1]; if (rslen == 1 && newline == '\n') goto smart_chomp; enc = rb_enc_check(str, rs); if (is_broken_string(rs)) { return Qnil; } pp = e - rslen; if (p[len-1] == newline && (rslen <= 1 || memcmp(RSTRING_PTR(rs), pp, rslen) == 0)) { if (rb_enc_left_char_head(p, pp, e, enc) != pp) return Qnil; if (ENC_CODERANGE(str) != ENC_CODERANGE_7BIT) { ENC_CODERANGE_CLEAR(str); } STR_SET_LEN(str, RSTRING_LEN(str) - rslen); RSTRING_PTR(str)[RSTRING_LEN(str)] = '\0'; return str; } return Qnil; } |
#chop ⇒ String
Returns a new String
with the last character removed. If the string ends with \r\n
, both characters are removed. Applying chop
to an empty string returns an empty string. String#chomp
is often a safer alternative, as it leaves the string unchanged if it doesn’t end in a record separator.
"string\r\n".chop #=> "string"
"string\n\r".chop #=> "string\n"
"string\n".chop #=> "string"
"string".chop #=> "strin"
"x".chop.chop #=> ""
6937 6938 6939 6940 6941 |
# File 'string.c', line 6937 static VALUE rb_str_chop(VALUE str) { return rb_str_subseq(str, 0, chopped_length(str)); } |
#chop! ⇒ String?
Processes str as for String#chop
, returning str, or nil
if str is the empty string. See also String#chomp!
.
6902 6903 6904 6905 6906 6907 6908 6909 6910 6911 6912 6913 6914 6915 6916 6917 |
# File 'string.c', line 6902 static VALUE rb_str_chop_bang(VALUE str) { str_modify_keep_cr(str); if (RSTRING_LEN(str) > 0) { long len; len = chopped_length(str); STR_SET_LEN(str, len); RSTRING_PTR(str)[len] = '\0'; if (ENC_CODERANGE(str) != ENC_CODERANGE_7BIT) { ENC_CODERANGE_CLEAR(str); } return str; } return Qnil; } |
#chr ⇒ String
Returns a one-character string at the beginning of the string.
a = "abcde"
a.chr #=> "a"
4358 4359 4360 4361 4362 |
# File 'string.c', line 4358 static VALUE rb_str_chr(VALUE str) { return rb_str_substr(str, 0, 1); } |
#clear ⇒ String
Makes string empty.
a = "abcde"
a.clear #=> ""
4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 |
# File 'string.c', line 4334 static VALUE rb_str_clear(VALUE str) { str_discard(str); STR_SET_EMBED(str); STR_SET_EMBED_LEN(str, 0); RSTRING_PTR(str)[0] = 0; if (rb_enc_asciicompat(STR_ENC_GET(str))) ENC_CODERANGE_SET(str, ENC_CODERANGE_7BIT); else ENC_CODERANGE_SET(str, ENC_CODERANGE_VALID); return str; } |
#codepoints ⇒ Array
Returns an array of the Integer
ordinals of the characters in str. This is a shorthand for str.each_codepoint.to_a
.
If a block is given, which is a deprecated form, works the same as each_codepoint
.
6868 6869 6870 6871 6872 |
# File 'string.c', line 6868 static VALUE rb_str_codepoints(VALUE str) { return rb_str_enumerate_codepoints(str, 1); } |
#<<(integer) ⇒ String #concat(integer) ⇒ String #<<(obj) ⇒ String #concat(obj) ⇒ String
Append—Concatenates the given object to str. If the object is a Integer
, it is considered as a codepoint, and is converted to a character before concatenation.
a = "hello "
a << "world" #=> "hello world"
a.concat(33) #=> "hello world!"
2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 |
# File 'string.c', line 2339 VALUE rb_str_concat(VALUE str1, VALUE str2) { unsigned int code; rb_encoding *enc = STR_ENC_GET(str1); if (FIXNUM_P(str2) || RB_TYPE_P(str2, T_BIGNUM)) { if (rb_num_to_uint(str2, &code) == 0) { } else if (FIXNUM_P(str2)) { rb_raise(rb_eRangeError, "%ld out of char range", FIX2LONG(str2)); } else { rb_raise(rb_eRangeError, "bignum out of char range"); } } else { return rb_str_append(str1, str2); } if (enc == rb_usascii_encoding()) { /* US-ASCII automatically extended to ASCII-8BIT */ char buf[1]; buf[0] = (char)code; if (code > 0xFF) { rb_raise(rb_eRangeError, "%u out of char range", code); } rb_str_cat(str1, buf, 1); if (code > 127) { rb_enc_associate(str1, rb_ascii8bit_encoding()); ENC_CODERANGE_SET(str1, ENC_CODERANGE_VALID); } } else { long pos = RSTRING_LEN(str1); int cr = ENC_CODERANGE(str1); int len; char *buf; switch (len = rb_enc_codelen(code, enc)) { case ONIGERR_INVALID_CODE_POINT_VALUE: rb_raise(rb_eRangeError, "invalid codepoint 0x%X in %s", code, rb_enc_name(enc)); break; case ONIGERR_TOO_BIG_WIDE_CHAR_VALUE: case 0: rb_raise(rb_eRangeError, "%u out of char range", code); break; } buf = ALLOCA_N(char, len + 1); rb_enc_mbcput(code, buf, enc); if (rb_enc_precise_mbclen(buf, buf + len + 1, enc) != len) { rb_raise(rb_eRangeError, "invalid codepoint 0x%X in %s", code, rb_enc_name(enc)); } rb_str_resize(str1, pos+len); memcpy(RSTRING_PTR(str1) + pos, buf, len); if (cr == ENC_CODERANGE_7BIT && code > 127) cr = ENC_CODERANGE_VALID; ENC_CODERANGE_SET(str1, cr); } return str1; } |
#count([other_str]) ⇒ Fixnum
Each other_str
parameter defines a set of characters to count. The intersection of these sets defines the characters to count in str
. Any other_str
that starts with a caret ^
is negated. The sequence c1-c2
means all characters between c1 and c2. The backslash character </code> can be used to escape <code>^
or -
and is otherwise ignored unless it appears at the end of a sequence or the end of a other_str
.
a = "hello world"
a.count "lo" #=> 5
a.count "lo", "o" #=> 2
a.count "hello", "^l" #=> 4
a.count "ej-m" #=> 4
"hello^world".count "\\^aeiou" #=> 4
"hello-world".count "a\\-eo" #=> 4
c = "hello world\\r\\n"
c.count "\\" #=> 2
c.count "\\A" #=> 0
c.count "X-\\w" #=> 3
6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 |
# File 'string.c', line 6057 static VALUE rb_str_count(int argc, VALUE *argv, VALUE str) { char table[TR_TABLE_SIZE]; rb_encoding *enc = 0; VALUE del = 0, nodel = 0, tstr; char *s, *send; int i; int ascompat; rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS); tstr = argv[0]; StringValue(tstr); enc = rb_enc_check(str, tstr); if (argc == 1) { const char *ptstr; if (RSTRING_LEN(tstr) == 1 && rb_enc_asciicompat(enc) && (ptstr = RSTRING_PTR(tstr), ONIGENC_IS_ALLOWED_REVERSE_MATCH(enc, (const unsigned char *)ptstr, (const unsigned char *)ptstr+1)) && !is_broken_string(str)) { int n = 0; int clen; unsigned char c = rb_enc_codepoint_len(ptstr, ptstr+1, &clen, enc); s = RSTRING_PTR(str); if (!s || RSTRING_LEN(str) == 0) return INT2FIX(0); send = RSTRING_END(str); while (s < send) { if (*(unsigned char*)s++ == c) n++; } return INT2NUM(n); } } tr_setup_table(tstr, table, TRUE, &del, &nodel, enc); for (i=1; i<argc; i++) { tstr = argv[i]; StringValue(tstr); enc = rb_enc_check(str, tstr); tr_setup_table(tstr, table, FALSE, &del, &nodel, enc); } s = RSTRING_PTR(str); if (!s || RSTRING_LEN(str) == 0) return INT2FIX(0); send = RSTRING_END(str); ascompat = rb_enc_asciicompat(enc); i = 0; while (s < send) { unsigned int c; if (ascompat && (c = *(unsigned char*)s) < 0x80) { if (table[c]) { i++; } s++; } else { int clen; c = rb_enc_codepoint_len(s, send, &clen, enc); if (tr_find(c, table, del, nodel)) { i++; } s += clen; } } return INT2NUM(i); } |
#crypt(salt_str) ⇒ String
Applies a one-way cryptographic hash to str by invoking the standard library function crypt(3)
with the given salt string. While the format and the result are system and implementation dependent, using a salt matching the regular expression \A[a-zA-Z0-9./]{2}
should be valid and safe on any platform, in which only the first two characters are significant.
This method is for use in system specific scripts, so if you want a cross-platform hash function consider using Digest or OpenSSL instead.
7412 7413 7414 7415 7416 7417 7418 7419 7420 7421 7422 7423 7424 7425 7426 7427 7428 7429 7430 7431 7432 7433 7434 7435 7436 7437 7438 7439 7440 7441 7442 7443 7444 7445 7446 |
# File 'string.c', line 7412 static VALUE rb_str_crypt(VALUE str, VALUE salt) { extern char *crypt(const char *, const char *); VALUE result; const char *s, *saltp; char *res; #ifdef BROKEN_CRYPT char salt_8bit_clean[3]; #endif StringValue(salt); if (RSTRING_LEN(salt) < 2) rb_raise(rb_eArgError, "salt too short (need >=2 bytes)"); s = RSTRING_PTR(str); if (!s) s = ""; saltp = RSTRING_PTR(salt); #ifdef BROKEN_CRYPT if (!ISASCII((unsigned char)saltp[0]) || !ISASCII((unsigned char)saltp[1])) { salt_8bit_clean[0] = saltp[0] & 0x7f; salt_8bit_clean[1] = saltp[1] & 0x7f; salt_8bit_clean[2] = '\0'; saltp = salt_8bit_clean; } #endif res = crypt(s, saltp); if (!res) { rb_sys_fail("crypt"); } result = rb_str_new2(res); OBJ_INFECT(result, str); OBJ_INFECT(result, salt); return result; } |
#delete([other_str]) ⇒ String
Returns a copy of str with all characters in the intersection of its arguments deleted. Uses the same rules for building the set of characters as String#count
.
"hello".delete "l","lo" #=> "heo"
"hello".delete "lo" #=> "he"
"hello".delete "aeiou", "^e" #=> "hell"
"hello".delete "ej-m" #=> "ho"
5877 5878 5879 5880 5881 5882 5883 |
# File 'string.c', line 5877 static VALUE rb_str_delete(int argc, VALUE *argv, VALUE str) { str = rb_str_dup(str); rb_str_delete_bang(argc, argv, str); return str; } |
#delete!([other_str]) ⇒ String?
Performs a delete
operation in place, returning str, or nil
if str was not modified.
5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 |
# File 'string.c', line 5801 static VALUE rb_str_delete_bang(int argc, VALUE *argv, VALUE str) { char squeez[TR_TABLE_SIZE]; rb_encoding *enc = 0; char *s, *send, *t; VALUE del = 0, nodel = 0; int modify = 0; int i, ascompat, cr; if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return Qnil; rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS); for (i=0; i<argc; i++) { VALUE s = argv[i]; StringValue(s); enc = rb_enc_check(str, s); tr_setup_table(s, squeez, i==0, &del, &nodel, enc); } str_modify_keep_cr(str); ascompat = rb_enc_asciicompat(enc); s = t = RSTRING_PTR(str); send = RSTRING_END(str); cr = ascompat ? ENC_CODERANGE_7BIT : ENC_CODERANGE_VALID; while (s < send) { unsigned int c; int clen; if (ascompat && (c = *(unsigned char*)s) < 0x80) { if (squeez[c]) { modify = 1; } else { if (t != s) *t = c; t++; } s++; } else { c = rb_enc_codepoint_len(s, send, &clen, enc); if (tr_find(c, squeez, del, nodel)) { modify = 1; } else { if (t != s) rb_enc_mbcput(c, t, enc); t += clen; if (cr == ENC_CODERANGE_7BIT) cr = ENC_CODERANGE_VALID; } s += clen; } } *t = '\0'; STR_SET_LEN(str, t - RSTRING_PTR(str)); ENC_CODERANGE_SET(str, cr); if (modify) return str; return Qnil; } |
#downcase ⇒ String
Returns a copy of str with all uppercase letters replaced with their lowercase counterparts. The operation is locale insensitive—only characters “A” to “Z” are affected. Note: case replacement is effective only in ASCII region.
"hEllO".downcase #=> "hello"
5196 5197 5198 5199 5200 5201 5202 |
# File 'string.c', line 5196 static VALUE rb_str_downcase(VALUE str) { str = rb_str_dup(str); rb_str_downcase_bang(str); return str; } |
#downcase! ⇒ String?
Downcases the contents of str, returning nil
if no changes were made. Note: case replacement is effective only in ASCII region.
5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 |
# File 'string.c', line 5131 static VALUE rb_str_downcase_bang(VALUE str) { rb_encoding *enc; char *s, *send; int modify = 0; str_modify_keep_cr(str); enc = STR_ENC_GET(str); rb_str_check_dummy_enc(enc); s = RSTRING_PTR(str); send = RSTRING_END(str); if (single_byte_optimizable(str)) { while (s < send) { unsigned int c = *(unsigned char*)s; if (rb_enc_isascii(c, enc) && 'A' <= c && c <= 'Z') { *s = 'a' + (c - 'A'); modify = 1; } s++; } } else { int ascompat = rb_enc_asciicompat(enc); while (s < send) { unsigned int c; int n; if (ascompat && (c = *(unsigned char*)s) < 0x80) { if (rb_enc_isascii(c, enc) && 'A' <= c && c <= 'Z') { *s = 'a' + (c - 'A'); modify = 1; } s++; } else { c = rb_enc_codepoint_len(s, send, &n, enc); if (rb_enc_isupper(c, enc)) { /* assuming toupper returns codepoint with same size */ rb_enc_mbcput(rb_enc_tolower(c, enc), s, enc); modify = 1; } s += n; } } } if (modify) return str; return Qnil; } |
#dump ⇒ String
Produces a version of str
with all non-printing characters replaced by \nnn
notation and all special characters escaped.
"hello \n ''".dump #=> "\"hello \\n ''\"
4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 |
# File 'string.c', line 4898 VALUE rb_str_dump(VALUE str) { rb_encoding *enc = rb_enc_get(str); long len; const char *p, *pend; char *q, *qend; VALUE result; int u8 = (enc == rb_utf8_encoding()); len = 2; /* "" */ p = RSTRING_PTR(str); pend = p + RSTRING_LEN(str); while (p < pend) { unsigned char c = *p++; switch (c) { case '"': case '\\': case '\n': case '\r': case '\t': case '\f': case '\013': case '\010': case '\007': case '\033': len += 2; break; case '#': len += IS_EVSTR(p, pend) ? 2 : 1; break; default: if (ISPRINT(c)) { len++; } else { if (u8 && c > 0x7F) { /* \u{NN} */ int n = rb_enc_precise_mbclen(p-1, pend, enc); if (MBCLEN_CHARFOUND_P(n)) { unsigned int cc = rb_enc_mbc_to_codepoint(p-1, pend, enc); while (cc >>= 4) len++; len += 5; p += MBCLEN_CHARFOUND_LEN(n)-1; break; } } len += 4; /* \xNN */ } break; } } if (!rb_enc_asciicompat(enc)) { len += 19; /* ".force_encoding('')" */ len += strlen(enc->name); } result = rb_str_new5(str, 0, len); p = RSTRING_PTR(str); pend = p + RSTRING_LEN(str); q = RSTRING_PTR(result); qend = q + len + 1; *q++ = '"'; while (p < pend) { unsigned char c = *p++; if (c == '"' || c == '\\') { *q++ = '\\'; *q++ = c; } else if (c == '#') { if (IS_EVSTR(p, pend)) *q++ = '\\'; *q++ = '#'; } else if (c == '\n') { *q++ = '\\'; *q++ = 'n'; } else if (c == '\r') { *q++ = '\\'; *q++ = 'r'; } else if (c == '\t') { *q++ = '\\'; *q++ = 't'; } else if (c == '\f') { *q++ = '\\'; *q++ = 'f'; } else if (c == '\013') { *q++ = '\\'; *q++ = 'v'; } else if (c == '\010') { *q++ = '\\'; *q++ = 'b'; } else if (c == '\007') { *q++ = '\\'; *q++ = 'a'; } else if (c == '\033') { *q++ = '\\'; *q++ = 'e'; } else if (ISPRINT(c)) { *q++ = c; } else { *q++ = '\\'; if (u8) { int n = rb_enc_precise_mbclen(p-1, pend, enc) - 1; if (MBCLEN_CHARFOUND_P(n)) { int cc = rb_enc_mbc_to_codepoint(p-1, pend, enc); p += n; snprintf(q, qend-q, "u{%x}", cc); q += strlen(q); continue; } } snprintf(q, qend-q, "x%02X", c); q += 3; } } *q++ = '"'; *q = '\0'; if (!rb_enc_asciicompat(enc)) { snprintf(q, qend-q, ".force_encoding(\"%s\")", enc->name); enc = rb_ascii8bit_encoding(); } OBJ_INFECT(result, str); /* result from dump is ASCII */ rb_enc_associate(result, enc); ENC_CODERANGE_SET(result, ENC_CODERANGE_7BIT); return result; } |
#each_byte {|fixnum| ... } ⇒ String #each_byte ⇒ Object
Passes each byte in str to the given block, or returns an enumerator if no block is given.
"hello".each_byte {|c| print c, ' ' }
produces:
104 101 108 108 111
6651 6652 6653 6654 6655 |
# File 'string.c', line 6651 static VALUE rb_str_each_byte(VALUE str) { return rb_str_enumerate_bytes(str, 0); } |
#each_char {|cstr| ... } ⇒ String #each_char ⇒ Object
Passes each character in str to the given block, or returns an enumerator if no block is given.
"hello".each_char {|c| print c, ' ' }
produces:
h e l l o
6757 6758 6759 6760 6761 |
# File 'string.c', line 6757 static VALUE rb_str_each_char(VALUE str) { return rb_str_enumerate_chars(str, 0); } |
#each_codepoint {|integer| ... } ⇒ String #each_codepoint ⇒ Object
Passes the Integer
ordinal of each character in str, also known as a codepoint when applied to Unicode strings to the given block.
If no block is given, an enumerator is returned instead.
"hello\u0639".each_codepoint {|c| print c, ' ' }
produces:
104 101 108 108 111 1593
6850 6851 6852 6853 6854 |
# File 'string.c', line 6850 static VALUE rb_str_each_codepoint(VALUE str) { return rb_str_enumerate_codepoints(str, 0); } |
#each_line(separator = $/) {|substr| ... } ⇒ String #each_line(separator = $/) ⇒ Object
Splits str using the supplied parameter as the record separator ($/
by default), passing each substring in turn to the supplied block. If a zero-length record separator is supplied, the string is split into paragraphs delimited by multiple successive newlines.
If no block is given, an enumerator is returned instead.
print "Example one\n"
"hello\nworld".each_line {|s| p s}
print "Example two\n"
"hello\nworld".each_line('l') {|s| p s}
print "Example three\n"
"hello\n\n\nworld".each_line('') {|s| p s}
produces:
Example one
"hello\n"
"world"
Example two
"hel"
"l"
"o\nworl"
"d"
Example three
"hello\n\n\n"
"world"
6570 6571 6572 6573 6574 |
# File 'string.c', line 6570 static VALUE rb_str_each_line(int argc, VALUE *argv, VALUE str) { return rb_str_enumerate_lines(argc, argv, str, 0); } |
#empty? ⇒ Boolean
Returns true
if str has a length of zero.
"hello".empty? #=> false
" ".empty? #=> false
"".empty? #=> true
1333 1334 1335 1336 1337 1338 1339 |
# File 'string.c', line 1333 static VALUE rb_str_empty(VALUE str) { if (RSTRING_LEN(str) == 0) return Qtrue; return Qfalse; } |
#encode(encoding[, options]) ⇒ String #encode(dst_encoding, src_encoding[, options]) ⇒ String #encode([options]) ⇒ String
The first form returns a copy of str
transcoded to encoding encoding
. The second form returns a copy of str
transcoded from src_encoding to dst_encoding. The last form returns a copy of str
transcoded to Encoding.default_internal
.
By default, the first and second form raise Encoding::UndefinedConversionError for characters that are undefined in the destination encoding, and Encoding::InvalidByteSequenceError for invalid byte sequences in the source encoding. The last form by default does not raise exceptions but uses replacement strings.
The options
Hash gives details for conversion and can have the following keys:
- :invalid
-
If the value is
:replace
, #encode replaces invalid byte sequences instr
with the replacement character. The default is to raise the Encoding::InvalidByteSequenceError exception - :undef
-
If the value is
:replace
, #encode replaces characters which are undefined in the destination encoding with the replacement character. The default is to raise the Encoding::UndefinedConversionError. - :replace
-
Sets the replacement string to the given value. The default replacement string is “uFFFD” for Unicode encoding forms, and “?” otherwise.
- :fallback
-
Sets the replacement string by the given object for undefined character. The object should be a Hash, a Proc, a Method, or an object which has [] method. Its key is an undefined character encoded in the source encoding of current transcoder. Its value can be any encoding until it can be converted into the destination encoding of the transcoder.
- :xml
-
The value must be
:text
or:attr
. If the value is:text
#encode replaces undefined characters with their (upper-case hexadecimal) numeric character references. ‘&’, ‘<’, and ‘>’ are converted to “&”, “<”, and “>”, respectively. If the value is:attr
, #encode also quotes the replacement result (using ‘“’), and replaces ‘”’ with “"”. - :cr_newline
-
Replaces LF (“n”) with CR (“r”) if value is true.
- :crlf_newline
-
Replaces LF (“n”) with CRLF (“rn”) if value is true.
- :universal_newline
-
Replaces CRLF (“rn”) and CR (“r”) with LF (“n”) if value is true.
2876 2877 2878 2879 2880 2881 2882 |
# File 'transcode.c', line 2876 static VALUE str_encode(int argc, VALUE *argv, VALUE str) { VALUE newstr = str; int encidx = str_transcode(argc, argv, &newstr); return encoded_dup(newstr, str, encidx); } |
#encode!(encoding[, options]) ⇒ String #encode!(dst_encoding, src_encoding[, options]) ⇒ String
The first form transcodes the contents of str from str.encoding to encoding
. The second form transcodes the contents of str from src_encoding to dst_encoding. The options Hash gives details for conversion. See String#encode for details. Returns the string even if no changes were made.
2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 |
# File 'transcode.c', line 2798 static VALUE str_encode_bang(int argc, VALUE *argv, VALUE str) { VALUE newstr; int encidx; rb_check_frozen(str); newstr = str; encidx = str_transcode(argc, argv, &newstr); if (encidx < 0) return str; if (newstr == str) { rb_enc_associate_index(str, encidx); return str; } rb_str_shared_replace(str, newstr); return str_encode_associate(str, encidx); } |
#encoding ⇒ Encoding
Returns the Encoding object that represents the encoding of obj.
929 930 931 932 933 934 935 936 937 |
# File 'encoding.c', line 929 VALUE rb_obj_encoding(VALUE obj) { int idx = rb_enc_get_index(obj); if (idx < 0) { rb_raise(rb_eTypeError, "unknown encoding"); } return rb_enc_from_encoding_index(idx & ENC_INDEX_MASK); } |
#end_with?([suffixes]) ⇒ Boolean
Returns true if str
ends with one of the suffixes
given.
7854 7855 7856 7857 7858 7859 7860 7861 7862 7863 7864 7865 7866 7867 7868 7869 7870 7871 7872 7873 7874 7875 |
# File 'string.c', line 7854 static VALUE rb_str_end_with(int argc, VALUE *argv, VALUE str) { int i; char *p, *s, *e; rb_encoding *enc; for (i=0; i<argc; i++) { VALUE tmp = argv[i]; StringValue(tmp); enc = rb_enc_check(str, tmp); if (RSTRING_LEN(str) < RSTRING_LEN(tmp)) continue; p = RSTRING_PTR(str); e = p + RSTRING_LEN(str); s = e - RSTRING_LEN(tmp); if (rb_enc_left_char_head(p, s, e, enc) != s) continue; if (memcmp(s, RSTRING_PTR(tmp), RSTRING_LEN(tmp)) == 0) return Qtrue; } return Qfalse; } |
#eql?(other) ⇒ Boolean
Two strings are equal if they have the same length and content.
2562 2563 2564 2565 2566 2567 2568 |
# File 'string.c', line 2562 static VALUE rb_str_eql(VALUE str1, VALUE str2) { if (str1 == str2) return Qtrue; if (!RB_TYPE_P(str2, T_STRING)) return Qfalse; return str_eql(str1, str2); } |
#force_encoding(encoding) ⇒ String
Changes the encoding to encoding
and returns self.
7894 7895 7896 7897 7898 7899 7900 7901 |
# File 'string.c', line 7894 static VALUE rb_str_force_encoding(VALUE str, VALUE enc) { str_modifiable(str); rb_enc_associate(str, rb_to_encoding(enc)); ENC_CODERANGE_CLEAR(str); return str; } |
#freeze ⇒ Object
#getbyte(index) ⇒ 0 .. 255
returns the indexth byte as an integer.
4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 |
# File 'string.c', line 4370 static VALUE rb_str_getbyte(VALUE str, VALUE index) { long pos = NUM2LONG(index); if (pos < 0) pos += RSTRING_LEN(str); if (pos < 0 || RSTRING_LEN(str) <= pos) return Qnil; return INT2FIX((unsigned char)RSTRING_PTR(str)[pos]); } |
#gsub(pattern, replacement) ⇒ String #gsub(pattern, hash) ⇒ String #gsub(pattern) {|match| ... } ⇒ String #gsub(pattern) ⇒ Object
Returns a copy of str with the all occurrences of pattern substituted for the second argument. The pattern is typically a Regexp
; if given as a String
, any regular expression metacharacters it contains will be interpreted literally, e.g. '\\d'
will match a backlash followed by ‘d’, instead of a digit.
If replacement is a String
it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form \\d
, where d is a group number, or \\k<n>
, where n is a group name. If it is a double-quoted string, both back-references must be preceded by an additional backslash. However, within replacement the special match variables, such as $&
, will not refer to the current match.
If the second argument is a Hash
, and the matched text is one of its keys, the corresponding value is the replacement string.
In the block form, the current match string is passed in as a parameter, and variables such as $1
, $2
, $`
, $&
, and $'
will be set appropriately. The value returned by the block will be substituted for the match on each call.
The result inherits any tainting in the original string or any supplied replacement string.
When neither a block nor a second argument is supplied, an Enumerator
is returned.
"hello".gsub(/[aeiou]/, '*') #=> "h*ll*"
"hello".gsub(/([aeiou])/, '<\1>') #=> "h<e>ll<o>"
"hello".gsub(/./) {|s| s.ord.to_s + ' '} #=> "104 101 108 108 111 "
"hello".gsub(/(?<foo>[aeiou])/, '{\k<foo>}') #=> "h{e}ll{o}"
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
4295 4296 4297 4298 4299 |
# File 'string.c', line 4295 static VALUE rb_str_gsub(int argc, VALUE *argv, VALUE str) { return str_gsub(argc, argv, str, 0); } |
#gsub!(pattern, replacement) ⇒ String? #gsub!(pattern) {|match| ... } ⇒ String? #gsub!(pattern) ⇒ Object
Performs the substitutions of String#gsub
in place, returning str, or nil
if no substitutions were performed. If no block and no replacement is given, an enumerator is returned instead.
4244 4245 4246 4247 4248 4249 |
# File 'string.c', line 4244 static VALUE rb_str_gsub_bang(int argc, VALUE *argv, VALUE str) { str_modify_keep_cr(str); return str_gsub(argc, argv, str, 1); } |
#hash ⇒ Fixnum
Return a hash based on the string’s length and content.
2451 2452 2453 2454 2455 2456 |
# File 'string.c', line 2451 static VALUE rb_str_hash_m(VALUE str) { st_index_t hval = rb_str_hash(str); return INT2FIX(hval); } |
#hex ⇒ Integer
Treats leading characters from str as a string of hexadecimal digits (with an optional sign and an optional 0x
) and returns the corresponding number. Zero is returned on error.
"0x0a".hex #=> 10
"-1234".hex #=> -4660
"0".hex #=> 0
"wombat".hex #=> 0
7367 7368 7369 7370 7371 |
# File 'string.c', line 7367 static VALUE rb_str_hex(VALUE str) { return rb_str_to_inum(str, 16, FALSE); } |
#include?(other_str) ⇒ Boolean
Returns true
if str contains the given string or character.
"hello".include? "lo" #=> true
"hello".include? "ol" #=> false
"hello".include? ?h #=> true
4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 |
# File 'string.c', line 4641 static VALUE rb_str_include(VALUE str, VALUE arg) { long i; StringValue(arg); i = rb_str_index(str, arg, 0); if (i == -1) return Qfalse; return Qtrue; } |
#index(substring[, offset]) ⇒ Fixnum? #index(regexp[, offset]) ⇒ Fixnum?
Returns the index of the first occurrence of the given substring or pattern (regexp) in str. Returns nil
if not found. If the second parameter is present, it specifies the position in the string to begin the search.
"hello".index('e') #=> 1
"hello".index('lo') #=> 3
"hello".index('a') #=> nil
"hello".index(?e) #=> 1
"hello".index(/[aeiou]/, -3) #=> 4
2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 |
# File 'string.c', line 2747 static VALUE rb_str_index_m(int argc, VALUE *argv, VALUE str) { VALUE sub; VALUE initpos; long pos; if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) { pos = NUM2LONG(initpos); } else { pos = 0; } if (pos < 0) { pos += str_strlen(str, STR_ENC_GET(str)); if (pos < 0) { if (RB_TYPE_P(sub, T_REGEXP)) { rb_backref_set(Qnil); } return Qnil; } } if (SPECIAL_CONST_P(sub)) goto generic; switch (BUILTIN_TYPE(sub)) { case T_REGEXP: if (pos > str_strlen(str, STR_ENC_GET(str))) return Qnil; pos = str_offset(RSTRING_PTR(str), RSTRING_END(str), pos, rb_enc_check(str, sub), single_byte_optimizable(str)); pos = rb_reg_search(sub, str, pos, 0); pos = rb_str_sublen(str, pos); break; generic: default: { VALUE tmp; tmp = rb_check_string_type(sub); if (NIL_P(tmp)) { rb_raise(rb_eTypeError, "type mismatch: %s given", rb_obj_classname(sub)); } sub = tmp; } /* fall through */ case T_STRING: pos = rb_str_index(str, sub, pos); pos = rb_str_sublen(str, pos); break; } if (pos == -1) return Qnil; return LONG2NUM(pos); } |
#replace(other_str) ⇒ String
Replaces the contents and taintedness of str with the corresponding values in other_str.
s = "hello" #=> "hello"
s.replace "world" #=> "world"
4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 |
# File 'string.c', line 4313 VALUE rb_str_replace(VALUE str, VALUE str2) { str_modifiable(str); if (str == str2) return str; StringValue(str2); str_discard(str); return str_replace(str, str2); } |
#insert(index, other_str) ⇒ String
Inserts other_str before the character at the given index, modifying str. Negative indices count from the end of the string, and insert after the given character. The intent is insert aString so that it starts at the given index.
"abcd".insert(0, 'X') #=> "Xabcd"
"abcd".insert(3, 'X') #=> "abcXd"
"abcd".insert(4, 'X') #=> "abcdX"
"abcd".insert(-3, 'X') #=> "abXcd"
"abcd".insert(-1, 'X') #=> "abcdX"
3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 |
# File 'string.c', line 3882 static VALUE rb_str_insert(VALUE str, VALUE idx, VALUE str2) { long pos = NUM2LONG(idx); if (pos == -1) { return rb_str_append(str, str2); } else if (pos < 0) { pos++; } rb_str_splice(str, pos, 0, str2); return str; } |
#inspect ⇒ String
Returns a printable version of str, surrounded by quote marks, with special characters escaped.
str = "hello"
str[3] = "\b"
str.inspect #=> "\"hel\\bo\""
4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 |
# File 'string.c', line 4791 VALUE rb_str_inspect(VALUE str) { int encidx = ENCODING_GET(str); rb_encoding *enc = rb_enc_from_index(encidx), *actenc; const char *p, *pend, *prev; char buf[CHAR_ESC_LEN + 1]; VALUE result = rb_str_buf_new(0); rb_encoding *resenc = rb_default_internal_encoding(); int unicode_p = rb_enc_unicode_p(enc); int asciicompat = rb_enc_asciicompat(enc); if (resenc == NULL) resenc = rb_default_external_encoding(); if (!rb_enc_asciicompat(resenc)) resenc = rb_usascii_encoding(); rb_enc_associate(result, resenc); str_buf_cat2(result, "\""); p = RSTRING_PTR(str); pend = RSTRING_END(str); prev = p; actenc = get_actual_encoding(encidx, str); if (actenc != enc) { enc = actenc; if (unicode_p) unicode_p = rb_enc_unicode_p(enc); } while (p < pend) { unsigned int c, cc; int n; n = rb_enc_precise_mbclen(p, pend, enc); if (!MBCLEN_CHARFOUND_P(n)) { if (p > prev) str_buf_cat(result, prev, p - prev); n = rb_enc_mbminlen(enc); if (pend < p + n) n = (int)(pend - p); while (n--) { snprintf(buf, CHAR_ESC_LEN, "\\x%02X", *p & 0377); str_buf_cat(result, buf, strlen(buf)); prev = ++p; } continue; } n = MBCLEN_CHARFOUND_LEN(n); c = rb_enc_mbc_to_codepoint(p, pend, enc); p += n; if ((asciicompat || unicode_p) && (c == '"'|| c == '\\' || (c == '#' && p < pend && MBCLEN_CHARFOUND_P(rb_enc_precise_mbclen(p,pend,enc)) && (cc = rb_enc_codepoint(p,pend,enc), (cc == '$' || cc == '@' || cc == '{'))))) { if (p - n > prev) str_buf_cat(result, prev, p - n - prev); str_buf_cat2(result, "\\"); if (asciicompat || enc == resenc) { prev = p - n; continue; } } switch (c) { case '\n': cc = 'n'; break; case '\r': cc = 'r'; break; case '\t': cc = 't'; break; case '\f': cc = 'f'; break; case '\013': cc = 'v'; break; case '\010': cc = 'b'; break; case '\007': cc = 'a'; break; case 033: cc = 'e'; break; default: cc = 0; break; } if (cc) { if (p - n > prev) str_buf_cat(result, prev, p - n - prev); buf[0] = '\\'; buf[1] = (char)cc; str_buf_cat(result, buf, 2); prev = p; continue; } if ((enc == resenc && rb_enc_isprint(c, enc)) || (asciicompat && rb_enc_isascii(c, enc) && ISPRINT(c))) { continue; } else { if (p - n > prev) str_buf_cat(result, prev, p - n - prev); rb_str_buf_cat_escaped_char(result, c, unicode_p); prev = p; continue; } } if (p > prev) str_buf_cat(result, prev, p - prev); str_buf_cat2(result, "\""); OBJ_INFECT(result, str); return result; } |
#intern ⇒ Object #to_sym ⇒ Object
Returns the Symbol
corresponding to str, creating the symbol if it did not previously exist. See Symbol#id2name
.
"Koala".intern #=> :Koala
s = 'cat'.to_sym #=> :cat
s == :cat #=> true
s = '@cat'.to_sym #=> :@cat
s == :@cat #=> true
This can also be used to create symbols that cannot be represented using the :xxx
notation.
'cat and dog'.to_sym #=> :"cat and dog"
7469 7470 7471 7472 7473 7474 7475 7476 7477 |
# File 'string.c', line 7469 VALUE rb_str_intern(VALUE s) { VALUE str = RB_GC_GUARD(s); ID id; id = rb_intern_str(str); return ID2SYM(id); } |
#length ⇒ Integer #size ⇒ Integer
Returns the character length of str.
1297 1298 1299 1300 1301 1302 1303 1304 |
# File 'string.c', line 1297 VALUE rb_str_length(VALUE str) { long len; len = str_strlen(str, STR_ENC_GET(str)); return LONG2NUM(len); } |
#lines(separator = $/) ⇒ Array
Returns an array of lines in str split using the supplied record separator ($/
by default). This is a shorthand for str.each_line(separator).to_a
.
If a block is given, which is a deprecated form, works the same as each_line
.
6588 6589 6590 6591 6592 |
# File 'string.c', line 6588 static VALUE rb_str_lines(int argc, VALUE *argv, VALUE str) { return rb_str_enumerate_lines(argc, argv, str, 1); } |
#ljust(integer, padstr = ' ') ⇒ String
If integer is greater than the length of str, returns a new String
of length integer with str left justified and padded with padstr; otherwise, returns str.
"hello".ljust(4) #=> "hello"
"hello".ljust(20) #=> "hello "
"hello".ljust(20, '1234') #=> "hello123412341234123"
7670 7671 7672 7673 7674 |
# File 'string.c', line 7670 static VALUE rb_str_ljust(int argc, VALUE *argv, VALUE str) { return rb_str_justify(argc, argv, str, 'l'); } |
#lstrip ⇒ String
Returns a copy of str with leading whitespace removed. See also String#rstrip
and String#strip
.
" hello ".lstrip #=> "hello "
"hello".lstrip #=> "hello"
7136 7137 7138 7139 7140 7141 7142 |
# File 'string.c', line 7136 static VALUE rb_str_lstrip(VALUE str) { str = rb_str_dup(str); rb_str_lstrip_bang(str); return str; } |
#lstrip! ⇒ self?
Removes leading whitespace from str, returning nil
if no change was made. See also String#rstrip!
and String#strip!
.
" hello ".lstrip #=> "hello "
"hello".lstrip! #=> nil
7095 7096 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 7110 7111 7112 7113 7114 7115 7116 7117 7118 7119 7120 7121 7122 |
# File 'string.c', line 7095 static VALUE rb_str_lstrip_bang(VALUE str) { rb_encoding *enc; char *s, *t, *e; str_modify_keep_cr(str); enc = STR_ENC_GET(str); s = RSTRING_PTR(str); if (!s || RSTRING_LEN(str) == 0) return Qnil; e = t = RSTRING_END(str); /* remove spaces at head */ while (s < e) { int n; unsigned int cc = rb_enc_codepoint_len(s, e, &n, enc); if (!rb_isspace(cc)) break; s += n; } if (s > RSTRING_PTR(str)) { STR_SET_LEN(str, t-s); memmove(RSTRING_PTR(str), s, RSTRING_LEN(str)); RSTRING_PTR(str)[RSTRING_LEN(str)] = '\0'; return str; } return Qnil; } |
#match(pattern) ⇒ MatchData? #match(pattern, pos) ⇒ MatchData?
Converts pattern to a Regexp
(if it isn’t already one), then invokes its match
method on str. If the second parameter is present, it specifies the position in the string to begin the search.
'hello'.match('(.)\1') #=> #<MatchData "ll" 1:"l">
'hello'.match('(.)\1')[0] #=> "ll"
'hello'.match(/(.)\1/)[0] #=> "ll"
'hello'.match('xx') #=> nil
If a block is given, invoke the block with MatchData if match succeed, so that you can write
str.match(pat) {|m| ...}
instead of
if m = str.match(pat)
...
end
The return value is a value from block execution in this case.
3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 |
# File 'string.c', line 3039 static VALUE rb_str_match_m(int argc, VALUE *argv, VALUE str) { VALUE re, result; if (argc < 1) rb_check_arity(argc, 1, 2); re = argv[0]; argv[0] = str; result = rb_funcall2(get_pat(re, 0), rb_intern("match"), argc, argv); if (!NIL_P(result) && rb_block_given_p()) { return rb_yield(result); } return result; } |
#succ ⇒ String #next ⇒ String
Returns the successor to str. The successor is calculated by incrementing characters starting from the rightmost alphanumeric (or the rightmost character if there are no alphanumerics) in the string. Incrementing a digit always results in another digit, and incrementing a letter results in another letter of the same case. Incrementing nonalphanumerics uses the underlying character set’s collating sequence.
If the increment generates a “carry,” the character to the left of it is incremented. This process repeats until there is no carry, adding an additional character if necessary.
"abcd".succ #=> "abce"
"THX1138".succ #=> "THX1139"
"<<koala>>".succ #=> "<<koalb>>"
"1999zzz".succ #=> "2000aaa"
"ZZZ9999".succ #=> "AAAA0000"
"***".succ #=> "**+"
3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 |
# File 'string.c', line 3256 VALUE rb_str_succ(VALUE orig) { rb_encoding *enc; VALUE str; char *sbeg, *s, *e, *last_alnum = 0; int c = -1; long l; char carry[ONIGENC_CODE_TO_MBC_MAXLEN] = "\1"; long carry_pos = 0, carry_len = 1; enum neighbor_char neighbor = NEIGHBOR_FOUND; str = rb_str_new5(orig, RSTRING_PTR(orig), RSTRING_LEN(orig)); rb_enc_cr_str_copy_for_substr(str, orig); OBJ_INFECT(str, orig); if (RSTRING_LEN(str) == 0) return str; enc = STR_ENC_GET(orig); sbeg = RSTRING_PTR(str); s = e = sbeg + RSTRING_LEN(str); while ((s = rb_enc_prev_char(sbeg, s, e, enc)) != 0) { if (neighbor == NEIGHBOR_NOT_CHAR && last_alnum) { if (ISALPHA(*last_alnum) ? ISDIGIT(*s) : ISDIGIT(*last_alnum) ? ISALPHA(*s) : 0) { s = last_alnum; break; } } l = rb_enc_precise_mbclen(s, e, enc); if (!ONIGENC_MBCLEN_CHARFOUND_P(l)) continue; l = ONIGENC_MBCLEN_CHARFOUND_LEN(l); neighbor = enc_succ_alnum_char(s, l, enc, carry); switch (neighbor) { case NEIGHBOR_NOT_CHAR: continue; case NEIGHBOR_FOUND: return str; case NEIGHBOR_WRAPPED: last_alnum = s; break; } c = 1; carry_pos = s - sbeg; carry_len = l; } if (c == -1) { /* str contains no alnum */ s = e; while ((s = rb_enc_prev_char(sbeg, s, e, enc)) != 0) { enum neighbor_char neighbor; char tmp[ONIGENC_CODE_TO_MBC_MAXLEN]; l = rb_enc_precise_mbclen(s, e, enc); if (!ONIGENC_MBCLEN_CHARFOUND_P(l)) continue; l = ONIGENC_MBCLEN_CHARFOUND_LEN(l); MEMCPY(tmp, s, char, l); neighbor = enc_succ_char(tmp, l, enc); switch (neighbor) { case NEIGHBOR_FOUND: MEMCPY(s, tmp, char, l); return str; break; case NEIGHBOR_WRAPPED: MEMCPY(s, tmp, char, l); break; case NEIGHBOR_NOT_CHAR: break; } if (rb_enc_precise_mbclen(s, s+l, enc) != l) { /* wrapped to \0...\0. search next valid char. */ enc_succ_char(s, l, enc); } if (!rb_enc_asciicompat(enc)) { MEMCPY(carry, s, char, l); carry_len = l; } carry_pos = s - sbeg; } } RESIZE_CAPA(str, RSTRING_LEN(str) + carry_len); s = RSTRING_PTR(str) + carry_pos; memmove(s + carry_len, s, RSTRING_LEN(str) - carry_pos); memmove(s, carry, carry_len); STR_SET_LEN(str, RSTRING_LEN(str) + carry_len); RSTRING_PTR(str)[RSTRING_LEN(str)] = '\0'; rb_enc_str_coderange(str); return str; } |
#succ! ⇒ String #next! ⇒ String
Equivalent to String#succ
, but modifies the receiver in place.
3354 3355 3356 3357 3358 3359 3360 |
# File 'string.c', line 3354 static VALUE rb_str_succ_bang(VALUE str) { rb_str_shared_replace(str, rb_str_succ(str)); return str; } |
#oct ⇒ Integer
Treats leading characters of str as a string of octal digits (with an optional sign) and returns the corresponding number. Returns 0 if the conversion fails.
"123".oct #=> 83
"-377".oct #=> -255
"bad".oct #=> 0
"0377bad".oct #=> 255
7388 7389 7390 7391 7392 |
# File 'string.c', line 7388 static VALUE rb_str_oct(VALUE str) { return rb_str_to_inum(str, -8, FALSE); } |
#ord ⇒ Integer
Return the Integer
ordinal of a one-character string.
"a".ord #=> 97
7489 7490 7491 7492 7493 7494 7495 7496 |
# File 'string.c', line 7489 VALUE rb_str_ord(VALUE s) { unsigned int c; c = rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s), STR_ENC_GET(s)); return UINT2NUM(c); } |
#partition(sep) ⇒ Array #partition(regexp) ⇒ Array
Searches sep or pattern (regexp) in the string and returns the part before it, the match, and the part after it. If it is not found, returns two empty strings and str.
"hello".partition("l") #=> ["he", "l", "lo"]
"hello".partition("x") #=> ["hello", "", ""]
"hello".partition(/.l/) #=> ["h", "el", "lo"]
7731 7732 7733 7734 7735 7736 7737 7738 7739 7740 7741 7742 7743 7744 7745 7746 7747 7748 7749 7750 7751 7752 7753 7754 7755 7756 7757 7758 7759 7760 7761 7762 7763 7764 |
# File 'string.c', line 7731 static VALUE rb_str_partition(VALUE str, VALUE sep) { long pos; int regex = FALSE; if (RB_TYPE_P(sep, T_REGEXP)) { pos = rb_reg_search(sep, str, 0, 0); regex = TRUE; } else { VALUE tmp; tmp = rb_check_string_type(sep); if (NIL_P(tmp)) { rb_raise(rb_eTypeError, "type mismatch: %s given", rb_obj_classname(sep)); } sep = tmp; pos = rb_str_index(str, sep, 0); } if (pos < 0) { failed: return rb_ary_new3(3, str, str_new_empty(str), str_new_empty(str)); } if (regex) { sep = rb_str_subpat(str, sep, INT2FIX(0)); if (pos == 0 && RSTRING_LEN(sep) == 0) goto failed; } return rb_ary_new3(3, rb_str_subseq(str, 0, pos), sep, rb_str_subseq(str, pos+RSTRING_LEN(sep), RSTRING_LEN(str)-pos-RSTRING_LEN(sep))); } |
#prepend(other_str) ⇒ String
Prepend—Prepend the given string to str.
a = "world"
a.prepend("hello ") #=> "hello world"
a #=> "hello world"
2412 2413 2414 2415 2416 2417 2418 2419 |
# File 'string.c', line 2412 static VALUE rb_str_prepend(VALUE str, VALUE str2) { StringValue(str2); StringValue(str); rb_str_update(str, 0L, 0L, str2); return str; } |
#replace(other_str) ⇒ String
Replaces the contents and taintedness of str with the corresponding values in other_str.
s = "hello" #=> "hello"
s.replace "world" #=> "world"
4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 |
# File 'string.c', line 4313 VALUE rb_str_replace(VALUE str, VALUE str2) { str_modifiable(str); if (str == str2) return str; StringValue(str2); str_discard(str); return str_replace(str, str2); } |
#reverse ⇒ String
Returns a new string with the characters from str in reverse order.
"stressed".reverse #=> "desserts"
4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 |
# File 'string.c', line 4538 static VALUE rb_str_reverse(VALUE str) { rb_encoding *enc; VALUE rev; char *s, *e, *p; int single = 1; if (RSTRING_LEN(str) <= 1) return rb_str_dup(str); enc = STR_ENC_GET(str); rev = rb_str_new5(str, 0, RSTRING_LEN(str)); s = RSTRING_PTR(str); e = RSTRING_END(str); p = RSTRING_END(rev); if (RSTRING_LEN(str) > 1) { if (single_byte_optimizable(str)) { while (s < e) { *--p = *s++; } } else if (ENC_CODERANGE(str) == ENC_CODERANGE_VALID) { while (s < e) { int clen = rb_enc_fast_mbclen(s, e, enc); if (clen > 1 || (*s & 0x80)) single = 0; p -= clen; memcpy(p, s, clen); s += clen; } } else { while (s < e) { int clen = rb_enc_mbclen(s, e, enc); if (clen > 1 || (*s & 0x80)) single = 0; p -= clen; memcpy(p, s, clen); s += clen; } } } STR_SET_LEN(rev, RSTRING_LEN(str)); OBJ_INFECT(rev, str); if (ENC_CODERANGE(str) == ENC_CODERANGE_UNKNOWN) { if (single) { ENC_CODERANGE_SET(str, ENC_CODERANGE_7BIT); } else { ENC_CODERANGE_SET(str, ENC_CODERANGE_VALID); } } rb_enc_cr_str_copy_for_substr(rev, str); return rev; } |
#reverse! ⇒ String
Reverses str in place.
4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 |
# File 'string.c', line 4602 static VALUE rb_str_reverse_bang(VALUE str) { if (RSTRING_LEN(str) > 1) { if (single_byte_optimizable(str)) { char *s, *e, c; str_modify_keep_cr(str); s = RSTRING_PTR(str); e = RSTRING_END(str) - 1; while (s < e) { c = *s; *s++ = *e; *e-- = c; } } else { rb_str_shared_replace(str, rb_str_reverse(str)); } } else { str_modify_keep_cr(str); } return str; } |
#rindex(substring[, fixnum]) ⇒ Fixnum? #rindex(regexp[, fixnum]) ⇒ Fixnum?
Returns the index of the last occurrence of the given substring or pattern (regexp) in str. Returns nil
if not found. If the second parameter is present, it specifies the position in the string to end the search—characters beyond this point will not be considered.
"hello".rindex('e') #=> 1
"hello".rindex('l') #=> 3
"hello".rindex('a') #=> nil
"hello".rindex(?e) #=> 1
"hello".rindex(/[aeiou]/, -2) #=> 1
2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 |
# File 'string.c', line 2912 static VALUE rb_str_rindex_m(int argc, VALUE *argv, VALUE str) { VALUE sub; VALUE vpos; rb_encoding *enc = STR_ENC_GET(str); long pos, len = str_strlen(str, enc); if (rb_scan_args(argc, argv, "11", &sub, &vpos) == 2) { pos = NUM2LONG(vpos); if (pos < 0) { pos += len; if (pos < 0) { if (RB_TYPE_P(sub, T_REGEXP)) { rb_backref_set(Qnil); } return Qnil; } } if (pos > len) pos = len; } else { pos = len; } if (SPECIAL_CONST_P(sub)) goto generic; switch (BUILTIN_TYPE(sub)) { case T_REGEXP: /* enc = rb_get_check(str, sub); */ pos = str_offset(RSTRING_PTR(str), RSTRING_END(str), pos, STR_ENC_GET(str), single_byte_optimizable(str)); if (!RREGEXP(sub)->ptr || RREGEXP_SRC_LEN(sub)) { pos = rb_reg_search(sub, str, pos, 1); pos = rb_str_sublen(str, pos); } if (pos >= 0) return LONG2NUM(pos); break; generic: default: { VALUE tmp; tmp = rb_check_string_type(sub); if (NIL_P(tmp)) { rb_raise(rb_eTypeError, "type mismatch: %s given", rb_obj_classname(sub)); } sub = tmp; } /* fall through */ case T_STRING: pos = rb_str_rindex(str, sub, pos); if (pos >= 0) return LONG2NUM(pos); break; } return Qnil; } |
#rjust(integer, padstr = ' ') ⇒ String
If integer is greater than the length of str, returns a new String
of length integer with str right justified and padded with padstr; otherwise, returns str.
"hello".rjust(4) #=> "hello"
"hello".rjust(20) #=> " hello"
"hello".rjust(20, '1234') #=> "123412341234123hello"
7690 7691 7692 7693 7694 |
# File 'string.c', line 7690 static VALUE rb_str_rjust(int argc, VALUE *argv, VALUE str) { return rb_str_justify(argc, argv, str, 'r'); } |
#rpartition(sep) ⇒ Array #rpartition(regexp) ⇒ Array
Searches sep or pattern (regexp) in the string from the end of the string, and returns the part before it, the match, and the part after it. If it is not found, returns two empty strings and str.
"hello".rpartition("l") #=> ["hel", "l", "o"]
"hello".rpartition("x") #=> ["", "", "hello"]
"hello".rpartition(/.l/) #=> ["he", "ll", "o"]
7781 7782 7783 7784 7785 7786 7787 7788 7789 7790 7791 7792 7793 7794 7795 7796 7797 7798 7799 7800 7801 7802 7803 7804 7805 7806 7807 7808 7809 7810 7811 7812 7813 7814 7815 7816 |
# File 'string.c', line 7781 static VALUE rb_str_rpartition(VALUE str, VALUE sep) { long pos = RSTRING_LEN(str); int regex = FALSE; if (RB_TYPE_P(sep, T_REGEXP)) { pos = rb_reg_search(sep, str, pos, 1); regex = TRUE; } else { VALUE tmp; tmp = rb_check_string_type(sep); if (NIL_P(tmp)) { rb_raise(rb_eTypeError, "type mismatch: %s given", rb_obj_classname(sep)); } sep = tmp; pos = rb_str_sublen(str, pos); pos = rb_str_rindex(str, sep, pos); } if (pos < 0) { return rb_ary_new3(3, str_new_empty(str), str_new_empty(str), str); } if (regex) { sep = rb_reg_nth_match(0, rb_backref_get()); } else { pos = rb_str_offset(str, pos); } return rb_ary_new3(3, rb_str_subseq(str, 0, pos), sep, rb_str_subseq(str, pos+RSTRING_LEN(sep), RSTRING_LEN(str)-pos-RSTRING_LEN(sep))); } |
#rstrip ⇒ String
Returns a copy of str with trailing whitespace removed. See also String#lstrip
and String#strip
.
" hello ".rstrip #=> " hello"
"hello".rstrip #=> "hello"
7206 7207 7208 7209 7210 7211 7212 |
# File 'string.c', line 7206 static VALUE rb_str_rstrip(VALUE str) { str = rb_str_dup(str); rb_str_rstrip_bang(str); return str; } |
#rstrip! ⇒ self?
Removes trailing whitespace from str, returning nil
if no change was made. See also String#lstrip!
and String#strip!
.
" hello ".rstrip #=> " hello"
"hello".rstrip! #=> nil
7157 7158 7159 7160 7161 7162 7163 7164 7165 7166 7167 7168 7169 7170 7171 7172 7173 7174 7175 7176 7177 7178 7179 7180 7181 7182 7183 7184 7185 7186 7187 7188 7189 7190 7191 7192 |
# File 'string.c', line 7157 static VALUE rb_str_rstrip_bang(VALUE str) { rb_encoding *enc; char *s, *t, *e; str_modify_keep_cr(str); enc = STR_ENC_GET(str); rb_str_check_dummy_enc(enc); s = RSTRING_PTR(str); if (!s || RSTRING_LEN(str) == 0) return Qnil; t = e = RSTRING_END(str); /* remove trailing spaces or '\0's */ if (single_byte_optimizable(str)) { unsigned char c; while (s < t && ((c = *(t-1)) == '\0' || ascii_isspace(c))) t--; } else { char *tp; while ((tp = rb_enc_prev_char(s, t, e, enc)) != NULL) { unsigned int c = rb_enc_codepoint(tp, e, enc); if (c && !rb_isspace(c)) break; t = tp; } } if (t < e) { long len = t-RSTRING_PTR(str); STR_SET_LEN(str, len); RSTRING_PTR(str)[len] = '\0'; return str; } return Qnil; } |
#scan(pattern) ⇒ Array #scan(pattern) {|match, ...| ... } ⇒ String
Both forms iterate through str, matching the pattern (which may be a Regexp
or a String
). For each match, a result is generated and either added to the result array or passed to the block. If the pattern contains no groups, each individual result consists of the matched string, $&
. If the pattern contains groups, each individual result is itself an array containing one entry per group.
a = "cruel world"
a.scan(/\w+/) #=> ["cruel", "world"]
a.scan(/.../) #=> ["cru", "el ", "wor"]
a.scan(/(...)/) #=> [["cru"], ["el "], ["wor"]]
a.scan(/(..)(..)/) #=> [["cr", "ue"], ["l ", "wo"]]
And the block form:
a.scan(/\w+/) {|w| print "<<#{w}>> " }
print "\n"
a.scan(/(.)(.)/) {|x,y| print y, x }
print "\n"
produces:
"rceu lowlr\n">> "">>
7321 7322 7323 7324 7325 7326 7327 7328 7329 7330 7331 7332 7333 7334 7335 7336 7337 7338 7339 7340 7341 7342 7343 7344 7345 7346 7347 7348 7349 7350 |
# File 'string.c', line 7321 static VALUE rb_str_scan(VALUE str, VALUE pat) { VALUE result; long start = 0; long last = -1, prev = 0; char *p = RSTRING_PTR(str); long len = RSTRING_LEN(str); pat = get_pat(pat, 1); if (!rb_block_given_p()) { VALUE ary = rb_ary_new(); while (!NIL_P(result = scan_once(str, pat, &start))) { last = prev; prev = start; rb_ary_push(ary, result); } if (last >= 0) rb_reg_search(pat, str, last, 0); return ary; } while (!NIL_P(result = scan_once(str, pat, &start))) { last = prev; prev = start; rb_yield(result); str_mod_check(str, p, len); } if (last >= 0) rb_reg_search(pat, str, last, 0); return str; } |
#scrub ⇒ String #scrub(repl) ⇒ String #scrub {|bytes| ... } ⇒ String
If the string is invalid byte sequence then replace invalid bytes with given replacement character, else returns self. If block is given, replace invalid bytes with returned value of the block.
"abc\u3042\x81".scrub #=> "abc\u3042\uFFFD"
"abc\u3042\x81".scrub("*") #=> "abc\u3042*"
"abc\u3042\xE3\x80".scrub{|bytes| '<'+bytes.unpack('H*')[0]+'>' } #=> "abc\u3042<e380>"
8289 8290 8291 8292 8293 8294 8295 |
# File 'string.c', line 8289 static VALUE str_scrub(int argc, VALUE *argv, VALUE str) { VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil; VALUE new = rb_str_scrub(str, repl); return NIL_P(new) ? rb_str_dup(str): new; } |
#scrub! ⇒ String #scrub!(repl) ⇒ String #scrub! {|bytes| ... } ⇒ String
If the string is invalid byte sequence then replace invalid bytes with given replacement character, else returns self. If block is given, replace invalid bytes with returned value of the block.
"abc\u3042\x81".scrub! #=> "abc\u3042\uFFFD"
"abc\u3042\x81".scrub!("*") #=> "abc\u3042*"
"abc\u3042\xE3\x80".scrub!{|bytes| '<'+bytes.unpack('H*')[0]+'>' } #=> "abc\u3042<e380>"
8311 8312 8313 8314 8315 8316 8317 8318 |
# File 'string.c', line 8311 static VALUE str_scrub_bang(int argc, VALUE *argv, VALUE str) { VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil; VALUE new = rb_str_scrub(str, repl); if (!NIL_P(new)) rb_str_replace(str, new); return str; } |
#setbyte(index, integer) ⇒ Integer
modifies the indexth byte as integer.
4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 |
# File 'string.c', line 4389 static VALUE rb_str_setbyte(VALUE str, VALUE index, VALUE value) { long pos = NUM2LONG(index); int byte = NUM2INT(value); rb_str_modify(str); if (pos < -RSTRING_LEN(str) || RSTRING_LEN(str) <= pos) rb_raise(rb_eIndexError, "index %ld out of string", pos); if (pos < 0) pos += RSTRING_LEN(str); RSTRING_PTR(str)[pos] = byte; return value; } |
#length ⇒ Integer #size ⇒ Integer
Returns the character length of str.
1297 1298 1299 1300 1301 1302 1303 1304 |
# File 'string.c', line 1297 VALUE rb_str_length(VALUE str) { long len; len = str_strlen(str, STR_ENC_GET(str)); return LONG2NUM(len); } |
#[](index) ⇒ String? #[](start, length) ⇒ String? #[](range) ⇒ String? #[](regexp) ⇒ String? #[](regexp, capture) ⇒ String? #[](match_str) ⇒ String? #slice(index) ⇒ String? #slice(start, length) ⇒ String? #slice(range) ⇒ String? #slice(regexp) ⇒ String? #slice(regexp, capture) ⇒ String? #slice(match_str) ⇒ String?
Element Reference — If passed a single index
, returns a substring of one character at that index. If passed a start
index and a length
, returns a substring containing length
characters starting at the index
. If passed a range
, its beginning and end are interpreted as offsets delimiting the substring to be returned.
In these three cases, if an index is negative, it is counted from the end of the string. For the start
and range
cases the starting index is just before a character and an index matching the string’s size. Additionally, an empty string is returned when the starting index for a character range is at the end of the string.
Returns nil
if the initial index falls outside the string or the length is negative.
If a Regexp
is supplied, the matching portion of the string is returned. If a capture
follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.
If a match_str
is given, that string is returned if it occurs in the string.
Returns nil
if the regular expression does not match or the match string cannot be found.
a = "hello there"
a[1] #=> "e"
a[2, 3] #=> "llo"
a[2..3] #=> "ll"
a[-3, 2] #=> "er"
a[7..-2] #=> "her"
a[-4..-2] #=> "her"
a[-2..-4] #=> ""
a[11, 0] #=> ""
a[11] #=> nil
a[12, 0] #=> nil
a[12..-1] #=> nil
a[/[aeiou](.)\1/] #=> "ell"
a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil
a[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, "non_vowel"] #=> "l"
a[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, "vowel"] #=> "e"
a["lo"] #=> "lo"
a["bye"] #=> nil
3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 |
# File 'string.c', line 3621 static VALUE rb_str_aref_m(int argc, VALUE *argv, VALUE str) { if (argc == 2) { if (RB_TYPE_P(argv[0], T_REGEXP)) { return rb_str_subpat(str, argv[0], argv[1]); } return rb_str_substr(str, NUM2LONG(argv[0]), NUM2LONG(argv[1])); } rb_check_arity(argc, 1, 2); return rb_str_aref(str, argv[0]); } |
#slice!(fixnum) ⇒ Fixnum? #slice!(fixnum, fixnum) ⇒ String? #slice!(range) ⇒ String? #slice!(regexp) ⇒ String? #slice!(other_str) ⇒ String?
Deletes the specified portion from str, and returns the portion deleted.
string = "this is a string"
string.slice!(2) #=> "i"
string.slice!(3..6) #=> " is "
string.slice!(/s.*t/) #=> "sa st"
string.slice!("r") #=> "r"
string #=> "thing"
3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 |
# File 'string.c', line 3917 static VALUE rb_str_slice_bang(int argc, VALUE *argv, VALUE str) { VALUE result; VALUE buf[3]; int i; rb_check_arity(argc, 1, 2); for (i=0; i<argc; i++) { buf[i] = argv[i]; } str_modify_keep_cr(str); result = rb_str_aref_m(argc, buf, str); if (!NIL_P(result)) { buf[i] = rb_str_new(0,0); rb_str_aset_m(argc+1, buf, str); } return result; } |
#split(pattern = $;, [limit]) ⇒ Array
Divides str into substrings based on a delimiter, returning an array of these substrings.
If pattern is a String
, then its contents are used as the delimiter when splitting str. If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored.
If pattern is a Regexp
, str is divided where the pattern matches. Whenever the pattern matches a zero-length string, str is split into individual characters. If pattern contains groups, the respective matches will be returned in the array as well.
If pattern is omitted, the value of $;
is used. If $;
is nil
(which is the default), str is split on whitespace as if ‘ ’ were specified.
If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of fields will be returned (if limit is 1
, the entire string is returned as the only entry in an array). If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.
When the input str
is empty an empty Array is returned as the string is considered to have no fields to split.
" now's the time".split #=> ["now's", "the", "time"]
" now's the time".split(' ') #=> ["now's", "the", "time"]
" now's the time".split(/ /) #=> ["", "now's", "", "the", "time"]
"1, 2.34,56, 7".split(%r{,\s*}) #=> ["1", "2.34", "56", "7"]
"hello".split(//) #=> ["h", "e", "l", "l", "o"]
"hello".split(//, 3) #=> ["h", "e", "llo"]
"hi mom".split(%r{\s*}) #=> ["h", "i", "m", "o", "m"]
"mellow yellow".split("ello") #=> ["m", "w y", "w"]
"1,2,,3,4,,".split(',') #=> ["1", "2", "", "3", "4"]
"1,2,,3,4,,".split(',', 4) #=> ["1", "2", "", "3,4,,"]
"1,2,,3,4,,".split(',', -4) #=> ["1", "2", "", "3", "4", "", ""]
"".split(',', -1) #=> []
6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 6261 6262 6263 6264 6265 6266 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 6277 6278 6279 6280 6281 6282 6283 6284 6285 6286 6287 6288 6289 6290 6291 6292 6293 6294 6295 6296 6297 6298 6299 6300 6301 6302 6303 6304 6305 6306 6307 6308 6309 6310 6311 6312 6313 6314 6315 6316 6317 6318 6319 6320 6321 6322 6323 6324 6325 6326 6327 6328 6329 6330 6331 6332 6333 6334 6335 6336 6337 6338 6339 6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353 6354 6355 6356 6357 6358 6359 6360 6361 6362 6363 6364 6365 6366 6367 6368 6369 6370 6371 6372 6373 6374 6375 6376 6377 6378 6379 6380 6381 6382 6383 6384 6385 6386 6387 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 6398 6399 6400 6401 6402 6403 6404 6405 6406 6407 6408 |
# File 'string.c', line 6195 static VALUE rb_str_split_m(int argc, VALUE *argv, VALUE str) { rb_encoding *enc; VALUE spat; VALUE limit; enum {awk, string, regexp} split_type; long beg, end, i = 0; int lim = 0; VALUE result, tmp; if (rb_scan_args(argc, argv, "02", &spat, &limit) == 2) { lim = NUM2INT(limit); if (lim <= 0) limit = Qnil; else if (lim == 1) { if (RSTRING_LEN(str) == 0) return rb_ary_new2(0); return rb_ary_new3(1, str); } i = 1; } enc = STR_ENC_GET(str); if (NIL_P(spat)) { if (!NIL_P(rb_fs)) { spat = rb_fs; goto fs_set; } split_type = awk; } else { fs_set: if (RB_TYPE_P(spat, T_STRING)) { rb_encoding *enc2 = STR_ENC_GET(spat); split_type = string; if (RSTRING_LEN(spat) == 0) { /* Special case - split into chars */ spat = rb_reg_regcomp(spat); split_type = regexp; } else if (rb_enc_asciicompat(enc2) == 1) { if (RSTRING_LEN(spat) == 1 && RSTRING_PTR(spat)[0] == ' '){ split_type = awk; } } else { int l; if (rb_enc_ascget(RSTRING_PTR(spat), RSTRING_END(spat), &l, enc2) == ' ' && RSTRING_LEN(spat) == l) { split_type = awk; } } } else { spat = get_pat(spat, 1); split_type = regexp; } } result = rb_ary_new(); beg = 0; if (split_type == awk) { char *ptr = RSTRING_PTR(str); char *eptr = RSTRING_END(str); char *bptr = ptr; int skip = 1; unsigned int c; end = beg; if (is_ascii_string(str)) { while (ptr < eptr) { c = (unsigned char)*ptr++; if (skip) { if (ascii_isspace(c)) { beg = ptr - bptr; } else { end = ptr - bptr; skip = 0; if (!NIL_P(limit) && lim <= i) break; } } else if (ascii_isspace(c)) { rb_ary_push(result, rb_str_subseq(str, beg, end-beg)); skip = 1; beg = ptr - bptr; if (!NIL_P(limit)) ++i; } else { end = ptr - bptr; } } } else { while (ptr < eptr) { int n; c = rb_enc_codepoint_len(ptr, eptr, &n, enc); ptr += n; if (skip) { if (rb_isspace(c)) { beg = ptr - bptr; } else { end = ptr - bptr; skip = 0; if (!NIL_P(limit) && lim <= i) break; } } else if (rb_isspace(c)) { rb_ary_push(result, rb_str_subseq(str, beg, end-beg)); skip = 1; beg = ptr - bptr; if (!NIL_P(limit)) ++i; } else { end = ptr - bptr; } } } } else if (split_type == string) { char *ptr = RSTRING_PTR(str); char *temp = ptr; char *eptr = RSTRING_END(str); char *sptr = RSTRING_PTR(spat); long slen = RSTRING_LEN(spat); if (is_broken_string(str)) { rb_raise(rb_eArgError, "invalid byte sequence in %s", rb_enc_name(STR_ENC_GET(str))); } if (is_broken_string(spat)) { rb_raise(rb_eArgError, "invalid byte sequence in %s", rb_enc_name(STR_ENC_GET(spat))); } enc = rb_enc_check(str, spat); while (ptr < eptr && (end = rb_memsearch(sptr, slen, ptr, eptr - ptr, enc)) >= 0) { /* Check we are at the start of a char */ char *t = rb_enc_right_char_head(ptr, ptr + end, eptr, enc); if (t != ptr + end) { ptr = t; continue; } rb_ary_push(result, rb_str_subseq(str, ptr - temp, end)); ptr += end + slen; if (!NIL_P(limit) && lim <= ++i) break; } beg = ptr - temp; } else { char *ptr = RSTRING_PTR(str); long len = RSTRING_LEN(str); long start = beg; long idx; int last_null = 0; struct re_registers *regs; while ((end = rb_reg_search(spat, str, start, 0)) >= 0) { regs = RMATCH_REGS(rb_backref_get()); if (start == end && BEG(0) == END(0)) { if (!ptr) { rb_ary_push(result, str_new_empty(str)); break; } else if (last_null == 1) { rb_ary_push(result, rb_str_subseq(str, beg, rb_enc_fast_mbclen(ptr+beg, ptr+len, enc))); beg = start; } else { if (ptr+start == ptr+len) start++; else start += rb_enc_fast_mbclen(ptr+start,ptr+len,enc); last_null = 1; continue; } } else { rb_ary_push(result, rb_str_subseq(str, beg, end-beg)); beg = start = END(0); } last_null = 0; for (idx=1; idx < regs->num_regs; idx++) { if (BEG(idx) == -1) continue; if (BEG(idx) == END(idx)) tmp = str_new_empty(str); else tmp = rb_str_subseq(str, BEG(idx), END(idx)-BEG(idx)); rb_ary_push(result, tmp); } if (!NIL_P(limit) && lim <= ++i) break; } } if (RSTRING_LEN(str) > 0 && (!NIL_P(limit) || RSTRING_LEN(str) > beg || lim < 0)) { if (RSTRING_LEN(str) == beg) tmp = str_new_empty(str); else tmp = rb_str_subseq(str, beg, RSTRING_LEN(str)-beg); rb_ary_push(result, tmp); } if (NIL_P(limit) && lim == 0) { long len; while ((len = RARRAY_LEN(result)) > 0 && (tmp = RARRAY_AREF(result, len-1), RSTRING_LEN(tmp) == 0)) rb_ary_pop(result); } return result; } |
#squeeze([other_str]) ⇒ String
Builds a set of characters from the other_str parameter(s) using the procedure described for String#count
. Returns a new string where runs of the same character that occur in this set are replaced by a single character. If no arguments are given, all runs of identical characters are replaced by a single character.
"yellow moon".squeeze #=> "yelow mon"
" now is the".squeeze(" ") #=> " now is the"
"putters shoot balls".squeeze("m-z") #=> "puters shot balls"
5984 5985 5986 5987 5988 5989 5990 |
# File 'string.c', line 5984 static VALUE rb_str_squeeze(int argc, VALUE *argv, VALUE str) { str = rb_str_dup(str); rb_str_squeeze_bang(argc, argv, str); return str; } |
#squeeze!([other_str]) ⇒ String?
Squeezes str in place, returning either str, or nil
if no changes were made.
5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 |
# File 'string.c', line 5894 static VALUE rb_str_squeeze_bang(int argc, VALUE *argv, VALUE str) { char squeez[TR_TABLE_SIZE]; rb_encoding *enc = 0; VALUE del = 0, nodel = 0; char *s, *send, *t; int i, modify = 0; int ascompat, singlebyte = single_byte_optimizable(str); unsigned int save; if (argc == 0) { enc = STR_ENC_GET(str); } else { for (i=0; i<argc; i++) { VALUE s = argv[i]; StringValue(s); enc = rb_enc_check(str, s); if (singlebyte && !single_byte_optimizable(s)) singlebyte = 0; tr_setup_table(s, squeez, i==0, &del, &nodel, enc); } } str_modify_keep_cr(str); s = t = RSTRING_PTR(str); if (!s || RSTRING_LEN(str) == 0) return Qnil; send = RSTRING_END(str); save = -1; ascompat = rb_enc_asciicompat(enc); if (singlebyte) { while (s < send) { unsigned int c = *(unsigned char*)s++; if (c != save || (argc > 0 && !squeez[c])) { *t++ = save = c; } } } else { while (s < send) { unsigned int c; int clen; if (ascompat && (c = *(unsigned char*)s) < 0x80) { if (c != save || (argc > 0 && !squeez[c])) { *t++ = save = c; } s++; } else { c = rb_enc_codepoint_len(s, send, &clen, enc); if (c != save || (argc > 0 && !tr_find(c, squeez, del, nodel))) { if (t != s) rb_enc_mbcput(c, t, enc); save = c; t += clen; } s += clen; } } } *t = '\0'; if (t - RSTRING_PTR(str) != RSTRING_LEN(str)) { STR_SET_LEN(str, t - RSTRING_PTR(str)); modify = 1; } if (modify) return str; return Qnil; } |
#start_with?([prefixes]) ⇒ Boolean
Returns true if str
starts with one of the prefixes
given.
"hello".start_with?("hell") #=> true
# returns true if one of the prefixes matches.
"hello".start_with?("heaven", "hell") #=> true
"hello".start_with?("heaven", "paradise") #=> false
7831 7832 7833 7834 7835 7836 7837 7838 7839 7840 7841 7842 7843 7844 7845 |
# File 'string.c', line 7831 static VALUE rb_str_start_with(int argc, VALUE *argv, VALUE str) { int i; for (i=0; i<argc; i++) { VALUE tmp = argv[i]; StringValue(tmp); rb_enc_check(str, tmp); if (RSTRING_LEN(str) < RSTRING_LEN(tmp)) continue; if (memcmp(RSTRING_PTR(str), RSTRING_PTR(tmp), RSTRING_LEN(tmp)) == 0) return Qtrue; } return Qfalse; } |
#strip ⇒ String
Returns a copy of str with leading and trailing whitespace removed.
" hello ".strip #=> "hello"
"\tgoodbye\r\n".strip #=> "goodbye"
7244 7245 7246 7247 7248 7249 7250 |
# File 'string.c', line 7244 static VALUE rb_str_strip(VALUE str) { str = rb_str_dup(str); rb_str_strip_bang(str); return str; } |
#strip! ⇒ String?
Removes leading and trailing whitespace from str. Returns nil
if str was not altered.
7223 7224 7225 7226 7227 7228 7229 7230 7231 |
# File 'string.c', line 7223 static VALUE rb_str_strip_bang(VALUE str) { VALUE l = rb_str_lstrip_bang(str); VALUE r = rb_str_rstrip_bang(str); if (NIL_P(l) && NIL_P(r)) return Qnil; return str; } |
#sub(pattern, replacement) ⇒ String #sub(pattern, hash) ⇒ String #sub(pattern) {|match| ... } ⇒ String
Returns a copy of str
with the first occurrence of pattern
replaced by the second argument. The pattern
is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\\d'
will match a backlash followed by ‘d’, instead of a digit.
If replacement
is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form "\d"
, where d is a group number, or "\k<n>"
, where n is a group name. If it is a double-quoted string, both back-references must be preceded by an additional backslash. However, within replacement
the special match variables, such as &$
, will not refer to the current match.
If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string.
In the block form, the current match string is passed in as a parameter, and variables such as $1
, $2
, $`
, $&
, and $'
will be set appropriately. The value returned by the block will be substituted for the match on each call.
The result inherits any tainting in the original string or any supplied replacement string.
"hello".sub(/[aeiou]/, '*') #=> "h*llo"
"hello".sub(/([aeiou])/, '<\1>') #=> "h<e>llo"
"hello".sub(/./) {|s| s.ord.to_s + ' ' } #=> "104 ello"
"hello".sub(/(?<foo>[aeiou])/, '*\k<foo>*') #=> "h*e*llo"
'Is SHELL your preferred shell?'.sub(/[[:upper:]]{2,}/, ENV)
#=> "Is /bin/bash your preferred shell?"
4111 4112 4113 4114 4115 4116 4117 |
# File 'string.c', line 4111 static VALUE rb_str_sub(int argc, VALUE *argv, VALUE str) { str = rb_str_dup(str); rb_str_sub_bang(argc, argv, str); return str; } |
#sub!(pattern, replacement) ⇒ String? #sub!(pattern) {|match| ... } ⇒ String?
Performs the same substitution as String#sub in-place.
Returns str
if a substitution was performed or nil
if no substitution was performed.
3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 |
# File 'string.c', line 3976 static VALUE rb_str_sub_bang(int argc, VALUE *argv, VALUE str) { VALUE pat, repl, hash = Qnil; int iter = 0; int tainted = 0; long plen; int min_arity = rb_block_given_p() ? 1 : 2; rb_check_arity(argc, min_arity, 2); if (argc == 1) { iter = 1; } else { repl = argv[1]; hash = rb_check_hash_type(argv[1]); if (NIL_P(hash)) { StringValue(repl); } if (OBJ_TAINTED(repl)) tainted = 1; } pat = get_pat(argv[0], 1); str_modifiable(str); if (rb_reg_search(pat, str, 0, 0) >= 0) { rb_encoding *enc; int cr = ENC_CODERANGE(str); VALUE match = rb_backref_get(); struct re_registers *regs = RMATCH_REGS(match); long beg0 = BEG(0); long end0 = END(0); char *p, *rp; long len, rlen; if (iter || !NIL_P(hash)) { p = RSTRING_PTR(str); len = RSTRING_LEN(str); if (iter) { repl = rb_obj_as_string(rb_yield(rb_reg_nth_match(0, match))); } else { repl = rb_hash_aref(hash, rb_str_subseq(str, beg0, end0 - beg0)); repl = rb_obj_as_string(repl); } str_mod_check(str, p, len); rb_check_frozen(str); } else { repl = rb_reg_regsub(repl, str, regs, pat); } enc = rb_enc_compatible(str, repl); if (!enc) { rb_encoding *str_enc = STR_ENC_GET(str); p = RSTRING_PTR(str); len = RSTRING_LEN(str); if (coderange_scan(p, beg0, str_enc) != ENC_CODERANGE_7BIT || coderange_scan(p+end0, len-end0, str_enc) != ENC_CODERANGE_7BIT) { rb_raise(rb_eEncCompatError, "incompatible character encodings: %s and %s", rb_enc_name(str_enc), rb_enc_name(STR_ENC_GET(repl))); } enc = STR_ENC_GET(repl); } rb_str_modify(str); rb_enc_associate(str, enc); if (OBJ_TAINTED(repl)) tainted = 1; if (ENC_CODERANGE_UNKNOWN < cr && cr < ENC_CODERANGE_BROKEN) { int cr2 = ENC_CODERANGE(repl); if (cr2 == ENC_CODERANGE_BROKEN || (cr == ENC_CODERANGE_VALID && cr2 == ENC_CODERANGE_7BIT)) cr = ENC_CODERANGE_UNKNOWN; else cr = cr2; } plen = end0 - beg0; rp = RSTRING_PTR(repl); rlen = RSTRING_LEN(repl); len = RSTRING_LEN(str); if (rlen > plen) { RESIZE_CAPA(str, len + rlen - plen); } p = RSTRING_PTR(str); if (rlen != plen) { memmove(p + beg0 + rlen, p + beg0 + plen, len - beg0 - plen); } memcpy(p + beg0, rp, rlen); len += rlen - plen; STR_SET_LEN(str, len); RSTRING_PTR(str)[len] = '\0'; ENC_CODERANGE_SET(str, cr); if (tainted) OBJ_TAINT(str); return str; } return Qnil; } |
#succ ⇒ String #next ⇒ String
Returns the successor to str. The successor is calculated by incrementing characters starting from the rightmost alphanumeric (or the rightmost character if there are no alphanumerics) in the string. Incrementing a digit always results in another digit, and incrementing a letter results in another letter of the same case. Incrementing nonalphanumerics uses the underlying character set’s collating sequence.
If the increment generates a “carry,” the character to the left of it is incremented. This process repeats until there is no carry, adding an additional character if necessary.
"abcd".succ #=> "abce"
"THX1138".succ #=> "THX1139"
"<<koala>>".succ #=> "<<koalb>>"
"1999zzz".succ #=> "2000aaa"
"ZZZ9999".succ #=> "AAAA0000"
"***".succ #=> "**+"
3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 |
# File 'string.c', line 3256 VALUE rb_str_succ(VALUE orig) { rb_encoding *enc; VALUE str; char *sbeg, *s, *e, *last_alnum = 0; int c = -1; long l; char carry[ONIGENC_CODE_TO_MBC_MAXLEN] = "\1"; long carry_pos = 0, carry_len = 1; enum neighbor_char neighbor = NEIGHBOR_FOUND; str = rb_str_new5(orig, RSTRING_PTR(orig), RSTRING_LEN(orig)); rb_enc_cr_str_copy_for_substr(str, orig); OBJ_INFECT(str, orig); if (RSTRING_LEN(str) == 0) return str; enc = STR_ENC_GET(orig); sbeg = RSTRING_PTR(str); s = e = sbeg + RSTRING_LEN(str); while ((s = rb_enc_prev_char(sbeg, s, e, enc)) != 0) { if (neighbor == NEIGHBOR_NOT_CHAR && last_alnum) { if (ISALPHA(*last_alnum) ? ISDIGIT(*s) : ISDIGIT(*last_alnum) ? ISALPHA(*s) : 0) { s = last_alnum; break; } } l = rb_enc_precise_mbclen(s, e, enc); if (!ONIGENC_MBCLEN_CHARFOUND_P(l)) continue; l = ONIGENC_MBCLEN_CHARFOUND_LEN(l); neighbor = enc_succ_alnum_char(s, l, enc, carry); switch (neighbor) { case NEIGHBOR_NOT_CHAR: continue; case NEIGHBOR_FOUND: return str; case NEIGHBOR_WRAPPED: last_alnum = s; break; } c = 1; carry_pos = s - sbeg; carry_len = l; } if (c == -1) { /* str contains no alnum */ s = e; while ((s = rb_enc_prev_char(sbeg, s, e, enc)) != 0) { enum neighbor_char neighbor; char tmp[ONIGENC_CODE_TO_MBC_MAXLEN]; l = rb_enc_precise_mbclen(s, e, enc); if (!ONIGENC_MBCLEN_CHARFOUND_P(l)) continue; l = ONIGENC_MBCLEN_CHARFOUND_LEN(l); MEMCPY(tmp, s, char, l); neighbor = enc_succ_char(tmp, l, enc); switch (neighbor) { case NEIGHBOR_FOUND: MEMCPY(s, tmp, char, l); return str; break; case NEIGHBOR_WRAPPED: MEMCPY(s, tmp, char, l); break; case NEIGHBOR_NOT_CHAR: break; } if (rb_enc_precise_mbclen(s, s+l, enc) != l) { /* wrapped to \0...\0. search next valid char. */ enc_succ_char(s, l, enc); } if (!rb_enc_asciicompat(enc)) { MEMCPY(carry, s, char, l); carry_len = l; } carry_pos = s - sbeg; } } RESIZE_CAPA(str, RSTRING_LEN(str) + carry_len); s = RSTRING_PTR(str) + carry_pos; memmove(s + carry_len, s, RSTRING_LEN(str) - carry_pos); memmove(s, carry, carry_len); STR_SET_LEN(str, RSTRING_LEN(str) + carry_len); RSTRING_PTR(str)[RSTRING_LEN(str)] = '\0'; rb_enc_str_coderange(str); return str; } |
#succ! ⇒ String #next! ⇒ String
Equivalent to String#succ
, but modifies the receiver in place.
3354 3355 3356 3357 3358 3359 3360 |
# File 'string.c', line 3354 static VALUE rb_str_succ_bang(VALUE str) { rb_str_shared_replace(str, rb_str_succ(str)); return str; } |
#sum(n = 16) ⇒ Integer
Returns a basic n-bit checksum of the characters in str, where n is the optional Fixnum
parameter, defaulting to 16. The result is simply the sum of the binary value of each character in str modulo 2**n - 1
. This is not a particularly good checksum.
7508 7509 7510 7511 7512 7513 7514 7515 7516 7517 7518 7519 7520 7521 7522 7523 7524 7525 7526 7527 7528 7529 7530 7531 7532 7533 7534 7535 7536 7537 7538 7539 7540 7541 7542 7543 7544 7545 7546 7547 7548 7549 7550 7551 7552 7553 7554 7555 7556 7557 7558 7559 7560 7561 7562 7563 7564 |
# File 'string.c', line 7508 static VALUE rb_str_sum(int argc, VALUE *argv, VALUE str) { VALUE vbits; int bits; char *ptr, *p, *pend; long len; VALUE sum = INT2FIX(0); unsigned long sum0 = 0; if (argc == 0) { bits = 16; } else { rb_scan_args(argc, argv, "01", &vbits); bits = NUM2INT(vbits); } ptr = p = RSTRING_PTR(str); len = RSTRING_LEN(str); pend = p + len; while (p < pend) { if (FIXNUM_MAX - UCHAR_MAX < sum0) { sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0)); str_mod_check(str, ptr, len); sum0 = 0; } sum0 += (unsigned char)*p; p++; } if (bits == 0) { if (sum0) { sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0)); } } else { if (sum == INT2FIX(0)) { if (bits < (int)sizeof(long)*CHAR_BIT) { sum0 &= (((unsigned long)1)<<bits)-1; } sum = LONG2FIX(sum0); } else { VALUE mod; if (sum0) { sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0)); } mod = rb_funcall(INT2FIX(1), rb_intern("<<"), 1, INT2FIX(bits)); mod = rb_funcall(mod, '-', 1, INT2FIX(1)); sum = rb_funcall(sum, '&', 1, mod); } } return sum; } |
#swapcase ⇒ String
Returns a copy of str with uppercase alphabetic characters converted to lowercase and lowercase characters converted to uppercase. Note: case conversion is effective only in ASCII region.
"Hello".swapcase #=> "hELLO"
"cYbEr_PuNk11".swapcase #=> "CyBeR_pUnK11"
5330 5331 5332 5333 5334 5335 5336 |
# File 'string.c', line 5330 static VALUE rb_str_swapcase(VALUE str) { str = rb_str_dup(str); rb_str_swapcase_bang(str); return str; } |
#swapcase! ⇒ String?
Equivalent to String#swapcase
, but modifies the receiver in place, returning str, or nil
if no changes were made. Note: case conversion is effective only in ASCII region.
5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 |
# File 'string.c', line 5285 static VALUE rb_str_swapcase_bang(VALUE str) { rb_encoding *enc; char *s, *send; int modify = 0; int n; str_modify_keep_cr(str); enc = STR_ENC_GET(str); rb_str_check_dummy_enc(enc); s = RSTRING_PTR(str); send = RSTRING_END(str); while (s < send) { unsigned int c = rb_enc_codepoint_len(s, send, &n, enc); if (rb_enc_isupper(c, enc)) { /* assuming toupper returns codepoint with same size */ rb_enc_mbcput(rb_enc_tolower(c, enc), s, enc); modify = 1; } else if (rb_enc_islower(c, enc)) { /* assuming tolower returns codepoint with same size */ rb_enc_mbcput(rb_enc_toupper(c, enc), s, enc); modify = 1; } s += n; } if (modify) return str; return Qnil; } |
#to_c ⇒ Object
Returns a complex which denotes the string form. The parser ignores leading whitespaces and trailing garbage. Any digit sequences can be separated by an underscore. Returns zero for null or garbage string.
'9'.to_c #=> (9+0i)
'2.5'.to_c #=> (2.5+0i)
'2.5/1'.to_c #=> ((5/2)+0i)
'-3/2'.to_c #=> ((-3/2)+0i)
'-i'.to_c #=> (0-1i)
'45i'.to_c #=> (0+45i)
'3-4i'.to_c #=> (3-4i)
'-4e2-4e-2i'.to_c #=> (-400.0-0.04i)
'-0.0-0.0i'.to_c #=> (-0.0-0.0i)
'1/2+3/4i'.to_c #=> ((1/2)+(3/4)*i)
'ruby'.to_c #=> (0+0i)
See Kernel.Complex.
1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 |
# File 'complex.c', line 1811 static VALUE string_to_c(VALUE self) { char *s; VALUE num; rb_must_asciicompat(self); s = RSTRING_PTR(self); if (s && s[RSTRING_LEN(self)]) { rb_str_modify(self); s = RSTRING_PTR(self); s[RSTRING_LEN(self)] = '\0'; } if (!s) s = (char *)""; (void)parse_comp(s, 0, &num); return num; } |
#to_f ⇒ Float
Returns the result of interpreting leading characters in str as a floating point number. Extraneous characters past the end of a valid number are ignored. If there is not a valid number at the start of str, 0.0
is returned. This method never raises an exception.
"123.45e1".to_f #=> 1234.5
"45.67 degrees".to_f #=> 45.67
"thx1138".to_f #=> 0.0
4708 4709 4710 4711 4712 |
# File 'string.c', line 4708 static VALUE rb_str_to_f(VALUE str) { return DBL2NUM(rb_str_to_dbl(str, FALSE)); } |
#to_i(base = 10) ⇒ Integer
Returns the result of interpreting leading characters in str as an integer base base (between 2 and 36). Extraneous characters past the end of a valid number are ignored. If there is not a valid number at the start of str, 0
is returned. This method never raises an exception when base is valid.
"12345".to_i #=> 12345
"99 red balloons".to_i #=> 99
"0a".to_i #=> 0
"0a".to_i(16) #=> 10
"hello".to_i #=> 0
"1100101".to_i(2) #=> 101
"1100101".to_i(8) #=> 294977
"1100101".to_i(10) #=> 1100101
"1100101".to_i(16) #=> 17826049
4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 |
# File 'string.c', line 4675 static VALUE rb_str_to_i(int argc, VALUE *argv, VALUE str) { int base; if (argc == 0) base = 10; else { VALUE b; rb_scan_args(argc, argv, "01", &b); base = NUM2INT(b); } if (base < 0) { rb_raise(rb_eArgError, "invalid radix %d", base); } return rb_str_to_inum(str, base, FALSE); } |
#to_r ⇒ Object
Returns a rational which denotes the string form. The parser ignores leading whitespaces and trailing garbage. Any digit sequences can be separated by an underscore. Returns zero for null or garbage string.
NOTE: ‘0.3’.to_r isn’t the same as 0.3.to_r. The former is equivalent to ‘3/10’.to_r, but the latter isn’t so.
' 2 '.to_r #=> (2/1)
'300/2'.to_r #=> (150/1)
'-9.2'.to_r #=> (-46/5)
'-9.2e2'.to_r #=> (-920/1)
'1_234_567'.to_r #=> (1234567/1)
'21 june 09'.to_r #=> (21/1)
'21/06/09'.to_r #=> (7/2)
'bwv 1079'.to_r #=> (0/1)
See Kernel.Rational.
2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 |
# File 'rational.c', line 2349 static VALUE string_to_r(VALUE self) { char *s; VALUE num; rb_must_asciicompat(self); s = RSTRING_PTR(self); if (s && s[RSTRING_LEN(self)]) { rb_str_modify(self); s = RSTRING_PTR(self); s[RSTRING_LEN(self)] = '\0'; } if (!s) s = (char *)""; (void)parse_rat(s, 0, &num); if (RB_TYPE_P(num, T_FLOAT)) rb_raise(rb_eFloatDomainError, "Infinity"); return num; } |
#to_s ⇒ String #to_str ⇒ String
Returns the receiver.
4723 4724 4725 4726 4727 4728 4729 4730 |
# File 'string.c', line 4723 static VALUE rb_str_to_s(VALUE str) { if (rb_obj_class(str) != rb_cString) { return str_duplicate(rb_cString, str); } return str; } |
#to_s ⇒ String #to_str ⇒ String
Returns the receiver.
4723 4724 4725 4726 4727 4728 4729 4730 |
# File 'string.c', line 4723 static VALUE rb_str_to_s(VALUE str) { if (rb_obj_class(str) != rb_cString) { return str_duplicate(rb_cString, str); } return str; } |
#intern ⇒ Object #to_sym ⇒ Object
Returns the Symbol
corresponding to str, creating the symbol if it did not previously exist. See Symbol#id2name
.
"Koala".intern #=> :Koala
s = 'cat'.to_sym #=> :cat
s == :cat #=> true
s = '@cat'.to_sym #=> :@cat
s == :@cat #=> true
This can also be used to create symbols that cannot be represented using the :xxx
notation.
'cat and dog'.to_sym #=> :"cat and dog"
7469 7470 7471 7472 7473 7474 7475 7476 7477 |
# File 'string.c', line 7469 VALUE rb_str_intern(VALUE s) { VALUE str = RB_GC_GUARD(s); ID id; id = rb_intern_str(str); return ID2SYM(id); } |
#tr(from_str, to_str) ⇒ String
Returns a copy of str
with the characters in from_str
replaced by the corresponding characters in to_str
. If to_str
is shorter than from_str
, it is padded with its last character in order to maintain the correspondence.
"hello".tr('el', 'ip') #=> "hippo"
"hello".tr('aeiou', '*') #=> "h*ll*"
"hello".tr('aeiou', 'AA*') #=> "hAll*"
Both strings may use the c1-c2
notation to denote ranges of characters, and from_str
may start with a ^
, which denotes all characters except those listed.
"hello".tr('a-y', 'b-z') #=> "ifmmp"
"hello".tr('^aeiou', '*') #=> "*e**o"
The backslash character </code> can be used to escape <code>^
or -
and is otherwise ignored unless it appears at the end of a range or the end of the from_str
or to_str
:
"hello^world".tr("\\^aeiou", "*") #=> "h*ll**w*rld"
"hello-world".tr("a\\-eo", "*") #=> "h*ll**w*rld"
"hello\r\nworld".tr("\r", "") #=> "hello\nworld"
"hello\r\nworld".tr("\\r", "") #=> "hello\r\nwold"
"hello\r\nworld".tr("\\\r", "") #=> "hello\nworld"
"X['\\b']".tr("X\\", "") #=> "['b']"
"X['\\b']".tr("X-\\]", "") #=> "'b'"
5698 5699 5700 5701 5702 5703 5704 |
# File 'string.c', line 5698 static VALUE rb_str_tr(VALUE str, VALUE src, VALUE repl) { str = rb_str_dup(str); tr_trans(str, src, repl, 0); return str; } |
#tr!(from_str, to_str) ⇒ String?
Translates str in place, using the same rules as String#tr
. Returns str, or nil
if no changes were made.
5656 5657 5658 5659 5660 |
# File 'string.c', line 5656 static VALUE rb_str_tr_bang(VALUE str, VALUE src, VALUE repl) { return tr_trans(str, src, repl, 0); } |
#tr_s(from_str, to_str) ⇒ String
Processes a copy of str as described under String#tr
, then removes duplicate characters in regions that were affected by the translation.
"hello".tr_s('l', 'r') #=> "hero"
"hello".tr_s('el', '*') #=> "h*o"
"hello".tr_s('el', 'hx') #=> "hhxo"
6021 6022 6023 6024 6025 6026 6027 |
# File 'string.c', line 6021 static VALUE rb_str_tr_s(VALUE str, VALUE src, VALUE repl) { str = rb_str_dup(str); tr_trans(str, src, repl, 1); return str; } |
#tr_s!(from_str, to_str) ⇒ String?
Performs String#tr_s
processing on str in place, returning str, or nil
if no changes were made.
6001 6002 6003 6004 6005 |
# File 'string.c', line 6001 static VALUE rb_str_tr_s_bang(VALUE str, VALUE src, VALUE repl) { return tr_trans(str, src, repl, 1); } |
#unpack(format) ⇒ Array
Decodes str (which may contain binary data) according to the format string, returning an array of each value extracted. The format string consists of a sequence of single-character directives, summarized in the table at the end of this entry. Each directive may be followed by a number, indicating the number of times to repeat with this directive. An asterisk (“*
”) will use up all remaining elements. The directives sSiIlL
may each be followed by an underscore (“_
”) or exclamation mark (“!
”) to use the underlying platform’s native size for the specified type; otherwise, it uses a platform-independent consistent size. Spaces are ignored in the format string. See also Array#pack
.
"abc \0\0abc \0\0".unpack('A6Z6') #=> ["abc", "abc "]
"abc \0\0".unpack('a3a3') #=> ["abc", " \000\000"]
"abc \0abc \0".unpack('Z*Z*') #=> ["abc ", "abc "]
"aa".unpack('b8B8') #=> ["10000110", "01100001"]
"aaa".unpack('h2H2c') #=> ["16", "61", 97]
"\xfe\xff\xfe\xff".unpack('sS') #=> [-2, 65534]
"now=20is".unpack('M*') #=> ["now is"]
"whole".unpack('xax2aX2aX1aX2a') #=> ["h", "e", "l", "l", "o"]
This table summarizes the various formats and the Ruby classes returned by each.
Integer | |
Directive | Returns | Meaning
-----------------------------------------------------------------
C | Integer | 8-bit unsigned (unsigned char)
S | Integer | 16-bit unsigned, native endian (uint16_t)
L | Integer | 32-bit unsigned, native endian (uint32_t)
Q | Integer | 64-bit unsigned, native endian (uint64_t)
| |
c | Integer | 8-bit signed (signed char)
s | Integer | 16-bit signed, native endian (int16_t)
l | Integer | 32-bit signed, native endian (int32_t)
q | Integer | 64-bit signed, native endian (int64_t)
| |
S_, S! | Integer | unsigned short, native endian
I, I_, I! | Integer | unsigned int, native endian
L_, L! | Integer | unsigned long, native endian
Q_, Q! | Integer | unsigned long long, native endian (ArgumentError
| | if the platform has no long long type.)
| | (Q_ and Q! is available since Ruby 2.1.)
| |
s_, s! | Integer | signed short, native endian
i, i_, i! | Integer | signed int, native endian
l_, l! | Integer | signed long, native endian
q_, q! | Integer | signed long long, native endian (ArgumentError
| | if the platform has no long long type.)
| | (q_ and q! is available since Ruby 2.1.)
| |
S> L> Q> | Integer | same as the directives without ">" except
s> l> q> | | big endian
S!> I!> | | (available since Ruby 1.9.3)
L!> Q!> | | "S>" is same as "n"
s!> i!> | | "L>" is same as "N"
l!> q!> | |
| |
S< L< Q< | Integer | same as the directives without "<" except
s< l< q< | | little endian
S!< I!< | | (available since Ruby 1.9.3)
L!< Q!< | | "S<" is same as "v"
s!< i!< | | "L<" is same as "V"
l!< q!< | |
| |
n | Integer | 16-bit unsigned, network (big-endian) byte order
N | Integer | 32-bit unsigned, network (big-endian) byte order
v | Integer | 16-bit unsigned, VAX (little-endian) byte order
V | Integer | 32-bit unsigned, VAX (little-endian) byte order
| |
U | Integer | UTF-8 character
w | Integer | BER-compressed integer (see Array.pack)
Float | |
Directive | Returns | Meaning
-----------------------------------------------------------------
D, d | Float | double-precision, native format
F, f | Float | single-precision, native format
E | Float | double-precision, little-endian byte order
e | Float | single-precision, little-endian byte order
G | Float | double-precision, network (big-endian) byte order
g | Float | single-precision, network (big-endian) byte order
String | |
Directive | Returns | Meaning
-----------------------------------------------------------------
A | String | arbitrary binary string (remove trailing nulls and ASCII spaces)
a | String | arbitrary binary string
Z | String | null-terminated string
B | String | bit string (MSB first)
b | String | bit string (LSB first)
H | String | hex string (high nibble first)
h | String | hex string (low nibble first)
u | String | UU-encoded string
M | String | quoted-printable, MIME encoding (see RFC2045)
m | String | base64 encoded string (RFC 2045) (default)
| | base64 encoded string (RFC 4648) if followed by 0
P | String | pointer to a structure (fixed-length string)
p | String | pointer to a null-terminated string
Misc. | |
Directive | Returns | Meaning
-----------------------------------------------------------------
@ | --- | skip to the offset given by the length argument
X | --- | skip backward one byte
x | --- | skip forward one byte
1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 |
# File 'pack.c', line 1197 static VALUE pack_unpack(VALUE str, VALUE fmt) { static const char hexdigits[] = "0123456789abcdef"; char *s, *send; char *p, *pend; VALUE ary; char type; long len, tmp_len; int star; #ifdef NATINT_PACK int natint; /* native integer */ #endif int block_p = rb_block_given_p(); int signed_p, integer_size, bigendian_p; #define UNPACK_PUSH(item) do { VALUE item_val = (item); if (block_p) { rb_yield(item_val); } else { rb_ary_push(ary, item_val); } } while (0) StringValue(str); StringValue(fmt); s = RSTRING_PTR(str); send = s + RSTRING_LEN(str); p = RSTRING_PTR(fmt); pend = p + RSTRING_LEN(fmt); ary = block_p ? Qnil : rb_ary_new(); while (p < pend) { int explicit_endian = 0; type = *p++; #ifdef NATINT_PACK natint = 0; #endif if (ISSPACE(type)) continue; if (type == '#') { while ((p < pend) && (*p != '\n')) { p++; } continue; } star = 0; { modifiers: switch (*p) { case '_': case '!': if (strchr(natstr, type)) { #ifdef NATINT_PACK natint = 1; #endif p++; } else { rb_raise(rb_eArgError, "'%c' allowed only after types %s", *p, natstr); } goto modifiers; case '<': case '>': if (!strchr(endstr, type)) { rb_raise(rb_eArgError, "'%c' allowed only after types %s", *p, endstr); } if (explicit_endian) { rb_raise(rb_eRangeError, "Can't use both '<' and '>'"); } explicit_endian = *p++; goto modifiers; } } if (p >= pend) len = 1; else if (*p == '*') { star = 1; len = send - s; p++; } else if (ISDIGIT(*p)) { errno = 0; len = STRTOUL(p, (char**)&p, 10); if (errno) { rb_raise(rb_eRangeError, "pack length too big"); } } else { len = (type != '@'); } switch (type) { case '%': rb_raise(rb_eArgError, "%% is not supported"); break; case 'A': if (len > send - s) len = send - s; { long end = len; char *t = s + len - 1; while (t >= s) { if (*t != ' ' && *t != '\0') break; t--; len--; } UNPACK_PUSH(infected_str_new(s, len, str)); s += end; } break; case 'Z': { char *t = s; if (len > send-s) len = send-s; while (t < s+len && *t) t++; UNPACK_PUSH(infected_str_new(s, t-s, str)); if (t < send) t++; s = star ? t : s+len; } break; case 'a': if (len > send - s) len = send - s; UNPACK_PUSH(infected_str_new(s, len, str)); s += len; break; case 'b': { VALUE bitstr; char *t; int bits; long i; if (p[-1] == '*' || len > (send - s) * 8) len = (send - s) * 8; bits = 0; UNPACK_PUSH(bitstr = rb_usascii_str_new(0, len)); t = RSTRING_PTR(bitstr); for (i=0; i<len; i++) { if (i & 7) bits >>= 1; else bits = *s++; *t++ = (bits & 1) ? '1' : '0'; } } break; case 'B': { VALUE bitstr; char *t; int bits; long i; if (p[-1] == '*' || len > (send - s) * 8) len = (send - s) * 8; bits = 0; UNPACK_PUSH(bitstr = rb_usascii_str_new(0, len)); t = RSTRING_PTR(bitstr); for (i=0; i<len; i++) { if (i & 7) bits <<= 1; else bits = *s++; *t++ = (bits & 128) ? '1' : '0'; } } break; case 'h': { VALUE bitstr; char *t; int bits; long i; if (p[-1] == '*' || len > (send - s) * 2) len = (send - s) * 2; bits = 0; UNPACK_PUSH(bitstr = rb_usascii_str_new(0, len)); t = RSTRING_PTR(bitstr); for (i=0; i<len; i++) { if (i & 1) bits >>= 4; else bits = *s++; *t++ = hexdigits[bits & 15]; } } break; case 'H': { VALUE bitstr; char *t; int bits; long i; if (p[-1] == '*' || len > (send - s) * 2) len = (send - s) * 2; bits = 0; UNPACK_PUSH(bitstr = rb_usascii_str_new(0, len)); t = RSTRING_PTR(bitstr); for (i=0; i<len; i++) { if (i & 1) bits <<= 4; else bits = *s++; *t++ = hexdigits[(bits >> 4) & 15]; } } break; case 'c': signed_p = 1; integer_size = 1; bigendian_p = BIGENDIAN_P(); /* not effective */ goto unpack_integer; case 'C': signed_p = 0; integer_size = 1; bigendian_p = BIGENDIAN_P(); /* not effective */ goto unpack_integer; case 's': signed_p = 1; integer_size = NATINT_LEN(short, 2); bigendian_p = BIGENDIAN_P(); goto unpack_integer; case 'S': signed_p = 0; integer_size = NATINT_LEN(short, 2); bigendian_p = BIGENDIAN_P(); goto unpack_integer; case 'i': signed_p = 1; integer_size = (int)sizeof(int); bigendian_p = BIGENDIAN_P(); goto unpack_integer; case 'I': signed_p = 0; integer_size = (int)sizeof(int); bigendian_p = BIGENDIAN_P(); goto unpack_integer; case 'l': signed_p = 1; integer_size = NATINT_LEN(long, 4); bigendian_p = BIGENDIAN_P(); goto unpack_integer; case 'L': signed_p = 0; integer_size = NATINT_LEN(long, 4); bigendian_p = BIGENDIAN_P(); goto unpack_integer; case 'q': signed_p = 1; integer_size = NATINT_LEN_Q; bigendian_p = BIGENDIAN_P(); goto unpack_integer; case 'Q': signed_p = 0; integer_size = NATINT_LEN_Q; bigendian_p = BIGENDIAN_P(); goto unpack_integer; case 'n': signed_p = 0; integer_size = 2; bigendian_p = 1; goto unpack_integer; case 'N': signed_p = 0; integer_size = 4; bigendian_p = 1; goto unpack_integer; case 'v': signed_p = 0; integer_size = 2; bigendian_p = 0; goto unpack_integer; case 'V': signed_p = 0; integer_size = 4; bigendian_p = 0; goto unpack_integer; unpack_integer: if (explicit_endian) { bigendian_p = explicit_endian == '>'; } PACK_LENGTH_ADJUST_SIZE(integer_size); while (len-- > 0) { int flags = bigendian_p ? INTEGER_PACK_BIG_ENDIAN : INTEGER_PACK_LITTLE_ENDIAN; VALUE val; if (signed_p) flags |= INTEGER_PACK_2COMP; val = rb_integer_unpack(s, integer_size, 1, 0, flags); UNPACK_PUSH(val); s += integer_size; } PACK_ITEM_ADJUST(); break; case 'f': case 'F': PACK_LENGTH_ADJUST_SIZE(sizeof(float)); while (len-- > 0) { float tmp; memcpy(&tmp, s, sizeof(float)); s += sizeof(float); UNPACK_PUSH(DBL2NUM((double)tmp)); } PACK_ITEM_ADJUST(); break; case 'e': PACK_LENGTH_ADJUST_SIZE(sizeof(float)); while (len-- > 0) { float tmp; FLOAT_CONVWITH(ftmp); memcpy(&tmp, s, sizeof(float)); s += sizeof(float); tmp = VTOHF(tmp,ftmp); UNPACK_PUSH(DBL2NUM((double)tmp)); } PACK_ITEM_ADJUST(); break; case 'E': PACK_LENGTH_ADJUST_SIZE(sizeof(double)); while (len-- > 0) { double tmp; DOUBLE_CONVWITH(dtmp); memcpy(&tmp, s, sizeof(double)); s += sizeof(double); tmp = VTOHD(tmp,dtmp); UNPACK_PUSH(DBL2NUM(tmp)); } PACK_ITEM_ADJUST(); break; case 'D': case 'd': PACK_LENGTH_ADJUST_SIZE(sizeof(double)); while (len-- > 0) { double tmp; memcpy(&tmp, s, sizeof(double)); s += sizeof(double); UNPACK_PUSH(DBL2NUM(tmp)); } PACK_ITEM_ADJUST(); break; case 'g': PACK_LENGTH_ADJUST_SIZE(sizeof(float)); while (len-- > 0) { float tmp; FLOAT_CONVWITH(ftmp); memcpy(&tmp, s, sizeof(float)); s += sizeof(float); tmp = NTOHF(tmp,ftmp); UNPACK_PUSH(DBL2NUM((double)tmp)); } PACK_ITEM_ADJUST(); break; case 'G': PACK_LENGTH_ADJUST_SIZE(sizeof(double)); while (len-- > 0) { double tmp; DOUBLE_CONVWITH(dtmp); memcpy(&tmp, s, sizeof(double)); s += sizeof(double); tmp = NTOHD(tmp,dtmp); UNPACK_PUSH(DBL2NUM(tmp)); } PACK_ITEM_ADJUST(); break; case 'U': if (len > send - s) len = send - s; while (len > 0 && s < send) { long alen = send - s; unsigned long l; l = utf8_to_uv(s, &alen); s += alen; len--; UNPACK_PUSH(ULONG2NUM(l)); } break; case 'u': { VALUE buf = infected_str_new(0, (send - s)*3/4, str); char *ptr = RSTRING_PTR(buf); long total = 0; while (s < send && *s > ' ' && *s < 'a') { long a,b,c,d; char hunk[4]; hunk[3] = '\0'; len = (*s++ - ' ') & 077; total += len; if (total > RSTRING_LEN(buf)) { len -= total - RSTRING_LEN(buf); total = RSTRING_LEN(buf); } while (len > 0) { long mlen = len > 3 ? 3 : len; if (s < send && *s >= ' ') a = (*s++ - ' ') & 077; else a = 0; if (s < send && *s >= ' ') b = (*s++ - ' ') & 077; else b = 0; if (s < send && *s >= ' ') c = (*s++ - ' ') & 077; else c = 0; if (s < send && *s >= ' ') d = (*s++ - ' ') & 077; else d = 0; hunk[0] = (char)(a << 2 | b >> 4); hunk[1] = (char)(b << 4 | c >> 2); hunk[2] = (char)(c << 6 | d); memcpy(ptr, hunk, mlen); ptr += mlen; len -= mlen; } if (*s == '\r') s++; if (*s == '\n') s++; else if (s < send && (s+1 == send || s[1] == '\n')) s += 2; /* possible checksum byte */ } rb_str_set_len(buf, total); UNPACK_PUSH(buf); } break; case 'm': { VALUE buf = infected_str_new(0, (send - s + 3)*3/4, str); /* +3 is for skipping paddings */ char *ptr = RSTRING_PTR(buf); int a = -1,b = -1,c = 0,d = 0; static signed char b64_xtable[256]; if (b64_xtable['/'] <= 0) { int i; for (i = 0; i < 256; i++) { b64_xtable[i] = -1; } for (i = 0; i < 64; i++) { b64_xtable[(unsigned char)b64_table[i]] = (char)i; } } if (len == 0) { while (s < send) { a = b = c = d = -1; a = b64_xtable[(unsigned char)*s++]; if (s >= send || a == -1) rb_raise(rb_eArgError, "invalid base64"); b = b64_xtable[(unsigned char)*s++]; if (s >= send || b == -1) rb_raise(rb_eArgError, "invalid base64"); if (*s == '=') { if (s + 2 == send && *(s + 1) == '=') break; rb_raise(rb_eArgError, "invalid base64"); } c = b64_xtable[(unsigned char)*s++]; if (s >= send || c == -1) rb_raise(rb_eArgError, "invalid base64"); if (s + 1 == send && *s == '=') break; d = b64_xtable[(unsigned char)*s++]; if (d == -1) rb_raise(rb_eArgError, "invalid base64"); *ptr++ = castchar(a << 2 | b >> 4); *ptr++ = castchar(b << 4 | c >> 2); *ptr++ = castchar(c << 6 | d); } if (c == -1) { *ptr++ = castchar(a << 2 | b >> 4); if (b & 0xf) rb_raise(rb_eArgError, "invalid base64"); } else if (d == -1) { *ptr++ = castchar(a << 2 | b >> 4); *ptr++ = castchar(b << 4 | c >> 2); if (c & 0x3) rb_raise(rb_eArgError, "invalid base64"); } } else { while (s < send) { a = b = c = d = -1; while ((a = b64_xtable[(unsigned char)*s]) == -1 && s < send) {s++;} if (s >= send) break; s++; while ((b = b64_xtable[(unsigned char)*s]) == -1 && s < send) {s++;} if (s >= send) break; s++; while ((c = b64_xtable[(unsigned char)*s]) == -1 && s < send) {if (*s == '=') break; s++;} if (*s == '=' || s >= send) break; s++; while ((d = b64_xtable[(unsigned char)*s]) == -1 && s < send) {if (*s == '=') break; s++;} if (*s == '=' || s >= send) break; s++; *ptr++ = castchar(a << 2 | b >> 4); *ptr++ = castchar(b << 4 | c >> 2); *ptr++ = castchar(c << 6 | d); a = -1; } if (a != -1 && b != -1) { if (c == -1) *ptr++ = castchar(a << 2 | b >> 4); else { *ptr++ = castchar(a << 2 | b >> 4); *ptr++ = castchar(b << 4 | c >> 2); } } } rb_str_set_len(buf, ptr - RSTRING_PTR(buf)); UNPACK_PUSH(buf); } break; case 'M': { VALUE buf = infected_str_new(0, send - s, str); char *ptr = RSTRING_PTR(buf), *ss = s; int c1, c2; while (s < send) { if (*s == '=') { if (++s == send) break; if (s+1 < send && *s == '\r' && *(s+1) == '\n') s++; if (*s != '\n') { if ((c1 = hex2num(*s)) == -1) break; if (++s == send) break; if ((c2 = hex2num(*s)) == -1) break; *ptr++ = castchar(c1 << 4 | c2); } } else { *ptr++ = *s; } s++; ss = s; } rb_str_set_len(buf, ptr - RSTRING_PTR(buf)); rb_str_buf_cat(buf, ss, send-ss); ENCODING_CODERANGE_SET(buf, rb_ascii8bit_encindex(), ENC_CODERANGE_VALID); UNPACK_PUSH(buf); } break; case '@': if (len > RSTRING_LEN(str)) rb_raise(rb_eArgError, "@ outside of string"); s = RSTRING_PTR(str) + len; break; case 'X': if (len > s - RSTRING_PTR(str)) rb_raise(rb_eArgError, "X outside of string"); s -= len; break; case 'x': if (len > send - s) rb_raise(rb_eArgError, "x outside of string"); s += len; break; case 'P': if (sizeof(char *) <= (size_t)(send - s)) { VALUE tmp = Qnil; char *t; memcpy(&t, s, sizeof(char *)); s += sizeof(char *); if (t) { VALUE a; const VALUE *p, *pend; if (!(a = rb_str_associated(str))) { rb_raise(rb_eArgError, "no associated pointer"); } p = RARRAY_CONST_PTR(a); pend = p + RARRAY_LEN(a); while (p < pend) { if (RB_TYPE_P(*p, T_STRING) && RSTRING_PTR(*p) == t) { if (len < RSTRING_LEN(*p)) { tmp = rb_tainted_str_new(t, len); rb_str_associate(tmp, a); } else { tmp = *p; } break; } p++; } if (p == pend) { rb_raise(rb_eArgError, "non associated pointer"); } } UNPACK_PUSH(tmp); } break; case 'p': if (len > (long)((send - s) / sizeof(char *))) len = (send - s) / sizeof(char *); while (len-- > 0) { if ((size_t)(send - s) < sizeof(char *)) break; else { VALUE tmp = Qnil; char *t; memcpy(&t, s, sizeof(char *)); s += sizeof(char *); if (t) { VALUE a; const VALUE *p, *pend; if (!(a = rb_str_associated(str))) { rb_raise(rb_eArgError, "no associated pointer"); } p = RARRAY_CONST_PTR(a); pend = p + RARRAY_LEN(a); while (p < pend) { if (RB_TYPE_P(*p, T_STRING) && RSTRING_PTR(*p) == t) { tmp = *p; break; } p++; } if (p == pend) { rb_raise(rb_eArgError, "non associated pointer"); } } UNPACK_PUSH(tmp); } } break; case 'w': { char *s0 = s; while (len > 0 && s < send) { if (*s & 0x80) { s++; } else { s++; UNPACK_PUSH(rb_integer_unpack(s0, s-s0, 1, 1, INTEGER_PACK_BIG_ENDIAN)); len--; s0 = s; } } } break; default: rb_warning("unknown unpack directive '%c' in '%s'", type, RSTRING_PTR(fmt)); break; } } return ary; } |
#upcase ⇒ String
Returns a copy of str with all lowercase letters replaced with their uppercase counterparts. The operation is locale insensitive—only characters “a” to “z” are affected. Note: case replacement is effective only in ASCII region.
"hEllO".upcase #=> "HELLO"
5113 5114 5115 5116 5117 5118 5119 |
# File 'string.c', line 5113 static VALUE rb_str_upcase(VALUE str) { str = rb_str_dup(str); rb_str_upcase_bang(str); return str; } |
#upcase! ⇒ String?
Upcases the contents of str, returning nil
if no changes were made. Note: case replacement is effective only in ASCII region.
5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 |
# File 'string.c', line 5048 static VALUE rb_str_upcase_bang(VALUE str) { rb_encoding *enc; char *s, *send; int modify = 0; int n; str_modify_keep_cr(str); enc = STR_ENC_GET(str); rb_str_check_dummy_enc(enc); s = RSTRING_PTR(str); send = RSTRING_END(str); if (single_byte_optimizable(str)) { while (s < send) { unsigned int c = *(unsigned char*)s; if (rb_enc_isascii(c, enc) && 'a' <= c && c <= 'z') { *s = 'A' + (c - 'a'); modify = 1; } s++; } } else { int ascompat = rb_enc_asciicompat(enc); while (s < send) { unsigned int c; if (ascompat && (c = *(unsigned char*)s) < 0x80) { if (rb_enc_isascii(c, enc) && 'a' <= c && c <= 'z') { *s = 'A' + (c - 'a'); modify = 1; } s++; } else { c = rb_enc_codepoint_len(s, send, &n, enc); if (rb_enc_islower(c, enc)) { /* assuming toupper returns codepoint with same size */ rb_enc_mbcput(rb_enc_toupper(c, enc), s, enc); modify = 1; } s += n; } } } if (modify) return str; return Qnil; } |
#upto(other_str, exclusive = false) {|s| ... } ⇒ String #upto(other_str, exclusive = false) ⇒ Object
Iterates through successive values, starting at str and ending at other_str inclusive, passing each value in turn to the block. The String#succ
method is used to generate each value. If optional second argument exclusive is omitted or is false, the last value will be included; otherwise it will be excluded.
If no block is given, an enumerator is returned instead.
"a8".upto("b6") {|s| print s, ' ' }
for s in "a8".."b6"
print s, ' '
end
produces:
a8 a9 b0 b1 b2 b3 b4 b5 b6
a8 a9 b0 b1 b2 b3 b4 b5 b6
If str and other_str contains only ascii numeric characters, both are recognized as decimal numbers. In addition, the width of string (e.g. leading zeros) is handled appropriately.
"9".upto("11").to_a #=> ["9", "10", "11"]
"25".upto("5").to_a #=> []
"07".upto("11").to_a #=> ["07", "08", "09", "10", "11"]
3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 |
# File 'string.c', line 3395 static VALUE rb_str_upto(int argc, VALUE *argv, VALUE beg) { VALUE end, exclusive; VALUE current, after_end; ID succ; int n, excl, ascii; rb_encoding *enc; rb_scan_args(argc, argv, "11", &end, &exclusive); RETURN_ENUMERATOR(beg, argc, argv); excl = RTEST(exclusive); CONST_ID(succ, "succ"); StringValue(end); enc = rb_enc_check(beg, end); ascii = (is_ascii_string(beg) && is_ascii_string(end)); /* single character */ if (RSTRING_LEN(beg) == 1 && RSTRING_LEN(end) == 1 && ascii) { char c = RSTRING_PTR(beg)[0]; char e = RSTRING_PTR(end)[0]; if (c > e || (excl && c == e)) return beg; for (;;) { rb_yield(rb_enc_str_new(&c, 1, enc)); if (!excl && c == e) break; c++; if (excl && c == e) break; } return beg; } /* both edges are all digits */ if (ascii && ISDIGIT(RSTRING_PTR(beg)[0]) && ISDIGIT(RSTRING_PTR(end)[0])) { char *s, *send; VALUE b, e; int width; s = RSTRING_PTR(beg); send = RSTRING_END(beg); width = rb_long2int(send - s); while (s < send) { if (!ISDIGIT(*s)) goto no_digits; s++; } s = RSTRING_PTR(end); send = RSTRING_END(end); while (s < send) { if (!ISDIGIT(*s)) goto no_digits; s++; } b = rb_str_to_inum(beg, 10, FALSE); e = rb_str_to_inum(end, 10, FALSE); if (FIXNUM_P(b) && FIXNUM_P(e)) { long bi = FIX2LONG(b); long ei = FIX2LONG(e); rb_encoding *usascii = rb_usascii_encoding(); while (bi <= ei) { if (excl && bi == ei) break; rb_yield(rb_enc_sprintf(usascii, "%.*ld", width, bi)); bi++; } } else { ID op = excl ? '<' : rb_intern("<="); VALUE args[2], fmt = rb_obj_freeze(rb_usascii_str_new_cstr("%.*d")); args[0] = INT2FIX(width); while (rb_funcall(b, op, 1, e)) { args[1] = b; rb_yield(rb_str_format(numberof(args), args, fmt)); b = rb_funcall(b, succ, 0, 0); } } return beg; } /* normal case */ no_digits: n = rb_str_cmp(beg, end); if (n > 0 || (excl && n == 0)) return beg; after_end = rb_funcall(end, succ, 0, 0); current = rb_str_dup(beg); while (!rb_str_equal(current, after_end)) { VALUE next = Qnil; if (excl || !rb_str_equal(current, end)) next = rb_funcall(current, succ, 0, 0); rb_yield(current); if (NIL_P(next)) break; current = next; StringValue(current); if (excl && rb_str_equal(current, end)) break; if (RSTRING_LEN(current) > RSTRING_LEN(end) || RSTRING_LEN(current) == 0) break; } return beg; } |
#valid_encoding? ⇒ Boolean
Returns true for a string which encoded correctly.
"\xc2\xa1".force_encoding("UTF-8").valid_encoding? #=> true
"\xc2".force_encoding("UTF-8").valid_encoding? #=> false
"\x80".force_encoding("UTF-8").valid_encoding? #=> false
7931 7932 7933 7934 7935 7936 7937 |
# File 'string.c', line 7931 static VALUE rb_str_valid_encoding_p(VALUE str) { int cr = rb_enc_str_coderange(str); return cr == ENC_CODERANGE_BROKEN ? Qfalse : Qtrue; } |