Class: String
Overview
A String object holds and manipulates an arbitrary sequence of bytes, typically representing characters. String objects may be created using String::new or as literals.
Because of aliasing issues, users of strings should be aware of the methods that modify the contents of a String object. Typically, methods with names ending in “!” modify their receiver, while those without a “!” return a new String. However, there are exceptions, such as String#[]=.
Direct Known Subclasses
Class Method Summary collapse
-
.try_convert(object) ⇒ Object?
If
object
is a String object, returnsobject
.
Instance Method Summary collapse
-
#%(object) ⇒ Object
Returns the result of formatting
object
into the format specificationself
(see Kernel#sprintf for formatting details): “%05d” % 123 # => “00123”. -
#*(integer) ⇒ Object
Returns a new String containing
integer
copies ofself
: “Ho! ” * 3 # => “Ho! Ho! Ho! ” “Ho! ” * 0 # => “”. -
#+(other_string) ⇒ Object
Returns a new String containing
other_string
concatenated toself
: “Hello from ” + self.to_s # => “Hello from main”. -
#+ ⇒ self
Returns
self
ifself
is not frozen. -
#- ⇒ Object
Returns a frozen, possibly pre-existing copy of the string.
-
#<<(object) ⇒ String
Returns a new String containing the concatenation of
self
andobject
: s = ‘foo’ s << ‘bar’ # => “foobar”. -
#<=>(other_string) ⇒ -1, ...
Compares
self
andother_string
, returning: - -1 ifother_string
is smaller. -
#==(str2) ⇒ Object
Returns
true
ifobject
has the same length and content; asself
;false
otherwise: s = ‘foo’ s == ‘foo’ # => true s == ‘food’ # => false s == ‘FOO’ # => false. -
#===(str2) ⇒ Object
Returns
true
ifobject
has the same length and content; asself
;false
otherwise: s = ‘foo’ s == ‘foo’ # => true s == ‘food’ # => false s == ‘FOO’ # => false. -
#=~(y) ⇒ Object
Returns the Integer index of the first substring that matches the given
regexp
, ornil
if no match found: ‘foo’ =~ /f/ # => 0 ‘foo’ =~ /o/ # => 1 ‘foo’ =~ /x/ # => nil. -
#[](*args) ⇒ Object
Returns the substring of
self
specified by the arguments. -
#[]=(*args) ⇒ Object
Element Assignment—Replaces some or all of the content of str.
-
#ascii_only? ⇒ Boolean
Returns true for a string which has only ASCII characters.
-
#b ⇒ String
Returns a copied string whose encoding is ASCII-8BIT.
-
#bytes ⇒ Array
Returns an array of bytes in str.
-
#bytesize ⇒ Integer
Returns the count of bytes in
self
: “x80u3042”.bytesize # => 4 “hello”.bytesize # => 5. -
#byteslice(*args) ⇒ Object
Byte Reference—If passed a single Integer, returns a substring of one byte at that position.
-
#capitalize(*args) ⇒ Object
Returns a copy of str with the first character converted to uppercase and the remainder to lowercase.
-
#capitalize!(*args) ⇒ Object
Modifies str by converting the first character to uppercase and the remainder to lowercase.
-
#casecmp(other_str) ⇒ -1, ...
Compares
self
andother_string
, ignoring case, and returning: - -1 ifother_string
is smaller. -
#casecmp?(other_string) ⇒ true, ...
Returns
true
ifself
andother_string
are equal after Unicode case folding, otherwisefalse
: ‘foo’.casecmp?(‘foo’) # => true ‘foo’.casecmp?(‘food’) # => false ‘food’.casecmp?(‘foo’) # => true ‘FOO’.casecmp?(‘foo’) # => true ‘foo’.casecmp?(‘FOO’) # => true. -
#center(width, padstr = ' ') ⇒ String
Centers
str
inwidth
. -
#chars ⇒ Array
Returns an array of characters in str.
-
#chomp(separator = $/) ⇒ String
Returns a new String with the given record separator removed from the end of str (if present).
-
#chomp!(separator = $/) ⇒ String?
Modifies str in place as described for String#chomp, returning str, or
nil
if no modifications were made. -
#chop ⇒ String
Returns a new String with the last character removed.
-
#chop! ⇒ String?
Processes str as for String#chop, returning str, or
nil
if str is the empty string. -
#chr ⇒ String
Returns a one-character string at the beginning of the string.
-
#clear ⇒ String
Makes string empty.
-
#codepoints ⇒ Array
Returns an array of the Integer ordinals of the characters in str.
-
#concat(*objects) ⇒ Object
Returns a new String containing the concatenation of
self
and all objects inobjects
:. -
#count([other_str]) ⇒ Integer
Each
other_str
parameter defines a set of characters to count. -
#crypt(salt_str) ⇒ String
Returns the string generated by calling
crypt(3)
standard library function withstr
andsalt_str
, in this order, as its arguments. -
#delete([other_str]) ⇒ String
Returns a copy of str with all characters in the intersection of its arguments deleted.
-
#delete!([other_str]) ⇒ String?
Performs a
delete
operation in place, returning str, ornil
if str was not modified. -
#delete_prefix(prefix) ⇒ String
Returns a copy of str with leading
prefix
deleted. -
#delete_prefix!(prefix) ⇒ self?
Deletes leading
prefix
from str, returningnil
if no change was made. -
#delete_suffix(suffix) ⇒ String
Returns a copy of str with trailing
suffix
deleted. -
#delete_suffix!(suffix) ⇒ self?
Deletes trailing
suffix
from str, returningnil
if no change was made. -
#downcase(*args) ⇒ Object
Returns a copy of str with all uppercase letters replaced with their lowercase counterparts.
-
#downcase!(*args) ⇒ Object
Downcases the contents of str, returning
nil
if no changes were made. -
#dump ⇒ String
Returns a quoted version of the string with all non-printing characters replaced by
\xHH
notation and all special characters escaped. -
#each_byte ⇒ Object
Passes each byte in str to the given block, or returns an enumerator if no block is given.
-
#each_char ⇒ Object
Passes each character in str to the given block, or returns an enumerator if no block is given.
-
#each_codepoint ⇒ Object
Passes the Integer ordinal of each character in str, also known as a codepoint when applied to Unicode strings to the given block.
-
#each_grapheme_cluster ⇒ Object
Passes each grapheme cluster in str to the given block, or returns an enumerator if no block is given.
-
#each_line(*args) ⇒ Object
Splits str using the supplied parameter as the record separator (
$/
by default), passing each substring in turn to the supplied block. -
#empty? ⇒ Boolean
Returns
true
if the length ofself
is zero,false
otherwise: “hello”.empty? # => false “ ”.empty? # => false “”.empty? # => true. -
#encode(*args) ⇒ Object
The first form returns a copy of
str
transcoded to encodingencoding
. -
#encode!(*args) ⇒ Object
The first form transcodes the contents of str from str.encoding to
encoding
. -
#encoding ⇒ Encoding
Returns the Encoding object that represents the encoding of obj.
-
#end_with?([suffixes]) ⇒ Boolean
Returns true if
str
ends with one of thesuffixes
given. -
#eql?(object) ⇒ Boolean
Returns
true
ifobject
has the same length and content; asself
;false
otherwise: s = ‘foo’ s.eql?(‘foo’) # => true s.eql?(‘food’) # => false s.eql?(‘FOO’) # => false. -
#force_encoding(encoding) ⇒ String
Changes the encoding to
encoding
and returns self. - #freeze ⇒ Object
-
#getbyte(index) ⇒ 0 .. 255
returns the indexth byte as an integer.
-
#grapheme_clusters ⇒ Array
Returns an array of grapheme clusters in str.
-
#gsub(*args) ⇒ Object
Returns a copy of str with all occurrences of pattern substituted for the second argument.
-
#gsub!(*args) ⇒ Object
Performs the substitutions of String#gsub in place, returning str, or
nil
if no substitutions were performed. -
#hash ⇒ Integer
Returns the integer hash value for
self
. -
#hex ⇒ Integer
Treats leading characters from str as a string of hexadecimal digits (with an optional sign and an optional
0x
) and returns the corresponding number. -
#include?(other_str) ⇒ Boolean
Returns
true
if str contains the given string or character. -
#index(*args) ⇒ Object
Returns the Integer index of the first occurrence of the given
substring
, ornil
if none found: ‘foo’.index(‘f’) # => 0 ‘foo’.index(‘o’) # => 1 ‘foo’.index(‘oo’) # => 1 ‘foo’.index(‘ooo’) # => nil. -
#initialize(*args) ⇒ Object
constructor
Returns a new String that is a copy of
string
. -
#replace(other_str) ⇒ String
Replaces the contents of str with the corresponding values in other_str.
-
#insert(index, other_string) ⇒ self
Inserts the given
other_string
intoself
; returnsself
. -
#inspect ⇒ String
Returns a printable version of str, surrounded by quote marks, with special characters escaped.
-
#intern ⇒ Object
Returns the Symbol corresponding to str, creating the symbol if it did not previously exist.
-
#length ⇒ Integer
Returns the count of characters (not bytes) in
self
: “x80u3042”.length # => 2 “hello”.length # => 5. -
#lines(separator = $/, chomp: false) ⇒ Array
Returns an array of lines in str split using the supplied record separator (
$/
by default). -
#ljust(integer, padstr = ' ') ⇒ String
If integer is greater than the length of str, returns a new String of length integer with str left justified and padded with padstr; otherwise, returns str.
-
#lstrip ⇒ String
Returns a copy of the receiver with leading whitespace removed.
-
#lstrip! ⇒ self?
Removes leading whitespace from the receiver.
-
#match(*args) ⇒ Object
Returns a Matchdata object (or
nil
) based onself
and the givenpattern
. -
#match?(pattern, offset = 0) ⇒ Boolean
Returns
true
orfalse
based on whether a match is found forself
andpattern
. -
#succ ⇒ String
Returns the successor to
self
. -
#succ! ⇒ self
Equivalent to String#succ, but modifies
self
in place; returnsself
. -
#oct ⇒ Integer
Treats leading characters of str as a string of octal digits (with an optional sign) and returns the corresponding number.
-
#ord ⇒ Integer
Returns the Integer ordinal of a one-character string.
-
#partition(sep) ⇒ Object
Searches sep or pattern (regexp) in the string and returns the part before it, the match, and the part after it.
-
#prepend(*other_strings) ⇒ String
Returns a new String containing the concatenation of all given
other_strings
andself
: s = ‘foo’ s.prepend(‘bar’, ‘baz’) # => “barbazfoo”. -
#replace(other_str) ⇒ String
Replaces the contents of str with the corresponding values in other_str.
-
#reverse ⇒ String
Returns a new string with the characters from str in reverse order.
-
#reverse! ⇒ String
Reverses str in place.
-
#rindex(*args) ⇒ Object
Returns the Integer index of the last occurrence of the given
substring
, ornil
if none found: ‘foo’.rindex(‘f’) # => 0 ‘foo’.rindex(‘o’) # => 2 ‘foo’.rindex(‘oo’) # => 1 ‘foo’.rindex(‘ooo’) # => nil. -
#rjust(integer, padstr = ' ') ⇒ String
If integer is greater than the length of str, returns a new String of length integer with str right justified and padded with padstr; otherwise, returns str.
-
#rpartition(sep) ⇒ Object
Searches sep or pattern (regexp) in the string from the end of the string, and returns the part before it, the match, and the part after it.
-
#rstrip ⇒ String
Returns a copy of the receiver with trailing whitespace removed.
-
#rstrip! ⇒ self?
Removes trailing whitespace from the receiver.
-
#scan(pat) ⇒ Object
Both forms iterate through str, matching the pattern (which may be a Regexp or a String).
-
#scrub(*args) ⇒ Object
If the string is invalid byte sequence then replace invalid bytes with given replacement character, else returns self.
-
#scrub!(*args) ⇒ Object
If the string is invalid byte sequence then replace invalid bytes with given replacement character, else returns self.
-
#setbyte(index, integer) ⇒ Integer
modifies the indexth byte as integer.
-
#length ⇒ Integer
Returns the count of characters (not bytes) in
self
: “x80u3042”.length # => 2 “hello”.length # => 5. -
#slice(*args) ⇒ Object
Returns the substring of
self
specified by the arguments. -
#slice!(*args) ⇒ Object
Deletes the specified portion from str, and returns the portion deleted.
-
#split(*args) ⇒ Object
Divides str into substrings based on a delimiter, returning an array of these substrings.
-
#squeeze([other_str]) ⇒ String
Builds a set of characters from the other_str parameter(s) using the procedure described for String#count.
-
#squeeze!([other_str]) ⇒ String?
Squeezes str in place, returning either str, or
nil
if no changes were made. -
#start_with?([prefixes]) ⇒ Boolean
Returns true if
str
starts with one of theprefixes
given. -
#strip ⇒ String
Returns a copy of the receiver with leading and trailing whitespace removed.
-
#strip! ⇒ self?
Removes leading and trailing whitespace from the receiver.
-
#sub(*args) ⇒ Object
Returns a copy of
str
with the first occurrence ofpattern
replaced by the second argument. -
#sub!(*args) ⇒ Object
Performs the same substitution as String#sub in-place.
-
#succ ⇒ String
Returns the successor to
self
. -
#succ! ⇒ self
Equivalent to String#succ, but modifies
self
in place; returnsself
. -
#sum(n = 16) ⇒ Integer
Returns a basic n-bit checksum of the characters in str, where n is the optional Integer parameter, defaulting to 16.
-
#swapcase(*args) ⇒ Object
Returns a copy of str with uppercase alphabetic characters converted to lowercase and lowercase characters converted to uppercase.
-
#swapcase!(*args) ⇒ Object
Equivalent to String#swapcase, but modifies the receiver in place, returning str, or
nil
if no changes were made. -
#to_c ⇒ Object
Returns a complex which denotes the string form.
-
#to_f ⇒ Float
Returns the result of interpreting leading characters in str as a floating point number.
-
#to_i(base = 10) ⇒ Integer
Returns the result of interpreting leading characters in str as an integer base base (between 2 and 36).
-
#to_r ⇒ Object
Returns the result of interpreting leading characters in
str
as a rational. -
#to_s ⇒ Object
Returns
self
. -
#to_str ⇒ Object
Returns
self
. -
#to_sym ⇒ Object
Returns the Symbol corresponding to str, creating the symbol if it did not previously exist.
-
#tr(from_str, to_str) ⇒ String
Returns a copy of
str
with the characters infrom_str
replaced by the corresponding characters into_str
. -
#tr!(from_str, to_str) ⇒ String?
Translates str in place, using the same rules as String#tr.
-
#tr_s(from_str, to_str) ⇒ String
Processes a copy of str as described under String#tr, then removes duplicate characters in regions that were affected by the translation.
-
#tr_s!(from_str, to_str) ⇒ String?
Performs String#tr_s processing on str in place, returning str, or
nil
if no changes were made. -
#undump ⇒ String
Returns an unescaped version of the string.
-
#unicode_normalize(form = :nfc) ⇒ Object
Unicode Normalization—Returns a normalized form of
str
, using Unicode normalizations NFC, NFD, NFKC, or NFKD. -
#unicode_normalize!(form = :nfc) ⇒ Object
Destructive version of String#unicode_normalize, doing Unicode normalization in place.
-
#unicode_normalized?(form = :nfc) ⇒ Boolean
Checks whether
str
is in Unicode normalization formform
, which can be any of the four values:nfc
,:nfd
,:nfkc
, or:nfkd
. -
#upcase(*args) ⇒ Object
Returns a copy of str with all lowercase letters replaced with their uppercase counterparts.
-
#upcase!(*args) ⇒ Object
Upcases the contents of str, returning
nil
if no changes were made. -
#upto(*args) ⇒ Object
With a block given, calls the block with each String value returned by successive calls to String#succ; the first value is
self
, the next isself.succ
, and so on; the sequence terminates when valueother_string
is reached; returnsself
: ‘a8’.upto(‘b6’) {|s| print s, ‘ ’ } # => “a8” Output: a8 a9 b0 b1 b2 b3 b4 b5 b6. -
#valid_encoding? ⇒ Boolean
Returns true for a string which is encoded correctly.
Methods included from Comparable
#<, #<=, #>, #>=, #between?, #clamp
Constructor Details
#new(string = '') ⇒ Object #new(string = '', encoding: encoding) ⇒ Object #new(string = '', capacity: size) ⇒ Object
Returns a new String that is a copy of string
.
With no arguments, returns the empty string with the Encoding ASCII-8BIT
:
s = String.new
s # => ""
s.encoding # => #<Encoding:ASCII-8BIT>
With the single String argument string
, returns a copy of string
with the same encoding as string
:
s = String.new("Que veut dire \u{e7}a?")
s # => "Que veut dire \u{e7}a?"
s.encoding # => #<Encoding:UTF-8>
Literal strings like ""
or here-documents always use script encoding, unlike String.new.
With keyword encoding
, returns a copy of str
with the specified encoding:
s = String.new(encoding: 'ASCII')
s.encoding # => #<Encoding:US-ASCII>
s = String.new('foo', encoding: 'ASCII')
s.encoding # => #<Encoding:US-ASCII>
Note that these are equivalent:
s0 = String.new('foo', encoding: 'ASCII')
s1 = 'foo'.force_encoding('ASCII')
s0.encoding == s1.encoding # => true
With keyword capacity
, returns a copy of str
; the given capacity
may set the size of the internal buffer, which may affect performance:
String.new(capacity: 1) # => ""
String.new(capacity: 4096) # => ""
The string
, encoding
, and capacity
arguments may all be used together:
String.new('hello', encoding: 'UTF-8', capacity: 25)
1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 |
# File 'string.c', line 1669
static VALUE
rb_str_init(int argc, VALUE *argv, VALUE str)
{
static ID keyword_ids[2];
VALUE orig, opt, venc, vcapa;
VALUE kwargs[2];
rb_encoding *enc = 0;
int n;
if (!keyword_ids[0]) {
keyword_ids[0] = rb_id_encoding();
CONST_ID(keyword_ids[1], "capacity");
}
n = rb_scan_args(argc, argv, "01:", &orig, &opt);
if (!NIL_P(opt)) {
rb_get_kwargs(opt, keyword_ids, 0, 2, kwargs);
venc = kwargs[0];
vcapa = kwargs[1];
if (venc != Qundef && !NIL_P(venc)) {
enc = rb_to_encoding(venc);
}
if (vcapa != Qundef && !NIL_P(vcapa)) {
long capa = NUM2LONG(vcapa);
long len = 0;
int termlen = enc ? rb_enc_mbminlen(enc) : 1;
if (capa < STR_BUF_MIN_SIZE) {
capa = STR_BUF_MIN_SIZE;
}
if (n == 1) {
StringValue(orig);
len = RSTRING_LEN(orig);
if (capa < len) {
capa = len;
}
if (orig == str) n = 0;
}
str_modifiable(str);
if (STR_EMBED_P(str)) { /* make noembed always */
char *new_ptr = ALLOC_N(char, (size_t)capa + termlen);
memcpy(new_ptr, RSTRING(str)->as.ary, RSTRING_EMBED_LEN_MAX + 1);
RSTRING(str)->as.heap.ptr = new_ptr;
}
else if (FL_TEST(str, STR_SHARED|STR_NOFREE)) {
const size_t size = (size_t)capa + termlen;
const char *const old_ptr = RSTRING_PTR(str);
const size_t osize = RSTRING(str)->as.heap.len + TERM_LEN(str);
char *new_ptr = ALLOC_N(char, (size_t)capa + termlen);
memcpy(new_ptr, old_ptr, osize < size ? osize : size);
FL_UNSET_RAW(str, STR_SHARED);
RSTRING(str)->as.heap.ptr = new_ptr;
}
else if (STR_HEAP_SIZE(str) != (size_t)capa + termlen) {
SIZED_REALLOC_N(RSTRING(str)->as.heap.ptr, char,
(size_t)capa + termlen, STR_HEAP_SIZE(str));
}
RSTRING(str)->as.heap.len = len;
TERM_FILL(&RSTRING(str)->as.heap.ptr[len], termlen);
if (n == 1) {
memcpy(RSTRING(str)->as.heap.ptr, RSTRING_PTR(orig), len);
rb_enc_cr_str_exact_copy(str, orig);
}
FL_SET(str, STR_NOEMBED);
RSTRING(str)->as.heap.aux.capa = capa;
}
else if (n == 1) {
rb_str_replace(str, orig);
}
if (enc) {
rb_enc_associate(str, enc);
ENC_CODERANGE_CLEAR(str);
}
}
else if (n == 1) {
rb_str_replace(str, orig);
}
return str;
}
|
Class Method Details
.try_convert(object) ⇒ Object?
If object
is a String object, returns object
.
Otherwise if object
responds to :to_str
, calls object.to_str
and returns the result.
Returns nil
if object
does not respond to :to_str
Raises an exception unless object.to_str
returns a String object.
2456 2457 2458 2459 2460 |
# File 'string.c', line 2456
static VALUE
rb_str_s_try_convert(VALUE dummy, VALUE str)
{
return rb_check_string_type(str);
}
|
Instance Method Details
#%(object) ⇒ Object
Returns the result of formatting object
into the format specification self
(see Kernel#sprintf for formatting details):
"%05d" % 123 # => "00123"
If self
contains multiple substitutions, object
must be an Array or Hash containing the values to be substituted:
"%-5s: %016x" % [ "ID", self.object_id ] # => "ID : 00002b054ec93168"
"foo = %{foo}" % {foo: 'bar'} # => "foo = bar"
"foo = %{foo}, baz = %{baz}" % {foo: 'bar', baz: 'bat'} # => "foo = bar, baz = bat"
2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 |
# File 'string.c', line 2158
static VALUE
rb_str_format_m(VALUE str, VALUE arg)
{
VALUE tmp = rb_check_array_type(arg);
if (!NIL_P(tmp)) {
return rb_str_format(RARRAY_LENINT(tmp), RARRAY_CONST_PTR(tmp), str);
}
return rb_str_format(1, &arg, str);
}
|
#*(integer) ⇒ Object
Returns a new String containing integer
copies of self
:
"Ho! " * 3 # => "Ho! Ho! Ho! "
"Ho! " * 0 # => ""
2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 |
# File 'string.c', line 2088
VALUE
rb_str_times(VALUE str, VALUE times)
{
VALUE str2;
long n, len;
char *ptr2;
int termlen;
if (times == INT2FIX(1)) {
return str_duplicate(rb_cString, str);
}
if (times == INT2FIX(0)) {
str2 = str_alloc(rb_cString);
rb_enc_copy(str2, str);
return str2;
}
len = NUM2LONG(times);
if (len < 0) {
rb_raise(rb_eArgError, "negative argument");
}
if (RSTRING_LEN(str) == 1 && RSTRING_PTR(str)[0] == 0) {
str2 = str_alloc(rb_cString);
if (!STR_EMBEDDABLE_P(len, 1)) {
RSTRING(str2)->as.heap.aux.capa = len;
RSTRING(str2)->as.heap.ptr = ZALLOC_N(char, (size_t)len + 1);
STR_SET_NOEMBED(str2);
}
STR_SET_LEN(str2, len);
rb_enc_copy(str2, str);
return str2;
}
if (len && LONG_MAX/len < RSTRING_LEN(str)) {
rb_raise(rb_eArgError, "argument too big");
}
len *= RSTRING_LEN(str);
termlen = TERM_LEN(str);
str2 = str_new0(rb_cString, 0, len, termlen);
ptr2 = RSTRING_PTR(str2);
if (len) {
n = RSTRING_LEN(str);
memcpy(ptr2, RSTRING_PTR(str), n);
while (n <= len/2) {
memcpy(ptr2 + n, ptr2, n);
n *= 2;
}
memcpy(ptr2 + n, ptr2, len-n);
}
STR_SET_LEN(str2, len);
TERM_FILL(&ptr2[len], termlen);
rb_enc_cr_str_copy_for_substr(str2, str);
return str2;
}
|
#+(other_string) ⇒ Object
Returns a new String containing other_string
concatenated to self
:
"Hello from " + self.to_s # => "Hello from main"
2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 |
# File 'string.c', line 2018
VALUE
rb_str_plus(VALUE str1, VALUE str2)
{
VALUE str3;
rb_encoding *enc;
char *ptr1, *ptr2, *ptr3;
long len1, len2;
int termlen;
StringValue(str2);
enc = rb_enc_check_str(str1, str2);
RSTRING_GETMEM(str1, ptr1, len1);
RSTRING_GETMEM(str2, ptr2, len2);
termlen = rb_enc_mbminlen(enc);
if (len1 > LONG_MAX - len2) {
rb_raise(rb_eArgError, "string size too big");
}
str3 = str_new0(rb_cString, 0, len1+len2, termlen);
ptr3 = RSTRING_PTR(str3);
memcpy(ptr3, ptr1, len1);
memcpy(ptr3+len1, ptr2, len2);
TERM_FILL(&ptr3[len1+len2], termlen);
ENCODING_CODERANGE_SET(str3, rb_enc_to_index(enc),
ENC_CODERANGE_AND(ENC_CODERANGE(str1), ENC_CODERANGE(str2)));
RB_GC_GUARD(str1);
RB_GC_GUARD(str2);
return str3;
}
|
#+ ⇒ self
Returns self
if self
is not frozen.
Otherwise. returns self.dup
, which is not frozen.
2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 |
# File 'string.c', line 2757
static VALUE
str_uplus(VALUE str)
{
if (OBJ_FROZEN(str)) {
return rb_str_dup(str);
}
else {
return str;
}
}
|
#- ⇒ Object
Returns a frozen, possibly pre-existing copy of the string.
The returned String will be deduplicated as long as it does not have any instance variables set on it.
2777 2778 2779 2780 2781 2782 2783 2784 |
# File 'string.c', line 2777
static VALUE
str_uminus(VALUE str)
{
if (!BARE_STRING_P(str) && !rb_obj_frozen_p(str)) {
str = rb_str_dup(str);
}
return rb_fstring(str);
}
|
#<<(object) ⇒ String
Returns a new String containing the concatenation of self
and object
:
s = 'foo'
s << 'bar' # => "foobar"
If object
is an Integer, the value is considered a codepoint and converted to a character before concatenation:
s = 'foo'
s << 33 # => "foo!"
Related: String#concat, which takes multiple arguments.
3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 |
# File 'string.c', line 3190
VALUE
rb_str_concat(VALUE str1, VALUE str2)
{
unsigned int code;
rb_encoding *enc = STR_ENC_GET(str1);
int encidx;
if (RB_INTEGER_TYPE_P(str2)) {
if (rb_num_to_uint(str2, &code) == 0) {
}
else if (FIXNUM_P(str2)) {
rb_raise(rb_eRangeError, "%ld out of char range", FIX2LONG(str2));
}
else {
rb_raise(rb_eRangeError, "bignum out of char range");
}
}
else {
return rb_str_append(str1, str2);
}
encidx = rb_enc_to_index(enc);
if (encidx == ENCINDEX_ASCII || encidx == ENCINDEX_US_ASCII) {
/* US-ASCII automatically extended to ASCII-8BIT */
char buf[1];
buf[0] = (char)code;
if (code > 0xFF) {
rb_raise(rb_eRangeError, "%u out of char range", code);
}
rb_str_cat(str1, buf, 1);
if (encidx == ENCINDEX_US_ASCII && code > 127) {
rb_enc_associate_index(str1, ENCINDEX_ASCII);
ENC_CODERANGE_SET(str1, ENC_CODERANGE_VALID);
}
}
else {
long pos = RSTRING_LEN(str1);
int cr = ENC_CODERANGE(str1);
int len;
char *buf;
switch (len = rb_enc_codelen(code, enc)) {
case ONIGERR_INVALID_CODE_POINT_VALUE:
rb_raise(rb_eRangeError, "invalid codepoint 0x%X in %s", code, rb_enc_name(enc));
break;
case ONIGERR_TOO_BIG_WIDE_CHAR_VALUE:
case 0:
rb_raise(rb_eRangeError, "%u out of char range", code);
break;
}
buf = ALLOCA_N(char, len + 1);
rb_enc_mbcput(code, buf, enc);
if (rb_enc_precise_mbclen(buf, buf + len + 1, enc) != len) {
rb_raise(rb_eRangeError, "invalid codepoint 0x%X in %s", code, rb_enc_name(enc));
}
rb_str_resize(str1, pos+len);
memcpy(RSTRING_PTR(str1) + pos, buf, len);
if (cr == ENC_CODERANGE_7BIT && code > 127)
cr = ENC_CODERANGE_VALID;
ENC_CODERANGE_SET(str1, cr);
}
return str1;
}
|
#<=>(other_string) ⇒ -1, ...
Compares self
and other_string
, returning:
-
-1 if
other_string
is smaller. -
0 if the two are equal.
-
1 if
other_string
is larger. -
nil
if the two are incomparable.
Examples:
'foo' <=> 'foo' # => 0
'foo' <=> 'food' # => -1
'food' <=> 'foo' # => 1
'FOO' <=> 'foo' # => -1
'foo' <=> 'FOO' # => 1
'foo' <=> 1 # => nil
3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 |
# File 'string.c', line 3451
static VALUE
rb_str_cmp_m(VALUE str1, VALUE str2)
{
int result;
VALUE s = rb_check_string_type(str2);
if (NIL_P(s)) {
return rb_invcmp(str1, str2);
}
result = rb_str_cmp(str1, s);
return INT2FIX(result);
}
|
#==(object) ⇒ Boolean #===(object) ⇒ Boolean
Returns true
if object
has the same length and content; as self
; false
otherwise:
s = 'foo'
s == 'foo' # => true
s == 'food' # => false
s == 'FOO' # => false
Returns false
if the two strings’ encodings are not compatible:
"\u{e4 f6 fc}".encode("ISO-8859-1") == ("\u{c4 d6 dc}") # => false
If object
is not an instance of String but responds to to_str
, then the two strings are compared using object.==
.
3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 |
# File 'string.c', line 3396
VALUE
rb_str_equal(VALUE str1, VALUE str2)
{
if (str1 == str2) return Qtrue;
if (!RB_TYPE_P(str2, T_STRING)) {
if (!rb_respond_to(str2, idTo_str)) {
return Qfalse;
}
return rb_equal(str2, str1);
}
return rb_str_eql_internal(str1, str2);
}
|
#==(object) ⇒ Boolean #===(object) ⇒ Boolean
Returns true
if object
has the same length and content; as self
; false
otherwise:
s = 'foo'
s == 'foo' # => true
s == 'food' # => false
s == 'FOO' # => false
Returns false
if the two strings’ encodings are not compatible:
"\u{e4 f6 fc}".encode("ISO-8859-1") == ("\u{c4 d6 dc}") # => false
If object
is not an instance of String but responds to to_str
, then the two strings are compared using object.==
.
3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 |
# File 'string.c', line 3396
VALUE
rb_str_equal(VALUE str1, VALUE str2)
{
if (str1 == str2) return Qtrue;
if (!RB_TYPE_P(str2, T_STRING)) {
if (!rb_respond_to(str2, idTo_str)) {
return Qfalse;
}
return rb_equal(str2, str1);
}
return rb_str_eql_internal(str1, str2);
}
|
#=~(regexp) ⇒ Integer? #=~(object) ⇒ Integer?
Returns the Integer index of the first substring that matches the given regexp
, or nil
if no match found:
'foo' =~ /f/ # => 0
'foo' =~ /o/ # => 1
'foo' =~ /x/ # => nil
Note: also updates Regexp-related global variables.
If the given object
is not a Regexp, returns the value returned by object =~ self
.
Note that string =~ regexp
is different from regexp =~ string
(see Regexp#=~):
number= nil
"no. 9" =~ /(?<number>\d+)/
number # => nil (not assigned)
/(?<number>\d+)/ =~ "no. 9"
number #=> "9"
3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 |
# File 'string.c', line 3936
static VALUE
rb_str_match(VALUE x, VALUE y)
{
switch (OBJ_BUILTIN_TYPE(y)) {
case T_STRING:
rb_raise(rb_eTypeError, "type mismatch: String given");
case T_REGEXP:
return rb_reg_match(y, x);
default:
return rb_funcall(y, idEqTilde, 1, x);
}
}
|
#[](index) ⇒ nil #[](start, length) ⇒ nil #[](range) ⇒ nil #[](regexp, capture = 0) ⇒ nil #[](substring) ⇒ nil
Returns the substring of self
specified by the arguments.
When the single Integer argument index
is given, returns the 1-character substring found in self
at offset index
:
'bar'[2] # => "r"
Counts backward from the end of self
if index
is negative:
'foo'[-3] # => "f"
Returns nil
if index
is out of range:
'foo'[3] # => nil
'foo'[-4] # => nil
When the two Integer arguments start
and length
are given, returns the substring of the given length
found in self
at offset start
:
'foo'[0, 2] # => "fo"
'foo'[0, 0] # => ""
Counts backward from the end of self
if start
is negative:
'foo'[-2, 2] # => "oo"
Special case: returns a new empty String if start
is equal to the length of self
:
'foo'[3, 2] # => ""
Returns nil
if start
is out of range:
'foo'[4, 2] # => nil
'foo'[-4, 2] # => nil
Returns the trailing substring of self
if length
is large:
'foo'[1, 50] # => "oo"
Returns nil
if length
is negative:
'foo'[0, -1] # => nil
When the single Range argument range
is given, derives start
and length
values from the given range
, and returns values as above:
-
'foo'[0..1]
is equivalent to'foo'[0, 2]
. -
'foo'[0...1]
is equivalent to'foo'[0, 1]
.
When the Regexp argument regexp
is given, and the capture
argument is 0
, returns the first matching substring found in self
, or nil
if none found:
'foo'[/o/] # => "o"
'foo'[/x/] # => nil
s = 'hello there'
s[/[aeiou](.)\1/] # => "ell"
s[/[aeiou](.)\1/, 0] # => "ell"
If argument capture
is given and not 0
, it should be either an Integer capture group index or a String or Symbol capture group name; the method call returns only the specified capture (see Regexp Capturing):
s = 'hello there'
s[/[aeiou](.)\1/, 1] # => "l"
s[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, "non_vowel"] # => "l"
s[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, :vowel] # => "e"
If an invalid capture group index is given, nil
is returned. If an invalid capture group name is given, IndexError
is raised.
When the single String argument substring
is given, returns the substring from self
if found, otherwise nil
:
'foo'['oo'] # => "oo"
'foo'['xx'] # => nil
String#slice is an alias for String#[].
4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 |
# File 'string.c', line 4735
static VALUE
rb_str_aref_m(int argc, VALUE *argv, VALUE str)
{
if (argc == 2) {
if (RB_TYPE_P(argv[0], T_REGEXP)) {
return rb_str_subpat(str, argv[0], argv[1]);
}
else {
long beg = NUM2LONG(argv[0]);
long len = NUM2LONG(argv[1]);
return rb_str_substr(str, beg, len);
}
}
rb_check_arity(argc, 1, 2);
return rb_str_aref(str, argv[0]);
}
|
#[]=(integer) ⇒ Object #[]=(integer, integer) ⇒ Object #[]=(range) ⇒ Object #[]=(regexp) ⇒ Object #[]=(regexp, integer) ⇒ Object #[]=(regexp, name) ⇒ Object #[]=(other_str) ⇒ Object
Element Assignment—Replaces some or all of the content of str. The portion of the string affected is determined using the same criteria as String#[]. If the replacement string is not the same length as the text it is replacing, the string will be adjusted accordingly. If the regular expression or string is used as the index doesn’t match a position in the string, IndexError is raised. If the regular expression form is used, the optional second Integer allows you to specify which portion of the match to replace (effectively using the MatchData indexing rules. The forms that take an Integer will raise an IndexError if the value is out of range; the Range form will raise a RangeError, and the Regexp and String will raise an IndexError on negative match.
4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 |
# File 'string.c', line 4960
static VALUE
rb_str_aset_m(int argc, VALUE *argv, VALUE str)
{
if (argc == 3) {
if (RB_TYPE_P(argv[0], T_REGEXP)) {
rb_str_subpat_set(str, argv[0], argv[1], argv[2]);
}
else {
rb_str_splice(str, NUM2LONG(argv[0]), NUM2LONG(argv[1]), argv[2]);
}
return argv[2];
}
rb_check_arity(argc, 2, 3);
return rb_str_aset(str, argv[0], argv[1]);
}
|
#ascii_only? ⇒ Boolean
Returns true for a string which has only ASCII characters.
"abc".force_encoding("UTF-8").ascii_only? #=> true
"abc\u{6666}".force_encoding("UTF-8").ascii_only? #=> false
10438 10439 10440 10441 10442 10443 10444 |
# File 'string.c', line 10438
static VALUE
rb_str_is_ascii_only_p(VALUE str)
{
int cr = rb_enc_str_coderange(str);
return cr == ENC_CODERANGE_7BIT ? Qtrue : Qfalse;
}
|
#b ⇒ String
Returns a copied string whose encoding is ASCII-8BIT.
10400 10401 10402 10403 10404 10405 10406 10407 |
# File 'string.c', line 10400
static VALUE
rb_str_b(VALUE str)
{
VALUE str2 = str_alloc(rb_cString);
str_replace_shared_without_enc(str2, str);
ENC_CODERANGE_CLEAR(str2);
return str2;
}
|
#bytes ⇒ Array
Returns an array of bytes in str. This is a shorthand for str.each_byte.to_a
.
If a block is given, which is a deprecated form, works the same as each_byte
.
8692 8693 8694 8695 8696 8697 |
# File 'string.c', line 8692
static VALUE
rb_str_bytes(VALUE str)
{
VALUE ary = WANTARRAY("bytes", RSTRING_LEN(str));
return rb_str_enumerate_bytes(str, ary);
}
|
#bytesize ⇒ Integer
Returns the count of bytes in self
:
"\x80\u3042".bytesize # => 4
"hello".bytesize # => 5
Related: String#length.
1986 1987 1988 1989 1990 |
# File 'string.c', line 1986
static VALUE
rb_str_bytesize(VALUE str)
{
return LONG2NUM(RSTRING_LEN(str));
}
|
#byteslice(integer) ⇒ String? #byteslice(integer, integer) ⇒ String? #byteslice(range) ⇒ String?
Byte Reference—If passed a single Integer, returns a substring of one byte at that position. If passed two Integer objects, returns a substring starting at the offset given by the first, and a length given by the second. If given a Range, a substring containing bytes at offsets given by the range is returned. In all three cases, if an offset is negative, it is counted from the end of str. Returns nil
if the initial offset falls outside the string, the length is negative, or the beginning of the range is greater than the end. The encoding of the resulted string keeps original encoding.
"hello".byteslice(1) #=> "e"
"hello".byteslice(-1) #=> "o"
"hello".byteslice(1, 2) #=> "el"
"\x80\u3042".byteslice(1, 3) #=> "\u3042"
"\x03\u3042\xff".byteslice(1..3) #=> "\u3042"
5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 |
# File 'string.c', line 5823
static VALUE
rb_str_byteslice(int argc, VALUE *argv, VALUE str)
{
if (argc == 2) {
long beg = NUM2LONG(argv[0]);
long end = NUM2LONG(argv[1]);
return str_byte_substr(str, beg, end, TRUE);
}
rb_check_arity(argc, 1, 2);
return str_byte_aref(str, argv[0]);
}
|
#capitalize ⇒ String #capitalize([options]) ⇒ String
Returns a copy of str with the first character converted to uppercase and the remainder to lowercase.
See String#downcase for meaning of options
and use with different encodings.
"hello".capitalize #=> "Hello"
"HELLO".capitalize #=> "Hello"
"123ABC".capitalize #=> "123abc"
7127 7128 7129 7130 7131 7132 7133 7134 7135 7136 7137 7138 7139 7140 7141 7142 7143 7144 7145 |
# File 'string.c', line 7127
static VALUE
rb_str_capitalize(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_TITLECASE;
VALUE ret;
flags = check_case_options(argc, argv, flags);
enc = str_true_enc(str);
if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return str;
if (flags&ONIGENC_CASE_ASCII_ONLY) {
ret = rb_str_new(0, RSTRING_LEN(str));
rb_str_ascii_casemap(str, ret, &flags, enc);
}
else {
ret = rb_str_casemap(str, &flags, enc);
}
return ret;
}
|
#capitalize! ⇒ String? #capitalize!([options]) ⇒ String?
Modifies str by converting the first character to uppercase and the remainder to lowercase. Returns nil
if no changes are made. There is an exception for modern Georgian (mkhedruli/MTAVRULI), where the result is the same as for String#downcase, to avoid mixed case.
See String#downcase for meaning of options
and use with different encodings.
a = "hello"
a.capitalize! #=> "Hello"
a #=> "Hello"
a.capitalize! #=> nil
7092 7093 7094 7095 7096 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 |
# File 'string.c', line 7092
static VALUE
rb_str_capitalize_bang(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_TITLECASE;
flags = check_case_options(argc, argv, flags);
str_modify_keep_cr(str);
enc = str_true_enc(str);
if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return Qnil;
if (flags&ONIGENC_CASE_ASCII_ONLY)
rb_str_ascii_casemap(str, str, &flags, enc);
else
str_shared_replace(str, rb_str_casemap(str, &flags, enc));
if (ONIGENC_CASE_MODIFIED&flags) return str;
return Qnil;
}
|
#casecmp(other_str) ⇒ -1, ...
Compares self
and other_string
, ignoring case, and returning:
-
-1 if
other_string
is smaller. -
0 if the two are equal.
-
1 if
other_string
is larger. -
nil
if the two are incomparable.
Examples:
'foo'.casecmp('foo') # => 0
'foo'.casecmp('food') # => -1
'food'.casecmp('foo') # => 1
'FOO'.casecmp('foo') # => 0
'foo'.casecmp('FOO') # => 0
'foo'.casecmp(1) # => nil
3485 3486 3487 3488 3489 3490 3491 3492 3493 |
# File 'string.c', line 3485
static VALUE
rb_str_casecmp(VALUE str1, VALUE str2)
{
VALUE s = rb_check_string_type(str2);
if (NIL_P(s)) {
return Qnil;
}
return str_casecmp(str1, s);
}
|
#casecmp?(other_string) ⇒ true, ...
Returns true
if self
and other_string
are equal after Unicode case folding, otherwise false
:
'foo'.casecmp?('foo') # => true
'foo'.casecmp?('food') # => false
'food'.casecmp?('foo') # => true
'FOO'.casecmp?('foo') # => true
'foo'.casecmp?('FOO') # => true
Returns nil
if the two values are incomparable:
'foo'.casecmp?(1) # => nil
3568 3569 3570 3571 3572 3573 3574 3575 3576 |
# File 'string.c', line 3568
static VALUE
rb_str_casecmp_p(VALUE str1, VALUE str2)
{
VALUE s = rb_check_string_type(str2);
if (NIL_P(s)) {
return Qnil;
}
return str_casecmp_p(str1, s);
}
|
#center(width, padstr = ' ') ⇒ String
Centers str
in width
. If width
is greater than the length of str
, returns a new String of length width
with str
centered and padded with padstr
; otherwise, returns str
.
"hello".center(4) #=> "hello"
"hello".center(20) #=> " hello "
"hello".center(20, '123') #=> "1231231hello12312312"
10020 10021 10022 10023 10024 |
# File 'string.c', line 10020
static VALUE
rb_str_center(int argc, VALUE *argv, VALUE str)
{
return rb_str_justify(argc, argv, str, 'c');
}
|
#chars ⇒ Array
Returns an array of characters in str. This is a shorthand for str.each_char.to_a
.
If a block is given, which is a deprecated form, works the same as each_char
.
8770 8771 8772 8773 8774 8775 |
# File 'string.c', line 8770
static VALUE
rb_str_chars(VALUE str)
{
VALUE ary = WANTARRAY("chars", rb_str_strlen(str));
return rb_str_enumerate_chars(str, ary);
}
|
#chomp(separator = $/) ⇒ String
Returns a new String with the given record separator removed from the end of str (if present). If $/
has not been changed from the default Ruby record separator, then chomp
also removes carriage return characters (that is it will remove \n
, \r
, and \r\n
). If $/
is an empty string, it will remove all trailing newlines from the string.
"hello".chomp #=> "hello"
"hello\n".chomp #=> "hello"
"hello\r\n".chomp #=> "hello"
"hello\n\r".chomp #=> "hello\n"
"hello\r".chomp #=> "hello"
"hello \n there".chomp #=> "hello \n there"
"hello".chomp("llo") #=> "he"
"hello\r\n\r\n".chomp('') #=> "hello"
"hello\r\n\r\r\n".chomp('') #=> "hello\r\n\r"
9251 9252 9253 9254 9255 9256 9257 |
# File 'string.c', line 9251
static VALUE
rb_str_chomp(int argc, VALUE *argv, VALUE str)
{
VALUE rs = chomp_rs(argc, argv);
if (NIL_P(rs)) return str_duplicate(rb_cString, str);
return rb_str_subseq(str, 0, chompped_length(str, rs));
}
|
#chomp!(separator = $/) ⇒ String?
Modifies str in place as described for String#chomp, returning str, or nil
if no modifications were made.
9217 9218 9219 9220 9221 9222 9223 9224 9225 9226 |
# File 'string.c', line 9217
static VALUE
rb_str_chomp_bang(int argc, VALUE *argv, VALUE str)
{
VALUE rs;
str_modifiable(str);
if (RSTRING_LEN(str) == 0) return Qnil;
rs = chomp_rs(argc, argv);
if (NIL_P(rs)) return Qnil;
return rb_str_chomp_string(str, rs);
}
|
#chop ⇒ String
Returns a new String with the last character removed. If the string ends with \r\n
, both characters are removed. Applying chop
to an empty string returns an empty string. String#chomp is often a safer alternative, as it leaves the string unchanged if it doesn’t end in a record separator.
"string\r\n".chop #=> "string"
"string\n\r".chop #=> "string\n"
"string\n".chop #=> "string"
"string".chop #=> "strin"
"x".chop.chop #=> ""
9064 9065 9066 9067 9068 |
# File 'string.c', line 9064
static VALUE
rb_str_chop(VALUE str)
{
return rb_str_subseq(str, 0, chopped_length(str));
}
|
#chop! ⇒ String?
Processes str as for String#chop, returning str, or nil
if str is the empty string. See also String#chomp!.
9028 9029 9030 9031 9032 9033 9034 9035 9036 9037 9038 9039 9040 9041 9042 9043 |
# File 'string.c', line 9028
static VALUE
rb_str_chop_bang(VALUE str)
{
str_modify_keep_cr(str);
if (RSTRING_LEN(str) > 0) {
long len;
len = chopped_length(str);
STR_SET_LEN(str, len);
TERM_FILL(&RSTRING_PTR(str)[len], TERM_LEN(str));
if (ENC_CODERANGE(str) != ENC_CODERANGE_7BIT) {
ENC_CODERANGE_CLEAR(str);
}
return str;
}
return Qnil;
}
|
#chr ⇒ String
Returns a one-character string at the beginning of the string.
a = "abcde"
a.chr #=> "a"
5634 5635 5636 5637 5638 |
# File 'string.c', line 5634
static VALUE
rb_str_chr(VALUE str)
{
return rb_str_substr(str, 0, 1);
}
|
#clear ⇒ String
Makes string empty.
a = "abcde"
a.clear #=> ""
5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 |
# File 'string.c', line 5610
static VALUE
rb_str_clear(VALUE str)
{
str_discard(str);
STR_SET_EMBED(str);
STR_SET_EMBED_LEN(str, 0);
RSTRING_PTR(str)[0] = 0;
if (rb_enc_asciicompat(STR_ENC_GET(str)))
ENC_CODERANGE_SET(str, ENC_CODERANGE_7BIT);
else
ENC_CODERANGE_SET(str, ENC_CODERANGE_VALID);
return str;
}
|
#codepoints ⇒ Array
Returns an array of the Integer ordinals of the characters in str. This is a shorthand for str.each_codepoint.to_a
.
If a block is given, which is a deprecated form, works the same as each_codepoint
.
8845 8846 8847 8848 8849 8850 |
# File 'string.c', line 8845
static VALUE
rb_str_codepoints(VALUE str)
{
VALUE ary = WANTARRAY("codepoints", rb_str_strlen(str));
return rb_str_enumerate_codepoints(str, ary);
}
|
#concat(*objects) ⇒ Object
Returns a new String containing the concatenation of self
and all objects in objects
:
s = 'foo'
s.concat('bar', 'baz') # => "foobarbaz"
For each given object object
that is an Integer, the value is considered a codepoint and converted to a character before concatenation:
s = 'foo'
s.concat(32, 'bar', 32, 'baz') # => "foo bar baz"
Related: String#<<, which takes a single argument.
3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 |
# File 'string.c', line 3153
static VALUE
rb_str_concat_multi(int argc, VALUE *argv, VALUE str)
{
str_modifiable(str);
if (argc == 1) {
return rb_str_concat(str, argv[0]);
}
else if (argc > 1) {
int i;
VALUE arg_str = rb_str_tmp_new(0);
rb_enc_copy(arg_str, str);
for (i = 0; i < argc; i++) {
rb_str_concat(arg_str, argv[i]);
}
rb_str_buf_append(str, arg_str);
}
return str;
}
|
#count([other_str]) ⇒ Integer
Each other_str
parameter defines a set of characters to count. The intersection of these sets defines the characters to count in str
. Any other_str
that starts with a caret ^
is negated. The sequence c1-c2
means all characters between c1 and c2. The backslash character \
can be used to escape ^
or -
and is otherwise ignored unless it appears at the end of a sequence or the end of a other_str
.
a = "hello world"
a.count "lo" #=> 5
a.count "lo", "o" #=> 2
a.count "hello", "^l" #=> 4
a.count "ej-m" #=> 4
"hello^world".count "\\^aeiou" #=> 4
"hello-world".count "a\\-eo" #=> 4
c = "hello world\\r\\n"
c.count "\\" #=> 2
c.count "\\A" #=> 0
c.count "X-\\w" #=> 3
7936 7937 7938 7939 7940 7941 7942 7943 7944 7945 7946 7947 7948 7949 7950 7951 7952 7953 7954 7955 7956 7957 7958 7959 7960 7961 7962 7963 7964 7965 7966 7967 7968 7969 7970 7971 7972 7973 7974 7975 7976 7977 7978 7979 7980 7981 7982 7983 7984 7985 7986 7987 7988 7989 7990 7991 7992 7993 7994 7995 7996 7997 7998 7999 8000 8001 8002 8003 8004 |
# File 'string.c', line 7936
static VALUE
rb_str_count(int argc, VALUE *argv, VALUE str)
{
char table[TR_TABLE_SIZE];
rb_encoding *enc = 0;
VALUE del = 0, nodel = 0, tstr;
char *s, *send;
int i;
int ascompat;
rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS);
tstr = argv[0];
StringValue(tstr);
enc = rb_enc_check(str, tstr);
if (argc == 1) {
const char *ptstr;
if (RSTRING_LEN(tstr) == 1 && rb_enc_asciicompat(enc) &&
(ptstr = RSTRING_PTR(tstr),
ONIGENC_IS_ALLOWED_REVERSE_MATCH(enc, (const unsigned char *)ptstr, (const unsigned char *)ptstr+1)) &&
!is_broken_string(str)) {
int n = 0;
int clen;
unsigned char c = rb_enc_codepoint_len(ptstr, ptstr+1, &clen, enc);
s = RSTRING_PTR(str);
if (!s || RSTRING_LEN(str) == 0) return INT2FIX(0);
send = RSTRING_END(str);
while (s < send) {
if (*(unsigned char*)s++ == c) n++;
}
return INT2NUM(n);
}
}
tr_setup_table(tstr, table, TRUE, &del, &nodel, enc);
for (i=1; i<argc; i++) {
tstr = argv[i];
StringValue(tstr);
enc = rb_enc_check(str, tstr);
tr_setup_table(tstr, table, FALSE, &del, &nodel, enc);
}
s = RSTRING_PTR(str);
if (!s || RSTRING_LEN(str) == 0) return INT2FIX(0);
send = RSTRING_END(str);
ascompat = rb_enc_asciicompat(enc);
i = 0;
while (s < send) {
unsigned int c;
if (ascompat && (c = *(unsigned char*)s) < 0x80) {
if (table[c]) {
i++;
}
s++;
}
else {
int clen;
c = rb_enc_codepoint_len(s, send, &clen, enc);
if (tr_find(c, table, del, nodel)) {
i++;
}
s += clen;
}
}
return INT2NUM(i);
}
|
#crypt(salt_str) ⇒ String
Returns the string generated by calling crypt(3)
standard library function with str
and salt_str
, in this order, as its arguments. Please do not use this method any longer. It is legacy; provided only for backward compatibility with ruby scripts in earlier days. It is bad to use in contemporary programs for several reasons:
-
Behaviour of C’s
crypt(3)
depends on the OS it is run. The generated string lacks data portability. -
On some OSes such as Mac OS,
crypt(3)
never fails (i.e. silently ends up in unexpected results). -
On some OSes such as Mac OS,
crypt(3)
is not thread safe. -
So-called “traditional” usage of
crypt(3)
is very very very weak. According to its manpage, Linux’s traditionalcrypt(3)
output has only 2**56 variations; too easy to brute force today. And this is the default behaviour. -
In order to make things robust some OSes implement so-called “modular” usage. To go through, you have to do a complex build-up of the
salt_str
parameter, by hand. Failure in generation of a proper salt string tends not to yield any errors; typos in parameters are normally not detectable.-
For instance, in the following example, the second invocation of String#crypt is wrong; it has a typo in “round=” (lacks “s”). However the call does not fail and something unexpected is generated.
"foo".crypt("$5$rounds=1000$salt$") # OK, proper usage "foo".crypt("$5$round=1000$salt$") # Typo not detected
-
-
Even in the “modular” mode, some hash functions are considered archaic and no longer recommended at all; for instance module
$1$
is officially abandoned by its author: see phk.freebsd.dk/sagas/md5crypt_eol.html . For another instance module$3$
is considered completely broken: see the manpage of FreeBSD. -
On some OS such as Mac OS, there is no modular mode. Yet, as written above,
crypt(3)
on Mac OS never fails. This means even if you build up a proper salt string it generates a traditional DES hash anyways, and there is no way for you to be aware of."foo".crypt("$5$rounds=1000$salt$") # => "$5fNPQMxC5j6."
If for some reason you cannot migrate to other secure contemporary password hashing algorithms, install the string-crypt gem and require 'string/crypt'
to continue using it.
9733 9734 9735 9736 9737 9738 9739 9740 9741 9742 9743 9744 9745 9746 9747 9748 9749 9750 9751 9752 9753 9754 9755 9756 9757 9758 9759 9760 9761 9762 9763 9764 9765 9766 9767 9768 9769 9770 9771 9772 9773 9774 9775 9776 9777 9778 9779 9780 9781 9782 9783 9784 9785 9786 9787 9788 9789 9790 |
# File 'string.c', line 9733
static VALUE
rb_str_crypt(VALUE str, VALUE salt)
{
#ifdef HAVE_CRYPT_R
VALUE databuf;
struct crypt_data *data;
# define CRYPT_END() ALLOCV_END(databuf)
#else
extern char *crypt(const char *, const char *);
# define CRYPT_END() (void)0
#endif
VALUE result;
const char *s, *saltp;
char *res;
#ifdef BROKEN_CRYPT
char salt_8bit_clean[3];
#endif
StringValue(salt);
mustnot_wchar(str);
mustnot_wchar(salt);
if (RSTRING_LEN(salt) < 2) {
goto short_salt;
}
s = StringValueCStr(str);
saltp = RSTRING_PTR(salt);
if (!saltp[0] || !saltp[1]) goto short_salt;
#ifdef BROKEN_CRYPT
if (!ISASCII((unsigned char)saltp[0]) || !ISASCII((unsigned char)saltp[1])) {
salt_8bit_clean[0] = saltp[0] & 0x7f;
salt_8bit_clean[1] = saltp[1] & 0x7f;
salt_8bit_clean[2] = '\0';
saltp = salt_8bit_clean;
}
#endif
#ifdef HAVE_CRYPT_R
data = ALLOCV(databuf, sizeof(struct crypt_data));
# ifdef HAVE_STRUCT_CRYPT_DATA_INITIALIZED
data->initialized = 0;
# endif
res = crypt_r(s, saltp, data);
#else
res = crypt(s, saltp);
#endif
if (!res) {
int err = errno;
CRYPT_END();
rb_syserr_fail(err, "crypt");
}
result = rb_str_new_cstr(res);
CRYPT_END();
return result;
short_salt:
rb_raise(rb_eArgError, "salt too short (need >=2 bytes)");
UNREACHABLE_RETURN(Qundef);
}
|
#delete([other_str]) ⇒ String
Returns a copy of str with all characters in the intersection of its arguments deleted. Uses the same rules for building the set of characters as String#count.
"hello".delete "l","lo" #=> "heo"
"hello".delete "lo" #=> "he"
"hello".delete "aeiou", "^e" #=> "hell"
"hello".delete "ej-m" #=> "ho"
7755 7756 7757 7758 7759 7760 7761 |
# File 'string.c', line 7755
static VALUE
rb_str_delete(int argc, VALUE *argv, VALUE str)
{
str = str_duplicate(rb_cString, str);
rb_str_delete_bang(argc, argv, str);
return str;
}
|
#delete!([other_str]) ⇒ String?
Performs a delete
operation in place, returning str, or nil
if str was not modified.
7679 7680 7681 7682 7683 7684 7685 7686 7687 7688 7689 7690 7691 7692 7693 7694 7695 7696 7697 7698 7699 7700 7701 7702 7703 7704 7705 7706 7707 7708 7709 7710 7711 7712 7713 7714 7715 7716 7717 7718 7719 7720 7721 7722 7723 7724 7725 7726 7727 7728 7729 7730 7731 7732 7733 7734 7735 7736 7737 7738 |
# File 'string.c', line 7679
static VALUE
rb_str_delete_bang(int argc, VALUE *argv, VALUE str)
{
char squeez[TR_TABLE_SIZE];
rb_encoding *enc = 0;
char *s, *send, *t;
VALUE del = 0, nodel = 0;
int modify = 0;
int i, ascompat, cr;
if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return Qnil;
rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS);
for (i=0; i<argc; i++) {
VALUE s = argv[i];
StringValue(s);
enc = rb_enc_check(str, s);
tr_setup_table(s, squeez, i==0, &del, &nodel, enc);
}
str_modify_keep_cr(str);
ascompat = rb_enc_asciicompat(enc);
s = t = RSTRING_PTR(str);
send = RSTRING_END(str);
cr = ascompat ? ENC_CODERANGE_7BIT : ENC_CODERANGE_VALID;
while (s < send) {
unsigned int c;
int clen;
if (ascompat && (c = *(unsigned char*)s) < 0x80) {
if (squeez[c]) {
modify = 1;
}
else {
if (t != s) *t = c;
t++;
}
s++;
}
else {
c = rb_enc_codepoint_len(s, send, &clen, enc);
if (tr_find(c, squeez, del, nodel)) {
modify = 1;
}
else {
if (t != s) rb_enc_mbcput(c, t, enc);
t += clen;
if (cr == ENC_CODERANGE_7BIT) cr = ENC_CODERANGE_VALID;
}
s += clen;
}
}
TERM_FILL(t, TERM_LEN(str));
STR_SET_LEN(str, t - RSTRING_PTR(str));
ENC_CODERANGE_SET(str, cr);
if (modify) return str;
return Qnil;
}
|
#delete_prefix(prefix) ⇒ String
Returns a copy of str with leading prefix
deleted.
"hello".delete_prefix("hel") #=> "lo"
"hello".delete_prefix("llo") #=> "hello"
10255 10256 10257 10258 10259 10260 10261 10262 10263 10264 |
# File 'string.c', line 10255
static VALUE
rb_str_delete_prefix(VALUE str, VALUE prefix)
{
long prefixlen;
prefixlen = deleted_prefix_length(str, prefix);
if (prefixlen <= 0) return str_duplicate(rb_cString, str);
return rb_str_subseq(str, prefixlen, RSTRING_LEN(str) - prefixlen);
}
|
#delete_prefix!(prefix) ⇒ self?
Deletes leading prefix
from str, returning nil
if no change was made.
"hello".delete_prefix!("hel") #=> "lo"
"hello".delete_prefix!("llo") #=> nil
10233 10234 10235 10236 10237 10238 10239 10240 10241 10242 10243 |
# File 'string.c', line 10233
static VALUE
rb_str_delete_prefix_bang(VALUE str, VALUE prefix)
{
long prefixlen;
str_modify_keep_cr(str);
prefixlen = deleted_prefix_length(str, prefix);
if (prefixlen <= 0) return Qnil;
return rb_str_drop_bytes(str, prefixlen);
}
|
#delete_suffix(suffix) ⇒ String
Returns a copy of str with trailing suffix
deleted.
"hello".delete_suffix("llo") #=> "he"
"hello".delete_suffix("hel") #=> "hello"
10341 10342 10343 10344 10345 10346 10347 10348 10349 10350 |
# File 'string.c', line 10341
static VALUE
rb_str_delete_suffix(VALUE str, VALUE suffix)
{
long suffixlen;
suffixlen = deleted_suffix_length(str, suffix);
if (suffixlen <= 0) return str_duplicate(rb_cString, str);
return rb_str_subseq(str, 0, RSTRING_LEN(str) - suffixlen);
}
|
#delete_suffix!(suffix) ⇒ self?
Deletes trailing suffix
from str, returning nil
if no change was made.
"hello".delete_suffix!("llo") #=> "he"
"hello".delete_suffix!("hel") #=> nil
10311 10312 10313 10314 10315 10316 10317 10318 10319 10320 10321 10322 10323 10324 10325 10326 10327 10328 10329 |
# File 'string.c', line 10311
static VALUE
rb_str_delete_suffix_bang(VALUE str, VALUE suffix)
{
long olen, suffixlen, len;
str_modifiable(str);
suffixlen = deleted_suffix_length(str, suffix);
if (suffixlen <= 0) return Qnil;
olen = RSTRING_LEN(str);
str_modify_keep_cr(str);
len = olen - suffixlen;
STR_SET_LEN(str, len);
TERM_FILL(&RSTRING_PTR(str)[len], TERM_LEN(str));
if (ENC_CODERANGE(str) != ENC_CODERANGE_7BIT) {
ENC_CODERANGE_CLEAR(str);
}
return str;
}
|
#downcase ⇒ String #downcase([options]) ⇒ String
Returns a copy of str with all uppercase letters replaced with their lowercase counterparts. Which letters exactly are replaced, and by which other letters, depends on the presence or absence of options, and on the encoding
of the string.
The meaning of the options
is as follows:
- No option
-
Full Unicode case mapping, suitable for most languages (see :turkic and :lithuanian options below for exceptions). Context-dependent case mapping as described in Table 3-14 of the Unicode standard is currently not supported.
- :ascii
-
Only the ASCII region, i.e. the characters “A” to “Z” and “a” to “z”, are affected. This option cannot be combined with any other option.
- :turkic
-
Full Unicode case mapping, adapted for Turkic languages (Turkish, Azerbaijani, …). This means that upper case I is mapped to lower case dotless i, and so on.
- :lithuanian
-
Currently, just full Unicode case mapping. In the future, full Unicode case mapping adapted for Lithuanian (keeping the dot on the lower case i even if there is an accent on top).
- :fold
-
Only available on
downcase
anddowncase!
. Unicode case folding, which is more far-reaching than Unicode case mapping. This option currently cannot be combined with any other option (i.e. there is currently no variant for turkic languages).
Please note that several assumptions that are valid for ASCII-only case conversions do not hold for more general case conversions. For example, the length of the result may not be the same as the length of the input (neither in characters nor in bytes), some roundtrip assumptions (e.g. str.downcase == str.upcase.downcase) may not apply, and Unicode normalization (i.e. String#unicode_normalize) is not necessarily maintained by case mapping operations.
Non-ASCII case mapping/folding is currently supported for UTF-8, UTF-16BE/LE, UTF-32BE/LE, and ISO-8859-1~16 Strings/Symbols. This support will be extended to other encodings.
"hEllO".downcase #=> "hello"
7048 7049 7050 7051 7052 7053 7054 7055 7056 7057 7058 7059 7060 7061 7062 7063 7064 7065 7066 7067 7068 7069 7070 7071 |
# File 'string.c', line 7048
static VALUE
rb_str_downcase(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_DOWNCASE;
VALUE ret;
flags = check_case_options(argc, argv, flags);
enc = str_true_enc(str);
if (case_option_single_p(flags, enc, str)) {
ret = rb_str_new(RSTRING_PTR(str), RSTRING_LEN(str));
str_enc_copy(ret, str);
downcase_single(ret);
}
else if (flags&ONIGENC_CASE_ASCII_ONLY) {
ret = rb_str_new(0, RSTRING_LEN(str));
rb_str_ascii_casemap(str, ret, &flags, enc);
}
else {
ret = rb_str_casemap(str, &flags, enc);
}
return ret;
}
|
#downcase! ⇒ String? #downcase!([options]) ⇒ String?
Downcases the contents of str, returning nil
if no changes were made.
See String#downcase for meaning of options
and use with different encodings.
6975 6976 6977 6978 6979 6980 6981 6982 6983 6984 6985 6986 6987 6988 6989 6990 6991 6992 6993 6994 6995 |
# File 'string.c', line 6975
static VALUE
rb_str_downcase_bang(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_DOWNCASE;
flags = check_case_options(argc, argv, flags);
str_modify_keep_cr(str);
enc = str_true_enc(str);
if (case_option_single_p(flags, enc, str)) {
if (downcase_single(str))
flags |= ONIGENC_CASE_MODIFIED;
}
else if (flags&ONIGENC_CASE_ASCII_ONLY)
rb_str_ascii_casemap(str, str, &flags, enc);
else
str_shared_replace(str, rb_str_casemap(str, &flags, enc));
if (ONIGENC_CASE_MODIFIED&flags) return str;
return Qnil;
}
|
#dump ⇒ String
Returns a quoted version of the string with all non-printing characters replaced by \xHH
notation and all special characters escaped.
This method can be used for round-trip: if the resulting new_str
is eval’ed, it will produce the original string.
"hello \n ''".dump #=> "\"hello \\n ''\""
"\f\x00\xff\\\"".dump #=> "\"\\f\\x00\\xFF\\\\\\\"\""
See also String#undump.
6268 6269 6270 6271 6272 6273 6274 6275 6276 6277 6278 6279 6280 6281 6282 6283 6284 6285 6286 6287 6288 6289 6290 6291 6292 6293 6294 6295 6296 6297 6298 6299 6300 6301 6302 6303 6304 6305 6306 6307 6308 6309 6310 6311 6312 6313 6314 6315 6316 6317 6318 6319 6320 6321 6322 6323 6324 6325 6326 6327 6328 6329 6330 6331 6332 6333 6334 6335 6336 6337 6338 6339 6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353 6354 6355 6356 6357 6358 6359 6360 6361 6362 6363 6364 6365 6366 6367 6368 6369 6370 6371 6372 6373 6374 6375 6376 6377 6378 6379 6380 6381 6382 6383 6384 6385 6386 6387 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 6398 6399 6400 6401 6402 6403 6404 6405 6406 6407 6408 6409 6410 6411 6412 6413 |
# File 'string.c', line 6268
VALUE
rb_str_dump(VALUE str)
{
int encidx = rb_enc_get_index(str);
rb_encoding *enc = rb_enc_from_index(encidx);
long len;
const char *p, *pend;
char *q, *qend;
VALUE result;
int u8 = (encidx == rb_utf8_encindex());
static const char nonascii_suffix[] = ".dup.force_encoding(\"%s\")";
len = 2; /* "" */
if (!rb_enc_asciicompat(enc)) {
len += strlen(nonascii_suffix) - rb_strlen_lit("%s");
len += strlen(enc->name);
}
p = RSTRING_PTR(str); pend = p + RSTRING_LEN(str);
while (p < pend) {
int clen;
unsigned char c = *p++;
switch (c) {
case '"': case '\\':
case '\n': case '\r':
case '\t': case '\f':
case '\013': case '\010': case '\007': case '\033':
clen = 2;
break;
case '#':
clen = IS_EVSTR(p, pend) ? 2 : 1;
break;
default:
if (ISPRINT(c)) {
clen = 1;
}
else {
if (u8 && c > 0x7F) { /* \u notation */
int n = rb_enc_precise_mbclen(p-1, pend, enc);
if (MBCLEN_CHARFOUND_P(n)) {
unsigned int cc = rb_enc_mbc_to_codepoint(p-1, pend, enc);
if (cc <= 0xFFFF)
clen = 6; /* \uXXXX */
else if (cc <= 0xFFFFF)
clen = 9; /* \u{XXXXX} */
else
clen = 10; /* \u{XXXXXX} */
p += MBCLEN_CHARFOUND_LEN(n)-1;
break;
}
}
clen = 4; /* \xNN */
}
break;
}
if (clen > LONG_MAX - len) {
rb_raise(rb_eRuntimeError, "string size too big");
}
len += clen;
}
result = rb_str_new(0, len);
p = RSTRING_PTR(str); pend = p + RSTRING_LEN(str);
q = RSTRING_PTR(result); qend = q + len + 1;
*q++ = '"';
while (p < pend) {
unsigned char c = *p++;
if (c == '"' || c == '\\') {
*q++ = '\\';
*q++ = c;
}
else if (c == '#') {
if (IS_EVSTR(p, pend)) *q++ = '\\';
*q++ = '#';
}
else if (c == '\n') {
*q++ = '\\';
*q++ = 'n';
}
else if (c == '\r') {
*q++ = '\\';
*q++ = 'r';
}
else if (c == '\t') {
*q++ = '\\';
*q++ = 't';
}
else if (c == '\f') {
*q++ = '\\';
*q++ = 'f';
}
else if (c == '\013') {
*q++ = '\\';
*q++ = 'v';
}
else if (c == '\010') {
*q++ = '\\';
*q++ = 'b';
}
else if (c == '\007') {
*q++ = '\\';
*q++ = 'a';
}
else if (c == '\033') {
*q++ = '\\';
*q++ = 'e';
}
else if (ISPRINT(c)) {
*q++ = c;
}
else {
*q++ = '\\';
if (u8) {
int n = rb_enc_precise_mbclen(p-1, pend, enc) - 1;
if (MBCLEN_CHARFOUND_P(n)) {
int cc = rb_enc_mbc_to_codepoint(p-1, pend, enc);
p += n;
if (cc <= 0xFFFF)
snprintf(q, qend-q, "u%04X", cc); /* \uXXXX */
else
snprintf(q, qend-q, "u{%X}", cc); /* \u{XXXXX} or \u{XXXXXX} */
q += strlen(q);
continue;
}
}
snprintf(q, qend-q, "x%02X", c);
q += 3;
}
}
*q++ = '"';
*q = '\0';
if (!rb_enc_asciicompat(enc)) {
snprintf(q, qend-q, nonascii_suffix, enc->name);
encidx = rb_ascii8bit_encindex();
}
/* result from dump is ASCII */
rb_enc_associate_index(result, encidx);
ENC_CODERANGE_SET(result, ENC_CODERANGE_7BIT);
return result;
}
|
#each_byte {|integer| ... } ⇒ String #each_byte ⇒ Object
Passes each byte in str to the given block, or returns an enumerator if no block is given.
"hello".each_byte {|c| print c, ' ' }
produces:
104 101 108 108 111
8674 8675 8676 8677 8678 8679 |
# File 'string.c', line 8674
static VALUE
rb_str_each_byte(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_byte_size);
return rb_str_enumerate_bytes(str, 0);
}
|
#each_char {|cstr| ... } ⇒ String #each_char ⇒ Object
Passes each character in str to the given block, or returns an enumerator if no block is given.
"hello".each_char {|c| print c, ' ' }
produces:
h e l l o
8752 8753 8754 8755 8756 8757 |
# File 'string.c', line 8752
static VALUE
rb_str_each_char(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
return rb_str_enumerate_chars(str, 0);
}
|
#each_codepoint {|integer| ... } ⇒ String #each_codepoint ⇒ Object
Passes the Integer ordinal of each character in str, also known as a codepoint when applied to Unicode strings to the given block. For encodings other than UTF-8/UTF-16(BE|LE)/UTF-32(BE|LE), values are directly derived from the binary representation of each character.
If no block is given, an enumerator is returned instead.
"hello\u0639".each_codepoint {|c| print c, ' ' }
produces:
104 101 108 108 111 1593
8826 8827 8828 8829 8830 8831 |
# File 'string.c', line 8826
static VALUE
rb_str_each_codepoint(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
return rb_str_enumerate_codepoints(str, 0);
}
|
#each_grapheme_cluster {|cstr| ... } ⇒ String #each_grapheme_cluster ⇒ Object
Passes each grapheme cluster in str to the given block, or returns an enumerator if no block is given. Unlike String#each_char, this enumerates by grapheme clusters defined by Unicode Standard Annex #29 unicode.org/reports/tr29/
"a\u0300".each_char.to_a.size #=> 2
"a\u0300".each_grapheme_cluster.to_a.size #=> 1
8976 8977 8978 8979 8980 8981 |
# File 'string.c', line 8976
static VALUE
rb_str_each_grapheme_cluster(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_grapheme_cluster_size);
return rb_str_enumerate_grapheme_clusters(str, 0);
}
|
#each_line(separator = $/, chomp: false) {|substr| ... } ⇒ String #each_line(separator = $/, chomp: false) ⇒ Object
Splits str using the supplied parameter as the record separator ($/
by default), passing each substring in turn to the supplied block. If a zero-length record separator is supplied, the string is split into paragraphs delimited by multiple successive newlines.
If chomp
is true
, separator
will be removed from the end of each line.
If no block is given, an enumerator is returned instead.
"hello\nworld".each_line {|s| p s}
# prints:
# "hello\n"
# "world"
"hello\nworld".each_line('l') {|s| p s}
# prints:
# "hel"
# "l"
# "o\nworl"
# "d"
"hello\n\n\nworld".each_line('') {|s| p s}
# prints
# "hello\n\n"
# "world"
"hello\nworld".each_line(chomp: true) {|s| p s}
# prints:
# "hello"
# "world"
"hello\nworld".each_line('l', chomp: true) {|s| p s}
# prints:
# "he"
# ""
# "o\nwor"
# "d"
8606 8607 8608 8609 8610 8611 |
# File 'string.c', line 8606
static VALUE
rb_str_each_line(int argc, VALUE *argv, VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, argc, argv, 0);
return rb_str_enumerate_lines(argc, argv, str, 0);
}
|
#empty? ⇒ Boolean
Returns true
if the length of self
is zero, false
otherwise:
"hello".empty? # => false
" ".empty? # => false
"".empty? # => true
2002 2003 2004 2005 2006 2007 2008 |
# File 'string.c', line 2002
static VALUE
rb_str_empty(VALUE str)
{
if (RSTRING_LEN(str) == 0)
return Qtrue;
return Qfalse;
}
|
#encode(encoding, **options) ⇒ String #encode(dst_encoding, src_encoding, **options) ⇒ String #encode(**options) ⇒ String
The first form returns a copy of str
transcoded to encoding encoding
. The second form returns a copy of str
transcoded from src_encoding to dst_encoding. The last form returns a copy of str
transcoded to Encoding.default_internal
.
By default, the first and second form raise Encoding::UndefinedConversionError for characters that are undefined in the destination encoding, and Encoding::InvalidByteSequenceError for invalid byte sequences in the source encoding. The last form by default does not raise exceptions but uses replacement strings.
The options
keyword arguments give details for conversion. The arguments are:
- :invalid
-
If the value is
:replace
, #encode replaces invalid byte sequences instr
with the replacement character. The default is to raise the Encoding::InvalidByteSequenceError exception - :undef
-
If the value is
:replace
, #encode replaces characters which are undefined in the destination encoding with the replacement character. The default is to raise the Encoding::UndefinedConversionError. - :replace
-
Sets the replacement string to the given value. The default replacement string is “uFFFD” for Unicode encoding forms, and “?” otherwise.
- :fallback
-
Sets the replacement string by the given object for undefined character. The object should be a Hash, a Proc, a Method, or an object which has [] method. Its key is an undefined character encoded in the source encoding of current transcoder. Its value can be any encoding until it can be converted into the destination encoding of the transcoder.
- :xml
-
The value must be
:text
or:attr
. If the value is:text
#encode replaces undefined characters with their (upper-case hexadecimal) numeric character references. ‘&’, ‘<’, and ‘>’ are converted to “&”, “<”, and “>”, respectively. If the value is:attr
, #encode also quotes the replacement result (using ‘“’), and replaces ‘”’ with “"”. - :cr_newline
-
Replaces LF (“n”) with CR (“r”) if value is true.
- :crlf_newline
-
Replaces LF (“n”) with CRLF (“rn”) if value is true.
- :universal_newline
-
Replaces CRLF (“rn”) and CR (“r”) with LF (“n”) if value is true.
2877 2878 2879 2880 2881 2882 2883 |
# File 'transcode.c', line 2877
static VALUE
str_encode(int argc, VALUE *argv, VALUE str)
{
VALUE newstr = str;
int encidx = str_transcode(argc, argv, &newstr);
return encoded_dup(newstr, str, encidx);
}
|
#encode!(encoding, **options) ⇒ String #encode!(dst_encoding, src_encoding, **options) ⇒ String
The first form transcodes the contents of str from str.encoding to encoding
. The second form transcodes the contents of str from src_encoding to dst_encoding. The options
keyword arguments give details for conversion. See String#encode for details. Returns the string even if no changes were made.
2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 |
# File 'transcode.c', line 2799
static VALUE
str_encode_bang(int argc, VALUE *argv, VALUE str)
{
VALUE newstr;
int encidx;
rb_check_frozen(str);
newstr = str;
encidx = str_transcode(argc, argv, &newstr);
if (encidx < 0) return str;
if (newstr == str) {
rb_enc_associate_index(str, encidx);
return str;
}
rb_str_shared_replace(str, newstr);
return str_encode_associate(str, encidx);
}
|
#encoding ⇒ Encoding
Returns the Encoding object that represents the encoding of obj.
1201 1202 1203 1204 1205 1206 1207 1208 1209 |
# File 'encoding.c', line 1201
VALUE
rb_obj_encoding(VALUE obj)
{
int idx = rb_enc_get_index(obj);
if (idx < 0) {
rb_raise(rb_eTypeError, "unknown encoding");
}
return rb_enc_from_encoding_index(idx & ENC_INDEX_MASK);
}
|
#end_with?([suffixes]) ⇒ Boolean
Returns true if str
ends with one of the suffixes
given.
"hello".end_with?("ello") #=> true
# returns true if one of the +suffixes+ matches.
"hello".end_with?("heaven", "ello") #=> true
"hello".end_with?("heaven", "paradise") #=> false
10168 10169 10170 10171 10172 10173 10174 10175 10176 10177 10178 10179 10180 10181 10182 10183 10184 10185 10186 10187 10188 10189 |
# File 'string.c', line 10168
static VALUE
rb_str_end_with(int argc, VALUE *argv, VALUE str)
{
int i;
char *p, *s, *e;
rb_encoding *enc;
for (i=0; i<argc; i++) {
VALUE tmp = argv[i];
StringValue(tmp);
enc = rb_enc_check(str, tmp);
if (RSTRING_LEN(str) < RSTRING_LEN(tmp)) continue;
p = RSTRING_PTR(str);
e = p + RSTRING_LEN(str);
s = e - RSTRING_LEN(tmp);
if (rb_enc_left_char_head(p, s, e, enc) != s)
continue;
if (memcmp(s, RSTRING_PTR(tmp), RSTRING_LEN(tmp)) == 0)
return Qtrue;
}
return Qfalse;
}
|
#eql?(object) ⇒ Boolean
Returns true
if object
has the same length and content;
as +self+; +false+ otherwise:
s = 'foo'
s.eql?('foo') # => true
s.eql?('food') # => false
s.eql?('FOO') # => false
Returns +false+ if the two strings' encodings are not compatible:
"\u{e4 f6 fc}".encode("ISO-8859-1").eql?("\u{c4 d6 dc}") # => false
3424 3425 3426 3427 3428 3429 3430 |
# File 'string.c', line 3424
MJIT_FUNC_EXPORTED VALUE
rb_str_eql(VALUE str1, VALUE str2)
{
if (str1 == str2) return Qtrue;
if (!RB_TYPE_P(str2, T_STRING)) return Qfalse;
return rb_str_eql_internal(str1, str2);
}
|
#force_encoding(encoding) ⇒ String
Changes the encoding to encoding
and returns self.
10384 10385 10386 10387 10388 10389 10390 10391 |
# File 'string.c', line 10384
static VALUE
rb_str_force_encoding(VALUE str, VALUE enc)
{
str_modifiable(str);
rb_enc_associate(str, rb_to_encoding(enc));
ENC_CODERANGE_CLEAR(str);
return str;
}
|
#freeze ⇒ Object
2740 2741 2742 2743 2744 2745 2746 |
# File 'string.c', line 2740
VALUE
rb_str_freeze(VALUE str)
{
if (OBJ_FROZEN(str)) return str;
rb_str_resize(str, RSTRING_LEN(str));
return rb_obj_freeze(str);
}
|
#getbyte(index) ⇒ 0 .. 255
returns the indexth byte as an integer.
5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 |
# File 'string.c', line 5646
static VALUE
rb_str_getbyte(VALUE str, VALUE index)
{
long pos = NUM2LONG(index);
if (pos < 0)
pos += RSTRING_LEN(str);
if (pos < 0 || RSTRING_LEN(str) <= pos)
return Qnil;
return INT2FIX((unsigned char)RSTRING_PTR(str)[pos]);
}
|
#grapheme_clusters ⇒ Array
Returns an array of grapheme clusters in str. This is a shorthand for str.each_grapheme_cluster.to_a
.
If a block is given, which is a deprecated form, works the same as each_grapheme_cluster
.
8994 8995 8996 8997 8998 8999 |
# File 'string.c', line 8994
static VALUE
rb_str_grapheme_clusters(VALUE str)
{
VALUE ary = WANTARRAY("grapheme_clusters", rb_str_strlen(str));
return rb_str_enumerate_grapheme_clusters(str, ary);
}
|
#gsub(pattern, replacement) ⇒ String #gsub(pattern, hash) ⇒ String #gsub(pattern) {|match| ... } ⇒ String #gsub(pattern) ⇒ Object
Returns a copy of str with all occurrences of pattern substituted for the second argument. The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. \d
will match a backslash followed by ‘d’, instead of a digit.
If replacement
is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form \d
, where d is a group number, or \k<n>
, where n is a group name. Similarly, \&
, \'
, \`
, and +
correspond to special variables, $&
, $'
, $`
, and $+
, respectively. (See regexp.rdoc for details.) \0
is the same as \&
. \\
is interpreted as an escape, i.e., a single backslash. Note that, within replacement
the special match variables, such as $&
, will not refer to the current match.
If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string.
In the block form, the current match string is passed in as a parameter, and variables such as $1
, $2
, $`
, $&
, and $'
will be set appropriately. (See regexp.rdoc for details.) The value returned by the block will be substituted for the match on each call.
When neither a block nor a second argument is supplied, an Enumerator is returned.
"hello".gsub(/[aeiou]/, '*') #=> "h*ll*"
"hello".gsub(/([aeiou])/, '<\1>') #=> "h<e>ll<o>"
"hello".gsub(/./) {|s| s.ord.to_s + ' '} #=> "104 101 108 108 111 "
"hello".gsub(/(?<foo>[aeiou])/, '{\k<foo>}') #=> "h{e}ll{o}"
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
Note that a string literal consumes backslashes. (See syntax/literals.rdoc for details on string literals.) Back-references are typically preceded by an additional backslash. For example, if you want to write a back-reference \&
in replacement
with a double-quoted string literal, you need to write: "..\\&.."
. If you want to write a non-back-reference string \&
in replacement
, you need first to escape the backslash to prevent this method from interpreting it as a back-reference, and then you need to escape the backslashes again to prevent a string literal from consuming them: "..\\\\&.."
. You may want to use the block form to avoid a lot of backslashes.
5571 5572 5573 5574 5575 |
# File 'string.c', line 5571
static VALUE
rb_str_gsub(int argc, VALUE *argv, VALUE str)
{
return str_gsub(argc, argv, str, 0);
}
|
#gsub!(pattern, replacement) ⇒ String? #gsub!(pattern, hash) ⇒ String? #gsub!(pattern) {|match| ... } ⇒ String? #gsub!(pattern) ⇒ Object
Performs the substitutions of String#gsub in place, returning str, or nil
if no substitutions were performed. If no block and no replacement is given, an enumerator is returned instead.
5503 5504 5505 5506 5507 5508 |
# File 'string.c', line 5503
static VALUE
rb_str_gsub_bang(int argc, VALUE *argv, VALUE str)
{
str_modify_keep_cr(str);
return str_gsub(argc, argv, str, 1);
}
|
#hash ⇒ Integer
Returns the integer hash value for self
. The value is based on the length, content and encoding of self
.
3317 3318 3319 3320 3321 3322 |
# File 'string.c', line 3317
static VALUE
rb_str_hash_m(VALUE str)
{
st_index_t hval = rb_str_hash(str);
return ST2FIX(hval);
}
|
#hex ⇒ Integer
Treats leading characters from str as a string of hexadecimal digits (with an optional sign and an optional 0x
) and returns the corresponding number. Zero is returned on error.
"0x0a".hex #=> 10
"-1234".hex #=> -4660
"0".hex #=> 0
"wombat".hex #=> 0
9642 9643 9644 9645 9646 |
# File 'string.c', line 9642
static VALUE
rb_str_hex(VALUE str)
{
return rb_str_to_inum(str, 16, FALSE);
}
|
#include?(other_str) ⇒ Boolean
Returns true
if str contains the given string or character.
"hello".include? "lo" #=> true
"hello".include? "ol" #=> false
"hello".include? ?h #=> true
5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 |
# File 'string.c', line 5941
static VALUE
rb_str_include(VALUE str, VALUE arg)
{
long i;
StringValue(arg);
i = rb_str_index(str, arg, 0);
if (i == -1) return Qfalse;
return Qtrue;
}
|
#index(substring, offset = 0) ⇒ Integer? #index(regexp, offset = 0) ⇒ Integer?
Returns the Integer index of the first occurrence of the given substring
, or nil
if none found:
'foo'.index('f') # => 0
'foo'.index('o') # => 1
'foo'.index('oo') # => 1
'foo'.index('ooo') # => nil
Returns the Integer index of the first match for the given Regexp regexp
, or nil
if none found:
'foo'.index(/f/) # => 0
'foo'.index(/o/) # => 1
'foo'.index(/oo/) # => 1
'foo'.index(/ooo/) # => nil
Integer argument offset
, if given, specifies the position in the string to begin the search:
'foo'.index('o', 1) # => 1
'foo'.index('o', 2) # => 2
'foo'.index('o', 3) # => nil
If offset
is negative, counts backward from the end of self
:
'foo'.index('o', -1) # => 2
'foo'.index('o', -2) # => 1
'foo'.index('o', -3) # => 1
'foo'.index('o', -4) # => nil
Related: String#rindex
3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 |
# File 'string.c', line 3691
static VALUE
rb_str_index_m(int argc, VALUE *argv, VALUE str)
{
VALUE sub;
VALUE initpos;
long pos;
if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) {
pos = NUM2LONG(initpos);
}
else {
pos = 0;
}
if (pos < 0) {
pos += str_strlen(str, NULL);
if (pos < 0) {
if (RB_TYPE_P(sub, T_REGEXP)) {
rb_backref_set(Qnil);
}
return Qnil;
}
}
if (RB_TYPE_P(sub, T_REGEXP)) {
if (pos > str_strlen(str, NULL))
return Qnil;
pos = str_offset(RSTRING_PTR(str), RSTRING_END(str), pos,
rb_enc_check(str, sub), single_byte_optimizable(str));
if (rb_reg_search(sub, str, pos, 0) < 0) {
return Qnil;
} else {
VALUE match = rb_backref_get();
struct re_registers *regs = RMATCH_REGS(match);
pos = rb_str_sublen(str, BEG(0));
return LONG2NUM(pos);
}
}
else {
StringValue(sub);
pos = rb_str_index(str, sub, pos);
pos = rb_str_sublen(str, pos);
}
if (pos == -1) return Qnil;
return LONG2NUM(pos);
}
|
#replace(other_str) ⇒ String
Replaces the contents of str with the corresponding values in other_str.
s = "hello" #=> "hello"
s.replace "world" #=> "world"
5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 |
# File 'string.c', line 5589
VALUE
rb_str_replace(VALUE str, VALUE str2)
{
str_modifiable(str);
if (str == str2) return str;
StringValue(str2);
str_discard(str);
return str_replace(str, str2);
}
|
#insert(index, other_string) ⇒ self
Inserts the given other_string
into self
; returns self
.
If the Integer index
is positive, inserts other_string
at offset index
:
'foo'.insert(1, 'bar') # => "fbaroo"
If the Integer index
is negative, counts backward from the end of self
and inserts other_string
at offset index+1
(that is, after self[index]
):
'foo'.insert(-2, 'bar') # => "fobaro"
4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 |
# File 'string.c', line 4991
static VALUE
rb_str_insert(VALUE str, VALUE idx, VALUE str2)
{
long pos = NUM2LONG(idx);
if (pos == -1) {
return rb_str_append(str, str2);
}
else if (pos < 0) {
pos++;
}
rb_str_splice(str, pos, 0, str2);
return str;
}
|
#inspect ⇒ String
Returns a printable version of str, surrounded by quote marks, with special characters escaped.
str = "hello"
str[3] = "\b"
str.inspect #=> "\"hel\\bo\""
6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 6185 6186 6187 6188 6189 6190 6191 6192 6193 6194 6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 |
# File 'string.c', line 6156
VALUE
rb_str_inspect(VALUE str)
{
int encidx = ENCODING_GET(str);
rb_encoding *enc = rb_enc_from_index(encidx), *actenc;
const char *p, *pend, *prev;
char buf[CHAR_ESC_LEN + 1];
VALUE result = rb_str_buf_new(0);
rb_encoding *resenc = rb_default_internal_encoding();
int unicode_p = rb_enc_unicode_p(enc);
int asciicompat = rb_enc_asciicompat(enc);
if (resenc == NULL) resenc = rb_default_external_encoding();
if (!rb_enc_asciicompat(resenc)) resenc = rb_usascii_encoding();
rb_enc_associate(result, resenc);
str_buf_cat2(result, "\"");
p = RSTRING_PTR(str); pend = RSTRING_END(str);
prev = p;
actenc = get_actual_encoding(encidx, str);
if (actenc != enc) {
enc = actenc;
if (unicode_p) unicode_p = rb_enc_unicode_p(enc);
}
while (p < pend) {
unsigned int c, cc;
int n;
n = rb_enc_precise_mbclen(p, pend, enc);
if (!MBCLEN_CHARFOUND_P(n)) {
if (p > prev) str_buf_cat(result, prev, p - prev);
n = rb_enc_mbminlen(enc);
if (pend < p + n)
n = (int)(pend - p);
while (n--) {
snprintf(buf, CHAR_ESC_LEN, "\\x%02X", *p & 0377);
str_buf_cat(result, buf, strlen(buf));
prev = ++p;
}
continue;
}
n = MBCLEN_CHARFOUND_LEN(n);
c = rb_enc_mbc_to_codepoint(p, pend, enc);
p += n;
if ((asciicompat || unicode_p) &&
(c == '"'|| c == '\\' ||
(c == '#' &&
p < pend &&
MBCLEN_CHARFOUND_P(rb_enc_precise_mbclen(p,pend,enc)) &&
(cc = rb_enc_codepoint(p,pend,enc),
(cc == '$' || cc == '@' || cc == '{'))))) {
if (p - n > prev) str_buf_cat(result, prev, p - n - prev);
str_buf_cat2(result, "\\");
if (asciicompat || enc == resenc) {
prev = p - n;
continue;
}
}
switch (c) {
case '\n': cc = 'n'; break;
case '\r': cc = 'r'; break;
case '\t': cc = 't'; break;
case '\f': cc = 'f'; break;
case '\013': cc = 'v'; break;
case '\010': cc = 'b'; break;
case '\007': cc = 'a'; break;
case 033: cc = 'e'; break;
default: cc = 0; break;
}
if (cc) {
if (p - n > prev) str_buf_cat(result, prev, p - n - prev);
buf[0] = '\\';
buf[1] = (char)cc;
str_buf_cat(result, buf, 2);
prev = p;
continue;
}
if ((enc == resenc && rb_enc_isprint(c, enc)) ||
(asciicompat && rb_enc_isascii(c, enc) && ISPRINT(c))) {
continue;
}
else {
if (p - n > prev) str_buf_cat(result, prev, p - n - prev);
rb_str_buf_cat_escaped_char(result, c, unicode_p);
prev = p;
continue;
}
}
if (p > prev) str_buf_cat(result, prev, p - prev);
str_buf_cat2(result, "\"");
return result;
}
|
#intern ⇒ Object #to_sym ⇒ Object
Returns the Symbol corresponding to str, creating the symbol if it did not previously exist. See Symbol#id2name.
"Koala".intern #=> :Koala
s = 'cat'.to_sym #=> :cat
s == :cat #=> true
s = '@cat'.to_sym #=> :@cat
s == :@cat #=> true
This can also be used to create symbols that cannot be represented using the :xxx
notation.
'cat and dog'.to_sym #=> :"cat and dog"
839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 |
# File 'symbol.c', line 839
VALUE
rb_str_intern(VALUE str)
{
VALUE sym;
#if USE_SYMBOL_GC
rb_encoding *enc, *ascii;
int type;
#else
ID id;
#endif
GLOBAL_SYMBOLS_ENTER(symbols);
{
sym = lookup_str_sym_with_lock(symbols, str);
if (sym) {
// ok
}
else {
#if USE_SYMBOL_GC
enc = rb_enc_get(str);
ascii = rb_usascii_encoding();
if (enc != ascii && sym_check_asciionly(str)) {
str = rb_str_dup(str);
rb_enc_associate(str, ascii);
OBJ_FREEZE(str);
enc = ascii;
}
else {
str = rb_str_dup(str);
OBJ_FREEZE(str);
}
str = rb_fstring(str);
type = rb_str_symname_type(str, IDSET_ATTRSET_FOR_INTERN);
if (type < 0) type = ID_JUNK;
sym = dsymbol_alloc(symbols, rb_cSymbol, str, enc, type);
#else
id = intern_str(str, 0);
sym = ID2SYM(id);
#endif
}
}
GLOBAL_SYMBOLS_LEAVE();
return sym;
}
|
#length ⇒ Integer
Returns the count of characters (not bytes) in self
:
"\x80\u3042".length # => 2
"hello".length # => 5
String#size is an alias for String#length.
Related: String#bytesize.
1969 1970 1971 1972 1973 |
# File 'string.c', line 1969
VALUE
rb_str_length(VALUE str)
{
return LONG2NUM(str_strlen(str, NULL));
}
|
#lines(separator = $/, chomp: false) ⇒ Array
Returns an array of lines in str split using the supplied record separator ($/
by default). This is a shorthand for str.each_line(separator, getline_args).to_a
.
If chomp
is true
, separator
will be removed from the end of each line.
"hello\nworld\n".lines #=> ["hello\n", "world\n"]
"hello world".lines(' ') #=> ["hello ", " ", "world"]
"hello\nworld\n".lines(chomp: true) #=> ["hello", "world"]
If a block is given, which is a deprecated form, works the same as each_line
.
8632 8633 8634 8635 8636 8637 |
# File 'string.c', line 8632
static VALUE
rb_str_lines(int argc, VALUE *argv, VALUE str)
{
VALUE ary = WANTARRAY("lines", 0);
return rb_str_enumerate_lines(argc, argv, str, ary);
}
|
#ljust(integer, padstr = ' ') ⇒ String
If integer is greater than the length of str, returns a new String of length integer with str left justified and padded with padstr; otherwise, returns str.
"hello".ljust(4) #=> "hello"
"hello".ljust(20) #=> "hello "
"hello".ljust(20, '1234') #=> "hello123412341234123"
9980 9981 9982 9983 9984 |
# File 'string.c', line 9980
static VALUE
rb_str_ljust(int argc, VALUE *argv, VALUE str)
{
return rb_str_justify(argc, argv, str, 'l');
}
|
#lstrip ⇒ String
Returns a copy of the receiver with leading whitespace removed. See also String#rstrip and String#strip.
Refer to String#strip for the definition of whitespace.
" hello ".lstrip #=> "hello "
"hello".lstrip #=> "hello"
9335 9336 9337 9338 9339 9340 9341 9342 9343 9344 |
# File 'string.c', line 9335
static VALUE
rb_str_lstrip(VALUE str)
{
char *start;
long len, loffset;
RSTRING_GETMEM(str, start, len);
loffset = lstrip_offset(str, start, start+len, STR_ENC_GET(str));
if (loffset <= 0) return str_duplicate(rb_cString, str);
return rb_str_subseq(str, loffset, len - loffset);
}
|
#lstrip! ⇒ self?
Removes leading whitespace from the receiver. Returns the altered receiver, or nil
if no change was made. See also String#rstrip! and String#strip!.
Refer to String#strip for the definition of whitespace.
" hello ".lstrip! #=> "hello "
"hello ".lstrip! #=> nil
"hello".lstrip! #=> nil
9297 9298 9299 9300 9301 9302 9303 9304 9305 9306 9307 9308 9309 9310 9311 9312 9313 9314 9315 9316 9317 9318 9319 |
# File 'string.c', line 9297
static VALUE
rb_str_lstrip_bang(VALUE str)
{
rb_encoding *enc;
char *start, *s;
long olen, loffset;
str_modify_keep_cr(str);
enc = STR_ENC_GET(str);
RSTRING_GETMEM(str, start, olen);
loffset = lstrip_offset(str, start, start+olen, enc);
if (loffset > 0) {
long len = olen-loffset;
s = start + loffset;
memmove(start, s, len);
STR_SET_LEN(str, len);
#if !SHARABLE_MIDDLE_SUBSTRING
TERM_FILL(start+len, rb_enc_mbminlen(enc));
#endif
return str;
}
return Qnil;
}
|
#match(pattern, offset = 0) ⇒ MatchData? #match(pattern, offset = 0) {|matchdata| ... } ⇒ Object
Returns a Matchdata object (or nil
) based on self
and the given pattern
.
Note: also updates Regexp-related global variables.
-
Computes
regexp
by convertingpattern
(if not already a Regexp).regexp = Regexp.new(pattern)
-
Computes
matchdata
, which will be either a MatchData object ornil
(see Regexp#match):matchdata = <tt>regexp.match(self)
With no block given, returns the computed matchdata
:
'foo'.match('f') # => #<MatchData "f">
'foo'.match('o') # => #<MatchData "o">
'foo'.match('x') # => nil
If Integer argument offset
is given, the search begins at index offset
:
'foo'.match('f', 1) # => nil
'foo'.match('o', 1) # => #<MatchData "o">
With a block given, calls the block with the computed matchdata
and returns the block’s return value:
'foo'.match(/o/) {|matchdata| matchdata } # => #<MatchData "o">
'foo'.match(/x/) {|matchdata| matchdata } # => nil
'foo'.match(/f/, 1) {|matchdata| matchdata } # => nil
3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 |
# File 'string.c', line 3987
static VALUE
rb_str_match_m(int argc, VALUE *argv, VALUE str)
{
VALUE re, result;
if (argc < 1)
rb_check_arity(argc, 1, 2);
re = argv[0];
argv[0] = str;
result = rb_funcallv(get_pat(re), rb_intern("match"), argc, argv);
if (!NIL_P(result) && rb_block_given_p()) {
return rb_yield(result);
}
return result;
}
|
#match?(pattern, offset = 0) ⇒ Boolean
Returns true
or false
based on whether a match is found for self
and pattern
.
Note: does not update Regexp-related global variables.
Computes regexp
by converting pattern
(if not already a Regexp).
regexp = Regexp.new(pattern)
Returns true
if self+.match(regexp)
returns a Matchdata object, false
otherwise:
'foo'.match?(/o/) # => true
'foo'.match?('o') # => true
'foo'.match?(/x/) # => false
If Integer argument offset
is given, the search begins at index offset
:
'foo'.match?('f', 1) # => false
'foo'.match?('o', 1) # => true
4025 4026 4027 4028 4029 4030 4031 4032 |
# File 'string.c', line 4025
static VALUE
rb_str_match_m_p(int argc, VALUE *argv, VALUE str)
{
VALUE re;
rb_check_arity(argc, 1, 2);
re = get_pat(argv[0]);
return rb_reg_match_p(re, str, argc > 1 ? NUM2LONG(argv[1]) : 0);
}
|
#succ ⇒ String
Returns the successor to self
. The successor is calculated by incrementing characters.
The first character to be incremented is the rightmost alphanumeric: or, if no alphanumerics, the rightmost character:
'THX1138'.succ # => "THX1139"
'<<koala>>'.succ # => "<<koalb>>"
'***'.succ # => '**+'
The successor to a digit is another digit, “carrying” to the next-left character for a “rollover” from 9 to 0, and prepending another digit if necessary:
'00'.succ # => "01"
'09'.succ # => "10"
'99'.succ # => "100"
The successor to a letter is another letter of the same case, carrying to the next-left character for a rollover, and prepending another same-case letter if necessary:
'aa'.succ # => "ab"
'az'.succ # => "ba"
'zz'.succ # => "aaa"
'AA'.succ # => "AB"
'AZ'.succ # => "BA"
'ZZ'.succ # => "AAA"
The successor to a non-alphanumeric character is the next character in the underlying character set’s collating sequence, carrying to the next-left character for a rollover, and prepending another character if necessary:
s = 0.chr * 3
s # => "\x00\x00\x00"
s.succ # => "\x00\x00\x01"
s = 255.chr * 3
s # => "\xFF\xFF\xFF"
s.succ # => "\x01\x00\x00\x00"
Carrying can occur between and among mixtures of alphanumeric characters:
s = 'zz99zz99'
s.succ # => "aaa00aa00"
s = '99zz99zz'
s.succ # => "100aa00aa"
The successor to an empty String is a new empty String:
''.succ # => ""
String#next is an alias for String#succ.
4272 4273 4274 4275 4276 4277 4278 4279 |
# File 'string.c', line 4272
VALUE
rb_str_succ(VALUE orig)
{
VALUE str;
str = rb_str_new(RSTRING_PTR(orig), RSTRING_LEN(orig));
rb_enc_cr_str_copy_for_substr(str, orig);
return str_succ(str);
}
|
#succ! ⇒ self
Equivalent to String#succ, but modifies self
in place; returns self
.
String#next! is an alias for String#succ!.
4378 4379 4380 4381 4382 4383 4384 |
# File 'string.c', line 4378
static VALUE
rb_str_succ_bang(VALUE str)
{
rb_str_modify(str);
str_succ(str);
return str;
}
|
#oct ⇒ Integer
Treats leading characters of str as a string of octal digits (with an optional sign) and returns the corresponding number. Returns 0 if the conversion fails.
"123".oct #=> 83
"-377".oct #=> -255
"bad".oct #=> 0
"0377bad".oct #=> 255
If str
starts with 0
, radix indicators are honored. See Kernel#Integer.
9666 9667 9668 9669 9670 |
# File 'string.c', line 9666
static VALUE
rb_str_oct(VALUE str)
{
return rb_str_to_inum(str, -8, FALSE);
}
|
#ord ⇒ Integer
Returns the Integer ordinal of a one-character string.
"a".ord #=> 97
9802 9803 9804 9805 9806 9807 9808 9809 |
# File 'string.c', line 9802
static VALUE
rb_str_ord(VALUE s)
{
unsigned int c;
c = rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s), STR_ENC_GET(s));
return UINT2NUM(c);
}
|
#partition(sep) ⇒ Array #partition(regexp) ⇒ Array
Searches sep or pattern (regexp) in the string and returns the part before it, the match, and the part after it. If it is not found, returns two empty strings and str.
"hello".partition("l") #=> ["he", "l", "lo"]
"hello".partition("x") #=> ["hello", "", ""]
"hello".partition(/.l/) #=> ["h", "el", "lo"]
10041 10042 10043 10044 10045 10046 10047 10048 10049 10050 10051 10052 10053 10054 10055 10056 10057 10058 10059 10060 10061 10062 10063 10064 10065 10066 10067 10068 |
# File 'string.c', line 10041
static VALUE
rb_str_partition(VALUE str, VALUE sep)
{
long pos;
sep = get_pat_quoted(sep, 0);
if (RB_TYPE_P(sep, T_REGEXP)) {
if (rb_reg_search(sep, str, 0, 0) < 0) {
goto failed;
}
VALUE match = rb_backref_get();
struct re_registers *regs = RMATCH_REGS(match);
pos = BEG(0);
sep = rb_str_subseq(str, pos, END(0) - pos);
}
else {
pos = rb_str_index(str, sep, 0);
if (pos < 0) goto failed;
}
return rb_ary_new3(3, rb_str_subseq(str, 0, pos),
sep,
rb_str_subseq(str, pos+RSTRING_LEN(sep),
RSTRING_LEN(str)-pos-RSTRING_LEN(sep)));
failed:
return rb_ary_new3(3, str_duplicate(rb_cString, str), str_new_empty_String(str), str_new_empty_String(str));
}
|
#prepend(*other_strings) ⇒ String
Returns a new String containing the concatenation of all given other_strings
and self
:
s = 'foo'
s.prepend('bar', 'baz') # => "barbazfoo"
Related: String#concat.
3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 |
# File 'string.c', line 3266
static VALUE
rb_str_prepend_multi(int argc, VALUE *argv, VALUE str)
{
str_modifiable(str);
if (argc == 1) {
rb_str_update(str, 0L, 0L, argv[0]);
}
else if (argc > 1) {
int i;
VALUE arg_str = rb_str_tmp_new(0);
rb_enc_copy(arg_str, str);
for (i = 0; i < argc; i++) {
rb_str_append(arg_str, argv[i]);
}
rb_str_update(str, 0L, 0L, arg_str);
}
return str;
}
|
#replace(other_str) ⇒ String
Replaces the contents of str with the corresponding values in other_str.
s = "hello" #=> "hello"
s.replace "world" #=> "world"
5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 |
# File 'string.c', line 5589
VALUE
rb_str_replace(VALUE str, VALUE str2)
{
str_modifiable(str);
if (str == str2) return str;
StringValue(str2);
str_discard(str);
return str_replace(str, str2);
}
|
#reverse ⇒ String
Returns a new string with the characters from str in reverse order.
"stressed".reverse #=> "desserts"
5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 |
# File 'string.c', line 5844
static VALUE
rb_str_reverse(VALUE str)
{
rb_encoding *enc;
VALUE rev;
char *s, *e, *p;
int cr;
if (RSTRING_LEN(str) <= 1) return str_duplicate(rb_cString, str);
enc = STR_ENC_GET(str);
rev = rb_str_new(0, RSTRING_LEN(str));
s = RSTRING_PTR(str); e = RSTRING_END(str);
p = RSTRING_END(rev);
cr = ENC_CODERANGE(str);
if (RSTRING_LEN(str) > 1) {
if (single_byte_optimizable(str)) {
while (s < e) {
*--p = *s++;
}
}
else if (cr == ENC_CODERANGE_VALID) {
while (s < e) {
int clen = rb_enc_fast_mbclen(s, e, enc);
p -= clen;
memcpy(p, s, clen);
s += clen;
}
}
else {
cr = rb_enc_asciicompat(enc) ?
ENC_CODERANGE_7BIT : ENC_CODERANGE_VALID;
while (s < e) {
int clen = rb_enc_mbclen(s, e, enc);
if (clen > 1 || (*s & 0x80)) cr = ENC_CODERANGE_UNKNOWN;
p -= clen;
memcpy(p, s, clen);
s += clen;
}
}
}
STR_SET_LEN(rev, RSTRING_LEN(str));
str_enc_copy(rev, str);
ENC_CODERANGE_SET(rev, cr);
return rev;
}
|
#reverse! ⇒ String
Reverses str in place.
5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 |
# File 'string.c', line 5902
static VALUE
rb_str_reverse_bang(VALUE str)
{
if (RSTRING_LEN(str) > 1) {
if (single_byte_optimizable(str)) {
char *s, *e, c;
str_modify_keep_cr(str);
s = RSTRING_PTR(str);
e = RSTRING_END(str) - 1;
while (s < e) {
c = *s;
*s++ = *e;
*e-- = c;
}
}
else {
str_shared_replace(str, rb_str_reverse(str));
}
}
else {
str_modify_keep_cr(str);
}
return str;
}
|
#rindex(substring, offset = self.length) ⇒ Integer? #rindex(regexp, offset = self.length) ⇒ Integer?
Returns the Integer index of the last occurrence of the given substring
, or nil
if none found:
'foo'.rindex('f') # => 0
'foo'.rindex('o') # => 2
'foo'.rindex('oo') # => 1
'foo'.rindex('ooo') # => nil
Returns the Integer index of the last match for the given Regexp regexp
, or nil
if none found:
'foo'.rindex(/f/) # => 0
'foo'.rindex(/o/) # => 2
'foo'.rindex(/oo/) # => 1
'foo'.rindex(/ooo/) # => nil
Integer argument offset
, if given and non-negative, specifies the maximum starting position in the
string to _end_ the search:
'foo'.rindex('o', 0) # => nil
'foo'.rindex('o', 1) # => 1
'foo'.rindex('o', 2) # => 2
'foo'.rindex('o', 3) # => 2
If offset
is a negative Integer, the maximum starting position in the string to end the search is the sum of the string’s length and offset
:
'foo'.rindex('o', -1) # => 2
'foo'.rindex('o', -2) # => 1
'foo'.rindex('o', -3) # => nil
'foo'.rindex('o', -4) # => nil
Related: String#index
3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 |
# File 'string.c', line 3865
static VALUE
rb_str_rindex_m(int argc, VALUE *argv, VALUE str)
{
VALUE sub;
VALUE vpos;
rb_encoding *enc = STR_ENC_GET(str);
long pos, len = str_strlen(str, enc); /* str's enc */
if (rb_scan_args(argc, argv, "11", &sub, &vpos) == 2) {
pos = NUM2LONG(vpos);
if (pos < 0) {
pos += len;
if (pos < 0) {
if (RB_TYPE_P(sub, T_REGEXP)) {
rb_backref_set(Qnil);
}
return Qnil;
}
}
if (pos > len) pos = len;
}
else {
pos = len;
}
if (RB_TYPE_P(sub, T_REGEXP)) {
/* enc = rb_get_check(str, sub); */
pos = str_offset(RSTRING_PTR(str), RSTRING_END(str), pos,
enc, single_byte_optimizable(str));
if (rb_reg_search(sub, str, pos, 1) >= 0) {
VALUE match = rb_backref_get();
struct re_registers *regs = RMATCH_REGS(match);
pos = rb_str_sublen(str, BEG(0));
return LONG2NUM(pos);
}
}
else {
StringValue(sub);
pos = rb_str_rindex(str, sub, pos);
if (pos >= 0) return LONG2NUM(pos);
}
return Qnil;
}
|
#rjust(integer, padstr = ' ') ⇒ String
If integer is greater than the length of str, returns a new String of length integer with str right justified and padded with padstr; otherwise, returns str.
"hello".rjust(4) #=> "hello"
"hello".rjust(20) #=> " hello"
"hello".rjust(20, '1234') #=> "123412341234123hello"
10000 10001 10002 10003 10004 |
# File 'string.c', line 10000
static VALUE
rb_str_rjust(int argc, VALUE *argv, VALUE str)
{
return rb_str_justify(argc, argv, str, 'r');
}
|
#rpartition(sep) ⇒ Array #rpartition(regexp) ⇒ Array
Searches sep or pattern (regexp) in the string from the end of the string, and returns the part before it, the match, and the part after it. If it is not found, returns two empty strings and str.
"hello".rpartition("l") #=> ["hel", "l", "o"]
"hello".rpartition("x") #=> ["", "", "hello"]
"hello".rpartition(/.l/) #=> ["he", "ll", "o"]
10085 10086 10087 10088 10089 10090 10091 10092 10093 10094 10095 10096 10097 10098 10099 10100 10101 10102 10103 10104 10105 10106 10107 10108 10109 10110 10111 10112 10113 10114 10115 10116 |
# File 'string.c', line 10085
static VALUE
rb_str_rpartition(VALUE str, VALUE sep)
{
long pos = RSTRING_LEN(str);
sep = get_pat_quoted(sep, 0);
if (RB_TYPE_P(sep, T_REGEXP)) {
if (rb_reg_search(sep, str, pos, 1) < 0) {
goto failed;
}
VALUE match = rb_backref_get();
struct re_registers *regs = RMATCH_REGS(match);
pos = BEG(0);
sep = rb_str_subseq(str, pos, END(0) - pos);
}
else {
pos = rb_str_sublen(str, pos);
pos = rb_str_rindex(str, sep, pos);
if(pos < 0) {
goto failed;
}
pos = rb_str_offset(str, pos);
}
return rb_ary_new3(3, rb_str_subseq(str, 0, pos),
sep,
rb_str_subseq(str, pos+RSTRING_LEN(sep),
RSTRING_LEN(str)-pos-RSTRING_LEN(sep)));
failed:
return rb_ary_new3(3, str_new_empty_String(str), str_new_empty_String(str), str_duplicate(rb_cString, str));
}
|
#rstrip ⇒ String
Returns a copy of the receiver with trailing whitespace removed. See also String#lstrip and String#strip.
Refer to String#strip for the definition of whitespace.
" hello ".rstrip #=> " hello"
"hello".rstrip #=> "hello"
9424 9425 9426 9427 9428 9429 9430 9431 9432 9433 9434 9435 9436 9437 |
# File 'string.c', line 9424
static VALUE
rb_str_rstrip(VALUE str)
{
rb_encoding *enc;
char *start;
long olen, roffset;
enc = STR_ENC_GET(str);
RSTRING_GETMEM(str, start, olen);
roffset = rstrip_offset(str, start, start+olen, enc);
if (roffset <= 0) return str_duplicate(rb_cString, str);
return rb_str_subseq(str, 0, olen-roffset);
}
|
#rstrip! ⇒ self?
Removes trailing whitespace from the receiver. Returns the altered receiver, or nil
if no change was made. See also String#lstrip! and String#strip!.
Refer to String#strip for the definition of whitespace.
" hello ".rstrip! #=> " hello"
" hello".rstrip! #=> nil
"hello".rstrip! #=> nil
9387 9388 9389 9390 9391 9392 9393 9394 9395 9396 9397 9398 9399 9400 9401 9402 9403 9404 9405 9406 9407 9408 |
# File 'string.c', line 9387
static VALUE
rb_str_rstrip_bang(VALUE str)
{
rb_encoding *enc;
char *start;
long olen, roffset;
str_modify_keep_cr(str);
enc = STR_ENC_GET(str);
RSTRING_GETMEM(str, start, olen);
roffset = rstrip_offset(str, start, start+olen, enc);
if (roffset > 0) {
long len = olen - roffset;
STR_SET_LEN(str, len);
#if !SHARABLE_MIDDLE_SUBSTRING
TERM_FILL(start+len, rb_enc_mbminlen(enc));
#endif
return str;
}
return Qnil;
}
|
#scan(pattern) ⇒ Array #scan(pattern) {|match, ...| ... } ⇒ String
Both forms iterate through str, matching the pattern (which may be a Regexp or a String). For each match, a result is generated and either added to the result array or passed to the block. If the pattern contains no groups, each individual result consists of the matched string, $&
. If the pattern contains groups, each individual result is itself an array containing one entry per group.
a = "cruel world"
a.scan(/\w+/) #=> ["cruel", "world"]
a.scan(/.../) #=> ["cru", "el ", "wor"]
a.scan(/(...)/) #=> [["cru"], ["el "], ["wor"]]
a.scan(/(..)(..)/) #=> [["cr", "ue"], ["l ", "wo"]]
And the block form:
a.scan(/\w+/) {|w| print "<<#{w}>> " }
print "\n"
a.scan(/(.)(.)/) {|x,y| print y, x }
print "\n"
produces:
<<cruel>> <<world>>
rceu lowlr
9594 9595 9596 9597 9598 9599 9600 9601 9602 9603 9604 9605 9606 9607 9608 9609 9610 9611 9612 9613 9614 9615 9616 9617 9618 9619 9620 9621 9622 9623 9624 9625 |
# File 'string.c', line 9594
static VALUE
rb_str_scan(VALUE str, VALUE pat)
{
VALUE result;
long start = 0;
long last = -1, prev = 0;
char *p = RSTRING_PTR(str); long len = RSTRING_LEN(str);
pat = get_pat_quoted(pat, 1);
mustnot_broken(str);
if (!rb_block_given_p()) {
VALUE ary = rb_ary_new();
while (!NIL_P(result = scan_once(str, pat, &start, 0))) {
last = prev;
prev = start;
rb_ary_push(ary, result);
}
if (last >= 0) rb_pat_search(pat, str, last, 1);
else rb_backref_set(Qnil);
return ary;
}
while (!NIL_P(result = scan_once(str, pat, &start, 1))) {
last = prev;
prev = start;
rb_yield(result);
str_mod_check(str, p, len);
}
if (last >= 0) rb_pat_search(pat, str, last, 1);
return str;
}
|
#scrub ⇒ String #scrub(repl) ⇒ String #scrub {|bytes| ... } ⇒ String
If the string is invalid byte sequence then replace invalid bytes with given replacement character, else returns self. If block is given, replace invalid bytes with returned value of the block.
"abc\u3042\x81".scrub #=> "abc\u3042\uFFFD"
"abc\u3042\x81".scrub("*") #=> "abc\u3042*"
"abc\u3042\xE3\x80".scrub{|bytes| '<'+bytes.unpack('H*')[0]+'>' } #=> "abc\u3042<e380>"
10798 10799 10800 10801 10802 10803 10804 |
# File 'string.c', line 10798
static VALUE
str_scrub(int argc, VALUE *argv, VALUE str)
{
VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil;
VALUE new = rb_str_scrub(str, repl);
return NIL_P(new) ? str_duplicate(rb_cString, str): new;
}
|
#scrub! ⇒ String #scrub!(repl) ⇒ String #scrub! {|bytes| ... } ⇒ String
If the string is invalid byte sequence then replace invalid bytes with given replacement character, else returns self. If block is given, replace invalid bytes with returned value of the block.
"abc\u3042\x81".scrub! #=> "abc\u3042\uFFFD"
"abc\u3042\x81".scrub!("*") #=> "abc\u3042*"
"abc\u3042\xE3\x80".scrub!{|bytes| '<'+bytes.unpack('H*')[0]+'>' } #=> "abc\u3042<e380>"
10820 10821 10822 10823 10824 10825 10826 10827 |
# File 'string.c', line 10820
static VALUE
str_scrub_bang(int argc, VALUE *argv, VALUE str)
{
VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil;
VALUE new = rb_str_scrub(str, repl);
if (!NIL_P(new)) rb_str_replace(str, new);
return str;
}
|
#setbyte(index, integer) ⇒ Integer
modifies the indexth byte as integer.
5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 |
# File 'string.c', line 5665
static VALUE
rb_str_setbyte(VALUE str, VALUE index, VALUE value)
{
long pos = NUM2LONG(index);
long len = RSTRING_LEN(str);
char *head, *left = 0;
unsigned char *ptr;
rb_encoding *enc;
int cr = ENC_CODERANGE_UNKNOWN, width, nlen;
if (pos < -len || len <= pos)
rb_raise(rb_eIndexError, "index %ld out of string", pos);
if (pos < 0)
pos += len;
VALUE v = rb_to_int(value);
VALUE w = rb_int_and(v, INT2FIX(0xff));
unsigned char byte = NUM2INT(w) & 0xFF;
if (!str_independent(str))
str_make_independent(str);
enc = STR_ENC_GET(str);
head = RSTRING_PTR(str);
ptr = (unsigned char *)&head[pos];
if (!STR_EMBED_P(str)) {
cr = ENC_CODERANGE(str);
switch (cr) {
case ENC_CODERANGE_7BIT:
left = (char *)ptr;
*ptr = byte;
if (ISASCII(byte)) goto end;
nlen = rb_enc_precise_mbclen(left, head+len, enc);
if (!MBCLEN_CHARFOUND_P(nlen))
ENC_CODERANGE_SET(str, ENC_CODERANGE_BROKEN);
else
ENC_CODERANGE_SET(str, ENC_CODERANGE_VALID);
goto end;
case ENC_CODERANGE_VALID:
left = rb_enc_left_char_head(head, ptr, head+len, enc);
width = rb_enc_precise_mbclen(left, head+len, enc);
*ptr = byte;
nlen = rb_enc_precise_mbclen(left, head+len, enc);
if (!MBCLEN_CHARFOUND_P(nlen))
ENC_CODERANGE_SET(str, ENC_CODERANGE_BROKEN);
else if (MBCLEN_CHARFOUND_LEN(nlen) != width || ISASCII(byte))
ENC_CODERANGE_CLEAR(str);
goto end;
}
}
ENC_CODERANGE_CLEAR(str);
*ptr = byte;
end:
return value;
}
|
#length ⇒ Integer
Returns the count of characters (not bytes) in self
:
"\x80\u3042".length # => 2
"hello".length # => 5
String#size is an alias for String#length.
Related: String#bytesize.
1969 1970 1971 1972 1973 |
# File 'string.c', line 1969
VALUE
rb_str_length(VALUE str)
{
return LONG2NUM(str_strlen(str, NULL));
}
|
#[](index) ⇒ nil #[](start, length) ⇒ nil #[](range) ⇒ nil #[](regexp, capture = 0) ⇒ nil #[](substring) ⇒ nil
Returns the substring of self
specified by the arguments.
When the single Integer argument index
is given, returns the 1-character substring found in self
at offset index
:
'bar'[2] # => "r"
Counts backward from the end of self
if index
is negative:
'foo'[-3] # => "f"
Returns nil
if index
is out of range:
'foo'[3] # => nil
'foo'[-4] # => nil
When the two Integer arguments start
and length
are given, returns the substring of the given length
found in self
at offset start
:
'foo'[0, 2] # => "fo"
'foo'[0, 0] # => ""
Counts backward from the end of self
if start
is negative:
'foo'[-2, 2] # => "oo"
Special case: returns a new empty String if start
is equal to the length of self
:
'foo'[3, 2] # => ""
Returns nil
if start
is out of range:
'foo'[4, 2] # => nil
'foo'[-4, 2] # => nil
Returns the trailing substring of self
if length
is large:
'foo'[1, 50] # => "oo"
Returns nil
if length
is negative:
'foo'[0, -1] # => nil
When the single Range argument range
is given, derives start
and length
values from the given range
, and returns values as above:
-
'foo'[0..1]
is equivalent to'foo'[0, 2]
. -
'foo'[0...1]
is equivalent to'foo'[0, 1]
.
When the Regexp argument regexp
is given, and the capture
argument is 0
, returns the first matching substring found in self
, or nil
if none found:
'foo'[/o/] # => "o"
'foo'[/x/] # => nil
s = 'hello there'
s[/[aeiou](.)\1/] # => "ell"
s[/[aeiou](.)\1/, 0] # => "ell"
If argument capture
is given and not 0
, it should be either an Integer capture group index or a String or Symbol capture group name; the method call returns only the specified capture (see Regexp Capturing):
s = 'hello there'
s[/[aeiou](.)\1/, 1] # => "l"
s[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, "non_vowel"] # => "l"
s[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, :vowel] # => "e"
If an invalid capture group index is given, nil
is returned. If an invalid capture group name is given, IndexError
is raised.
When the single String argument substring
is given, returns the substring from self
if found, otherwise nil
:
'foo'['oo'] # => "oo"
'foo'['xx'] # => nil
String#slice is an alias for String#[].
4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 |
# File 'string.c', line 4735
static VALUE
rb_str_aref_m(int argc, VALUE *argv, VALUE str)
{
if (argc == 2) {
if (RB_TYPE_P(argv[0], T_REGEXP)) {
return rb_str_subpat(str, argv[0], argv[1]);
}
else {
long beg = NUM2LONG(argv[0]);
long len = NUM2LONG(argv[1]);
return rb_str_substr(str, beg, len);
}
}
rb_check_arity(argc, 1, 2);
return rb_str_aref(str, argv[0]);
}
|
#slice!(integer) ⇒ String? #slice!(integer, integer) ⇒ String? #slice!(range) ⇒ String? #slice!(regexp) ⇒ String? #slice!(other_str) ⇒ String?
Deletes the specified portion from str, and returns the portion deleted.
string = "this is a string"
string.slice!(2) #=> "i"
string.slice!(3..6) #=> " is "
string.slice!(/s.*t/) #=> "sa st"
string.slice!("r") #=> "r"
string #=> "thing"
5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 |
# File 'string.c', line 5026
static VALUE
rb_str_slice_bang(int argc, VALUE *argv, VALUE str)
{
VALUE result = Qnil;
VALUE indx;
long beg, len = 1;
char *p;
rb_check_arity(argc, 1, 2);
str_modify_keep_cr(str);
indx = argv[0];
if (RB_TYPE_P(indx, T_REGEXP)) {
if (rb_reg_search(indx, str, 0, 0) < 0) return Qnil;
VALUE match = rb_backref_get();
struct re_registers *regs = RMATCH_REGS(match);
int nth = 0;
if (argc > 1 && (nth = rb_reg_backref_number(match, argv[1])) < 0) {
if ((nth += regs->num_regs) <= 0) return Qnil;
}
else if (nth >= regs->num_regs) return Qnil;
beg = BEG(nth);
len = END(nth) - beg;
goto subseq;
}
else if (argc == 2) {
beg = NUM2LONG(indx);
len = NUM2LONG(argv[1]);
goto num_index;
}
else if (FIXNUM_P(indx)) {
beg = FIX2LONG(indx);
if (!(p = rb_str_subpos(str, beg, &len))) return Qnil;
if (!len) return Qnil;
beg = p - RSTRING_PTR(str);
goto subseq;
}
else if (RB_TYPE_P(indx, T_STRING)) {
beg = rb_str_index(str, indx, 0);
if (beg == -1) return Qnil;
len = RSTRING_LEN(indx);
result = str_duplicate(rb_cString, indx);
goto squash;
}
else {
switch (rb_range_beg_len(indx, &beg, &len, str_strlen(str, NULL), 0)) {
case Qnil:
return Qnil;
case Qfalse:
beg = NUM2LONG(indx);
if (!(p = rb_str_subpos(str, beg, &len))) return Qnil;
if (!len) return Qnil;
beg = p - RSTRING_PTR(str);
goto subseq;
default:
goto num_index;
}
}
num_index:
if (!(p = rb_str_subpos(str, beg, &len))) return Qnil;
beg = p - RSTRING_PTR(str);
subseq:
result = rb_str_new(RSTRING_PTR(str)+beg, len);
rb_enc_cr_str_copy_for_substr(result, str);
squash:
if (len > 0) {
if (beg == 0) {
rb_str_drop_bytes(str, len);
}
else {
char *sptr = RSTRING_PTR(str);
long slen = RSTRING_LEN(str);
if (beg + len > slen) /* pathological check */
len = slen - beg;
memmove(sptr + beg,
sptr + beg + len,
slen - (beg + len));
slen -= len;
STR_SET_LEN(str, slen);
TERM_FILL(&sptr[slen], TERM_LEN(str));
}
}
return result;
}
|
#split(pattern = nil, [limit]) ⇒ Array #split(pattern = nil, [limit]) {|sub| ... } ⇒ String
Divides str into substrings based on a delimiter, returning an array of these substrings.
If pattern is a String, then its contents are used as the delimiter when splitting str. If pattern is a single space, str is split on whitespace, with leading and trailing whitespace and runs of contiguous whitespace characters ignored.
If pattern is a Regexp, str is divided where the pattern matches. Whenever the pattern matches a zero-length string, str is split into individual characters. If pattern contains groups, the respective matches will be returned in the array as well.
If pattern is nil
, the value of $;
is used. If $;
is nil
(which is the default), str is split on whitespace as if ‘ ’ were specified.
If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of split substrings will be returned (captured groups will be returned as well, but are not counted towards the limit). If limit is 1
, the entire string is returned as the only entry in an array. If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.
When the input str
is empty an empty Array is returned as the string is considered to have no fields to split.
" now's the time ".split #=> ["now's", "the", "time"]
" now's the time ".split(' ') #=> ["now's", "the", "time"]
" now's the time".split(/ /) #=> ["", "now's", "", "the", "time"]
"1, 2.34,56, 7".split(%r{,\s*}) #=> ["1", "2.34", "56", "7"]
"hello".split(//) #=> ["h", "e", "l", "l", "o"]
"hello".split(//, 3) #=> ["h", "e", "llo"]
"hi mom".split(%r{\s*}) #=> ["h", "i", "m", "o", "m"]
"mellow yellow".split("ello") #=> ["m", "w y", "w"]
"1,2,,3,4,,".split(',') #=> ["1", "2", "", "3", "4"]
"1,2,,3,4,,".split(',', 4) #=> ["1", "2", "", "3,4,,"]
"1,2,,3,4,,".split(',', -4) #=> ["1", "2", "", "3", "4", "", ""]
"1:2:3".split(/(:)()()/, 2) #=> ["1", ":", "", "", "2:3"]
"".split(',', -1) #=> []
If a block is given, invoke the block with each split substring.
8150 8151 8152 8153 8154 8155 8156 8157 8158 8159 8160 8161 8162 8163 8164 8165 8166 8167 8168 8169 8170 8171 8172 8173 8174 8175 8176 8177 8178 8179 8180 8181 8182 8183 8184 8185 8186 8187 8188 8189 8190 8191 8192 8193 8194 8195 8196 8197 8198 8199 8200 8201 8202 8203 8204 8205 8206 8207 8208 8209 8210 8211 8212 8213 8214 8215 8216 8217 8218 8219 8220 8221 8222 8223 8224 8225 8226 8227 8228 8229 8230 8231 8232 8233 8234 8235 8236 8237 8238 8239 8240 8241 8242 8243 8244 8245 8246 8247 8248 8249 8250 8251 8252 8253 8254 8255 8256 8257 8258 8259 8260 8261 8262 8263 8264 8265 8266 8267 8268 8269 8270 8271 8272 8273 8274 8275 8276 8277 8278 8279 8280 8281 8282 8283 8284 8285 8286 8287 8288 8289 8290 8291 8292 8293 8294 8295 8296 8297 8298 8299 8300 8301 8302 8303 8304 8305 8306 8307 8308 8309 8310 8311 8312 8313 8314 8315 8316 8317 8318 8319 8320 8321 8322 8323 8324 8325 8326 8327 8328 8329 8330 8331 8332 8333 8334 8335 8336 8337 8338 8339 8340 8341 8342 8343 8344 8345 8346 8347 8348 8349 8350 8351 8352 8353 8354 8355 8356 8357 8358 8359 8360 8361 8362 8363 8364 8365 8366 8367 |
# File 'string.c', line 8150
static VALUE
rb_str_split_m(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
VALUE spat;
VALUE limit;
split_type_t split_type;
long beg, end, i = 0, empty_count = -1;
int lim = 0;
VALUE result, tmp;
result = rb_block_given_p() ? Qfalse : Qnil;
if (rb_scan_args(argc, argv, "02", &spat, &limit) == 2) {
lim = NUM2INT(limit);
if (lim <= 0) limit = Qnil;
else if (lim == 1) {
if (RSTRING_LEN(str) == 0)
return result ? rb_ary_new2(0) : str;
tmp = str_duplicate(rb_cString, str);
if (!result) {
rb_yield(tmp);
return str;
}
return rb_ary_new3(1, tmp);
}
i = 1;
}
if (NIL_P(limit) && !lim) empty_count = 0;
enc = STR_ENC_GET(str);
split_type = SPLIT_TYPE_REGEXP;
if (!NIL_P(spat)) {
spat = get_pat_quoted(spat, 0);
}
else if (NIL_P(spat = rb_fs)) {
split_type = SPLIT_TYPE_AWK;
}
else if (!(spat = rb_fs_check(spat))) {
rb_raise(rb_eTypeError, "value of $; must be String or Regexp");
}
else {
rb_category_warn(RB_WARN_CATEGORY_DEPRECATED, "$; is set to non-nil value");
}
if (split_type != SPLIT_TYPE_AWK) {
switch (BUILTIN_TYPE(spat)) {
case T_REGEXP:
rb_reg_options(spat); /* check if uninitialized */
tmp = RREGEXP_SRC(spat);
split_type = literal_split_pattern(tmp, SPLIT_TYPE_REGEXP);
if (split_type == SPLIT_TYPE_AWK) {
spat = tmp;
split_type = SPLIT_TYPE_STRING;
}
break;
case T_STRING:
mustnot_broken(spat);
split_type = literal_split_pattern(spat, SPLIT_TYPE_STRING);
break;
default:
UNREACHABLE_RETURN(Qnil);
}
}
#define SPLIT_STR(beg, len) (empty_count = split_string(result, str, beg, len, empty_count))
if (result) result = rb_ary_new();
beg = 0;
char *ptr = RSTRING_PTR(str);
char *eptr = RSTRING_END(str);
if (split_type == SPLIT_TYPE_AWK) {
char *bptr = ptr;
int skip = 1;
unsigned int c;
end = beg;
if (is_ascii_string(str)) {
while (ptr < eptr) {
c = (unsigned char)*ptr++;
if (skip) {
if (ascii_isspace(c)) {
beg = ptr - bptr;
}
else {
end = ptr - bptr;
skip = 0;
if (!NIL_P(limit) && lim <= i) break;
}
}
else if (ascii_isspace(c)) {
SPLIT_STR(beg, end-beg);
skip = 1;
beg = ptr - bptr;
if (!NIL_P(limit)) ++i;
}
else {
end = ptr - bptr;
}
}
}
else {
while (ptr < eptr) {
int n;
c = rb_enc_codepoint_len(ptr, eptr, &n, enc);
ptr += n;
if (skip) {
if (rb_isspace(c)) {
beg = ptr - bptr;
}
else {
end = ptr - bptr;
skip = 0;
if (!NIL_P(limit) && lim <= i) break;
}
}
else if (rb_isspace(c)) {
SPLIT_STR(beg, end-beg);
skip = 1;
beg = ptr - bptr;
if (!NIL_P(limit)) ++i;
}
else {
end = ptr - bptr;
}
}
}
}
else if (split_type == SPLIT_TYPE_STRING) {
char *str_start = ptr;
char *substr_start = ptr;
char *sptr = RSTRING_PTR(spat);
long slen = RSTRING_LEN(spat);
mustnot_broken(str);
enc = rb_enc_check(str, spat);
while (ptr < eptr &&
(end = rb_memsearch(sptr, slen, ptr, eptr - ptr, enc)) >= 0) {
/* Check we are at the start of a char */
char *t = rb_enc_right_char_head(ptr, ptr + end, eptr, enc);
if (t != ptr + end) {
ptr = t;
continue;
}
SPLIT_STR(substr_start - str_start, (ptr+end) - substr_start);
ptr += end + slen;
substr_start = ptr;
if (!NIL_P(limit) && lim <= ++i) break;
}
beg = ptr - str_start;
}
else if (split_type == SPLIT_TYPE_CHARS) {
char *str_start = ptr;
int n;
mustnot_broken(str);
enc = rb_enc_get(str);
while (ptr < eptr &&
(n = rb_enc_precise_mbclen(ptr, eptr, enc)) > 0) {
SPLIT_STR(ptr - str_start, n);
ptr += n;
if (!NIL_P(limit) && lim <= ++i) break;
}
beg = ptr - str_start;
}
else {
long len = RSTRING_LEN(str);
long start = beg;
long idx;
int last_null = 0;
struct re_registers *regs;
VALUE match = 0;
for (; rb_reg_search(spat, str, start, 0) >= 0;
(match ? (rb_match_unbusy(match), rb_backref_set(match)) : (void)0)) {
match = rb_backref_get();
if (!result) rb_match_busy(match);
regs = RMATCH_REGS(match);
end = BEG(0);
if (start == end && BEG(0) == END(0)) {
if (!ptr) {
SPLIT_STR(0, 0);
break;
}
else if (last_null == 1) {
SPLIT_STR(beg, rb_enc_fast_mbclen(ptr+beg, eptr, enc));
beg = start;
}
else {
if (start == len)
start++;
else
start += rb_enc_fast_mbclen(ptr+start,eptr,enc);
last_null = 1;
continue;
}
}
else {
SPLIT_STR(beg, end-beg);
beg = start = END(0);
}
last_null = 0;
for (idx=1; idx < regs->num_regs; idx++) {
if (BEG(idx) == -1) continue;
SPLIT_STR(BEG(idx), END(idx)-BEG(idx));
}
if (!NIL_P(limit) && lim <= ++i) break;
}
if (match) rb_match_unbusy(match);
}
if (RSTRING_LEN(str) > 0 && (!NIL_P(limit) || RSTRING_LEN(str) > beg || lim < 0)) {
SPLIT_STR(beg, RSTRING_LEN(str)-beg);
}
return result ? result : str;
}
|
#squeeze([other_str]) ⇒ String
Builds a set of characters from the other_str parameter(s) using the procedure described for String#count. Returns a new string where runs of the same character that occur in this set are replaced by a single character. If no arguments are given, all runs of identical characters are replaced by a single character.
"yellow moon".squeeze #=> "yelow mon"
" now is the".squeeze(" ") #=> " now is the"
"putters shoot balls".squeeze("m-z") #=> "puters shot balls"
7863 7864 7865 7866 7867 7868 7869 |
# File 'string.c', line 7863
static VALUE
rb_str_squeeze(int argc, VALUE *argv, VALUE str)
{
str = str_duplicate(rb_cString, str);
rb_str_squeeze_bang(argc, argv, str);
return str;
}
|
#squeeze!([other_str]) ⇒ String?
Squeezes str in place, returning either str, or nil
if no changes were made.
7772 7773 7774 7775 7776 7777 7778 7779 7780 7781 7782 7783 7784 7785 7786 7787 7788 7789 7790 7791 7792 7793 7794 7795 7796 7797 7798 7799 7800 7801 7802 7803 7804 7805 7806 7807 7808 7809 7810 7811 7812 7813 7814 7815 7816 7817 7818 7819 7820 7821 7822 7823 7824 7825 7826 7827 7828 7829 7830 7831 7832 7833 7834 7835 7836 7837 7838 7839 7840 7841 7842 7843 7844 7845 |
# File 'string.c', line 7772
static VALUE
rb_str_squeeze_bang(int argc, VALUE *argv, VALUE str)
{
char squeez[TR_TABLE_SIZE];
rb_encoding *enc = 0;
VALUE del = 0, nodel = 0;
unsigned char *s, *send, *t;
int i, modify = 0;
int ascompat, singlebyte = single_byte_optimizable(str);
unsigned int save;
if (argc == 0) {
enc = STR_ENC_GET(str);
}
else {
for (i=0; i<argc; i++) {
VALUE s = argv[i];
StringValue(s);
enc = rb_enc_check(str, s);
if (singlebyte && !single_byte_optimizable(s))
singlebyte = 0;
tr_setup_table(s, squeez, i==0, &del, &nodel, enc);
}
}
str_modify_keep_cr(str);
s = t = (unsigned char *)RSTRING_PTR(str);
if (!s || RSTRING_LEN(str) == 0) return Qnil;
send = (unsigned char *)RSTRING_END(str);
save = -1;
ascompat = rb_enc_asciicompat(enc);
if (singlebyte) {
while (s < send) {
unsigned int c = *s++;
if (c != save || (argc > 0 && !squeez[c])) {
*t++ = save = c;
}
}
}
else {
while (s < send) {
unsigned int c;
int clen;
if (ascompat && (c = *s) < 0x80) {
if (c != save || (argc > 0 && !squeez[c])) {
*t++ = save = c;
}
s++;
}
else {
c = rb_enc_codepoint_len((char *)s, (char *)send, &clen, enc);
if (c != save || (argc > 0 && !tr_find(c, squeez, del, nodel))) {
if (t != s) rb_enc_mbcput(c, t, enc);
save = c;
t += clen;
}
s += clen;
}
}
}
TERM_FILL((char *)t, TERM_LEN(str));
if ((char *)t - RSTRING_PTR(str) != RSTRING_LEN(str)) {
STR_SET_LEN(str, (char *)t - RSTRING_PTR(str));
modify = 1;
}
if (modify) return str;
return Qnil;
}
|
#start_with?([prefixes]) ⇒ Boolean
Returns true if str
starts with one of the prefixes
given. Each of the prefixes
should be a String or a Regexp.
"hello".start_with?("hell") #=> true
"hello".start_with?(/H/i) #=> true
# returns true if one of the prefixes matches.
"hello".start_with?("heaven", "hell") #=> true
"hello".start_with?("heaven", "paradise") #=> false
10133 10134 10135 10136 10137 10138 10139 10140 10141 10142 10143 10144 10145 10146 10147 10148 10149 10150 10151 10152 10153 |
# File 'string.c', line 10133
static VALUE
rb_str_start_with(int argc, VALUE *argv, VALUE str)
{
int i;
for (i=0; i<argc; i++) {
VALUE tmp = argv[i];
if (RB_TYPE_P(tmp, T_REGEXP)) {
if (rb_reg_start_with_p(tmp, str))
return Qtrue;
}
else {
StringValue(tmp);
rb_enc_check(str, tmp);
if (RSTRING_LEN(str) < RSTRING_LEN(tmp)) continue;
if (memcmp(RSTRING_PTR(str), RSTRING_PTR(tmp), RSTRING_LEN(tmp)) == 0)
return Qtrue;
}
}
return Qfalse;
}
|
#strip ⇒ String
Returns a copy of the receiver with leading and trailing whitespace removed.
Whitespace is defined as any of the following characters: null, horizontal tab, line feed, vertical tab, form feed, carriage return, space.
" hello ".strip #=> "hello"
"\tgoodbye\r\n".strip #=> "goodbye"
"\x00\t\n\v\f\r ".strip #=> ""
"hello".strip #=> "hello"
9497 9498 9499 9500 9501 9502 9503 9504 9505 9506 9507 9508 9509 9510 |
# File 'string.c', line 9497
static VALUE
rb_str_strip(VALUE str)
{
char *start;
long olen, loffset, roffset;
rb_encoding *enc = STR_ENC_GET(str);
RSTRING_GETMEM(str, start, olen);
loffset = lstrip_offset(str, start, start+olen, enc);
roffset = rstrip_offset(str, start+loffset, start+olen, enc);
if (loffset <= 0 && roffset <= 0) return str_duplicate(rb_cString, str);
return rb_str_subseq(str, loffset, olen-loffset-roffset);
}
|
#strip! ⇒ self?
Removes leading and trailing whitespace from the receiver. Returns the altered receiver, or nil
if there was no change.
Refer to String#strip for the definition of whitespace.
" hello ".strip! #=> "hello"
"hello".strip! #=> nil
9453 9454 9455 9456 9457 9458 9459 9460 9461 9462 9463 9464 9465 9466 9467 9468 9469 9470 9471 9472 9473 9474 9475 9476 9477 9478 9479 |
# File 'string.c', line 9453
static VALUE
rb_str_strip_bang(VALUE str)
{
char *start;
long olen, loffset, roffset;
rb_encoding *enc;
str_modify_keep_cr(str);
enc = STR_ENC_GET(str);
RSTRING_GETMEM(str, start, olen);
loffset = lstrip_offset(str, start, start+olen, enc);
roffset = rstrip_offset(str, start+loffset, start+olen, enc);
if (loffset > 0 || roffset > 0) {
long len = olen-roffset;
if (loffset > 0) {
len -= loffset;
memmove(start, start + loffset, len);
}
STR_SET_LEN(str, len);
#if !SHARABLE_MIDDLE_SUBSTRING
TERM_FILL(start+len, rb_enc_mbminlen(enc));
#endif
return str;
}
return Qnil;
}
|
#sub(pattern, replacement) ⇒ String #sub(pattern, hash) ⇒ String #sub(pattern) {|match| ... } ⇒ String
Returns a copy of str
with the first occurrence of pattern
replaced by the second argument. The pattern
is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. \d
will match a backslash followed by ‘d’, instead of a digit.
If replacement
is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form \d
, where d is a group number, or \k<n>
, where n is a group name. Similarly, \&
, \'
, \`
, and +
correspond to special variables, $&
, $'
, $`
, and $+
, respectively. (See regexp.rdoc for details.) \0
is the same as \&
. \\
is interpreted as an escape, i.e., a single backslash. Note that, within replacement
the special match variables, such as $&
, will not refer to the current match.
If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string.
In the block form, the current match string is passed in as a parameter, and variables such as $1
, $2
, $`
, $&
, and $'
will be set appropriately. (See regexp.rdoc for details.) The value returned by the block will be substituted for the match on each call.
"hello".sub(/[aeiou]/, '*') #=> "h*llo"
"hello".sub(/([aeiou])/, '<\1>') #=> "h<e>llo"
"hello".sub(/./) {|s| s.ord.to_s + ' ' } #=> "104 ello"
"hello".sub(/(?<foo>[aeiou])/, '*\k<foo>*') #=> "h*e*llo"
'Is SHELL your preferred shell?'.sub(/[[:upper:]]{2,}/, ENV)
#=> "Is /bin/bash your preferred shell?"
Note that a string literal consumes backslashes. (See syntax/literals.rdoc for details about string literals.) Back-references are typically preceded by an additional backslash. For example, if you want to write a back-reference \&
in replacement
with a double-quoted string literal, you need to write: "..\\&.."
. If you want to write a non-back-reference string \&
in replacement
, you need first to escape the backslash to prevent this method from interpreting it as a back-reference, and then you need to escape the backslashes again to prevent a string literal from consuming them: "..\\\\&.."
. You may want to use the block form to avoid a lot of backslashes.
5359 5360 5361 5362 5363 5364 5365 |
# File 'string.c', line 5359
static VALUE
rb_str_sub(int argc, VALUE *argv, VALUE str)
{
str = str_duplicate(rb_cString, str);
rb_str_sub_bang(argc, argv, str);
return str;
}
|
#sub!(pattern, replacement) ⇒ String? #sub!(pattern) {|match| ... } ⇒ String?
Performs the same substitution as String#sub in-place.
Returns str
if a substitution was performed or nil
if no substitution was performed.
5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 |
# File 'string.c', line 5194
static VALUE
rb_str_sub_bang(int argc, VALUE *argv, VALUE str)
{
VALUE pat, repl, hash = Qnil;
int iter = 0;
long plen;
int min_arity = rb_block_given_p() ? 1 : 2;
long beg;
rb_check_arity(argc, min_arity, 2);
if (argc == 1) {
iter = 1;
}
else {
repl = argv[1];
hash = rb_check_hash_type(argv[1]);
if (NIL_P(hash)) {
StringValue(repl);
}
}
pat = get_pat_quoted(argv[0], 1);
str_modifiable(str);
beg = rb_pat_search(pat, str, 0, 1);
if (beg >= 0) {
rb_encoding *enc;
int cr = ENC_CODERANGE(str);
long beg0, end0;
VALUE match, match0 = Qnil;
struct re_registers *regs;
char *p, *rp;
long len, rlen;
match = rb_backref_get();
regs = RMATCH_REGS(match);
if (RB_TYPE_P(pat, T_STRING)) {
beg0 = beg;
end0 = beg0 + RSTRING_LEN(pat);
match0 = pat;
}
else {
beg0 = BEG(0);
end0 = END(0);
if (iter) match0 = rb_reg_nth_match(0, match);
}
if (iter || !NIL_P(hash)) {
p = RSTRING_PTR(str); len = RSTRING_LEN(str);
if (iter) {
repl = rb_obj_as_string(rb_yield(match0));
}
else {
repl = rb_hash_aref(hash, rb_str_subseq(str, beg0, end0 - beg0));
repl = rb_obj_as_string(repl);
}
str_mod_check(str, p, len);
rb_check_frozen(str);
}
else {
repl = rb_reg_regsub(repl, str, regs, RB_TYPE_P(pat, T_STRING) ? Qnil : pat);
}
enc = rb_enc_compatible(str, repl);
if (!enc) {
rb_encoding *str_enc = STR_ENC_GET(str);
p = RSTRING_PTR(str); len = RSTRING_LEN(str);
if (coderange_scan(p, beg0, str_enc) != ENC_CODERANGE_7BIT ||
coderange_scan(p+end0, len-end0, str_enc) != ENC_CODERANGE_7BIT) {
rb_raise(rb_eEncCompatError, "incompatible character encodings: %s and %s",
rb_enc_name(str_enc),
rb_enc_name(STR_ENC_GET(repl)));
}
enc = STR_ENC_GET(repl);
}
rb_str_modify(str);
rb_enc_associate(str, enc);
if (ENC_CODERANGE_UNKNOWN < cr && cr < ENC_CODERANGE_BROKEN) {
int cr2 = ENC_CODERANGE(repl);
if (cr2 == ENC_CODERANGE_BROKEN ||
(cr == ENC_CODERANGE_VALID && cr2 == ENC_CODERANGE_7BIT))
cr = ENC_CODERANGE_UNKNOWN;
else
cr = cr2;
}
plen = end0 - beg0;
rlen = RSTRING_LEN(repl);
len = RSTRING_LEN(str);
if (rlen > plen) {
RESIZE_CAPA(str, len + rlen - plen);
}
p = RSTRING_PTR(str);
if (rlen != plen) {
memmove(p + beg0 + rlen, p + beg0 + plen, len - beg0 - plen);
}
rp = RSTRING_PTR(repl);
memmove(p + beg0, rp, rlen);
len += rlen - plen;
STR_SET_LEN(str, len);
TERM_FILL(&RSTRING_PTR(str)[len], TERM_LEN(str));
ENC_CODERANGE_SET(str, cr);
return str;
}
return Qnil;
}
|
#succ ⇒ String
Returns the successor to self
. The successor is calculated by incrementing characters.
The first character to be incremented is the rightmost alphanumeric: or, if no alphanumerics, the rightmost character:
'THX1138'.succ # => "THX1139"
'<<koala>>'.succ # => "<<koalb>>"
'***'.succ # => '**+'
The successor to a digit is another digit, “carrying” to the next-left character for a “rollover” from 9 to 0, and prepending another digit if necessary:
'00'.succ # => "01"
'09'.succ # => "10"
'99'.succ # => "100"
The successor to a letter is another letter of the same case, carrying to the next-left character for a rollover, and prepending another same-case letter if necessary:
'aa'.succ # => "ab"
'az'.succ # => "ba"
'zz'.succ # => "aaa"
'AA'.succ # => "AB"
'AZ'.succ # => "BA"
'ZZ'.succ # => "AAA"
The successor to a non-alphanumeric character is the next character in the underlying character set’s collating sequence, carrying to the next-left character for a rollover, and prepending another character if necessary:
s = 0.chr * 3
s # => "\x00\x00\x00"
s.succ # => "\x00\x00\x01"
s = 255.chr * 3
s # => "\xFF\xFF\xFF"
s.succ # => "\x01\x00\x00\x00"
Carrying can occur between and among mixtures of alphanumeric characters:
s = 'zz99zz99'
s.succ # => "aaa00aa00"
s = '99zz99zz'
s.succ # => "100aa00aa"
The successor to an empty String is a new empty String:
''.succ # => ""
String#next is an alias for String#succ.
4272 4273 4274 4275 4276 4277 4278 4279 |
# File 'string.c', line 4272
VALUE
rb_str_succ(VALUE orig)
{
VALUE str;
str = rb_str_new(RSTRING_PTR(orig), RSTRING_LEN(orig));
rb_enc_cr_str_copy_for_substr(str, orig);
return str_succ(str);
}
|
#succ! ⇒ self
Equivalent to String#succ, but modifies self
in place; returns self
.
String#next! is an alias for String#succ!.
4378 4379 4380 4381 4382 4383 4384 |
# File 'string.c', line 4378
static VALUE
rb_str_succ_bang(VALUE str)
{
rb_str_modify(str);
str_succ(str);
return str;
}
|
#sum(n = 16) ⇒ Integer
Returns a basic n-bit checksum of the characters in str, where n is the optional Integer parameter, defaulting to 16. The result is simply the sum of the binary value of each byte in str modulo 2**n - 1
. This is not a particularly good checksum.
9821 9822 9823 9824 9825 9826 9827 9828 9829 9830 9831 9832 9833 9834 9835 9836 9837 9838 9839 9840 9841 9842 9843 9844 9845 9846 9847 9848 9849 9850 9851 9852 9853 9854 9855 9856 9857 9858 9859 9860 9861 9862 9863 9864 9865 9866 9867 9868 9869 9870 9871 9872 |
# File 'string.c', line 9821
static VALUE
rb_str_sum(int argc, VALUE *argv, VALUE str)
{
int bits = 16;
char *ptr, *p, *pend;
long len;
VALUE sum = INT2FIX(0);
unsigned long sum0 = 0;
if (rb_check_arity(argc, 0, 1) && (bits = NUM2INT(argv[0])) < 0) {
bits = 0;
}
ptr = p = RSTRING_PTR(str);
len = RSTRING_LEN(str);
pend = p + len;
while (p < pend) {
if (FIXNUM_MAX - UCHAR_MAX < sum0) {
sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0));
str_mod_check(str, ptr, len);
sum0 = 0;
}
sum0 += (unsigned char)*p;
p++;
}
if (bits == 0) {
if (sum0) {
sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0));
}
}
else {
if (sum == INT2FIX(0)) {
if (bits < (int)sizeof(long)*CHAR_BIT) {
sum0 &= (((unsigned long)1)<<bits)-1;
}
sum = LONG2FIX(sum0);
}
else {
VALUE mod;
if (sum0) {
sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0));
}
mod = rb_funcall(INT2FIX(1), idLTLT, 1, INT2FIX(bits));
mod = rb_funcall(mod, '-', 1, INT2FIX(1));
sum = rb_funcall(sum, '&', 1, mod);
}
}
return sum;
}
|
#swapcase ⇒ String #swapcase([options]) ⇒ String
Returns a copy of str with uppercase alphabetic characters converted to lowercase and lowercase characters converted to uppercase.
See String#downcase for meaning of options
and use with different encodings.
"Hello".swapcase #=> "hELLO"
"cYbEr_PuNk11".swapcase #=> "CyBeR_pUnK11"
7193 7194 7195 7196 7197 7198 7199 7200 7201 7202 7203 7204 7205 7206 7207 7208 7209 7210 7211 |
# File 'string.c', line 7193
static VALUE
rb_str_swapcase(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_DOWNCASE;
VALUE ret;
flags = check_case_options(argc, argv, flags);
enc = str_true_enc(str);
if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return str_duplicate(rb_cString, str);
if (flags&ONIGENC_CASE_ASCII_ONLY) {
ret = rb_str_new(0, RSTRING_LEN(str));
rb_str_ascii_casemap(str, ret, &flags, enc);
}
else {
ret = rb_str_casemap(str, &flags, enc);
}
return ret;
}
|
#swapcase! ⇒ String? #swapcase!([options]) ⇒ String?
Equivalent to String#swapcase, but modifies the receiver in place, returning str, or nil
if no changes were made.
See String#downcase for meaning of options
and use with different encodings.
7160 7161 7162 7163 7164 7165 7166 7167 7168 7169 7170 7171 7172 7173 7174 7175 7176 |
# File 'string.c', line 7160
static VALUE
rb_str_swapcase_bang(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_DOWNCASE;
flags = check_case_options(argc, argv, flags);
str_modify_keep_cr(str);
enc = str_true_enc(str);
if (flags&ONIGENC_CASE_ASCII_ONLY)
rb_str_ascii_casemap(str, str, &flags, enc);
else
str_shared_replace(str, rb_str_casemap(str, &flags, enc));
if (ONIGENC_CASE_MODIFIED&flags) return str;
return Qnil;
}
|
#to_c ⇒ Object
Returns a complex which denotes the string form. The parser ignores leading whitespaces and trailing garbage. Any digit sequences can be separated by an underscore. Returns zero for null or garbage string.
'9'.to_c #=> (9+0i)
'2.5'.to_c #=> (2.5+0i)
'2.5/1'.to_c #=> ((5/2)+0i)
'-3/2'.to_c #=> ((-3/2)+0i)
'-i'.to_c #=> (0-1i)
'45i'.to_c #=> (0+45i)
'3-4i'.to_c #=> (3-4i)
'-4e2-4e-2i'.to_c #=> (-400.0-0.04i)
'-0.0-0.0i'.to_c #=> (-0.0-0.0i)
'1/2+3/4i'.to_c #=> ((1/2)+(3/4)*i)
'ruby'.to_c #=> (0+0i)
See Kernel.Complex.
2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 |
# File 'complex.c', line 2038
static VALUE
string_to_c(VALUE self)
{
char *s;
VALUE num;
rb_must_asciicompat(self);
s = RSTRING_PTR(self);
if (s && s[RSTRING_LEN(self)]) {
rb_str_modify(self);
s = RSTRING_PTR(self);
s[RSTRING_LEN(self)] = '\0';
}
if (!s)
s = (char *)"";
(void)parse_comp(s, 0, &num);
return num;
}
|
#to_f ⇒ Float
Returns the result of interpreting leading characters in str as a floating point number. Extraneous characters past the end of a valid number are ignored. If there is not a valid number at the start of str, 0.0
is returned. This method never raises an exception.
"123.45e1".to_f #=> 1234.5
"45.67 degrees".to_f #=> 45.67
"thx1138".to_f #=> 0.0
6001 6002 6003 6004 6005 |
# File 'string.c', line 6001
static VALUE
rb_str_to_f(VALUE str)
{
return DBL2NUM(rb_str_to_dbl(str, FALSE));
}
|
#to_i(base = 10) ⇒ Integer
Returns the result of interpreting leading characters in str as an integer base base (between 2 and 36). Extraneous characters past the end of a valid number are ignored. If there is not a valid number at the start of str, 0
is returned. This method never raises an exception when base is valid.
"12345".to_i #=> 12345
"99 red balloons".to_i #=> 99
"0a".to_i #=> 0
"0a".to_i(16) #=> 10
"hello".to_i #=> 0
"1100101".to_i(2) #=> 101
"1100101".to_i(8) #=> 294977
"1100101".to_i(10) #=> 1100101
"1100101".to_i(16) #=> 17826049
5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 |
# File 'string.c', line 5975
static VALUE
rb_str_to_i(int argc, VALUE *argv, VALUE str)
{
int base = 10;
if (rb_check_arity(argc, 0, 1) && (base = NUM2INT(argv[0])) < 0) {
rb_raise(rb_eArgError, "invalid radix %d", base);
}
return rb_str_to_inum(str, base, FALSE);
}
|
#to_r ⇒ Object
Returns the result of interpreting leading characters in str
as a rational. Leading whitespace and extraneous characters past the end of a valid number are ignored. Digit sequences can be separated by an underscore. If there is not a valid number at the start of str
, zero is returned. This method never raises an exception.
' 2 '.to_r #=> (2/1)
'300/2'.to_r #=> (150/1)
'-9.2'.to_r #=> (-46/5)
'-9.2e2'.to_r #=> (-920/1)
'1_234_567'.to_r #=> (1234567/1)
'21 June 09'.to_r #=> (21/1)
'21/06/09'.to_r #=> (7/2)
'BWV 1079'.to_r #=> (0/1)
NOTE: “0.3”.to_r isn’t the same as 0.3.to_r. The former is equivalent to “3/10”.to_r, but the latter isn’t so.
"0.3".to_r == 3/10r #=> true
0.3.to_r == 3/10r #=> false
See also Kernel#Rational.
2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 |
# File 'rational.c', line 2534
static VALUE
string_to_r(VALUE self)
{
VALUE num;
rb_must_asciicompat(self);
num = parse_rat(RSTRING_PTR(self), RSTRING_END(self), 0, TRUE);
if (RB_FLOAT_TYPE_P(num) && !FLOAT_ZERO_P(num))
rb_raise(rb_eFloatDomainError, "Infinity");
return num;
}
|
#to_s ⇒ String #to_str ⇒ String
Returns self
.
If called on a subclass of String, converts the receiver to a String object.
6018 6019 6020 6021 6022 6023 6024 6025 |
# File 'string.c', line 6018
static VALUE
rb_str_to_s(VALUE str)
{
if (rb_obj_class(str) != rb_cString) {
return str_duplicate(rb_cString, str);
}
return str;
}
|
#to_s ⇒ String #to_str ⇒ String
Returns self
.
If called on a subclass of String, converts the receiver to a String object.
6018 6019 6020 6021 6022 6023 6024 6025 |
# File 'string.c', line 6018
static VALUE
rb_str_to_s(VALUE str)
{
if (rb_obj_class(str) != rb_cString) {
return str_duplicate(rb_cString, str);
}
return str;
}
|
#intern ⇒ Object #to_sym ⇒ Object
Returns the Symbol corresponding to str, creating the symbol if it did not previously exist. See Symbol#id2name.
"Koala".intern #=> :Koala
s = 'cat'.to_sym #=> :cat
s == :cat #=> true
s = '@cat'.to_sym #=> :@cat
s == :@cat #=> true
This can also be used to create symbols that cannot be represented using the :xxx
notation.
'cat and dog'.to_sym #=> :"cat and dog"
839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 |
# File 'symbol.c', line 839
VALUE
rb_str_intern(VALUE str)
{
VALUE sym;
#if USE_SYMBOL_GC
rb_encoding *enc, *ascii;
int type;
#else
ID id;
#endif
GLOBAL_SYMBOLS_ENTER(symbols);
{
sym = lookup_str_sym_with_lock(symbols, str);
if (sym) {
// ok
}
else {
#if USE_SYMBOL_GC
enc = rb_enc_get(str);
ascii = rb_usascii_encoding();
if (enc != ascii && sym_check_asciionly(str)) {
str = rb_str_dup(str);
rb_enc_associate(str, ascii);
OBJ_FREEZE(str);
enc = ascii;
}
else {
str = rb_str_dup(str);
OBJ_FREEZE(str);
}
str = rb_fstring(str);
type = rb_str_symname_type(str, IDSET_ATTRSET_FOR_INTERN);
if (type < 0) type = ID_JUNK;
sym = dsymbol_alloc(symbols, rb_cSymbol, str, enc, type);
#else
id = intern_str(str, 0);
sym = ID2SYM(id);
#endif
}
}
GLOBAL_SYMBOLS_LEAVE();
return sym;
}
|
#tr(from_str, to_str) ⇒ String
Returns a copy of str
with the characters in from_str
replaced by the corresponding characters in to_str
. If to_str
is shorter than from_str
, it is padded with its last character in order to maintain the correspondence.
"hello".tr('el', 'ip') #=> "hippo"
"hello".tr('aeiou', '*') #=> "h*ll*"
"hello".tr('aeiou', 'AA*') #=> "hAll*"
Both strings may use the c1-c2
notation to denote ranges of characters, and from_str
may start with a ^
, which denotes all characters except those listed.
"hello".tr('a-y', 'b-z') #=> "ifmmp"
"hello".tr('^aeiou', '*') #=> "*e**o"
The backslash character \
can be used to escape ^
or -
and is otherwise ignored unless it appears at the end of a range or the end of the from_str
or to_str
:
"hello^world".tr("\\^aeiou", "*") #=> "h*ll**w*rld"
"hello-world".tr("a\\-eo", "*") #=> "h*ll**w*rld"
"hello\r\nworld".tr("\r", "") #=> "hello\nworld"
"hello\r\nworld".tr("\\r", "") #=> "hello\r\nwold"
"hello\r\nworld".tr("\\\r", "") #=> "hello\nworld"
"X['\\b']".tr("X\\", "") #=> "['b']"
"X['\\b']".tr("X-\\]", "") #=> "'b'"
7575 7576 7577 7578 7579 7580 7581 |
# File 'string.c', line 7575
static VALUE
rb_str_tr(VALUE str, VALUE src, VALUE repl)
{
str = str_duplicate(rb_cString, str);
tr_trans(str, src, repl, 0);
return str;
}
|
#tr!(from_str, to_str) ⇒ String?
Translates str in place, using the same rules as String#tr. Returns str, or nil
if no changes were made.
7533 7534 7535 7536 7537 |
# File 'string.c', line 7533
static VALUE
rb_str_tr_bang(VALUE str, VALUE src, VALUE repl)
{
return tr_trans(str, src, repl, 0);
}
|
#tr_s(from_str, to_str) ⇒ String
Processes a copy of str as described under String#tr, then removes duplicate characters in regions that were affected by the translation.
"hello".tr_s('l', 'r') #=> "hero"
"hello".tr_s('el', '*') #=> "h*o"
"hello".tr_s('el', 'hx') #=> "hhxo"
7900 7901 7902 7903 7904 7905 7906 |
# File 'string.c', line 7900
static VALUE
rb_str_tr_s(VALUE str, VALUE src, VALUE repl)
{
str = str_duplicate(rb_cString, str);
tr_trans(str, src, repl, 1);
return str;
}
|
#tr_s!(from_str, to_str) ⇒ String?
Performs String#tr_s processing on str in place, returning str, or nil
if no changes were made.
7880 7881 7882 7883 7884 |
# File 'string.c', line 7880
static VALUE
rb_str_tr_s_bang(VALUE str, VALUE src, VALUE repl)
{
return tr_trans(str, src, repl, 1);
}
|
#undump ⇒ String
Returns an unescaped version of the string. This does the inverse of String#dump.
"\"hello \\n ''\"".undump #=> "hello \n ''"
6558 6559 6560 6561 6562 6563 6564 6565 6566 6567 6568 6569 6570 6571 6572 6573 6574 6575 6576 6577 6578 6579 6580 6581 6582 6583 6584 6585 6586 6587 6588 6589 6590 6591 6592 6593 6594 6595 6596 6597 6598 6599 6600 6601 6602 6603 6604 6605 6606 6607 6608 6609 6610 6611 6612 6613 6614 6615 6616 6617 6618 6619 6620 6621 6622 6623 6624 6625 6626 6627 6628 6629 6630 6631 6632 6633 6634 6635 6636 6637 6638 6639 6640 6641 6642 6643 6644 6645 |
# File 'string.c', line 6558
static VALUE
str_undump(VALUE str)
{
const char *s = RSTRING_PTR(str);
const char *s_end = RSTRING_END(str);
rb_encoding *enc = rb_enc_get(str);
VALUE undumped = rb_enc_str_new(s, 0L, enc);
bool utf8 = false;
bool binary = false;
int w;
rb_must_asciicompat(str);
if (rb_str_is_ascii_only_p(str) == Qfalse) {
rb_raise(rb_eRuntimeError, "non-ASCII character detected");
}
if (!str_null_check(str, &w)) {
rb_raise(rb_eRuntimeError, "string contains null byte");
}
if (RSTRING_LEN(str) < 2) goto invalid_format;
if (*s != '"') goto invalid_format;
/* strip '"' at the start */
s++;
for (;;) {
if (s >= s_end) {
rb_raise(rb_eRuntimeError, "unterminated dumped string");
}
if (*s == '"') {
/* epilogue */
s++;
if (s == s_end) {
/* ascii compatible dumped string */
break;
}
else {
static const char force_encoding_suffix[] = ".force_encoding(\""; /* "\")" */
static const char dup_suffix[] = ".dup";
const char *encname;
int encidx;
ptrdiff_t size;
/* check separately for strings dumped by older versions */
size = sizeof(dup_suffix) - 1;
if (s_end - s > size && memcmp(s, dup_suffix, size) == 0) s += size;
size = sizeof(force_encoding_suffix) - 1;
if (s_end - s <= size) goto invalid_format;
if (memcmp(s, force_encoding_suffix, size) != 0) goto invalid_format;
s += size;
if (utf8) {
rb_raise(rb_eRuntimeError, "dumped string contained Unicode escape but used force_encoding");
}
encname = s;
s = memchr(s, '"', s_end-s);
size = s - encname;
if (!s) goto invalid_format;
if (s_end - s != 2) goto invalid_format;
if (s[0] != '"' || s[1] != ')') goto invalid_format;
encidx = rb_enc_find_index2(encname, (long)size);
if (encidx < 0) {
rb_raise(rb_eRuntimeError, "dumped string has unknown encoding name");
}
rb_enc_associate_index(undumped, encidx);
}
break;
}
if (*s == '\\') {
s++;
if (s >= s_end) {
rb_raise(rb_eRuntimeError, "invalid escape");
}
undump_after_backslash(undumped, &s, s_end, &enc, &utf8, &binary);
}
else {
rb_str_cat(undumped, s++, 1);
}
}
return undumped;
invalid_format:
rb_raise(rb_eRuntimeError, "invalid dumped string; not wrapped with '\"' nor '\"...\".force_encoding(\"...\")' form");
}
|
#unicode_normalize(form = :nfc) ⇒ Object
Unicode Normalization—Returns a normalized form of str
, using Unicode normalizations NFC, NFD, NFKC, or NFKD. The normalization form used is determined by form
, which can be any of the four values :nfc
, :nfd
, :nfkc
, or :nfkd
. The default is :nfc
.
If the string is not in a Unicode Encoding, then an Exception is raised. In this context, ‘Unicode Encoding’ means any of UTF-8, UTF-16BE/LE, and UTF-32BE/LE, as well as GB18030, UCS_2BE, and UCS_4BE. Anything other than UTF-8 is implemented by converting to UTF-8, which makes it slower than UTF-8.
"a\u0300".unicode_normalize #=> "\u00E0"
"a\u0300".unicode_normalize(:nfc) #=> "\u00E0"
"\u00E0".unicode_normalize(:nfd) #=> "a\u0300"
"\xE0".force_encoding('ISO-8859-1').unicode_normalize(:nfd)
#=> Encoding::CompatibilityError raised
10870 10871 10872 10873 10874 |
# File 'string.c', line 10870
static VALUE
rb_str_unicode_normalize(int argc, VALUE *argv, VALUE str)
{
return unicode_normalize_common(argc, argv, str, id_normalize);
}
|
#unicode_normalize!(form = :nfc) ⇒ Object
Destructive version of String#unicode_normalize, doing Unicode normalization in place.
10883 10884 10885 10886 10887 |
# File 'string.c', line 10883
static VALUE
rb_str_unicode_normalize_bang(int argc, VALUE *argv, VALUE str)
{
return rb_str_replace(str, unicode_normalize_common(argc, argv, str, id_normalize));
}
|
#unicode_normalized?(form = :nfc) ⇒ Boolean
Checks whether str
is in Unicode normalization form form
, which can be any of the four values :nfc
, :nfd
, :nfkc
, or :nfkd
. The default is :nfc
.
If the string is not in a Unicode Encoding, then an Exception is raised. For details, see String#unicode_normalize.
"a\u0300".unicode_normalized? #=> false
"a\u0300".unicode_normalized?(:nfd) #=> true
"\u00E0".unicode_normalized? #=> true
"\u00E0".unicode_normalized?(:nfd) #=> false
"\xE0".force_encoding('ISO-8859-1').unicode_normalized?
#=> Encoding::CompatibilityError raised
10906 10907 10908 10909 10910 |
# File 'string.c', line 10906
static VALUE
rb_str_unicode_normalized_p(int argc, VALUE *argv, VALUE str)
{
return unicode_normalize_common(argc, argv, str, id_normalized_p);
}
|
#upcase ⇒ String #upcase([options]) ⇒ String
Returns a copy of str with all lowercase letters replaced with their uppercase counterparts.
See String#downcase for meaning of options
and use with different encodings.
"hEllO".upcase #=> "HELLO"
6920 6921 6922 6923 6924 6925 6926 6927 6928 6929 6930 6931 6932 6933 6934 6935 6936 6937 6938 6939 6940 6941 6942 6943 |
# File 'string.c', line 6920
static VALUE
rb_str_upcase(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE;
VALUE ret;
flags = check_case_options(argc, argv, flags);
enc = str_true_enc(str);
if (case_option_single_p(flags, enc, str)) {
ret = rb_str_new(RSTRING_PTR(str), RSTRING_LEN(str));
str_enc_copy(ret, str);
upcase_single(ret);
}
else if (flags&ONIGENC_CASE_ASCII_ONLY) {
ret = rb_str_new(0, RSTRING_LEN(str));
rb_str_ascii_casemap(str, ret, &flags, enc);
}
else {
ret = rb_str_casemap(str, &flags, enc);
}
return ret;
}
|
#upcase! ⇒ String? #upcase!([options]) ⇒ String?
Upcases the contents of str, returning nil
if no changes were made.
See String#downcase for meaning of options
and use with different encodings.
6884 6885 6886 6887 6888 6889 6890 6891 6892 6893 6894 6895 6896 6897 6898 6899 6900 6901 6902 6903 6904 |
# File 'string.c', line 6884
static VALUE
rb_str_upcase_bang(int argc, VALUE *argv, VALUE str)
{
rb_encoding *enc;
OnigCaseFoldType flags = ONIGENC_CASE_UPCASE;
flags = check_case_options(argc, argv, flags);
str_modify_keep_cr(str);
enc = str_true_enc(str);
if (case_option_single_p(flags, enc, str)) {
if (upcase_single(str))
flags |= ONIGENC_CASE_MODIFIED;
}
else if (flags&ONIGENC_CASE_ASCII_ONLY)
rb_str_ascii_casemap(str, str, &flags, enc);
else
str_shared_replace(str, rb_str_casemap(str, &flags, enc));
if (ONIGENC_CASE_MODIFIED&flags) return str;
return Qnil;
}
|
#upto(other_string, exclusive = false) {|string| ... } ⇒ self #upto(other_string, exclusive = false) ⇒ Object
With a block given, calls the block with each String value returned by successive calls to String#succ; the first value is self
, the next is self.succ
, and so on; the sequence terminates when value other_string
is reached; returns self
:
'a8'.upto('b6') {|s| print s, ' ' } # => "a8"
Output:
a8 a9 b0 b1 b2 b3 b4 b5 b6
If argument exclusive
is given as a truthy object, the last value is omitted:
'a8'.upto('b6', true) {|s| print s, ' ' } # => "a8"
Output:
a8 a9 b0 b1 b2 b3 b4 b5
If other_string
would not be reached, does not call the block:
'25'.upto('5') {|s| fail s }
'aa'.upto('a') {|s| fail s }
With no block given, returns a new Enumerator:
'a8'.upto('b6') # => #<Enumerator: "a8":upto("b6")>
4430 4431 4432 4433 4434 4435 4436 4437 4438 |
# File 'string.c', line 4430
static VALUE
rb_str_upto(int argc, VALUE *argv, VALUE beg)
{
VALUE end, exclusive;
rb_scan_args(argc, argv, "11", &end, &exclusive);
RETURN_ENUMERATOR(beg, argc, argv);
return rb_str_upto_each(beg, end, RTEST(exclusive), str_upto_i, Qnil);
}
|
#valid_encoding? ⇒ Boolean
Returns true for a string which is encoded correctly.
"\xc2\xa1".force_encoding("UTF-8").valid_encoding? #=> true
"\xc2".force_encoding("UTF-8").valid_encoding? #=> false
"\x80".force_encoding("UTF-8").valid_encoding? #=> false
10420 10421 10422 10423 10424 10425 10426 |
# File 'string.c', line 10420
static VALUE
rb_str_valid_encoding_p(VALUE str)
{
int cr = rb_enc_str_coderange(str);
return cr == ENC_CODERANGE_BROKEN ? Qfalse : Qtrue;
}
|