Class: HTMLEntities
- Inherits:
-
Object
- Object
- HTMLEntities
- Defined in:
- lib/htmlentities.rb,
lib/htmlentities/html4.rb,
lib/htmlentities/legacy.rb,
lib/htmlentities/xhtml1.rb
Overview
HTML entity encoding and decoding for Ruby
Defined Under Namespace
Classes: InstructionError, UnknownFlavor
Constant Summary collapse
- VERSION =
'4.0.0'
- FLAVORS =
%w[html4 xhtml1]
- INSTRUCTIONS =
[:basic, :named, :decimal, :hexadecimal]
- MAPPINGS =
{}
Class Method Summary collapse
-
.decode_entities(*args) ⇒ Object
Legacy compatibility class method allowing direct decoding of XHTML1 entities.
-
.encode_entities(*args) ⇒ Object
Legacy compatibility class method allowing direct encoding of XHTML1 entities.
Instance Method Summary collapse
-
#decode(source) ⇒ Object
Decode entities in a string into their UTF-8 equivalents.
-
#encode(source, *instructions) ⇒ Object
Encode codepoints into their corresponding entities.
-
#initialize(flavor = 'xhtml1') ⇒ HTMLEntities
constructor
Create a new HTMLEntities coder for the specified flavor.
Constructor Details
#initialize(flavor = 'xhtml1') ⇒ HTMLEntities
Create a new HTMLEntities coder for the specified flavor. Available flavors are ‘html4’ and ‘xhtml1’ (the default). The only difference in functionality between the two is in the handling of the apos (apostrophe) named entity, which is not defined in HTML4.
24 25 26 27 |
# File 'lib/htmlentities.rb', line 24 def initialize(flavor='xhtml1') @flavor = flavor.to_s.downcase raise UnknownFlavor, "Unknown flavor #{flavor}" unless FLAVORS.include?(@flavor) end |
Class Method Details
.decode_entities(*args) ⇒ Object
Legacy compatibility class method allowing direct decoding of XHTML1 entities. See HTMLEntities#decode for description of parameters.
16 17 18 |
# File 'lib/htmlentities/legacy.rb', line 16 def decode_entities(*args) xhtml1_entities.decode(*args) end |
.encode_entities(*args) ⇒ Object
Legacy compatibility class method allowing direct encoding of XHTML1 entities. See HTMLEntities#encode for description of parameters.
8 9 10 |
# File 'lib/htmlentities/legacy.rb', line 8 def encode_entities(*args) xhtml1_entities.encode(*args) end |
Instance Method Details
#decode(source) ⇒ Object
Decode entities in a string into their UTF-8 equivalents. Obviously, if your string is not already in UTF-8, you’d better convert it before using this method, or the output will be mixed up.
Unknown named entities will not be converted
37 38 39 40 41 42 43 |
# File 'lib/htmlentities.rb', line 37 def decode(source) return source.to_s.gsub(named_entity_regexp) { (cp = map[$1]) ? [cp].pack('U') : $& }.gsub(/&#([0-9]{1,7});|&#x([0-9a-f]{1,6});/i) { $1 ? [$1.to_i].pack('U') : [$2.to_i(16)].pack('U') } end |
#encode(source, *instructions) ⇒ Object
Encode codepoints into their corresponding entities. Various operations are possible, and may be specified in order:
- :basic
-
Convert the five XML entities (‘“<>&)
- :named
-
Convert non-ASCII characters to their named HTML 4.01 equivalent
- :decimal
-
Convert non-ASCII characters to decimal entities (e.g. Ӓ)
- :hexadecimal
-
Convert non-ASCII characters to hexadecimal entities (e.g. # ካ)
You can specify the commands in any order, but they will be executed in the order listed above to ensure that entity ampersands are not clobbered and that named entities are replaced before numeric ones.
If no instructions are specified, :basic will be used.
Examples:
encode_entities(str) - XML-safe
encode_entities(str, :basic, :decimal) - XML-safe and 7-bit clean
encode_entities(str, :basic, :named, :decimal) - 7-bit clean, with all
non-ASCII characters replaced with their named entity where possible, and
decimal equivalents otherwise.
Note: It is the program’s responsibility to ensure that the source contains valid UTF-8 before calling this method.
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
# File 'lib/htmlentities.rb', line 70 def encode(source, *instructions) string = source.to_s.dup if (instructions.empty?) instructions = [:basic] elsif (unknown_instructions = instructions - INSTRUCTIONS) != [] raise InstructionError, "unknown encode_entities command(s): #{unknown_instructions.inspect}" end basic_entity_encoder = if instructions.include?(:basic) || instructions.include?(:named) :encode_named elsif instructions.include?(:decimal) :encode_decimal else instructions.include?(:hexadecimal) :encode_hexadecimal end string.gsub!(basic_entity_regexp){ __send__(basic_entity_encoder, $&) } extended_entity_encoders = [] if instructions.include?(:named) extended_entity_encoders << :encode_named end if instructions.include?(:decimal) extended_entity_encoders << :encode_decimal elsif instructions.include?(:hexadecimal) extended_entity_encoders << :encode_hexadecimal end unless extended_entity_encoders.empty? string.gsub!(extended_entity_regexp){ encode_extended(extended_entity_encoders, $&) } end return string end |