Class: HTMLEntities
- Inherits:
-
Object
- Object
- HTMLEntities
- Defined in:
- lib/htmlentities.rb,
lib/htmlentities/decoder.rb,
lib/htmlentities/encoder.rb,
lib/htmlentities/flavors.rb,
lib/htmlentities/version.rb,
lib/htmlentities/mappings/html4.rb,
lib/htmlentities/mappings/xhtml1.rb,
lib/htmlentities/mappings/expanded.rb
Overview
HTML entity encoding and decoding for Ruby
Defined Under Namespace
Modules: VERSION Classes: Decoder, Encoder
Constant Summary collapse
- UnknownFlavor =
Class.new(RuntimeError)
- InstructionError =
Class.new(RuntimeError)
- FLAVORS =
%w[html4 xhtml1 expanded]
- MAPPINGS =
{}
- SKIP_DUP_ENCODINGS =
{}
Instance Method Summary collapse
-
#decode(source) ⇒ Object
Decode entities in a string into their UTF-8 equivalents.
-
#encode(source, *instructions) ⇒ Object
Encode codepoints into their corresponding entities.
-
#initialize(flavor = 'xhtml1') ⇒ HTMLEntities
constructor
Create a new HTMLEntities coder for the specified flavor.
Constructor Details
#initialize(flavor = 'xhtml1') ⇒ HTMLEntities
Create a new HTMLEntities coder for the specified flavor. Available flavors are ‘html4’, ‘expanded’ and ‘xhtml1’ (the default).
The only difference in functionality between html4 and xhtml1 is in the handling of the apos (apostrophe) named entity, which is not defined in HTML4.
‘expanded’ includes a large number of additional SGML entities drawn from
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT
it “maps SGML character entities from various public sets (namely, ISOamsa, ISOamsb, ISOamsc, ISOamsn, ISOamso, ISOamsr, ISObox, ISOcyr1, ISOcyr2, ISOdia, ISOgrk1, ISOgrk2, ISOgrk3, ISOgrk4, ISOlat1, ISOlat2, ISOnum, ISOpub, ISOtech, HTMLspecial, HTMLsymbol) to corresponding Unicode characters.” (sgml.txt).
‘expanded’ is a strict superset of the XHTML entities: every xhtml named entity encodes and decodes the same under :expanded as under :xhtml1
32 33 34 35 |
# File 'lib/htmlentities.rb', line 32 def initialize(flavor='xhtml1') @flavor = flavor.to_s.downcase raise UnknownFlavor, "Unknown flavor #{flavor}" unless FLAVORS.include?(@flavor) end |
Instance Method Details
#decode(source) ⇒ Object
Decode entities in a string into their UTF-8 equivalents. The string should already be in UTF-8 encoding.
Unknown named entities will not be converted
43 44 45 |
# File 'lib/htmlentities.rb', line 43 def decode(source) (@decoder ||= Decoder.new(@flavor)).decode(source) end |
#encode(source, *instructions) ⇒ Object
Encode codepoints into their corresponding entities. Various operations are possible, and may be specified in order:
- :basic
-
Convert the five XML entities (‘“<>&)
- :named
-
Convert non-ASCII characters to their named HTML 4.01 equivalent
- :decimal
-
Convert non-ASCII characters to decimal entities (e.g. Ӓ)
- :hexadecimal
-
Convert non-ASCII characters to hexadecimal entities (e.g. # ካ)
You can specify the commands in any order, but they will be executed in the order listed above to ensure that entity ampersands are not clobbered and that named entities are replaced before numeric ones.
If no instructions are specified, :basic will be used.
Examples:
encode(str) - XML-safe
encode(str, :basic, :decimal) - XML-safe and 7-bit clean
encode(str, :basic, :named, :decimal) - 7-bit clean, with all
non-ASCII characters replaced with their named entity where possible, and
decimal equivalents otherwise.
Note: It is the program’s responsibility to ensure that the source contains valid UTF-8 before calling this method.
72 73 74 |
# File 'lib/htmlentities.rb', line 72 def encode(source, *instructions) Encoder.new(@flavor, instructions).encode(source) end |