Class: HTML::Tokenizer
- Inherits:
-
Object
- Object
- HTML::Tokenizer
- Defined in:
- lib/rails/deprecated_sanitizer/html-scanner/html/tokenizer.rb
Overview
A simple HTML tokenizer. It simply breaks a stream of text into tokens, where each token is a string. Each string represents either “text”, or an HTML element.
This currently assumes valid XHTML, which means no free < or > characters.
Usage:
tokenizer = HTML::Tokenizer.new(text)
while token = tokenizer.next
p token
end
Instance Attribute Summary collapse
-
#line ⇒ Object
readonly
The current line number.
-
#position ⇒ Object
readonly
The current (byte) position in the text.
Instance Method Summary collapse
-
#initialize(text) ⇒ Tokenizer
constructor
Create a new Tokenizer for the given text.
-
#next ⇒ Object
Returns the next token in the sequence, or
nil
if there are no more tokens in the stream.
Constructor Details
#initialize(text) ⇒ Tokenizer
Create a new Tokenizer for the given text.
25 26 27 28 29 30 31 |
# File 'lib/rails/deprecated_sanitizer/html-scanner/html/tokenizer.rb', line 25 def initialize(text) text.encode! @scanner = StringScanner.new(text) @position = 0 @line = 0 @current_line = 1 end |
Instance Attribute Details
#line ⇒ Object (readonly)
The current line number
22 23 24 |
# File 'lib/rails/deprecated_sanitizer/html-scanner/html/tokenizer.rb', line 22 def line @line end |
#position ⇒ Object (readonly)
The current (byte) position in the text
19 20 21 |
# File 'lib/rails/deprecated_sanitizer/html-scanner/html/tokenizer.rb', line 19 def position @position end |
Instance Method Details
#next ⇒ Object
Returns the next token in the sequence, or nil
if there are no more tokens in the stream.
35 36 37 38 39 40 41 42 43 44 |
# File 'lib/rails/deprecated_sanitizer/html-scanner/html/tokenizer.rb', line 35 def next return nil if @scanner.eos? @position = @scanner.pos @line = @current_line if @scanner.check(/<\S/) update_current_line(scan_tag) else update_current_line(scan_text) end end |