Class: Oga::XML::PullParser

Inherits:
Parser
  • Object
show all
Defined in:
lib/oga/xml/pull_parser.rb

Overview

The PullParser class can be used to parse an XML document incrementally instead of parsing it as a whole. This results in lower memory usage and potentially faster parsing times. The downside is that pull parsers are typically more difficult to use compared to DOM parsers.

Basic parsing using this class works as following:

parser = Oga::XML::PullParser.new('... xml here ...')

parser.parse do |node|
  if node.is_a?(Oga::XML::PullParser)

  end
end

This parses yields proper XML instances such as Element. Doctypes and XML declarations are ignored by this parser.

Constant Summary collapse

DISABLED_CALLBACKS =
[
  :on_document,
  :on_doctype,
  :on_xml_decl,
  :on_element_children
]
BLOCK_CALLBACKS =
[
  :on_cdata,
  :on_comment,
  :on_text,
  :on_proc_ins
]
NODE_SHORTHANDS =

Returns the shorthands that can be used for various node classes.

{
  :text            => XML::Text,
  :node            => XML::Node,
  :cdata           => XML::Cdata,
  :element         => XML::Element,
  :doctype         => XML::Doctype,
  :comment         => XML::Comment,
  :xml_declaration => XML::XmlDeclaration
}

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#nestingArray (readonly)

Array containing the names of the currently nested elements.


30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
# File 'lib/oga/xml/pull_parser.rb', line 30

class PullParser < Parser
  attr_reader :node, :nesting

  ##
  # @return [Array]
  #
  DISABLED_CALLBACKS = [
    :on_document,
    :on_doctype,
    :on_xml_decl,
    :on_element_children
  ]

  ##
  # @return [Array]
  #
  BLOCK_CALLBACKS = [
    :on_cdata,
    :on_comment,
    :on_text,
    :on_proc_ins
  ]

  ##
  # Returns the shorthands that can be used for various node classes.
  #
  # @return [Hash]
  #
  NODE_SHORTHANDS = {
    :text            => XML::Text,
    :node            => XML::Node,
    :cdata           => XML::Cdata,
    :element         => XML::Element,
    :doctype         => XML::Doctype,
    :comment         => XML::Comment,
    :xml_declaration => XML::XmlDeclaration
  }

  ##
  # @see Oga::XML::Parser#reset
  #
  def reset
    super

    @block   = nil
    @nesting = []
    @node    = nil
  end

  ##
  # Parses the input and yields every node to the supplied block.
  #
  # @yieldparam [Oga::XML::Node]
  #
  def parse(&block)
    @block = block

    yyparse(self, :yield_next_token)

    reset

    return
  end

  ##
  # Calls the supplied block if the current node type and optionally the
  # nesting match. This method allows you to write this:
  #
  #     parser.parse do |node|
  #       parser.on(:text, %w{people person name}) do
  #         puts node.text
  #       end
  #     end
  #
  # Instead of this:
  #
  #     parser.parse do |node|
  #       if node.is_a?(Oga::XML::Text) and parser.nesting == %w{people person name}
  #         puts node.text
  #       end
  #     end
  #
  # When calling this method you can specify the following node types:
  #
  # * `:cdata`
  # * `:comment`
  # * `:element`
  # * `:text`
  #
  # @example
  #  parser.on(:element, %w{people person name}) do
  #
  #  end
  #
  # @param [Symbol] type The type of node to act upon. This is a symbol as
  #  returned by {Oga::XML::Node#node_type}.
  #
  # @param [Array] nesting The element name nesting to act upon.
  #
  def on(type, nesting = [])
    if node.is_a?(NODE_SHORTHANDS[type])
      if nesting.empty? or nesting == self.nesting
        yield
      end
    end
  end

  # eval is a heck of a lot faster than define_method on both Rubinius and
  # JRuby.
  DISABLED_CALLBACKS.each do |method|
    eval "def \#{method}(*args)\nreturn\nend\n", nil, __FILE__, __LINE__ + 1
  end

  BLOCK_CALLBACKS.each do |method|
    eval "def \#{method}(*args)\[email protected] = super\[email protected](@node)\nreturn\nend\n", nil, __FILE__, __LINE__ + 1
  end

  ##
  # @see Oga::XML::Parser#on_element
  #
  def on_element(*args)
    @node = super

    nesting << @node.name

    @block.call(@node)

    return
  end

  ##
  # @see Oga::XML::Parser#on_element_children
  #
  def after_element(*args)
    nesting.pop

    return
  end
end

#nodeOga::XML::Node (readonly)

The current node.


30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
# File 'lib/oga/xml/pull_parser.rb', line 30

class PullParser < Parser
  attr_reader :node, :nesting

  ##
  # @return [Array]
  #
  DISABLED_CALLBACKS = [
    :on_document,
    :on_doctype,
    :on_xml_decl,
    :on_element_children
  ]

  ##
  # @return [Array]
  #
  BLOCK_CALLBACKS = [
    :on_cdata,
    :on_comment,
    :on_text,
    :on_proc_ins
  ]

  ##
  # Returns the shorthands that can be used for various node classes.
  #
  # @return [Hash]
  #
  NODE_SHORTHANDS = {
    :text            => XML::Text,
    :node            => XML::Node,
    :cdata           => XML::Cdata,
    :element         => XML::Element,
    :doctype         => XML::Doctype,
    :comment         => XML::Comment,
    :xml_declaration => XML::XmlDeclaration
  }

  ##
  # @see Oga::XML::Parser#reset
  #
  def reset
    super

    @block   = nil
    @nesting = []
    @node    = nil
  end

  ##
  # Parses the input and yields every node to the supplied block.
  #
  # @yieldparam [Oga::XML::Node]
  #
  def parse(&block)
    @block = block

    yyparse(self, :yield_next_token)

    reset

    return
  end

  ##
  # Calls the supplied block if the current node type and optionally the
  # nesting match. This method allows you to write this:
  #
  #     parser.parse do |node|
  #       parser.on(:text, %w{people person name}) do
  #         puts node.text
  #       end
  #     end
  #
  # Instead of this:
  #
  #     parser.parse do |node|
  #       if node.is_a?(Oga::XML::Text) and parser.nesting == %w{people person name}
  #         puts node.text
  #       end
  #     end
  #
  # When calling this method you can specify the following node types:
  #
  # * `:cdata`
  # * `:comment`
  # * `:element`
  # * `:text`
  #
  # @example
  #  parser.on(:element, %w{people person name}) do
  #
  #  end
  #
  # @param [Symbol] type The type of node to act upon. This is a symbol as
  #  returned by {Oga::XML::Node#node_type}.
  #
  # @param [Array] nesting The element name nesting to act upon.
  #
  def on(type, nesting = [])
    if node.is_a?(NODE_SHORTHANDS[type])
      if nesting.empty? or nesting == self.nesting
        yield
      end
    end
  end

  # eval is a heck of a lot faster than define_method on both Rubinius and
  # JRuby.
  DISABLED_CALLBACKS.each do |method|
    eval "def \#{method}(*args)\nreturn\nend\n", nil, __FILE__, __LINE__ + 1
  end

  BLOCK_CALLBACKS.each do |method|
    eval "def \#{method}(*args)\[email protected] = super\[email protected](@node)\nreturn\nend\n", nil, __FILE__, __LINE__ + 1
  end

  ##
  # @see Oga::XML::Parser#on_element
  #
  def on_element(*args)
    @node = super

    nesting << @node.name

    @block.call(@node)

    return
  end

  ##
  # @see Oga::XML::Parser#on_element_children
  #
  def after_element(*args)
    nesting.pop

    return
  end
end

Instance Method Details

#after_element(*args) ⇒ Object

See Also:

  • Oga::XML::Parser#on_element_children

173
174
175
176
177
# File 'lib/oga/xml/pull_parser.rb', line 173

def after_element(*args)
  nesting.pop

  return
end

#on(type, nesting = []) ⇒ Object

Calls the supplied block if the current node type and optionally the nesting match. This method allows you to write this:

parser.parse do |node|
  parser.on(:text, %w{people person name}) do
    puts node.text
  end
end

Instead of this:

parser.parse do |node|
  if node.is_a?(Oga::XML::Text) and parser.nesting == %w{people person name}
    puts node.text
  end
end

When calling this method you can specify the following node types:

  • :cdata
  • :comment
  • :element
  • :text

Examples:

parser.on(:element, %w{people person name}) do

end

129
130
131
132
133
134
135
# File 'lib/oga/xml/pull_parser.rb', line 129

def on(type, nesting = [])
  if node.is_a?(NODE_SHORTHANDS[type])
    if nesting.empty? or nesting == self.nesting
      yield
    end
  end
end

#on_element(*args) ⇒ Object

See Also:

  • Oga::XML::Parser#on_element

160
161
162
163
164
165
166
167
168
# File 'lib/oga/xml/pull_parser.rb', line 160

def on_element(*args)
  @node = super

  nesting << @node.name

  @block.call(@node)

  return
end

#parse {|| ... } ⇒ Object

Parses the input and yields every node to the supplied block.

Yield Parameters:


84
85
86
87
88
89
90
91
92
# File 'lib/oga/xml/pull_parser.rb', line 84

def parse(&block)
  @block = block

  yyparse(self, :yield_next_token)

  reset

  return
end

#resetObject

See Also:

  • Oga::XML::Parser#reset

71
72
73
74
75
76
77
# File 'lib/oga/xml/pull_parser.rb', line 71

def reset
  super

  @block   = nil
  @nesting = []
  @node    = nil
end