Class: Undress::Grammar

Inherits:
Object show all
Defined in:
lib/undress/grammar.rb

Overview

Grammars give you a DSL to declare how to convert an HTML document into a different markup language.

Direct Known Subclasses

Textile

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeGrammar

:nodoc:



79
80
81
82
# File 'lib/undress/grammar.rb', line 79

def initialize #:nodoc:
  @pre_processing_rules = self.class.pre_processing_rules.dup
  @post_processing_rules = self.class.post_processing_rules.dup
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(tag, node, *args) ⇒ Object

:nodoc:



123
124
125
# File 'lib/undress/grammar.rb', line 123

def method_missing(tag, node, *args) #:nodoc:
  process(node.children)
end

Instance Attribute Details

#post_processing_rulesObject (readonly)

:nodoc:



77
78
79
# File 'lib/undress/grammar.rb', line 77

def post_processing_rules
  @post_processing_rules
end

#pre_processing_rulesObject (readonly)

:nodoc:



76
77
78
# File 'lib/undress/grammar.rb', line 76

def pre_processing_rules
  @pre_processing_rules
end

Class Method Details

.default(&handler) ⇒ Object

Set a default rule for unrecognized tags.

Unless you define a special case, it will ignore the tags and just output the contents of unrecognized tags.



30
31
32
33
34
# File 'lib/undress/grammar.rb', line 30

def self.default(&handler) # :yields: element
  define_method :method_missing do |tag, node, *args|
    handler.call(node)
  end
end

.inherited(base) ⇒ Object

:nodoc:



5
6
7
8
# File 'lib/undress/grammar.rb', line 5

def self.inherited(base) # :nodoc:
  base.instance_variable_set(:@post_processing_rules, post_processing_rules)
  base.instance_variable_set(:@pre_processing_rules, pre_processing_rules)
end

.post_processing(regexp, replacement = nil, &handler) ⇒ Object

Add a post-processing rule to your parser.

This takes a regular expression that will be applied to the output after processing any nodes. It can take a string as a replacement, or a block that will be passed to String#gsub.

post_processing(/\n\n+/, "\n\n") # compress more than two newlines
post_processing(/whatever/) { ... }


44
45
46
# File 'lib/undress/grammar.rb', line 44

def self.post_processing(regexp, replacement = nil, &handler) #:yields: matched_string
  post_processing_rules[regexp] = replacement || handler
end

.post_processing_rulesObject

:nodoc:



64
65
66
# File 'lib/undress/grammar.rb', line 64

def self.post_processing_rules #:nodoc:
  @post_processing_rules ||= {}
end

.pre_processing(selector, &handler) ⇒ Object

Add a pre-processing rule to your parser.

This lets you mutate the DOM before applying any rule defined with rule_for. You need to pass a CSS/XPath selector, and a block that takes an Hpricot element to parse it.

pre_processing "ul.toc" do |element|
  element.swap("<p>[[toc]]</p>")
end

Would replace any unordered lists with the class toc for a paragraph containing the code [[toc]].



60
61
62
# File 'lib/undress/grammar.rb', line 60

def self.pre_processing(selector, &handler) # :yields: element
  pre_processing_rules[selector] = handler
end

.pre_processing_rulesObject

:nodoc:



68
69
70
# File 'lib/undress/grammar.rb', line 68

def self.pre_processing_rules #:nodoc:
  @pre_processing_rules ||= {}
end

.process!(node) ⇒ Object

:nodoc:



72
73
74
# File 'lib/undress/grammar.rb', line 72

def self.process!(node) #:nodoc:
  new.process!(node)
end

.rule_for(*tags, &handler) ⇒ Object

Add a parsing rule for a group of html tags.

rule_for :p do |element|
  "<this was a paragraph>#{content_of(element)}</this was a paragraph>"
end

will replace your <p> tags for <this was a paragraph> tags, without altering the contents.

The element yielded to the block is an Hpricot element for the given tag.



20
21
22
23
24
# File 'lib/undress/grammar.rb', line 20

def self.rule_for(*tags, &handler) # :yields: element
  tags.each do |tag|
    define_method tag.to_sym, &handler
  end
end

Instance Method Details

#content_of(node) ⇒ Object

Get the result of parsing the contents of a node.



112
113
114
# File 'lib/undress/grammar.rb', line 112

def content_of(node)
  process(node.respond_to?(:children) ? node.children : node)
end

#process(nodes) ⇒ Object

Process a DOM node, converting it to your markup language according to your defined rules. If the node is a Text node, it will return it’s string representation. Otherwise it will call the rule defined for it.



87
88
89
90
91
92
93
94
95
96
97
# File 'lib/undress/grammar.rb', line 87

def process(nodes)
  Array(nodes).map do |node|
    if node.text?
      node.to_html
    elsif node.elem?
      send node.name.to_sym, node
    else
      ""
    end
  end.join("")
end

#process!(node) ⇒ Object

:nodoc:



99
100
101
102
103
104
105
106
107
108
109
# File 'lib/undress/grammar.rb', line 99

def process!(node) #:nodoc:
  pre_processing_rules.each do |selector, handler|
    node.search(selector).each(&handler)
  end

  process(node.children).tap do |text|
    post_processing_rules.each do |rule, handler|
      handler.is_a?(String) ?  text.gsub!(rule, handler) : text.gsub!(rule, &handler)
    end
  end
end

#surrounded_by_whitespace?(node) ⇒ Boolean

Helper method that tells you if the given DOM node is immediately surrounded by whitespace.

Returns:

  • (Boolean)


118
119
120
121
# File 'lib/undress/grammar.rb', line 118

def surrounded_by_whitespace?(node)
  (node.previous.text? && node.previous.to_s =~ /\s+$/) ||
    (node.next.text? && node.next.to_s =~ /^\s+/)
end