Class: Undress::Grammar

Inherits:

Object

Object
Undress::Grammar

show all

Defined in:: lib/undress/grammar.rb

Overview

Grammars give you a DSL to declare how to convert an HTML document into a different markup language.

Direct Known Subclasses

Textile

Instance Attribute Summary collapse

#post_processing_rules ⇒ Object readonly

:nodoc:.
#pre_processing_rules ⇒ Object readonly

:nodoc:.

Class Method Summary collapse

.default(&handler) ⇒ Object

Set a default rule for unrecognized tags.
.inherited(base) ⇒ Object

:nodoc:.
.post_processing(regexp, replacement = nil, &handler) ⇒ Object

Add a post-processing rule to your parser.
.post_processing_rules ⇒ Object

:nodoc:.
.pre_processing(selector, &handler) ⇒ Object

Add a pre-processing rule to your parser.
.pre_processing_rules ⇒ Object

:nodoc:.
.process!(node) ⇒ Object

:nodoc:.
.rule_for(*tags, &handler) ⇒ Object

Add a parsing rule for a group of html tags.

Instance Method Summary collapse

#complete_word?(node) ⇒ Boolean

Helper to determine if a node contents a whole word useful to convert for example a letter italic inside a word.
#content_of(node) ⇒ Object

Get the result of parsing the contents of a node.
#initialize ⇒ Grammar constructor

:nodoc:.
#method_missing(tag, node, *args) ⇒ Object

:nodoc:.
#process(nodes) ⇒ Object

Process a DOM node, converting it to your markup language according to your defined rules.
#process!(node) ⇒ Object

:nodoc:.
#surrounded_by_whitespace?(node) ⇒ Boolean

Helper method that tells you if the given DOM node is immediately surrounded by whitespace.

Constructor Details

#initialize ⇒ `Grammar`

:nodoc:

# File 'lib/undress/grammar.rb', line 79

def initialize #:nodoc:
  @pre_processing_rules = self.class.pre_processing_rules.dup
  @post_processing_rules = self.class.post_processing_rules.dup
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(tag, node, *args) ⇒ `Object`

:nodoc:



142
143
144

# File 'lib/undress/grammar.rb', line 142

def method_missing(tag, node, *args) #:nodoc:
  process(node.children)
end

Instance Attribute Details

#post_processing_rules ⇒ `Object` (readonly)

:nodoc:



77
78
79

# File 'lib/undress/grammar.rb', line 77

def post_processing_rules
  @post_processing_rules
end

#pre_processing_rules ⇒ `Object` (readonly)

:nodoc:



76
77
78

# File 'lib/undress/grammar.rb', line 76

def pre_processing_rules
  @pre_processing_rules
end

Class Method Details

.default(&handler) ⇒ `Object`

Set a default rule for unrecognized tags.

Unless you define a special case, it will ignore the tags and just output the contents of unrecognized tags.

# File 'lib/undress/grammar.rb', line 30

def self.default(&handler) # :yields: element
  define_method :method_missing do |tag, node, *args|
    handler.call(node)
  end
end

.inherited(base) ⇒ `Object`

:nodoc:

# File 'lib/undress/grammar.rb', line 5

def self.inherited(base) # :nodoc:
  base.instance_variable_set(:@post_processing_rules, post_processing_rules)
  base.instance_variable_set(:@pre_processing_rules, pre_processing_rules)
end

.post_processing(regexp, replacement = nil, &handler) ⇒ `Object`

Add a post-processing rule to your parser.

This takes a regular expression that will be applied to the output after processing any nodes. It can take a string as a replacement, or a block that will be passed to String#gsub.

post_processing(/\n\n+/, "\n\n") # compress more than two newlines
post_processing(/whatever/) { ... }



44
45
46

# File 'lib/undress/grammar.rb', line 44

def self.post_processing(regexp, replacement = nil, &handler) #:yields: matched_string
  post_processing_rules[regexp] = replacement || handler
end

.post_processing_rules ⇒ `Object`

:nodoc:



64
65
66

# File 'lib/undress/grammar.rb', line 64

def self.post_processing_rules #:nodoc:
  @post_processing_rules ||= {}
end

.pre_processing(selector, &handler) ⇒ `Object`

Add a pre-processing rule to your parser.

This lets you mutate the DOM before applying any rule defined with rule_for. You need to pass a CSS/XPath selector, and a block that takes an Hpricot element to parse it.

pre_processing "ul.toc" do |element|
  element.swap("<p>[[toc]]</p>")
end

Would replace any unordered lists with the class toc for a paragraph containing the code [[toc]].



60
61
62

# File 'lib/undress/grammar.rb', line 60

def self.pre_processing(selector, &handler) # :yields: element
  pre_processing_rules[selector] = handler
end

.pre_processing_rules ⇒ `Object`

:nodoc:



68
69
70

# File 'lib/undress/grammar.rb', line 68

def self.pre_processing_rules #:nodoc:
  @pre_processing_rules ||= {}
end

.process!(node) ⇒ `Object`

:nodoc:



72
73
74

# File 'lib/undress/grammar.rb', line 72

def self.process!(node) #:nodoc:
  new.process!(node)
end

.rule_for(*tags, &handler) ⇒ `Object`

Add a parsing rule for a group of html tags.

rule_for :p do |element|
  "<this was a paragraph>#{content_of(element)}</this was a paragraph>"
end

will replace your <p> tags for <this was a paragraph> tags, without altering the contents.

The element yielded to the block is an Hpricot element for the given tag.

# File 'lib/undress/grammar.rb', line 20

def self.rule_for(*tags, &handler) # :yields: element
  tags.each do |tag|
    define_method tag.to_sym, &handler
  end
end

Instance Method Details

#complete_word?(node) ⇒ `Boolean`

Helper to determine if a node contents a whole word useful to convert for example a letter italic inside a word

Returns:

(Boolean)

# File 'lib/undress/grammar.rb', line 125

def complete_word?(node)
  return true if ! node.previous_node || ! node.next_node
  
  p, n = node.previous_node, node.next_node

  if p.respond_to?(:content)
    return false if p.content       !~ /\s$/
  elsif p.respond_to?(:inner_html)
    return false if p.inner_html    !~ /\s$/
  elsif n.respond_to?(:content)
    return false if n.content       !~ /^\s/
  elsif n.respond_to?(:inner_html)
    return false if n.content       !~ /^\s/
  end
  true
end

#content_of(node) ⇒ `Object`

Get the result of parsing the contents of a node.



112
113
114

# File 'lib/undress/grammar.rb', line 112

def content_of(node)
  process(node.respond_to?(:children) ? node.children : node)
end

#process(nodes) ⇒ `Object`

Process a DOM node, converting it to your markup language according to your defined rules. If the node is a Text node, it will return it’s string representation. Otherwise it will call the rule defined for it.

# File 'lib/undress/grammar.rb', line 87

def process(nodes)
  Array(nodes).map do |node|
    if node.text?
      node.to_html
    elsif node.elem? 
      send node.name.to_sym, node if ! defined?(ALLOWED_TAGS) || ALLOWED_TAGS.empty? || ALLOWED_TAGS.include?(node.name)
    else
      ""
    end
  end.join("")
end

#process!(node) ⇒ `Object`

:nodoc:

# File 'lib/undress/grammar.rb', line 99

def process!(node) #:nodoc:
  pre_processing_rules.each do |selector, handler|
    node.search(selector).each(&handler)
  end

  process(node.children).tap do |text|
    post_processing_rules.each do |rule, handler|
      handler.is_a?(String) ?  text.gsub!(rule, handler) : text.gsub!(rule, &handler)
    end
  end
end

#surrounded_by_whitespace?(node) ⇒ `Boolean`

Helper method that tells you if the given DOM node is immediately surrounded by whitespace.

Returns:

(Boolean)

# File 'lib/undress/grammar.rb', line 118

def surrounded_by_whitespace?(node)
  (node.previous && node.previous.text? && node.previous.to_s =~ /\s+$/) ||
    (node.next && node.next.text? && node.next.to_s =~ /^\s+/)
end

Class: Undress::Grammar

Overview

Direct Known Subclasses

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ Grammar

Dynamic Method Handling

#method_missing(tag, node, *args) ⇒ Object

Instance Attribute Details

#post_processing_rules ⇒ Object (readonly)

#pre_processing_rules ⇒ Object (readonly)

Class Method Details

.default(&handler) ⇒ Object

.inherited(base) ⇒ Object

.post_processing(regexp, replacement = nil, &handler) ⇒ Object

.post_processing_rules ⇒ Object

.pre_processing(selector, &handler) ⇒ Object

.pre_processing_rules ⇒ Object

.process!(node) ⇒ Object

.rule_for(*tags, &handler) ⇒ Object

Instance Method Details

#complete_word?(node) ⇒ Boolean

#content_of(node) ⇒ Object

#process(nodes) ⇒ Object

#process!(node) ⇒ Object

#surrounded_by_whitespace?(node) ⇒ Boolean

#initialize ⇒ `Grammar`

#method_missing(tag, node, *args) ⇒ `Object`

#post_processing_rules ⇒ `Object` (readonly)

#pre_processing_rules ⇒ `Object` (readonly)

.default(&handler) ⇒ `Object`

.inherited(base) ⇒ `Object`

.post_processing(regexp, replacement = nil, &handler) ⇒ `Object`

.post_processing_rules ⇒ `Object`

.pre_processing(selector, &handler) ⇒ `Object`

.pre_processing_rules ⇒ `Object`

.process!(node) ⇒ `Object`

.rule_for(*tags, &handler) ⇒ `Object`

#complete_word?(node) ⇒ `Boolean`

#content_of(node) ⇒ `Object`

#process(nodes) ⇒ `Object`

#process!(node) ⇒ `Object`

#surrounded_by_whitespace?(node) ⇒ `Boolean`