Class: Mechanize::PluggableParser

Inherits:

Object

Object
Mechanize::PluggableParser

Defined in:: lib/mechanize/pluggable_parsers.rb

Overview

Mechanize allows different parsers for different content types. Mechanize uses PluggableParser to determine which parser to use for any content type. To use your own parser or to change the default parsers, register them with this class through Mechanize#pluggable_parser.

The default parser for unregistered content types is Mechanize::File.

The module Mechanize::Parser provides basic functionality for any content type, so you may use it in custom parsers you write. For small files you wish to perform in-memory operations on, you should subclass Mechanize::File. For large files you should subclass Mechanize::Download as the content is only loaded into memory in small chunks.

When writing your own pluggable parser, be sure to provide a method #body that returns a String containing the response body for compatibility with Mechanize#get_file.

Example

To create your own parser, just create a class that takes four parameters in the constructor. Here is an example of registering a parser that handles CSV files:

require 'csv'

class CSVParser < Mechanize::File
  attr_reader :csv

  def initialize uri = nil, response = nil, body = nil, code = nil
    super uri, response, body, code
    @csv = CSV.parse body
  end
end

agent = Mechanize.new
agent.pluggable_parser.csv = CSVParser
agent.get('http://example.com/test.csv')  # => CSVParser

Now any response with a content type of ‘text/csv’ will initialize a CSVParser and return that object to the caller.

To register a parser for a content type that Mechanize does not know about, use the hash syntax:

agent.pluggable_parser['text/something'] = SomeClass

To set the default parser, use #default:

agent.pluggable_parser.default = Mechanize::Download

Now all unknown content types will be saved to disk and not loaded into memory.

Constant Summary collapse

CONTENT_TYPES =

{
  :html  => 'text/html',
  :wap   => 'application/vnd.wap.xhtml+xml',
  :xhtml => 'application/xhtml+xml',
  :pdf   => 'application/pdf',
  :csv   => 'text/csv',
  :xml   => ['text/xml', 'application/xml'],
}

InvalidContentTypeError =

if defined?(MIME::Type::InvalidContentType)
  # For mime-types >=2.1
  MIME::Type::InvalidContentType
else
  # For mime-types <2.1
  MIME::InvalidContentType
end

Instance Attribute Summary collapse

#default ⇒ Object

Returns the value of attribute default.

Instance Method Summary collapse

#[](content_type) ⇒ Object

Retrieves the parser for content_type content.
#[]=(content_type, klass) ⇒ Object

Sets the parser for content_type content to klass.
#csv=(klass) ⇒ Object

Registers klass as the parser for text/csv content.
#html=(klass) ⇒ Object

Registers klass as the parser for text/html and application/xhtml+xml content.
#initialize ⇒ PluggableParser constructor

A new instance of PluggableParser.
#parser(content_type) ⇒ Object

Returns the parser registered for the given content_type.
#pdf=(klass) ⇒ Object

Registers klass as the parser for application/pdf content.
#register_parser(content_type, klass) ⇒ Object

:nodoc:.
#xhtml=(klass) ⇒ Object

Registers klass as the parser for application/xhtml+xml content.
#xml=(klass) ⇒ Object

Registers klass as the parser for text/xml content.

Constructor Details

#initialize ⇒ `PluggableParser`

Returns a new instance of PluggableParser.

# File 'lib/mechanize/pluggable_parsers.rb', line 83

def initialize
  @parsers = {
    CONTENT_TYPES[:html]  => Mechanize::Page,
    CONTENT_TYPES[:xhtml] => Mechanize::Page,
    CONTENT_TYPES[:wap]   => Mechanize::Page,
    'image'               => Mechanize::Image,
    'text/xml'            => Mechanize::XmlFile,
    'application/xml'     => Mechanize::XmlFile,
  }

  @default = Mechanize::File
end

Instance Attribute Details

#default ⇒ `Object`

Returns the value of attribute default.



81
82
83

# File 'lib/mechanize/pluggable_parsers.rb', line 81

def default
  @default
end

Instance Method Details

#[](content_type) ⇒ `Object`

Retrieves the parser for content_type content



164
165
166

# File 'lib/mechanize/pluggable_parsers.rb', line 164

def [](content_type)
  @parsers[content_type]
end

#[]=(content_type, klass) ⇒ `Object`

Sets the parser for content_type content to klass

The content_type may either be a full MIME type a simplified MIME type (‘text/x-csv’ simplifies to ‘text/csv’) or a media type like ‘image’.



174
175
176

# File 'lib/mechanize/pluggable_parsers.rb', line 174

def []= content_type, klass
  register_parser content_type, klass
end

#csv=(klass) ⇒ `Object`

Registers klass as the parser for text/csv content



148
149
150

# File 'lib/mechanize/pluggable_parsers.rb', line 148

def csv=(klass)
  register_parser(CONTENT_TYPES[:csv], klass)
end

#html=(klass) ⇒ `Object`

Registers klass as the parser for text/html and application/xhtml+xml content

# File 'lib/mechanize/pluggable_parsers.rb', line 126

def html=(klass)
  register_parser(CONTENT_TYPES[:html], klass)
  register_parser(CONTENT_TYPES[:xhtml], klass)
end

#parser(content_type) ⇒ `Object`

Returns the parser registered for the given content_type

# File 'lib/mechanize/pluggable_parsers.rb', line 99

def parser content_type
  return default unless content_type

  parser = @parsers[content_type]

  return parser if parser

  mime_type = MIME::Type.new content_type

  parser = @parsers[mime_type.to_s] ||
           @parsers[mime_type.simplified] ||
           # Starting from mime-types 3.0 x-prefix is deprecated as per IANA
           (@parsers[MIME::Type.simplified(mime_type.to_s, remove_x_prefix: true)] rescue nil) ||
           @parsers[mime_type.media_type] ||
           default
rescue InvalidContentTypeError
  default
end

#pdf=(klass) ⇒ `Object`

Registers klass as the parser for application/pdf content



141
142
143

# File 'lib/mechanize/pluggable_parsers.rb', line 141

def pdf=(klass)
  register_parser(CONTENT_TYPES[:pdf], klass)
end

#register_parser(content_type, klass) ⇒ `Object`

:nodoc:



118
119
120

# File 'lib/mechanize/pluggable_parsers.rb', line 118

def register_parser content_type, klass # :nodoc:
  @parsers[content_type] = klass
end

#xhtml=(klass) ⇒ `Object`

Registers klass as the parser for application/xhtml+xml content



134
135
136

# File 'lib/mechanize/pluggable_parsers.rb', line 134

def xhtml=(klass)
  register_parser(CONTENT_TYPES[:xhtml], klass)
end

#xml=(klass) ⇒ `Object`

Registers klass as the parser for text/xml content

# File 'lib/mechanize/pluggable_parsers.rb', line 155

def xml=(klass)
  CONTENT_TYPES[:xml].each do |content_type|
    register_parser content_type, klass
  end
end

Class: Mechanize::PluggableParser

Overview

Example

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ PluggableParser

Instance Attribute Details

#default ⇒ Object

Instance Method Details

#[](content_type) ⇒ Object

#[]=(content_type, klass) ⇒ Object

#csv=(klass) ⇒ Object

#html=(klass) ⇒ Object

#parser(content_type) ⇒ Object

#pdf=(klass) ⇒ Object

#register_parser(content_type, klass) ⇒ Object

#xhtml=(klass) ⇒ Object

#xml=(klass) ⇒ Object