Class: MicroformatParser::Extractor

Inherits:
Object
  • Object
show all
Defined in:
lib/uformatparser.rb

Overview

Implements an extractor using a simple expression format.

For more information see MicroformatParser.extractor.

Constant Summary collapse

REGEX =

Parse each extractor into three parts: $1 function name (excluding parentheses) $2 element name $3 attribute name (including leading @) If a match is found the result is either $1, or $2 and/or $3

/^(\w+)\(\)|([A-Za-z][A-Za-z0-9_\-:]*)?(@[A-Za-z][A-Za-z0-9_\-:]*)?$/

Instance Method Summary collapse

Constructor Details

#initialize(context, statement) ⇒ Extractor

:startdoc:



612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
# File 'lib/uformatparser.rb', line 612

def initialize(context, statement)
    statement.strip!
    @extracts = []
    # Break the statement into multiple extraction rules, separated by |.
    statement.split('|').each do |extract|
        parts = REGEX.match(extract)
        if parts[1] then
            # Function. Find a method in the context object (the rule class),
            # report an error is not found.
            begin
                @extracts << context.method(parts[1]) # context.
            rescue NameError=>error
                raise InvalidExtractorException, error.message, error.backtrace
            end
        elsif parts[2] and parts[3]
            # Apply only if element of this type, and extract the named attribute.
            attr_name = parts[3][1..-1]
            @extracts << proc { |node| node.attributes[attr_name] if node.name == parts[2] }
        elsif parts[2]
            # Apply only if element of this type, and extract the text value.
            @extracts << proc { |node| text(node) if node.name == parts[2] }
        elsif parts[3]
            # Extract the named attribute.
            attr_name = parts[3][1..-1]
            @extracts << proc { |node| node.attributes[attr_name] }
        else
            raise InvalidExtractorException, "Invalid extraction statement"
        end
    end
    raise InvalidExtractorException, "Invalid (empty) extraction statement" if @extracts.size == 0
end

Instance Method Details

#extract(node) ⇒ Object

Extracts a value from the node based on the extractor expression.



646
647
648
649
650
651
652
653
654
# File 'lib/uformatparser.rb', line 646

def extract(node)
    # Iterate over all extraction rules, returning the first value.
    value = nil
    @extracts.each do |extract|
        value = extract.call(node)
        break if value
    end
    value
end

#inspectObject



656
657
658
# File 'lib/uformatparser.rb', line 656

def inspect
    @extracts.join('|')
end