Class: Scrubyt::PostProcessor
- Inherits:
-
Object
- Object
- Scrubyt::PostProcessor
- Defined in:
- lib/scrubyt/output/post_processor.rb
Overview
The sole purpose of this class is to execute these post-processing tasks.
Class Method Summary collapse
-
.apply_post_processing(root_pattern) ⇒ Object
This is just a convenience method do call all the postprocessing functionality and checks.
-
.ensure_presence_of_pattern_full(pattern) ⇒ Object
Apply the ensure_presence_of_pattern constraint on the full extractor.
-
.remove_multiple_filter_duplicates(pattern) ⇒ Object
Remove unneeded results of a pattern (caused by evaluating multiple filters) See for example the B&N scenario - the book titles are extracted two times for every pattern (since both examples generate the same XPath for them) but since always only one of the results has a price, the other is discarded.
-
.report_if_no_results(root_pattern) ⇒ Object
Issue an error report if the document did not extract anything.
Class Method Details
.apply_post_processing(root_pattern) ⇒ Object
This is just a convenience method do call all the postprocessing functionality and checks
18 19 20 21 22 |
# File 'lib/scrubyt/output/post_processor.rb', line 18 def self.apply_post_processing(root_pattern) ensure_presence_of_pattern_full(root_pattern) remove_multiple_filter_duplicates(root_pattern) if root_pattern.children[0].filters.size > 1 report_if_no_results(root_pattern) if root_pattern.evaluation_context.extractor.get_mode != :production end |
.ensure_presence_of_pattern_full(pattern) ⇒ Object
Apply the ensure_presence_of_pattern constraint on the full extractor
27 28 29 30 |
# File 'lib/scrubyt/output/post_processor.rb', line 27 def self.ensure_presence_of_pattern_full(pattern) ensure_presence_of_pattern(pattern) pattern.children.each {|child| ensure_presence_of_pattern_full(child)} end |
.remove_multiple_filter_duplicates(pattern) ⇒ Object
Remove unneeded results of a pattern (caused by evaluating multiple filters) See for example the B&N scenario - the book titles are extracted two times for every pattern (since both examples generate the same XPath for them) but since always only one of the results has a price, the other is discarded
37 38 39 40 |
# File 'lib/scrubyt/output/post_processor.rb', line 37 def self.remove_multiple_filter_duplicates(pattern) remove_multiple_filter_duplicates_intern(pattern) if pattern.parent_of_leaf pattern.children.each {|child| remove_multiple_filter_duplicates(child)} end |
.report_if_no_results(root_pattern) ⇒ Object
Issue an error report if the document did not extract anything. Probably this is because the structure of the page changed or because of some rather nasty bug - in any case, something wrong is going on, and we need to inform the user about this!
47 48 49 50 51 52 53 54 55 |
# File 'lib/scrubyt/output/post_processor.rb', line 47 def self.report_if_no_results(root_pattern) results_found = false root_pattern.children.each {|child| return if (child.result.childmap.size > 0)} Scrubyt.log :WARNING, [ "The extractor did not find any result instances. Most probably this is wrong.", "Check your extractor and if you are sure it should work, report a bug!" ] end |