Class: Scrubyt::Constraint
- Inherits:
-
Object
- Object
- Scrubyt::Constraint
- Defined in:
- lib/scrubyt/core/scraping/constraint.rb
Overview
Rejecting result instances based on further rules
The two most trivial problems with a set of rules is that they match either less or more instances than we would like them to. Constraints are a way to remedy the second problem: they serve as a tool to filter out some result instances based on rules. A typical example:
-
ensure_presence_of_ancestor_pattern consider this model:
<book> <author>...</author> <title>...</title> </book>
If I attach the ensure_presence_of_ancestor_pattern to the pattern ‘book’ with values ‘author’ and ‘title’, only those books will be matched which have an author and a title (i.e.the child patterns author and title must extract something). This is a way to say ‘a book MUST have an author and a title’.
Constant Summary collapse
- CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_PATTERN =
Different constraint types
0
- CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_ATTRIBUTE =
1
- CONSTRAINT_TYPE_ENSURE_ABSENCE_OF_ATTRIBUTE =
2
- CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_ANCESTOR_NODE =
3
- CONSTRAINT_TYPE_ENSURE_ABSENCE_OF_ANCESTOR_NODE =
4
Instance Attribute Summary collapse
-
#target ⇒ Object
readonly
Returns the value of attribute target.
-
#type ⇒ Object
readonly
Returns the value of attribute type.
Class Method Summary collapse
-
.add_ensure_absence_of_ancestor_node(node_name, attributes) ⇒ Object
If this type of constraint is added to a pattern, the HTML node extracted by the pattern must NOT contain a HTML ancestor node called ‘node_name’ with the attribute set ‘attributes’.
-
.add_ensure_absence_of_attribute(attribute_hash) ⇒ Object
If this type of constraint is added to a pattern, the HTML node it targets must NOT have an attribute named “attribute_name” with the value “attribute_value”.
-
.add_ensure_presence_of_ancestor_node(node_name, attributes) ⇒ Object
If this type of constraint is added to a pattern, the HTML node extracted by the pattern must NOT contain a HTML ancestor node called ‘node_name’ with the attribute set ‘attributes’.
-
.add_ensure_presence_of_attribute(attribute_hash) ⇒ Object
If this type of constraint is added to a pattern, the HTML node it targets must have an attribute named “attribute_name” with the value “attribute_value”.
-
.add_ensure_presence_of_pattern(ancestor) ⇒ Object
If this type of constraint is added to a pattern, it must have an ancestor pattern (child pattern, or child pattern of a child pattern, etc.) denoted by “ancestor” ‘Has an ancestor pattern’ means that the ancestor pattern actually extracts something (just by looking at the wrapper model, the ancestor pattern is always present) Note that from this type of constraint there is no ‘ensure_absence’ version, since I could not think about an use case for that.
Instance Method Summary collapse
-
#check(result) ⇒ Object
Evaluate the constraint; if this function returns true, it means that the constraint passed, i.e.
Instance Attribute Details
#target ⇒ Object (readonly)
Returns the value of attribute target.
46 47 48 |
# File 'lib/scrubyt/core/scraping/constraint.rb', line 46 def target @target end |
#type ⇒ Object (readonly)
Returns the value of attribute type.
46 47 48 |
# File 'lib/scrubyt/core/scraping/constraint.rb', line 46 def type @type end |
Class Method Details
.add_ensure_absence_of_ancestor_node(node_name, attributes) ⇒ Object
If this type of constraint is added to a pattern, the HTML node extracted by the pattern must NOT contain a HTML ancestor node called ‘node_name’ with the attribute set ‘attributes’.
“attributes” is an array of hashes, for example
- => ‘red’, => ‘www.google.com’
-
in the case that more values have to be checked with the same key (e.g. ‘class’ => ‘small’ and ‘ class’ => ‘wide’ it has to be written as [=> [‘small’,‘wide’]]
“attributes” can be empty - in this case just the ‘node_name’ is checked
89 90 91 92 |
# File 'lib/scrubyt/core/scraping/constraint.rb', line 89 def self.add_ensure_absence_of_ancestor_node(node_name, attributes) Constraint.new([node_name, attributes], CONSTRAINT_TYPE_ENSURE_ABSENCE_OF_ANCESTOR_NODE) end |
.add_ensure_absence_of_attribute(attribute_hash) ⇒ Object
If this type of constraint is added to a pattern, the HTML node it targets must NOT have an attribute named “attribute_name” with the value “attribute_value”
64 65 66 67 |
# File 'lib/scrubyt/core/scraping/constraint.rb', line 64 def self.add_ensure_absence_of_attribute(attribute_hash) Constraint.new(attribute_hash, CONSTRAINT_TYPE_ENSURE_ABSENCE_OF_ATTRIBUTE) end |
.add_ensure_presence_of_ancestor_node(node_name, attributes) ⇒ Object
If this type of constraint is added to a pattern, the HTML node extracted by the pattern must NOT contain a HTML ancestor node called ‘node_name’ with the attribute set ‘attributes’.
“attributes” is an array of hashes, for example
- => ‘red’, => ‘www.google.com’
-
in the case that more values have to be checked with the same key (e.g. ‘class’ => ‘small’ and ‘ class’ => ‘wide’ it has to be written as [=> [‘small’,‘wide’]]
“attributes” can be empty - in this case just the ‘node_name’ is checked
105 106 107 108 |
# File 'lib/scrubyt/core/scraping/constraint.rb', line 105 def self.add_ensure_presence_of_ancestor_node(node_name, attributes) Constraint.new([node_name, attributes], CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_ANCESTOR_NODE) end |
.add_ensure_presence_of_attribute(attribute_hash) ⇒ Object
If this type of constraint is added to a pattern, the HTML node it targets must have an attribute named “attribute_name” with the value “attribute_value”
73 74 75 76 |
# File 'lib/scrubyt/core/scraping/constraint.rb', line 73 def self.add_ensure_presence_of_attribute(attribute_hash) Constraint.new(attribute_hash, CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_ATTRIBUTE) end |
.add_ensure_presence_of_pattern(ancestor) ⇒ Object
If this type of constraint is added to a pattern, it must have an ancestor pattern (child pattern, or child pattern of a child pattern, etc.) denoted by “ancestor” ‘Has an ancestor pattern’ means that the ancestor pattern actually extracts something (just by looking at the wrapper model, the ancestor pattern is always present) Note that from this type of constraint there is no ‘ensure_absence’ version, since I could not think about an use case for that
56 57 58 |
# File 'lib/scrubyt/core/scraping/constraint.rb', line 56 def self.add_ensure_presence_of_pattern(ancestor) Constraint.new(ancestor, CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_PATTERN) end |
Instance Method Details
#check(result) ⇒ Object
Evaluate the constraint; if this function returns true, it means that the constraint passed, i.e. its filter will be added to the exctracted content of the pattern
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
# File 'lib/scrubyt/core/scraping/constraint.rb', line 113 def check(result) case @type #checked after evaluation, so here always return true when CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_PATTERN return true when CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_ATTRIBUTE attribute_present(result) when CONSTRAINT_TYPE_ENSURE_ABSENCE_OF_ATTRIBUTE !attribute_present(result) when CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_ANCESTOR_NODE ancestor_node_present(result) when CONSTRAINT_TYPE_ENSURE_ABSENCE_OF_ANCESTOR_NODE !ancestor_node_present(result) end end |