Class: ChupaText::Extractor
- Inherits:
-
Object
- Object
- ChupaText::Extractor
- Includes:
- Loggable
- Defined in:
- lib/chupa-text/extractor.rb
Instance Method Summary collapse
- #add_decomposer(decomposer) ⇒ Object
-
#apply_configuration(configuration) ⇒ void
Sets the extractor up by the configuration.
-
#extract(input) {|text_data| ... } ⇒ void
Extracts texts from input.
-
#initialize ⇒ Extractor
constructor
A new instance of Extractor.
Constructor Details
#initialize ⇒ Extractor
Returns a new instance of Extractor.
24 25 26 |
# File 'lib/chupa-text/extractor.rb', line 24 def initialize @decomposers = [] end |
Instance Method Details
#add_decomposer(decomposer) ⇒ Object
43 44 45 |
# File 'lib/chupa-text/extractor.rb', line 43 def add_decomposer(decomposer) @decomposers << decomposer end |
#apply_configuration(configuration) ⇒ void
This method returns an undefined value.
Sets the extractor up by the configuration. It adds decomposers enabled in the configuration.
35 36 37 38 39 40 41 |
# File 'lib/chupa-text/extractor.rb', line 35 def apply_configuration(configuration) decomposers = Decomposers.create(Decomposer.registry, configuration.decomposer) decomposers.each do |decomposer| add_decomposer(decomposer) end end |
#extract(input) {|text_data| ... } ⇒ void
This method returns an undefined value.
Extracts texts from input. Each extracted text is passes to the given block.
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
# File 'lib/chupa-text/extractor.rb', line 60 def extract(input) targets = [ensure_data(input)] until targets.empty? target = targets.shift debug do "#{log_tag}[extract][target] <#{target.uri}>:<#{target.mime_type}>" end decomposer = find_decomposer(target) if decomposer.nil? if target.text_plain? debug {"#{log_tag}[extract][text-plain]"} yield(target) next else debug {"#{log_tag}[extract][decomposer] not found"} yield(target) if target.text? next end end debug {"#{log_tag}[extract][decomposer] #{decomposer.class}"} decomposer.decompose(target) do |decomposed| debug do "#{log_tag}[extract][decomposed] " + "#{decomposer.class}: " + "<#{target.uri}>: " + "<#{target.mime_type}> -> <#{decomposed.mime_type}>" end targets.push(decomposed) end end end |