Class: Rika::Parser
- Inherits:
-
Object
- Object
- Rika::Parser
- Defined in:
- lib/rika/parser.rb
Overview
Parses a document and returns a ParseResult. This class is intended to be used only by the Rika module, not by users of the gem, who should instead call Rika.parse.
Instance Method Summary collapse
-
#initialize(data_source, key_sort: true, max_content_length: -1,, detector: DefaultDetector.new) ⇒ Parser
constructor
A new instance of Parser.
-
#parse ⇒ ParseResult
Entry point method for parsing a document.
Constructor Details
#initialize(data_source, key_sort: true, max_content_length: -1,, detector: DefaultDetector.new) ⇒ Parser
Returns a new instance of Parser.
15 16 17 18 19 20 21 22 |
# File 'lib/rika/parser.rb', line 15 def initialize(data_source, key_sort: true, max_content_length: -1, detector: DefaultDetector.new) @data_source = data_source @key_sort = key_sort @max_content_length = max_content_length @detector = detector @input_type = data_source_input_type @tika = Tika.new(@detector) end |
Instance Method Details
#parse ⇒ ParseResult
Entry point method for parsing a document
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/rika/parser.rb', line 26 def parse = Metadata.new @tika.set_max_string_length(@max_content_length) content = with_input_stream { |stream| @tika.parse_to_string(stream, ) } language = Rika.language(content) .set('rika:language', language) .set('rika:data-source', @data_source) = () = .sort_by { |key, _value| key.downcase }.to_h if @key_sort ParseResult.new( content: content, metadata: , metadata_java: , content_type: ['Content-Type'], language: language, input_type: @input_type, data_source: @data_source, max_content_length: @max_content_length ) end |