Class: BioInterchange::TextMining::PDFxXMLReader

Inherits:

TMReader

Object
Reader
TMReader
BioInterchange::TextMining::PDFxXMLReader

show all

Defined in:: lib/biointerchange/textmining/pdfx_xml_reader.rb

Defined Under Namespace

Classes: MyListener

Instance Method Summary collapse

#deserialize(inputstream) ⇒ Object

Reads input stream and returns associated BioInterchange::TextMining::Document model.

Methods inherited from TMReader

#initialize, #postponed?

Methods inherited from Reader

#initialize

Constructor Details

This class inherits a constructor from BioInterchange::TextMining::TMReader

Instance Method Details

#deserialize(inputstream) ⇒ `Object`

Reads input stream and returns associated BioInterchange::TextMining::Document model

Presently I assume a single document per xml file, and that <section> tags cannot nest. I also assume that a Content::DOCUMENT type is everything between the <article> tags.

inputstream: Input IO stream to deserialize

Raises:

(BioInterchange::Exceptions::ImplementationReaderError)

# File 'lib/biointerchange/textmining/pdfx_xml_reader.rb', line 37

def deserialize(inputstream)
  raise BioInterchange::Exceptions::ImplementationReaderError, 'InputStream not of type IO, cannot read.' unless inputstream.kind_of?(IO) or inputstream.kind_of?(String)
  
  @input = inputstream
  
  pdfx
end