Class: TaliaCore::DataTypes::XmlData

Inherits:

FileRecord

Object
ActiveRecord::Base
DataRecord
FileRecord
TaliaCore::DataTypes::XmlData

show all

Defined in:: lib/talia_core/data_types/xml_data.rb

Overview

FileRecord class to store XML (or XHTML) files.

Instance Attribute Summary

Attributes inherited from DataRecord

#temp_path

Instance Method Summary collapse

#create_from_data(location, data, options = {:tidy => true}) ⇒ Object

See the FileStore module for details on how creation of data file objects works.
#extract_mime_type(location) ⇒ Object

MIME type should be one of ‘text/html’ or ‘text/xml’ (‘text/hnml’ is supported for legacy reasons).
#get_content(options = {}) ⇒ Object

The content of this document.
#get_content_string(options = nil) ⇒ Object

Same as #get_content, but returns a string instead of the REXML documents.
#get_escaped_content_string(options = nil) ⇒ Object

Same as #get_content_string, but with the XML escape for inclusion in HTML documents.
#mime_subtype ⇒ Object

The mime subtype for this specified class.

Instance Method Details

#create_from_data(location, data, options = {:tidy => true}) ⇒ `Object`

See the FileStore module for details on how creation of data file objects works. This version differs from the superclass version in that it will (optionally) clean the HTML using the “tidy” tool. Also see tidy.rubyforge.org/

Tidy will be used under the following circumstances:

The “tidy” option is given and
The library itself is available and
The file appears to be a (X)HTML file

Options::

tidy: Use the “tidy” tool to clean up (X)HTML. Defaults to true if no options are given.

# File 'lib/talia_core/data_types/xml_data.rb', line 120

def create_from_data(location, data, options = {:tidy => true})
  # check tidy option
  if (((options[:tidy] == true) and (Tidy_enable == true)) and 
        ((File.extname(location) == '.htm') or (File.extname(location) == '.html') or (File.extname(location) == '.xhtml')))        
  
    # apply tidy on data
    data_to_write = Tidy.open(:show_warnings => false) do |tidy|
      tidy.options.output_xhtml = true
      tidy.options.tidy_mark = false
      xhtml = tidy.clean(data)
      xhtml
    end
  else
    data_to_write = data
  end

  # write data
  super(location, data_to_write, options)
end

#extract_mime_type(location) ⇒ `Object`

MIME type should be one of ‘text/html’ or ‘text/xml’ (‘text/hnml’ is supported for legacy reasons)

# File 'lib/talia_core/data_types/xml_data.rb', line 37

def extract_mime_type(location)
  # TODO: Could probably use the Mime classes to get the
  # type, or move to the superclass
  case File.extname(location).downcase
  when '.htm', '.html','.xhtml'
    'text/html'
  when '.hnml'
    'text/hnml'
  when '.xml'
    'text/xml'
  end
end

#get_content(options = {}) ⇒ `Object`

The content of this document. This returns REXML elements for the document content. For plain XML files, this will return the children of the doucment root. For XHTML documents, this will return the children of the “body” tag.

Options:

xsl_file

If given, the document will be transformed using this XSL file before the document is extracted

# File 'lib/talia_core/data_types/xml_data.rb', line 64

def get_content(options = {})
  # TODO: Maybe port this to hpricot/nokogiri too
  text_to_parse = all_text

  # if xsl_file option is specified, execute transformation
  if (options[:xsl_file])
    text_to_parse = xslt_transform(file_path, options[:xsl_file])
  end

  # create document object
  document = REXML::Document.new text_to_parse

  # get content
  if ((mime_subtype == "html") or 
        ((mime_subtype == "xml") and (!options.nil?) and (!options[:xsl_file].nil?)))
    content = document.elements['//body'].elements
  elsif ((mime_subtype == "xml") or (mime_subtype == "hnml"))
    content = document.root.elements
  end

  # adjust/replace items path
  content.each { |i| wrapItem i }

  # return content
  return content
end

#get_content_string(options = nil) ⇒ `Object`

Same as #get_content, but returns a string instead of the REXML documents

# File 'lib/talia_core/data_types/xml_data.rb', line 92

def get_content_string(options = nil)
  xml_str = ''
  get_content(options).each do |element|
    xml_str << element.to_s
  end
  xml_str
end

#get_escaped_content_string(options = nil) ⇒ `Object`

Same as #get_content_string, but with the XML escape for inclusion in HTML documents



102
103
104

# File 'lib/talia_core/data_types/xml_data.rb', line 102

def get_escaped_content_string(options = nil)
  get_content_string(options).gsub(/</, "&lt;").gsub(/>/, "&gt;")
end

#mime_subtype ⇒ `Object`

The mime subtype for this specified class



51
52
53

# File 'lib/talia_core/data_types/xml_data.rb', line 51

def mime_subtype
  mime_type.split(/\//)[1]
end

Class: TaliaCore::DataTypes::XmlData

Overview

Instance Attribute Summary

Attributes inherited from DataRecord

Instance Method Summary collapse

Methods inherited from FileRecord

Methods included from PathHelpers::ClassMethods

Methods included from DataLoader::ClassMethods

Methods included from IipLoader

Methods included from TaliaUtil::IoHelper

Methods included from PathHelpers

Methods included from FileStore

Methods inherited from DataRecord

Instance Method Details

#create_from_data(location, data, options = {:tidy => true}) ⇒ Object

#extract_mime_type(location) ⇒ Object

#get_content(options = {}) ⇒ Object

#get_content_string(options = nil) ⇒ Object