Class: TaliaCore::DataTypes::XmlData

Inherits:
FileRecord show all
Defined in:
lib/talia_core/data_types/xml_data.rb

Overview

FileRecord class to store XML (or XHTML) files.

Instance Attribute Summary

Attributes inherited from DataRecord

#temp_path

Instance Method Summary collapse

Methods inherited from FileRecord

#all_bytes, #get_byte, #position, #reset, #seek, #size

Methods included from PathHelpers::ClassMethods

#data_path, #tempfile_path

Methods included from DataLoader::ClassMethods

#create_from_url

Methods included from IipLoader

#convert_original?, #create_from_files, #create_from_stream, #create_iip, #open_original_image, #open_original_image_file, #open_original_image_stream, #orig_location, #prepare_image_from_existing!

Methods included from TaliaUtil::IoHelper

#base_for, #file_url, #open_from_url, #open_generic

Methods included from PathHelpers

#data_directory, #data_path, #extract_filename, #file_path, #full_filename, #static_path, #tempfile_path

Methods included from FileStore

#all_text, #assign_type, #create_from_file, #is_file_open?, #write_file_after_save

Methods inherited from DataRecord

#all_bytes, #content_string, find_by_type_and_location!, find_data_records, #get_byte, #mime_type, #position, #reset, #seek, #size

Instance Method Details

#create_from_data(location, data, options = {:tidy => true}) ⇒ Object

See the FileStore module for details on how creation of data file objects works. This version differs from the superclass version in that it will (optionally) clean the HTML using the “tidy” tool. Also see tidy.rubyforge.org/

Tidy will be used under the following circumstances:

  • The “tidy” option is given and

  • The library itself is available and

  • The file appears to be a (X)HTML file

Options::

tidy

Use the “tidy” tool to clean up (X)HTML. Defaults to true if no options are given.



120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
# File 'lib/talia_core/data_types/xml_data.rb', line 120

def create_from_data(location, data, options = {:tidy => true})
  # check tidy option
  if (((options[:tidy] == true) and (Tidy_enable == true)) and 
        ((File.extname(location) == '.htm') or (File.extname(location) == '.html') or (File.extname(location) == '.xhtml')))        
  
    # apply tidy on data
    data_to_write = Tidy.open(:show_warnings => false) do |tidy|
      tidy.options.output_xhtml = true
      tidy.options.tidy_mark = false
      xhtml = tidy.clean(data)
      xhtml
    end
  else
    data_to_write = data
  end

  # write data
  super(location, data_to_write, options)
end

#extract_mime_type(location) ⇒ Object

MIME type should be one of ‘text/html’ or ‘text/xml’ (‘text/hnml’ is supported for legacy reasons)



37
38
39
40
41
42
43
44
45
46
47
48
# File 'lib/talia_core/data_types/xml_data.rb', line 37

def extract_mime_type(location)
  # TODO: Could probably use the Mime classes to get the
  # type, or move to the superclass
  case File.extname(location).downcase
  when '.htm', '.html','.xhtml'
    'text/html'
  when '.hnml'
    'text/hnml'
  when '.xml'
    'text/xml'
  end
end

#get_content(options = {}) ⇒ Object

The content of this document. This returns REXML elements for the document content. For plain XML files, this will return the children of the doucment root. For XHTML documents, this will return the children of the “body” tag.

Options:

  • xsl_file

    If given, the document will be transformed using this XSL file before the document is extracted



64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'lib/talia_core/data_types/xml_data.rb', line 64

def get_content(options = {})
  # TODO: Maybe port this to hpricot/nokogiri too
  text_to_parse = all_text

  # if xsl_file option is specified, execute transformation
  if (options[:xsl_file])
    text_to_parse = xslt_transform(file_path, options[:xsl_file])
  end

  # create document object
  document = REXML::Document.new text_to_parse

  # get content
  if ((mime_subtype == "html") or 
        ((mime_subtype == "xml") and (!options.nil?) and (!options[:xsl_file].nil?)))
    content = document.elements['//body'].elements
  elsif ((mime_subtype == "xml") or (mime_subtype == "hnml"))
    content = document.root.elements
  end

  # adjust/replace items path
  content.each { |i| wrapItem i }

  # return content
  return content
end

#get_content_string(options = nil) ⇒ Object

Same as #get_content, but returns a string instead of the REXML documents



92
93
94
95
96
97
98
# File 'lib/talia_core/data_types/xml_data.rb', line 92

def get_content_string(options = nil)
  xml_str = ''
  get_content(options).each do |element|
    xml_str << element.to_s
  end
  xml_str
end

#get_escaped_content_string(options = nil) ⇒ Object

Same as #get_content_string, but with the XML escape for inclusion in HTML documents



102
103
104
# File 'lib/talia_core/data_types/xml_data.rb', line 102

def get_escaped_content_string(options = nil)
  get_content_string(options).gsub(/</, "&lt;").gsub(/>/, "&gt;")
end

#mime_subtypeObject

The mime subtype for this specified class



51
52
53
# File 'lib/talia_core/data_types/xml_data.rb', line 51

def mime_subtype
  mime_type.split(/\//)[1]
end