Class: ActiveFedora::NokogiriDatastream
- Inherits:
-
Datastream
- Object
- Rubydora::Datastream
- Datastream
- ActiveFedora::NokogiriDatastream
- Includes:
- OM::XML::Document, Solrizer::XML::TerminologyBasedSolrizer
- Defined in:
- lib/active_fedora/nokogiri_datastream.rb
Direct Known Subclasses
Instance Attribute Summary collapse
-
#internal_solr_doc ⇒ Object
Returns the value of attribute internal_solr_doc.
-
#xml_loaded ⇒ Object
Returns the value of attribute xml_loaded.
Attributes inherited from Datastream
#digital_object, #fields, #last_modified
Class Method Summary collapse
- .default_attributes ⇒ Object
-
.from_xml(xml, tmpl = nil) ⇒ Object
Create an instance of this class based on xml content Careful! If you call this from a constructor, be sure to provide something ‘ie.
- .xml_template ⇒ Object
Instance Method Summary collapse
- #changed? ⇒ Boolean
- #content ⇒ Object
- #content=(content) ⇒ Object
- #content_changed? ⇒ Boolean
- #datastream_content ⇒ Object
- #find_by_terms(*termpointer) ⇒ Object
-
#from_solr(solr_doc) ⇒ Object
** Experimental **.
- #generate_solr_symbol(base, data_type) ⇒ Object
- #get_values(field_key, default = []) ⇒ Object
-
#get_values_from_solr(*term_pointer) ⇒ Array
** Experimental ** This method is called by
get_values
if this datastream has been initialized by calling from_solr method via ActiveFedora::Base.load_instance_from_solr. - #has_content? ⇒ Boolean
-
#has_solr_name?(name, solr_doc = Hash.new) ⇒ Boolean
** Experimental **.
-
#is_hierarchical_term_pointer?(*term_pointer) ⇒ Boolean
** Experimental ** ====Example: [:image, :title_set=>1, :title] return true [:image, :title_set, :title] return false.
-
#metadata? ⇒ Boolean
Indicates that this datastream has metadata content.
- #ng_xml ⇒ Object
- #ng_xml=(new_xml) ⇒ Object
-
#ng_xml_changed? ⇒ Boolean
don’t want content eagerly loaded by proxy, so implementing methods that would be implemented by define_attribute_methods.
- #ng_xml_doesnt_change! ⇒ Object
-
#ng_xml_will_change! ⇒ Object
don’t want content eagerly loaded by proxy, so implementing methods that would be implemented by define_attribute_methods.
- #om_term_values ⇒ Object
- #om_update_values ⇒ Object
- #save ⇒ Object
-
#term_values(*term_pointer) ⇒ Object
override OM::XML::term_values so can lazy load from solr if this datastream initialized using
from_solr
. - #to_xml(xml = nil) ⇒ Object
-
#update_indexed_attributes(params = {}, opts = {}) ⇒ Object
Update field values within the current datastream using #update_values, which is a wrapper for OM::TermValueOperators#update_values Ignores any fields from params that this datastream’s Terminology doesn’t recognize .
-
#update_values(params = {}) ⇒ Object
Update values in the datastream’s xml This wraps OM::TermValueOperators#update_values so that returns an error if we have loaded from solr since datastreams loaded that way should be read-only.
Methods inherited from Datastream
#create, #dirty, #dirty=, #dirty?, #initialize, #inspect, #new_object?, #profile_from_hash, #serialize!, #solrize_profile, #to_param, #to_solr, #validate_content_present
Constructor Details
This class inherits a constructor from ActiveFedora::Datastream
Instance Attribute Details
#internal_solr_doc ⇒ Object
Returns the value of attribute internal_solr_doc.
20 21 22 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 20 def internal_solr_doc @internal_solr_doc end |
#xml_loaded ⇒ Object
Returns the value of attribute xml_loaded.
20 21 22 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 20 def xml_loaded @xml_loaded end |
Class Method Details
.default_attributes ⇒ Object
22 23 24 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 22 def self.default_attributes super.merge(:controlGroup => 'X', :mimeType => 'text/xml') end |
.from_xml(xml, tmpl = nil) ⇒ Object
Create an instance of this class based on xml content Careful! If you call this from a constructor, be sure to provide something ‘ie. self’ as the @tmpl. Otherwise, you will get an infinite loop!
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 30 def self.from_xml(xml, tmpl=nil) tmpl = self.new if tmpl.nil? ## This path is used only for unit testing (e.g. MarpaDCDatastream.from_xml(fixture("data.xml")) ) if !xml.present? tmpl.ng_xml = self.xml_template elsif xml.kind_of? Nokogiri::XML::Node || xml.kind_of?(Nokogiri::XML::Document) tmpl.ng_xml = xml else tmpl.ng_xml = Nokogiri::XML::Document.parse(xml) end tmpl.ng_xml_doesnt_change! return tmpl end |
.xml_template ⇒ Object
46 47 48 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 46 def self.xml_template Nokogiri::XML::Document.parse("<xml/>") end |
Instance Method Details
#changed? ⇒ Boolean
114 115 116 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 114 def changed? ng_xml_changed? || super end |
#content ⇒ Object
141 142 143 144 145 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 141 def content return to_xml if xml_loaded or new? datastream_content end |
#content=(content) ⇒ Object
132 133 134 135 136 137 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 132 def content=(content) content_will_change! unless (xml_loaded && (content == datastream_content)) @content = content self.xml_loaded=true self.ng_xml = Nokogiri::XML::Document.parse(datastream_content) end |
#content_changed? ⇒ Boolean
118 119 120 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 118 def content_changed? ng_xml_changed? || super end |
#datastream_content ⇒ Object
139 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 139 alias :datastream_content :content |
#find_by_terms(*termpointer) ⇒ Object
401 402 403 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 401 def find_by_terms(*termpointer) super end |
#from_solr(solr_doc) ⇒ Object
** Experimental **
This method is called by ActiveFedora::Base.load_instance_from_solr in order to initialize a nokogiri datastreams values from a solr document. This method merely sets the internal_solr_doc to the document passed in. Then any calls to get_values get values from the solr document on demand instead of directly from the xml stored in Fedora. This should be used for read-only purposes only, and instances where you want to improve performance by getting data from solr instead of Fedora.
See ActiveFedora::Base.load_instance_from_solr and get_values_from_solr
for more information.
181 182 183 184 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 181 def from_solr(solr_doc) #just initialize internal_solr_doc since any value retrieval will be done via lazy loading on this doc on-demand @internal_solr_doc = solr_doc end |
#generate_solr_symbol(base, data_type) ⇒ Object
319 320 321 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 319 def generate_solr_symbol(base, data_type) Solrizer::XML::TerminologyBasedSolrizer.default_field_mapper.solr_name(base.to_sym, data_type) end |
#get_values(field_key, default = []) ⇒ Object
396 397 398 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 396 def get_values(field_key,default=[]) term_values(*field_key) end |
#get_values_from_solr(*term_pointer) ⇒ Array
** Experimental ** This method is called by get_values
if this datastream has been initialized by calling from_solr method via ActiveFedora::Base.load_instance_from_solr. This method retrieves values from a preinitialized @internal_solr_doc instead of xml. This makes the datastream read-only and this method is not intended to be used in any other case.
Values are retrieved from the @internal_solr_doc on-demand instead of via xml preloaded into memory.
A term_pointer is passed in and if it contains hierarchical indexes it will detect which solr field values need to be returned.
Example 1 (non-hierarchical term_pointer):
term_pointer = [:image, :title_set, :title]
Returns value of "image_title_set_title_t" in @internal_solr_doc
Example 2 (hierarchical term_pointer that contains one or more indexes):
term_pointer = [:image, {:title_set=>1}, :title]
relevant xml:
<image>
<title_set>
<title>Title 1</title>
</title_set>
</image>
<image>
<title_set>
<title>Title 2</title>
</title_set>
<title_set>
<title>Title 3</title>
</title_set>
</image>
Repeating element nodes are indexed and will be stored in solr as follows:
image_0_title_set_0_title_t = "Title 1"
image_1_title_set_0_title_t = "Title 2"
image_1_title_set_1_title_t = "Title 3"
Even though no image element index is specified, only the second image element has two title_set elements so the expected return value is
["Title 3"]
While loading from solr the xml hierarchy is not immediately apparent so we must detect first how many image elements with a title_set element exist
and then check which of those elements have a second title element.
As this nokogiri datastream is indexed in solr, a value at each level in the tree will be stored independently and therefore
if 'image_0_title_set_0_title_t' exists in solr 'image_0_title_set_t' will also exist in solr.
So, we will build up the relevant solr names incrementally for a given term_pointer. The last element in the
solr_name will not contain an index.
It then will do the following:
Because no index is supplied for :image it will detect which indexes exist in solr
image_0_title_set_t (found key and add 'image_0_title_set' to base solr_name list)
image_1_title_set_t (found key and add 'image_0_title_set' to base solr_name list)
image_2_title_set_t (not found and stop checking indexes for image)
After iteration 1:
bases = ["image_0_title_set","image_1_title_set"]
Two image nodes were found and next sees index of 1 supplied for title_set so just uses index of 1 building off bases found in previous iteration
image_0_title_set_1_title_t (not found remove 'image_0_title_set' from base solr_name list)
image_1_title_set_1_title_t (found and replace 'image_1_title_set' with new base 'image_1_title_set_1_title')
After iteration 2:
bases = ["image_1_title_set_1_title"]
It always looks ahead one element so we check if any elements are after title. There are not any other elements so we are done iterating.
returns @internal_solr_doc["image_1_title_set_1_title_t"]
253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 253 def get_values_from_solr(*term_pointer) values = [] solr_doc = @internal_solr_doc return values if solr_doc.nil? term = self.class.terminology.retrieve_term(*OM.pointers_to_flat_array(term_pointer, false)) #check if hierarchical term pointer if is_hierarchical_term_pointer?(*term_pointer) # if we are hierarchical need to detect all possible node values that exist # we do this by building up the possible solr names parent by parent and/or child by child # if an index is supplied for any node in the pointer it will be used # otherwise it will include all nodes and indexes that exist in solr bases = [] #add first item in term_pointer as start of bases # then iterate through possible nodes that might exist term_pointer.first.kind_of?(Hash) ? bases << term_pointer.first.keys.first : bases << term_pointer.first for i in 1..(term_pointer.length-1) #iterate in reverse so that we can modify the bases array while iterating (bases.length-1).downto(0) do |j| current_last = (term_pointer[i].kind_of?(Hash) ? term_pointer[i].keys.first : term_pointer[i]) if (term_pointer[i-1].kind_of?(Hash)) #just use index supplied instead of trying possibilities index = term_pointer[i-1].values.first solr_name_base = OM::XML::Terminology.term_hierarchical_name({bases[j]=>index},current_last) solr_name = generate_solr_symbol(solr_name_base, term.type) bases.delete_at(j) #insert the new solr name base if found bases.insert(j,solr_name_base) if has_solr_name?(solr_name,solr_doc) else #detect how many nodes exist index = 0 current_base = bases[j] bases.delete_at(j) solr_name_base = OM::XML::Terminology.term_hierarchical_name({current_base=>index},current_last) solr_name = generate_solr_symbol(solr_name_base, term.type) #check for indexes that exist until we find all nodes while has_solr_name?(solr_name,solr_doc) do #only reinsert if it exists bases.insert(j,solr_name_base) index = index + 1 solr_name_base = OM::XML::Terminology.term_hierarchical_name({current_base=>index},current_last) solr_name = generate_solr_symbol(solr_name_base, term.type) end end end end #all existing applicable solr_names have been found and we can now grab all values and build up our value array bases.each do |base| field_name = generate_solr_symbol(base.to_sym, term.type) value = (solr_doc[field_name].nil? ? solr_doc[field_name.to_s]: solr_doc[field_name]) unless value.nil? value.is_a?(Array) ? values.concat(value) : values << value end end else #this is not hierarchical and we can simply look for the solr name created using the terms without any indexes generic_field_name_base = OM::XML::Terminology.term_generic_name(*term_pointer) generic_field_name = generate_solr_symbol(generic_field_name_base, term.type) value = (solr_doc[generic_field_name].nil? ? solr_doc[generic_field_name.to_s]: solr_doc[generic_field_name]) unless value.nil? value.is_a?(Array) ? values.concat(value) : values << value end end values end |
#has_content? ⇒ Boolean
122 123 124 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 122 def has_content? xml_loaded || super end |
#has_solr_name?(name, solr_doc = Hash.new) ⇒ Boolean
** Experimental **
327 328 329 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 327 def has_solr_name?(name, solr_doc=Hash.new) !solr_doc[name].nil? || !solr_doc[name.to_s].nil? end |
#is_hierarchical_term_pointer?(*term_pointer) ⇒ Boolean
** Experimental **
Example:
[:image, {:title_set=>1}, :title] return true
[:image, :title_set, :title] return false
336 337 338 339 340 341 342 343 344 345 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 336 def is_hierarchical_term_pointer?(*term_pointer) if term_pointer.length>1 term_pointer.each do |pointer| if pointer.kind_of?(Hash) return true end end end return false end |
#metadata? ⇒ Boolean
Indicates that this datastream has metadata content.
128 129 130 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 128 def true end |
#ng_xml ⇒ Object
56 57 58 59 60 61 62 63 64 65 66 67 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 56 def ng_xml @ng_xml ||= begin self.xml_loaded = true if new? ## Load up the template self.class.xml_template else Nokogiri::XML::Document.parse(datastream_content) end end end |
#ng_xml=(new_xml) ⇒ Object
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 69 def ng_xml=(new_xml) nokogiri_document = case new_xml when Nokogiri::XML::Document new_xml when Nokogiri::XML::Node Nokogiri::XML(new_xml.to_s) ## Cast a fragment to a document when String Nokogiri::XML::Document.parse(new_xml) else raise TypeError, "You passed a #{new_xml.class} into the ng_xml of the #{self.dsid} datastream. NokogiriDatastream.ng_xml= only accepts Nokogiri::XML::Document, Nokogiri::XML::Element, Nokogiri::XML::Node, or raw XML (String) as inputs." end new_xml_string = nokogiri_document.to_xml {|config| config.no_declaration} ng_xml_will_change! unless (xml_loaded && (new_xml_string.to_s.strip == (datastream_content || '').strip)) self.xml_loaded=true @ng_xml = nokogiri_document end |
#ng_xml_changed? ⇒ Boolean
don’t want content eagerly loaded by proxy, so implementing methods that would be implemented by define_attribute_methods
102 103 104 105 106 107 108 109 110 111 112 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 102 def ng_xml_changed? return true if changed_attributes.has_key?('ng_xml') return false unless xml_loaded if new? !to_xml.empty? else (to_xml.strip != (datastream_content || '').strip) end end |
#ng_xml_doesnt_change! ⇒ Object
97 98 99 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 97 def ng_xml_doesnt_change! changed_attributes.delete('ng_xml') end |
#ng_xml_will_change! ⇒ Object
don’t want content eagerly loaded by proxy, so implementing methods that would be implemented by define_attribute_methods
93 94 95 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 93 def ng_xml_will_change! changed_attributes['ng_xml'] = nil end |
#om_term_values ⇒ Object
17 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 17 alias_method(:om_term_values, :term_values) |
#om_update_values ⇒ Object
18 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 18 alias_method(:om_update_values, :update_values) |
#save ⇒ Object
50 51 52 53 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 50 def save @content = to_xml if ng_xml_changed? and content_will_change! super end |
#term_values(*term_pointer) ⇒ Object
override OM::XML::term_values so can lazy load from solr if this datastream initialized using from_solr
422 423 424 425 426 427 428 429 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 422 def term_values(*term_pointer) if @internal_solr_doc #lazy load values from solr on demand get_values_from_solr(*term_pointer) else om_term_values(*term_pointer) end end |
#to_xml(xml = nil) ⇒ Object
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 147 def to_xml(xml = nil) xml = self.ng_xml if xml.nil? ng_xml = self.ng_xml if ng_xml.respond_to?(:root) && ng_xml.root.nil? && self.class.respond_to?(:root_property_ref) && !self.class.root_property_ref.nil? ng_xml = self.class.generate(self.class.root_property_ref, "") if xml.root.nil? xml = ng_xml end end unless xml == ng_xml || ng_xml.root.nil? if xml.kind_of?(Nokogiri::XML::Document) xml.root.add_child(ng_xml.root) elsif xml.kind_of?(Nokogiri::XML::Node) xml.add_child(ng_xml.root) else raise "You can only pass instances of Nokogiri::XML::Node into this method. You passed in #{xml}" end end return xml.to_xml {|config| config.no_declaration} end |
#update_indexed_attributes(params = {}, opts = {}) ⇒ Object
Update field values within the current datastream using #update_values, which is a wrapper for OM::TermValueOperators#update_values Ignores any fields from params that this datastream’s Terminology doesn’t recognize
Example:
@mods_ds.update_indexed_attributes( {[{":person"=>"0"}, "role"]=>{"0"=>"role1", "1"=>"role2", "2"=>"role3"} })
=> {"person_0_role"=>{"0"=>"role1", "1"=>"role2", "2"=>"role3"}}
@mods_ds.to_xml # (the following is an approximation)
<mods>
<mods:name type="person">
<mods:role>
<mods:roleTerm>role1</mods:roleTerm>
</mods:role>
<mods:role>
<mods:roleTerm>role2</mods:roleTerm>
</mods:role>
<mods:role>
<mods:roleTerm>role3</mods:roleTerm>
</mods:role>
</mods:name>
</mods>
372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 372 def update_indexed_attributes(params={}, opts={}) if self.class.terminology.nil? raise "No terminology is set for this NokogiriDatastream class. Cannot perform update_indexed_attributes" end # remove any fields from params that this datastream doesn't recognize # make sure to make a copy of params so not to modify hash that might be passed to other methods current_params = params.clone current_params.delete_if do |term_pointer,new_values| if term_pointer.kind_of?(String) logger.warn "WARNING: #{dsid} ignoring {#{term_pointer.inspect} => #{new_values.inspect}} because #{term_pointer.inspect} is a String (only valid OM Term Pointers will be used). Make sure your html has the correct field_selector tags in it." true else !self.class.terminology.has_term?(*OM.destringify(term_pointer)) end end result = {} unless current_params.empty? result = update_values( current_params ) end return result end |
#update_values(params = {}) ⇒ Object
Update values in the datastream’s xml This wraps OM::TermValueOperators#update_values so that returns an error if we have loaded from solr since datastreams loaded that way should be read-only
411 412 413 414 415 416 417 418 419 |
# File 'lib/active_fedora/nokogiri_datastream.rb', line 411 def update_values(params={}) if @internal_solr_doc raise "No update performed, this object was initialized via Solr instead of Fedora and is therefore read-only. Please utilize ActiveFedora::Base.find to first load object via Fedora instead." else ng_xml_will_change! result = om_update_values(params) return result end end |