Class: ContentData::ContentData
- Inherits:
-
Object
- Object
- ContentData::ContentData
- Defined in:
- lib/content_data/content_data.rb
Overview
Content Data(CD) object holds files information as contents and instances Files info retrieved from hardware: checksum, size, time modification, server, device and path Those attributes are divided into content and instance attributes:
unique checksum, size are content attributes
time modification, server, device and path are instance attributes
The relationship between content and instances is 1:many meaning that a content can have instances in many servers. content also has time attribute, which has the value of the time of the first instance. This can be changed by using unify_time method which sets all time attributes for a content and it’s instances to the min time off all. Different files(instances) with same content(checksum), are grouped together under that content. Interface methods include:
iterate over contents and instances info,
unify time, add/remove instance, queries, merge, remove directory and more.
Content info data structure:
@contents_info = { Checksum -> [size, *instances*, content_modification_time] }
*instances* = {[server,path] -> instance_modification_time }
Notes:
1. content_modification_time is the instance_modification_time of the first
instances which was added to @contents_info
Instance Method Summary collapse
- #==(other) ⇒ Object
- #add_instance(checksum, size, server, path, modification_time) ⇒ Object
- #checksum_instances_size(checksum) ⇒ Object
- #clone_contents_info ⇒ Object
- #clone_instances_info ⇒ Object
-
#content_each_instance(checksum, &block) ⇒ Object
iterator of instances over specific content block is provided with: checksum, size, content modification time, instance modification time, server and file path.
- #content_exists(checksum) ⇒ Object
- #contents_size ⇒ Object
-
#each_content(&block) ⇒ Object
iterator over @contents_info data structure (not including instances) block is provided with: checksum, size and content modification time.
-
#each_instance(&block) ⇒ Object
iterator over @contents_info data structure (including instances) block is provided with: checksum, size, content modification time, instance modification time, server and file path.
- #empty? ⇒ Boolean
-
#from_file(filename) ⇒ Object
TODO validation that file indeed contains ContentData missing.
- #get_instance_mod_time(checksum, location) ⇒ Object
-
#get_query(variable, params) ⇒ Object
TODO simplify conditions This mehod is experimental and shouldn\‘t be used nil is used to define +/- infinity for to/from method arguments from/to values are exlusive in condition’a calculations Need to take care about ‘==’ operation that is used for object’s comparison.
-
#initialize(other = nil) ⇒ ContentData
constructor
A new instance of ContentData.
- #instance_exists(path, server) ⇒ Object
- #instances_size ⇒ Object
- #remove_content(checksum) ⇒ Object
-
#remove_directory(server, dir_to_remove) ⇒ Object
removes all instances records which are located under input param: dir_to_remove.
-
#remove_instance(server, path) ⇒ Object
removes an instance record both in @instances_info and @instances_info.
- #stats_by_location(location) ⇒ Object
- #to_file(filename) ⇒ Object
- #to_s ⇒ Object
-
#unify_time ⇒ Object
for each content, all time fields (content and instances) are replaced with the min time found, while going through all time fields.
-
#unique_id ⇒ ID
Content Data unique identification.
-
#validate(params = nil) ⇒ Boolean
Validates index against file system that all instances hold a correct data regarding files that they represents.
Constructor Details
#initialize(other = nil) ⇒ ContentData
Returns a new instance of ContentData.
32 33 34 35 36 37 38 39 40 |
# File 'lib/content_data/content_data.rb', line 32 def initialize(other = nil) if other.nil? @contents_info = {} # Checksum --> [size, paths-->time(instance), time(content)] @instances_info = {} # location --> checksum to optimize instances query else @contents_info = other.clone_contents_info @instances_info = other.clone_instances_info # location --> checksum to optimize instances query end end |
Instance Method Details
#==(other) ⇒ Object
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
# File 'lib/content_data/content_data.rb', line 210 def ==(other) return false if other.nil? return false if @contents_info.length != other.contents_size other.each_instance { |checksum, size, content_mod_time, instance_mod_time, server, path| return false if instance_exists(path, server) != other.instance_exists(path, server) local_content_info = @contents_info[checksum] return false if local_content_info.nil? return false if local_content_info[0] != size return false if local_content_info[2] != content_mod_time #check instances local_instances = local_content_info[1] return false if other.checksum_instances_size(checksum) != local_instances.length location = [server, path] local_instance_mod_time = local_instances[location] return false if local_instance_mod_time.nil? return false if local_instance_mod_time != instance_mod_time } true end |
#add_instance(checksum, size, server, path, modification_time) ⇒ Object
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/content_data/content_data.rb', line 138 def add_instance(checksum, size, server, path, modification_time) location = [server, path] content_info = @contents_info[checksum] if content_info.nil? @contents_info[checksum] = [size, {location => modification_time}, modification_time] else if size != content_info[0] Log.warning 'File size different from content size while same checksum' Log.warning("instance location:server:'#{location[0]}' path:'#{location[1]}'") Log.warning("instance mod time:'#{modification_time}'") end #override file if needed content_info[0] = size instances = content_info[1] instances[location] = modification_time end @instances_info[location] = checksum end |
#checksum_instances_size(checksum) ⇒ Object
125 126 127 128 129 |
# File 'lib/content_data/content_data.rb', line 125 def checksum_instances_size(checksum) content_info = @contents_info[checksum] return 0 if content_info.nil? content_info[1].length end |
#clone_contents_info ⇒ Object
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/content_data/content_data.rb', line 55 def clone_contents_info @contents_info.keys.inject({}) { |clone_contents_info, checksum| instances = @contents_info[checksum] size = instances[0] content_time = instances[2] instances_db = instances[1] instances_db_cloned = {} instances_db.keys.each { |location| instance_mtime = instances_db[location] instances_db_cloned[[location[0].clone,location[1].clone]]=instance_mtime } clone_contents_info[checksum] = [size, instances_db_cloned, content_time] clone_contents_info } end |
#clone_instances_info ⇒ Object
48 49 50 51 52 53 |
# File 'lib/content_data/content_data.rb', line 48 def clone_instances_info @instances_info.keys.inject({}) { |clone_instances_info, location| clone_instances_info[[location[0].clone, location[1].clone]] = @instances_info[location].clone clone_instances_info } end |
#content_each_instance(checksum, &block) ⇒ Object
iterator of instances over specific content block is provided with: checksum, size, content modification time,
instance modification time, server and file path
102 103 104 105 106 107 108 109 110 111 |
# File 'lib/content_data/content_data.rb', line 102 def content_each_instance(checksum, &block) content_info = @contents_info[checksum] content_info[1].keys.each {|location| # provide the block with: checksum, size, content modification time,instance modification time, # server and path. instance_modification_time = content_info[1][location] block.call(checksum,content_info[0], content_info[2], instance_modification_time, location[0], location[1]) } end |
#content_exists(checksum) ⇒ Object
163 164 165 |
# File 'lib/content_data/content_data.rb', line 163 def content_exists(checksum) @contents_info.has_key?(checksum) end |
#contents_size ⇒ Object
113 114 115 |
# File 'lib/content_data/content_data.rb', line 113 def contents_size() @contents_info.length end |
#each_content(&block) ⇒ Object
iterator over @contents_info data structure (not including instances) block is provided with: checksum, size and content modification time
75 76 77 78 79 80 81 |
# File 'lib/content_data/content_data.rb', line 75 def each_content(&block) @contents_info.keys.each { |checksum| content_val = @contents_info[checksum] # provide checksum, size and content modification time to the block block.call(checksum,content_val[0], content_val[2]) } end |
#each_instance(&block) ⇒ Object
iterator over @contents_info data structure (including instances) block is provided with: checksum, size, content modification time,
instance modification time, server and file path
86 87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/content_data/content_data.rb', line 86 def each_instance(&block) @contents_info.keys.each { |checksum| content_info = @contents_info[checksum] content_info[1].keys.each {|location| # provide the block with: checksum, size, content modification time,instance modification time, # server and path. instance_modification_time = content_info[1][location] block.call(checksum,content_info[0], content_info[2], instance_modification_time, location[0], location[1]) } } end |
#empty? ⇒ Boolean
159 160 161 |
# File 'lib/content_data/content_data.rb', line 159 def empty? @contents_info.empty? end |
#from_file(filename) ⇒ Object
TODO validation that file indeed contains ContentData missing
273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 |
# File 'lib/content_data/content_data.rb', line 273 def from_file(filename) lines = IO.readlines(filename) number_of_contents = lines[0].to_i i = 1 + number_of_contents number_of_instances = lines[i].to_i i += 1 number_of_instances.times { if lines[i].nil? Log.warning "line ##{i} is nil !!!, Backing filename: #{filename} to #{filename}.bad" FileUtils.cp(filename, "#{filename}.bad") Log.warning("Lines:\n#{lines[i].join("\n")}") else parameters = lines[i].split(',') # bugfix: if file name consist a comma then parsing based on comma separating fails if (parameters.size > 5) (4..parameters.size-2).each do |i| parameters[3] = [parameters[3], parameters[i]].join(",") end (4..parameters.size-2).each do |i| parameters.delete_at(4) end end add_instance(parameters[0], parameters[1].to_i, parameters[2], parameters[3], parameters[4].to_i) end i += 1 } end |
#get_instance_mod_time(checksum, location) ⇒ Object
131 132 133 134 135 136 |
# File 'lib/content_data/content_data.rb', line 131 def get_instance_mod_time(checksum, location) content_info = @contents_info[checksum] return nil if content_info.nil? instances = content_info[1] instance_time = instances[location] end |
#get_query(variable, params) ⇒ Object
TODO simplify conditions This mehod is experimental and shouldn\‘t be used nil is used to define +/- infinity for to/from method arguments from/to values are exlusive in condition’a calculations Need to take care about ‘==’ operation that is used for object’s comparison. In need of case user should define it’s own ‘==’ implemementation.
461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 |
# File 'lib/content_data/content_data.rb', line 461 def get_query(variable, params) raise RuntimeError.new 'This method is experimental and shouldn\'t be used' exact = params['exact'].nil? ? Array.new : params['exact'] from = params['from'] to = params ['to'] is_inside = params['is_inside'] unless ContentInstance.new.instance_variable_defined?("@#{attribute}") raise ArgumentError "#{variable} isn't a ContentInstance variable" end if (exact.nil? && from.nil? && to.nil?) raise ArgumentError 'At least one of the argiments {exact, from, to} must be defined' end if (!(from.nil? || to.nil?) && from.kind_of?(to.class)) raise ArgumentError 'to and from arguments should be comparable one with another' end # FIXME add support for from/to for Strings if ((!from.nil? && !from.kind_of?(Numeric.new.class))\ || (!to.nil? && to.kind_of?(Numeric.new.class))) raise ArgumentError 'from and to options supported only for numeric values' end if (!exact.empty? && (!from.nil? || !to.nil?)) raise ArgumentError 'exact and from/to options are mutually exclusive' end result_index = ContentData.new instances.each_value do |instance| is_match = false var_value = instance.instance_variable_get("@#{variable}") if exact.include? var_value is_match = true elsif (from.nil? || var_value > from) && (to.nil? || var_value < to) is_match = true end if (is_match && is_inside) || (!is_match && !is_inside) checksum = instance.checksum result_index.add_content(contents[checksum]) unless result_index.content_exists(checksum) result_index.add_instance instance end end result_index end |
#instance_exists(path, server) ⇒ Object
167 168 169 |
# File 'lib/content_data/content_data.rb', line 167 def instance_exists(path, server) @instances_info.has_key?([server, path]) end |
#instances_size ⇒ Object
117 118 119 120 121 122 123 |
# File 'lib/content_data/content_data.rb', line 117 def instances_size() counter=0 @contents_info.values.each { |content_info| counter += content_info[1].length } counter end |
#remove_content(checksum) ⇒ Object
230 231 232 233 234 235 236 237 238 |
# File 'lib/content_data/content_data.rb', line 230 def remove_content(checksum) content_info = @contents_info[checksum] if content_info content_info[1].each_key { |location| @instances_info.delete(location) } @contents_info.delete(checksum) end end |
#remove_directory(server, dir_to_remove) ⇒ Object
removes all instances records which are located under input param: dir_to_remove. found records are removed from both @instances_info and @instances_info. input params: server & dir_to_remove - are used to check each instance unique key (called location) removes also content\s, if a content\s become\s empty after removing instance\s
196 197 198 199 200 201 202 203 204 205 206 207 |
# File 'lib/content_data/content_data.rb', line 196 def remove_directory(server, dir_to_remove) @contents_info.keys.each { |checksum| instances = @contents_info[checksum][1] instances.each_key { |location| if location[0] == server and location[1].scan(dir_to_remove).size > 0 instances.delete(location) @instances_info.delete(location) end } @contents_info.delete(checksum) if instances.empty? } end |
#remove_instance(server, path) ⇒ Object
removes an instance record both in @instances_info and @instances_info. input params: server & path - are the instance unique key (called location) removes also the content, if content becomes empty after removing the instance
181 182 183 184 185 186 187 188 189 190 |
# File 'lib/content_data/content_data.rb', line 181 def remove_instance(server, path) location = [server, path] checksum = @instances_info[location] content_info = @contents_info[checksum] return nil if content_info.nil? instances = content_info[1] instances.delete(location) @contents_info.delete(checksum) if instances.empty? @instances_info.delete(location) end |
#stats_by_location(location) ⇒ Object
171 172 173 174 175 176 |
# File 'lib/content_data/content_data.rb', line 171 def stats_by_location(location) checksum = @instances_info[location] content_info = @contents_info[checksum] return nil if content_info.nil? return [content_info[0], content_info[1][location]] end |
#to_file(filename) ⇒ Object
257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
# File 'lib/content_data/content_data.rb', line 257 def to_file(filename) content_data_dir = File.dirname(filename) FileUtils.makedirs(content_data_dir) unless File.directory?(content_data_dir) file = File.open(filename, 'w') file.write("#{@contents_info.length}\n") each_content { |checksum, size, content_mod_time| file.write("#{checksum},#{size},#{content_mod_time}\n") } file.write("#{@instances_info.length}\n") each_instance { |checksum, size, _, instance_mod_time, server, path| file.write("#{checksum},#{size},#{server},#{path},#{instance_mod_time}\n") } file.close end |
#to_s ⇒ Object
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 |
# File 'lib/content_data/content_data.rb', line 240 def to_s return_str = "" contents_str = "" instances_str = "" each_content { |checksum, size, content_mod_time| contents_str << "%s,%d,%d\n" % [checksum, size, content_mod_time] } each_instance { |checksum, size, content_mod_time, instance_mod_time, server, path| instances_str << "%s,%d,%s,%s,%d\n" % [checksum, size, server, path, instance_mod_time] } return_str << "%d\n" % [@contents_info.length] return_str << contents_str return_str << "%d\n" % [@instances_info.length] return_str << instances_str return_str end |
#unify_time ⇒ Object
for each content, all time fields (content and instances) are replaced with the min time found, while going through all time fields.
308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 |
# File 'lib/content_data/content_data.rb', line 308 def unify_time() @contents_info.keys.each { |checksum| content_info = @contents_info[checksum] min_time_per_checksum = content_info[2] instances = content_info[1] instances.keys.each { |location| instance_mod_time = instances[location] if instance_mod_time < min_time_per_checksum min_time_per_checksum = instance_mod_time end } # update all instances with min time instances.keys.each { |location| instances[location] = min_time_per_checksum } # update content time with min time content_info[2] = min_time_per_checksum } end |
#unique_id ⇒ ID
Content Data unique identification
44 45 46 |
# File 'lib/content_data/content_data.rb', line 44 def unique_id @instances_info.hash end |
#validate(params = nil) ⇒ Boolean
Validates index against file system that all instances hold a correct data regarding files that they represents.
There are two levels of validation, controlled by instance_check_level system parameter:
-
shallow - quick, tests instance for file existence and attributes.
-
deep - can take more time, in addition to shallow recalculates hash sum.
340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 |
# File 'lib/content_data/content_data.rb', line 340 def validate(params = nil) # used to answer whether specific param was set param_exists = Proc.new do |param| !(params.nil? || params[param].nil?) end # used to process method parameters centrally process_params = Proc.new do |values| if param_exists.call(:failed) info = values[:details] unless info.nil? checksum = info[0] content_mtime = info[1] size = info[2] inst_mtime = info[3] server = info[4] file_path = info[5] params[:failed].add_instance(checksum, size, server, file_path, inst_mtime) end end end is_valid = true @contents_info.keys.each { |checksum| instances = @contents_info[checksum] content_size = instances[0] content_mtime = instances[2] instances[1].keys.each { |unique_path| instance_mtime = instances[1][unique_path] instance_info = [checksum, content_mtime, content_size, instance_mtime] instance_info.concat(unique_path) unless check_instance(instance_info) is_valid = false unless params.nil? || params.empty? process_params.call({:details => instance_info}) end end } } is_valid end |