Class: Orchard::Pairtree
- Inherits:
-
Object
- Object
- Orchard::Pairtree
- Defined in:
- lib/orchard/pairtree.rb
Overview
Provides a set of methods for working with Pairtree paths.
Constant Summary collapse
- MAX_SHORTY =
2
- ENCODE_REGEX =
/[\"*+,<=>?\\^|]|[^\x21-\x7e]/u
- DECODE_REGEX =
/\^(..)|(.)/u
- PPATH_REGEX =
/^(?:pairtree_root\/)?((?>[^:\/\.|.]{2}\/)*[^:\/\.|.]{1,2})(?:\/?$)/
- CHAR_ENCODE_CONV =
{'/'=>'=',':'=>'+','.'=>','}
- CHAR_DECODE_CONV =
{'='=>'/','+'=>':',','=>'.'}
Class Method Summary collapse
-
.decode(id) ⇒ Object
Decodes a given
id
(String)according to the pairtree 0.1 specifiaation. -
.encode(id) ⇒ Object
————————— Class Methods ————————— Encodes a given
id
(String) according to the “identifier string cleaning” in the pairtree 0.1 specification. -
.id_to_ppath(*args) ⇒ Object
Constructs the pairpath for a given
id
(String) andoptions
. -
.iterate(*args, &block) ⇒ Object
Iterates a given
pairpath
with ablock
. -
.ppath_to_id(*args) ⇒ Object
Reconstructs the id for a given
pairpath
andoptions
.
Instance Method Summary collapse
- #each ⇒ Object
-
#initialize(*args) ⇒ Pairtree
constructor
————————— Instance Methods —————————.
- #test(path) ⇒ Object
Constructor Details
#initialize(*args) ⇒ Pairtree
Instance Methods
15 16 17 18 19 |
# File 'lib/orchard/pairtree.rb', line 15 def initialize(*args) path = args[0] = args[1] || {} end |
Class Method Details
.decode(id) ⇒ Object
Decodes a given id
(String)according to the pairtree 0.1 specifiaation.
encode(id)
Examples
Pairtree.decode('ark+=13030=xt12t3')
# => ark:/13030/xt12t3
Pairtree.decode('http+==n2t,info=urn+nbn+se+kb+repos-1')
# => http://n2t.info/urn:nbn:se:kb:repos-1
Pairtree.decode('what-the-^2a@^3f#!^5e!^3f')
# => what-the-*@?#!^!?
126 127 128 129 130 131 132 |
# File 'lib/orchard/pairtree.rb', line 126 def self.decode(id) # first pass (reverse second from encode) first_pass_id = id.split(//).collect { |char| CHAR_DECODE_CONV[char] || char}.join # second pass (reverse first from encode) second_pass_id = first_pass_id.scan(DECODE_REGEX).map {|coded,chr| coded.nil? ? chr.ord : coded.hex}.pack('C*').force_encoding('utf-8') end |
.encode(id) ⇒ Object
Class Methods
Encodes a given id
(String) according to the “identifier string cleaning” in the pairtree 0.1 specification.
encode(id)
Examples
Pairtree.encode('ark:/13030/xt12t3')
# => ark+=13030=xt12t3
Pairtree.encode('http://n2t.info/urn:nbn:se:kb:repos-1')
# => http+==n2t,info=urn+nbn+se+kb+repos-1
Pairtree.encode('what-the-*@?#!^!?')
# => what-the-^2a@^3f#!^5e!^3f
Explanation (From Pairtree 0.1 Specification)
Identifier string cleaning
Prior to splitting into character pairs, identifier strings are cleaned in
two separate steps. One step would be simpler, but pairtree is designed so
that commonly used characters in reasonably opaque identifiers (e.g., not
containing natural language words, phrases, or hints) result in reasonably
short and familiar-looking paths. For completeness, the pairtree algorithm
specifies what to do with all possible UTF-8 characters, and relies for this
on a kind of URL hex-encoding. To avoid conflict with URLs, pairtree
hex-encoding is introduced with the '^' character instead of '%'.
First, the identifier string is cleaned of characters that are expected to
occur rarely in object identifiers but that would cause certain known
problems for file systems. In this step, every UTF-8 octet outside the range
of visible ASCII (94 characters with hexadecimal codes 21-7e) [ASCII], as
well as the following visible ASCII characters, must be converted to
their corresponding 3-character hexadecimal encoding, ^hh, where ^ is a
circumflex and hh is two hex digits. For example, ' ' (space) is converted
to ^20 and '*' to ^2a. In the second step, the following single-character to
single-character conversions must be done. These are characters that occur
quite commonly in opaque identifiers but present special problems for
filesystems. This step avoids requiring them to be hex encoded (hence
expanded to three characters), which keeps the typical ppath reasonably
short. Here are examples of identifier strings after cleaning and after
ppath mapping.
103 104 105 106 107 108 109 |
# File 'lib/orchard/pairtree.rb', line 103 def self.encode(id) #first pass first_pass_id = id.gsub(ENCODE_REGEX) { |m| m.bytes.map{|b| "^%02x"%b }.join} # second pass second_pass_id = first_pass_id.split(//).collect { |char| CHAR_ENCODE_CONV[char] || char}.join end |
.id_to_ppath(*args) ⇒ Object
Constructs the pairpath for a given id
(String) and options
.
id_to_ppath(id, = {})
Options
-
:prefix => Pairtree prefix
- This will remove the prefix from the id before creating a pairpath. (String)
Examples
Pairtree.id_to_ppath('abcde')
# => ab/cd/e
or with the prefix option
Pairtree.id_to_ppath('http://dom.org/abcde', :prefix => 'http://dom.org/')
# => ab/cd/e
Explanation (From Pairtree 0.1 Specification) The basic pairtree algorithm
The pairtree algorithm maps an arbitrary UTF-8 [RFC3629] encoded identifier
string into a filesystem directory path based on successive pairs of
characters, and also defines the reverse mapping (from pathname to
identifier).
In this document the word "directory" is used interchangeably with the word
"folder" and all examples conform to Unix-based filesystem conventions which
should tranlate easily to Windows conventions after substituting the path
separator ('\' instead of '/'). Pairtree places no limitations on file and
pathlengths, so implementors thinking about maximal interoperation may
wish to consider the issues listed in the Interoperability section of
this document.
The mapping from identifier string to path has two parts. First, the string
is cleaned by converting characters that would be illegal or especially
problemmaticin Unix or Windows filesystems. The cleaned string is then
split into pairs of characters, each of which becomes a directory name
in a filesystem path: successive pairs map to successive path components
until there are no characters left, with the last component being either
a 1- or 2-character directory name. The resulting path is known as
a pairpath, or ppath.
abcd -> ab/cd/
abcdefg -> ab/cd/ef/g/
12-986xy4 -> 12/-9/86/xy/4/
180 181 182 183 184 185 |
# File 'lib/orchard/pairtree.rb', line 180 def self.id_to_ppath(*args) id = args[0] = args[1] || {} id.sub!(/^#{[:prefix]}/,'') unless [:prefix].nil? self.string_to_dirpath(self.encode(id), MAX_SHORTY) end |
.iterate(*args, &block) ⇒ Object
Iterates a given pairpath
with a block
.
iterate(pairtree_path,,&block)
# pairtree_path is a String
Options
-
<tt>:raise_errors => Raise encountered errors/tt> - This will show (true) or surpress (false) ecountered errors. (Boolean)
-
:error_handling => Function to call on error
- The Proc will execute if errors occur. Error passed into Proc as parameter. (Proc or Nil)
Examples
Pairtree.iterate('repo/pairtree_root/', true) do |path|
puts path
end
# => /absolute_path/repo/pairtree_root/ab/cd/e/object
...
# => /absolute_path/repo/pairtree_root/xy/z/object
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 |
# File 'lib/orchard/pairtree.rb', line 237 def self.iterate(*args,&block) #pairtree_path,raise_errors=false,error_handling=nil,&block ppath = args[0] = args[1] || {} Find.find(ppath) do |entry| begin if File.directory?(entry) case entry when /^.*\/[^\/:.]{1,2}$/ # in pairtree when /^.*[^\/]{3,}$/ # found object block.call(File.absolute_path(entry)) Find.prune when ppath # ignore initial path else raise UnexpectedPairpathError, File.absolute_path(entry) end else raise UnexpectedPairpathError, File.absolute_path(entry) end rescue Exception => e [:error_handling].call(e) unless [:error_handling].nil? raise e if [:raise_errors] == true end end end |
.ppath_to_id(*args) ⇒ Object
Reconstructs the id for a given pairpath
and options
.
ppath_to_id(id, = {})
# id is a String
Options
-
:prefix => Pairtree prefix
- This will remove the prefix from the id before creating a pairpath. (String)
Examples
Pairtree.ppath_to_id('ab/cd/e')
# => abcde
or with the prefix option
Pairtree.ppath_to_id('ab/cd/e', :prefix => 'http://dom.org/')
# => http://dom.org/abcde
206 207 208 209 210 211 212 213 214 215 |
# File 'lib/orchard/pairtree.rb', line 206 def self.ppath_to_id(*args) ppath = args[0] = args[1] || {} match = ppath.match(PPATH_REGEX) if match.nil? throw InvalidPPathError end id = self.decode(match[1].delete('/')) [:prefix].nil? ? id : [:prefix] + id end |
Instance Method Details
#each ⇒ Object
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# File 'lib/orchard/pairtree.rb', line 21 def each dirs = ["pairtree_root"] excludes = [] for dir in dirs Find.find(dir) do |path| if FileTest.directory?(path) if excludes.include?(File.basename(path)) Find.prune # Don't look any further into this directory. else next end else p path end end end end |
#test(path) ⇒ Object
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/orchard/pairtree.rb', line 39 def test(path) begin if File.lstat(path).directory? begin dir = Dir.open(path) dir.each do |f| unless f == "." or f == ".." test(f) end end ensure dir.close end end end end |