Module: RDF::SAK::Util
- Extended by:
- Util
- Included in:
- Context, Context::Document, Document, Util
- Defined in:
- lib/rdf/sak/util.rb
Constant Summary collapse
- SCHEME_RANK =
{ https: 0, http: 1 }
- XHTMLNS =
'http://www.w3.org/1999/xhtml'.freeze
- XHV =
'http://www.w3.org/1999/xhtml/vocab#'.freeze
- XPATHNS =
{ html: XHTMLNS, svg: 'http://www.w3.org/2000/svg', atom: 'http://www.w3.org/2005/Atom', xlink: 'http://www.w3.org/1999/xlink', }.freeze
Class Method Summary collapse
-
.asserted_types(repo, subject, type = nil) ⇒ Array
Obtain all and only the rdf:types directly asserted on the subject.
- .base_for(xmlnode, base) ⇒ Object
-
.canonical_uri(repo, subject, base: nil, unique: true, rdf: true, slugs: false, fragment: false) ⇒ RDF::URI, ...
Obtain the “best” dereferenceable URI for the subject.
-
.canonical_uuid(repo, uri, unique: true, published: false, scache: {}, ucache: {}, base: nil) ⇒ RDF::URI, Array
Obtain the canonical UUID for the given URI.
- .cmp_label(repo, a, b, labels: nil, supplant: true, reverse: false) ⇒ Object
-
.dates_for(repo, subject, predicate: RDF::Vocab::DC.date, datatype: [RDF::XSD.date, RDF::XSD.dateTime]) ⇒ Array
Obtain dates for the subject as instances of Date(Time).
-
.label_for(repo, subject, candidates: nil, unique: true, type: nil, lang: nil, desc: false, alt: false, base: nil) ⇒ Array
Obtain the most appropriate label(s) for the subject’s type(s).
-
.objects_for(repo, subject, predicate, entail: true, only: [], datatype: nil) ⇒ RDF::Term
Returns objects from the graph with entailment.
-
.published?(repo, uri, circulated: false, base: nil) ⇒ true, false
Determine whether the URI represents a published document.
-
.replacements_for(repo, subject, published: true, base: nil) ⇒ Set
Find the terminal replacements for the given subject, if any exist.
-
.struct_for(repo, subject, base: nil, rev: false, only: [], uuids: false, canon: false, ucache: {}, scache: {}) ⇒ Hash
Obtain a key-value structure for the given subject, optionally constraining the result by node type (:resource, :uri/:iri, :blank/:bnode, :literal).
-
.subjects_for(repo, predicate, object, entail: true, only: []) ⇒ RDF::Resource
Returns subjects from the graph with entailment.
-
.traverse_links(node, type: 'application/xhtml+xml', &block) ⇒ Object
Traverse links based on content type.
Instance Method Summary collapse
-
#abbreviate(term, prefixes: {}, vocab: nil, noop: true, sort: true) ⇒ String, ...
Abbreviate one or more URIs into one or more CURIEs if we can.
-
#all_related(rdftype) ⇒ Array
Obtain everything that is an owl:equivalentClass or rdfs:subClassOf the given type.
-
#authors_for(repo, subject, unique: false, contrib: false, base: nil) ⇒ RDF::Value, Array
Assuming the subject is a thing that has authors, return the list of authors.
- #cmp_resource(a, b, www: nil) ⇒ Object
- #coerce_node_spec(spec, rev: false) ⇒ Object
-
#coerce_resource(arg, base = nil, as: :rdf) ⇒ URI, ...
Coerce a stringlike argument into a URI.
-
#coerce_uuid_urn(arg, base = nil) ⇒ Object
Coerce a stringlike argument into a UUID URN.
-
#dehydrate(doc) ⇒ Object
Strip all the links surrounding and RDFa attributes off
dfn
/abbr
/span
tags. -
#formats_for(repo, subject, predicate: RDF::Vocab::DC.format, datatype: [RDF::XSD.token]) ⇒ Array
Obtain any specified MIME types for the subject.
-
#get_base(elem, default: nil, coerce: nil) ⇒ nil, ...
Returns the base URI from the perspective of the given element.
-
#get_prefixes(elem, traverse: true, coerce: nil, descend: false) ⇒ Hash
Given an X(HT)ML element, returns a hash of prefixes of the form { prefix: “vocab” }, where the current @vocab is represented by the
nil
key. - #invert_struct(struct) ⇒ Object
- #modernize(doc) ⇒ Object
- #node_matches?(node, spec) ⇒ Boolean
-
#predicate_set(predicates, seen: Set.new) ⇒ Array
Obtain the objects for a given subject-predicate pair.
-
#prefix_subset(prefixes, nodes) ⇒ Hash
Given a hash of prefixes and an array of nodes, obtain the the subset of prefixes that abbreviate the nodes.
-
#prepare_collation(struct) {|p, o| ... } ⇒ Hash
Given a structure of the form { predicate => [objects] }, rearrange the structure into one more amenable to rendering RDFa.
-
#rehydrate(doc, graph, &block) ⇒ Object
(maybe add
code
/kbd
/samp
/var
/time
one day too). -
#reindent(node, depth = 0, indent = ' ') ⇒ Object
reindent text nodes.
-
#resolve_curie(curie, prefixes: {}, vocab: nil, base: nil, refnode: nil, term: false, noop: true, scalar: false, coerce: nil) ⇒ nil, ...
Resolve a string or array or attribute node containing one or more terms/CURIEs against a set of prefixes.
-
#smush_struct(struct) ⇒ Object
turns any data structure into a set of nodes.
-
#split_pp(uri, only: false) ⇒ Array
Given a URI as input, split any path parameters out of the last path segment.
- #split_pp2(path, only: false) ⇒ Object
-
#split_qp(uri, only: false) ⇒ Array
Given a URI as input, split any query parameters into an array of key-value pairs.
-
#subject_for(node, prefixes: nil, base: nil, coerce: :rdf) ⇒ URI, ...
Given an X(HT)ML element, return the nearest RDFa subject.
-
#subtree(doc, xpath = '/*', reindent: true, prefixes: {}) ⇒ Object
isolate an element into a new document.
-
#terminal_slug(uri, base: nil) ⇒ String
Get the last non-empty path segment of the URI.
- #title_tag(predicates, content, prefixes: {}, vocab: nil, lang: nil, xhtml: true) ⇒ Object
-
#type_strata(rdftype) ⇒ Array
Obtain a stack of types for an asserted initial type or set thereof.
-
#uri_pp(uri, extra = '') ⇒ Object
really gotta stop carting this thing around.
Class Method Details
.asserted_types(repo, subject, type = nil) ⇒ Array
Obtain all and only the rdf:types directly asserted on the subject.
431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 |
# File 'lib/rdf/sak/util.rb', line 431 def self.asserted_types repo, subject, type = nil asserted = nil if type type = type.respond_to?(:to_a) ? type.to_a : [type] asserted = type.select { |t| t.is_a? RDF::Value }.map do |t| RDF::Vocabulary.find_term t end end asserted ||= repo.query([subject, RDF.type, nil]).objects.map do |o| RDF::Vocabulary.find_term o end.compact asserted.select { |t| t && t.uri? }.uniq end |
.base_for(xmlnode, base) ⇒ Object
1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 |
# File 'lib/rdf/sak/util.rb', line 1269 def self.base_for xmlnode, base base = URI(base.to_s) unless base.is_a? URI out = base if xmlnode.at_xpath('self::html:*|/html', XPATHNS) b = URI(xmlnode.at_xpath(XPATH[:htmlbase], XPATHNS).to_s.strip) out = b if b.absolute? elsif b = xmlnode.root.at_xpath(XPATH[:xmlbase]) b = URI(b.to_s.strip) out = b if b.absolute? end out end |
.canonical_uri(repo, subject, base: nil, unique: true, rdf: true, slugs: false, fragment: false) ⇒ RDF::URI, ...
Obtain the “best” dereferenceable URI for the subject. Optionally returns all candidates.
951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 |
# File 'lib/rdf/sak/util.rb', line 951 def self.canonical_uri repo, subject, base: nil, unique: true, rdf: true, slugs: false, fragment: false subject = coerce_resource subject, base out = [] # try to find it first out = objects_for(repo, subject, [RDF::SAK::CI.canonical, RDF::OWL.sameAs], entail: false, only: :resource).select do |o| # only consider the subjects repo.has_subject? o end.sort { |a, b| cmp_resource a, b } # try to generate in lieu if subject.uri? and (out.empty? or slugs) out += objects_for(repo, subject, [RDF::SAK::CI['canonical-slug'], RDF::SAK::CI.slug], only: :literal).map do |o| base + o.value end if slugs uri = URI(uri_pp(subject.to_s)) if base and uri.respond_to? :uuid b = base.clone b.query = b.fragment = nil b.path = '/' + uri.uuid out << RDF::URI.new(b.to_s) else out << subject end end # remove all URIs with fragments unless specified unless fragment tmp = out.reject(&:fragment) out = tmp unless tmp.empty? end # coerce to URI objects if specified out.map! { |u| URI(uri_pp u.to_s) } unless rdf unique ? out.first : out.uniq end |
.canonical_uuid(repo, uri, unique: true, published: false, scache: {}, ucache: {}, base: nil) ⇒ RDF::URI, Array
Obtain the canonical UUID for the given URI
704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 |
# File 'lib/rdf/sak/util.rb', line 704 def self.canonical_uuid repo, uri, unique: true, published: false, scache: {}, ucache: {}, base: nil # make sure this is actually a uri orig = uri = coerce_resource uri, base unless uri.is_a? RDF::Node tu = URI(uri_pp(uri).to_s).normalize if tu.path && !tu.fragment && UUID_RE.match?(uu = tu.path.delete_prefix(?/)) tu = URI('urn:uuid:' + uu.downcase) end # unconditionally overwrite uri uri = RDF::URI(tu.to_s) # now check if it's a uuid if tu.respond_to? :uuid # warn "lol uuid #{orig}" # if it's a uuid, check that we have it as a subject # if we have it as a subject, return it return uri if scache[uri] ||= repo.has_subject?(uri) # note i don't want to screw around right now dealing with the # case that a UUID might not itself be canonical end end # spit up the cache if present if out = ucache[orig] # warn "lol cached #{orig}" return unique ? out.first : out end # otherwise we proceed: # goal: return the most "appropriate" UUID for the given URI # it is so lame i have to do this bits = { nil => 0, false => 0, true => 1 } # rank (0 is higher): # * (00) exact & canonical == 0, # * (01) exact == 1, # * (10) inexact & canonical == 2, # * (11) inexact == 3. # warn "WTF URI #{uri}" # handle path parameters by generating a bunch of candidates uris = if uri.respond_to? :path and uri.path.start_with? ?/ # split any path parameters off uu, *pp = split_pp uri if pp.empty? [uri] # no path parameters else uu = RDF::URI(uu.to_s) bp = uu.path # base path (0..pp.length).to_a.reverse.map do |i| u = uu.dup u.path = ([bp] + pp.take(i)).join(';') u end end else [uri] # not a pathful URI end # collect the candidates by URI sa = predicate_set [RDF::SAK::CI.canonical, RDF::SAK::CI.alias, RDF::OWL.sameAs] candidates = nil uris.each do |u| candidates = subjects_for(repo, sa, u, entail: false) do |s, f| # there is no #to_i for booleans and also we xor this number [s, { rank: bits[f.include?(RDF::SAK::CI.canonical)] ^ 1, published: published?(repo, s), mtime: dates_for(repo, s).last || DateTime.new }] end.compact.to_h break unless candidates.empty? end # now collect by slug slug = terminal_slug uri, base: base if slug and !slug.empty? exact = uri == coerce_resource(slug, base) # slug represents exact match sl = [RDF::SAK::CI['canonical-slug'], RDF::SAK::CI.slug] [RDF::XSD.string, RDF::XSD.token].each do |t| subjects_for(repo, sl, RDF::Literal(slug, datatype: t)) do |s, f| # default to lowest rank if this candidate is new entry = candidates[s] ||= { published: published?(repo, s, base: base), rank: 0b11, mtime: dates_for(repo, s).last || DateTime.new } # true is 1 and false is zero so we xor this too rank = (BITS[exact] << 1 | BITS[f.include?(sl[0])]) ^ 0b11 # now amend the rank if we have found a better one entry[:rank] = rank if rank < entry[:rank] end end end candidates.delete_if { |s, _| !/^urn:uuid:/.match?(s.to_s) } # scan all the candidates for replacements and remove any # candidates that have been replaced candidates.to_a.each do |k, v| # note that reps = replacements_for(repo, k, published: published) - [k] unless reps.empty? v[:replaced] = true reps.each do |r| c = candidates[r] ||= { rank: v[:rank], published: published?(repo, r), mtime: dates_for(repo, r).last || v[:mtime] || DateTime.new } # we give the replacement the rank and mtime of the # resource being replaced if it scores better c[:rank] = v[:rank] if v[:rank] < c[:rank] c[:mtime] = v[:mtime] if v[:mtime] > c[:mtime] end end end # now we can remove all unpublished candidates if the context is # published candidates.select! do |_, v| !v[:replaced] && (published ? v[:published] : true) end # now we sort by rank and date; the highest-ranking newest # candidate is the one out = candidates.sort do |a, b| _, va = a _, vb = b cb = published ? BITS[vb[:published]] <=> BITS[va[:published]] : 0 cr = va[:rank] <=> vb[:rank] cb == 0 ? cr == 0 ? vb[:mtime] <=> va[:mtime] : cr : cb end.map { |x| x.first }.compact # set cache ucache[orig] = out #warn "lol not cached #{orig}" unique ? out.first : out # an exact match is better than an inexact one # a canonical match is better than non-canonical # note this is four bits: exact, canon(exact), inexact, canon(inexact) # !canon(exact) should rank higher than canon(inexact) # unreplaced is better than replaced # newer is better than older (though no reason an older item # can't replace a newer one) # published is better than not, unless the context is # unpublished and an unpublished document replaces a published one end |
.cmp_label(repo, a, b, labels: nil, supplant: true, reverse: false) ⇒ Object
919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 |
# File 'lib/rdf/sak/util.rb', line 919 def self.cmp_label repo, a, b, labels: nil, supplant: true, reverse: false labels ||= {} # try supplied label or fall back pair = [a, b].map do |x| if labels[x] labels[x][1] elsif supplant and y = label_for(repo, x) labels[x] = y y[1] else x end end pair.reverse! if reverse # warn "#{pair[0]} <=> #{pair[1]}" pair[0].to_s <=> pair[1].to_s end |
.dates_for(repo, subject, predicate: RDF::Vocab::DC.date, datatype: [RDF::XSD.date, RDF::XSD.dateTime]) ⇒ Array
Obtain dates for the subject as instances of Date(Time). This is just shorthand for a common application of ‘objects_for`.
1242 1243 1244 1245 1246 1247 1248 |
# File 'lib/rdf/sak/util.rb', line 1242 def self.dates_for repo, subject, predicate: RDF::Vocab::DC.date, datatype: [RDF::XSD.date, RDF::XSD.dateTime] objects_for( repo, subject, predicate, only: :literal, datatype: datatype) do |o| o.object end.sort.uniq end |
.label_for(repo, subject, candidates: nil, unique: true, type: nil, lang: nil, desc: false, alt: false, base: nil) ⇒ Array
Obtain the most appropriate label(s) for the subject’s type(s). Returns one or more (depending on the ‘unique` flag) predicate-object pairs in order of preference.
1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 |
# File 'lib/rdf/sak/util.rb', line 1077 def self.label_for repo, subject, candidates: nil, unique: true, type: nil, lang: nil, desc: false, alt: false, base: nil raise ArgumentError, 'no repo!' unless repo.is_a? RDF::Queryable return unless subject.is_a? RDF::Value and subject.resource? asserted = asserted_types repo, subject, type # get all the inferred types by layer; add default class if needed strata = type_strata asserted strata.push [RDF::RDFS.Resource] if strata.empty? or not strata[-1].include?(RDF::RDFS.Resource) # get the key-value pairs for the subject candidates ||= struct_for repo, subject, only: :literal seen = {} accum = [] strata.each do |lst| lst.each do |cls| next unless STRINGS[cls] and preds = STRINGS[cls][desc ? :desc : :label][alt ? 1 : 0] # warn cls preds.each do |p| # warn p.inspect next unless vals = candidates[p] vals.each do |v| pair = [p, v] accum.push(pair) unless seen[pair] seen[pair] = true end end end end # try that for now unique ? accum[0] : accum.uniq # what we want to do is match the predicates from the subject to # the predicates in the label designation # get label predicate stack(s) for RDF type(s) # get all predicates in order (use alt stack if doubly specified) # filter out desired language(s) # XXX note we will probably want to return the predicate as well end |
.objects_for(repo, subject, predicate, entail: true, only: [], datatype: nil) ⇒ RDF::Term
Returns objects from the graph with entailment.
631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 |
# File 'lib/rdf/sak/util.rb', line 631 def self.objects_for repo, subject, predicate, entail: true, only: [], datatype: nil raise "Subject must be a resource, not #{subject.inspect}" unless subject.is_a? RDF::Resource predicate = predicate.respond_to?(:to_a) ? predicate.to_a : [predicate] raise "Predicate must be a term, not #{predicate.first.class}" unless predicate.all? { |p| p.is_a? RDF::URI } predicate = predicate.map { |x| RDF::Vocabulary.find_term x }.compact only = coerce_node_spec only datatype = ( datatype.respond_to?(:to_a) ? datatype.to_a : [datatype]).compact raise 'Datatype must be some kind of term' unless datatype.all? { |p| p.is_a? RDF::URI } # fluff this out predicate = predicate_set predicate if entail out = {} predicate.each do |p| repo.query([subject, p, nil]).objects.each do |o| # make sure it's in the spec next unless node_matches? o, only # constrain output next if o.literal? and !(datatype.empty? or datatype.include?(o.datatype)) entry = out[o] ||= [Set.new, Set.new] entry.first << p end end # now we do the reverse unless only == [:literal] # generate reverse predicates revp = Set.new predicate.each do |p| revp += p.inverseOf.to_set revp << p if p.type.include? RDF::OWL.SymmetricProperty end revp = predicate_set revp if entail # now scan 'em revp.each do |p| repo.query([nil, p, subject]).subjects.each do |s| next unless node_matches? s, only # no need to check datatype; subject is never a literal entry = out[s] ||= [Set.new, Set.new] entry.last << p end end end # run this through a block to get access to the predicates return out.map { |p, v| yield p, *v } if block_given? out.keys end |
.published?(repo, uri, circulated: false, base: nil) ⇒ true, false
Determine whether the URI represents a published document.
1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 |
# File 'lib/rdf/sak/util.rb', line 1001 def self.published? repo, uri, circulated: false, base: nil uri = coerce_resource uri, base candidates = objects_for( repo, uri, RDF::Vocab::BIBO.status, only: :resource).to_set test = Set[RDF::Vocab::BIBO['status/published']] test << RDF::SAK::CI.circulated if circulated # warn candidates, test, candidates & test !(candidates & test).empty? end |
.replacements_for(repo, subject, published: true, base: nil) ⇒ Set
Find the terminal replacements for the given subject, if any exist.
1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 |
# File 'lib/rdf/sak/util.rb', line 1173 def self.replacements_for repo, subject, published: true, base: nil subject = coerce_resource subject, base # `seen` is a hash mapping resources to publication status and # subsequent replacements. it collects all the resources in the # replacement chain in :fwd (replaces) and :rev (replaced-by) # members, along with a boolean :pub. `seen` also performs a # duty as cycle-breaking sentinel. seen = {} queue = [subject] while (test = queue.shift) # fwd is "replaces", rev is "replaced by" entry = seen[test] ||= { pub: published?(repo, test), fwd: Set.new, rev: Set.new } queue += ( subjects_for(repo, RDF::Vocab::DC.replaces, subject) + objects_for(repo, subject, RDF::Vocab::DC.isReplacedBy, only: :resource) ).uniq.map do |r| # r = replacement next if seen.include? r seen[r] ||= { pub: published?(repo, r), fwd: Set.new, rev: Set.new } seen[r][:fwd] << test entry[:rev] << r r end.compact.uniq end # if we're calling from a published context, we return the # (topologically) last published resource(s), even if they are # replaced ultimately by unpublished resources. out = seen.map { |k, v| v[:rev].empty? ? k : nil }.compact - [subject] # now we modify `out` based on the publication status of the context if published pubout = out.select { |o| seen[o][:pub] } # if there is anything left after this, return it return pubout unless pubout.empty? # now we want to find the penultimate elements of `seen` that # are farthest along the replacement chain but whose status is # published # start with `out`, take the union of their :fwd members, then # take the subset of those which are published. if the result # is empty, repeat. (this is walking backwards through the # graph we just walked forwards through to construct `seen`) loop do # XXX THIS NEEDS A TEST CASE out = seen.values_at(*out).map { |v| v[:fwd] }.reduce(:+).to_a break if out.empty? pubout = out.select { |o| seen[o][:pub] } return pubout unless pubout.empty? end end out end |
.struct_for(repo, subject, base: nil, rev: false, only: [], uuids: false, canon: false, ucache: {}, scache: {}) ⇒ Hash
Obtain a key-value structure for the given subject, optionally constraining the result by node type (:resource, :uri/:iri, :blank/:bnode, :literal)
1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 |
# File 'lib/rdf/sak/util.rb', line 1026 def self.struct_for repo, subject, base: nil, rev: false, only: [], uuids: false, canon: false, ucache: {}, scache: {} only = coerce_node_spec only # coerce the subject subject = canonical_uuid(repo, subject, base: base, scache: scache, ucache: ucache) || subject if uuids rsrc = {} pattern = rev ? [nil, nil, subject] : [subject, nil, nil] repo.query(pattern) do |stmt| # this will skip over any term not matching the type node = rev ? stmt.subject : stmt.object next unless node_matches? node, only # coerce the node to uuid if told to if node.resource? if uuids uu = canonical_uuid(repo, node, scache: scache, ucache: ucache) unless ucache.key? node node = uu || (canon ? canonical_uri(repo, node) : node) elsif canon node = canonical_uri(repo, node) end end p = RDF::Vocabulary.find_term(stmt.predicate) || stmt.predicate o = rsrc[p] ||= [] o.push node if node # may be nil end # XXX in here we can do fun stuff like filter/sort by language/datatype rsrc.values.each { |v| v.sort!.uniq! } rsrc end |
.subjects_for(repo, predicate, object, entail: true, only: []) ⇒ RDF::Resource
Returns subjects from the graph with entailment.
573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 |
# File 'lib/rdf/sak/util.rb', line 573 def self.subjects_for repo, predicate, object, entail: true, only: [] raise 'Object must be a Term' unless object.is_a? RDF::Term predicate = predicate.respond_to?(:to_a) ? predicate.to_a : [predicate] raise 'Predicate must be some kind of term' unless predicate.all? { |p| p.is_a? RDF::URI } only = coerce_node_spec only, rev: true predicate = predicate.map { |x| RDF::Vocabulary.find_term x }.compact predicate = predicate_set predicate if entail out = {} revp = Set.new predicate.each do |p| repo.query([nil, p, object]).subjects.each do |s| next unless node_matches? s, only entry = out[s] ||= [Set.new, Set.new] entry[0] << p end # do this here while we're at it unless object.literal? revp += p.inverseOf.to_set revp << p if p.type.include? RDF::OWL.SymmetricProperty end end unless object.literal? revp = predicate_set revp if entail revp.each do |p| repo.query([object, p, nil]).objects.each do |o| next unless node_matches? o, only entry = out[o] ||= [Set.new, Set.new] entry[1] << p end end end # run this through a block to get access to the predicates return out.map { |p, v| yield p, *v } if block_given? out.keys end |
.traverse_links(node, type: 'application/xhtml+xml', &block) ⇒ Object
Traverse links based on content type.
1286 1287 1288 1289 1290 1291 |
# File 'lib/rdf/sak/util.rb', line 1286 def self.traverse_links node, type: 'application/xhtml+xml', &block enum_for :traverse_links, node, type: type unless block type = type.strip.downcase.gsub(/\s*;.*/, '') xpath = LINK_MAP.fetch type, XPATH[:xlinks] node.xpath(xpath, XPATHNS).each { |node| block.call node } end |
Instance Method Details
#abbreviate(term, prefixes: {}, vocab: nil, noop: true, sort: true) ⇒ String, ...
Only noop: true can be guaranteed to return a value.
Abbreviate one or more URIs into one or more CURIEs if we can. Will through if noop:
is true, or if false, return nil for any URI that can’t be abbreviated this way. Takes a hash of prefix-URI mappings where the keys are assumed to be symbols or nil
to express the current vocabulary, which can be overridden via vocab:
.
1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 |
# File 'lib/rdf/sak/util.rb', line 1643 def abbreviate term, prefixes: {}, vocab: nil, noop: true, sort: true # this returns a duplicate that we can mess with prefixes = sanitize_prefixes prefixes # sanitize vocab raise ArgumentError, 'vocab must be nil or stringable' unless vocab.nil? or vocab.respond_to? :to_s prefixes[nil] = vocab.to_s if vocab scalar = true term = if term.respond_to? :to_a scalar = false term.to_a else [term]; end rev = prefixes.invert term.map! do |t| t = t.to_s slug = nil # we want this value to be nil if no match and !noop # try matching each prefix URI from longest to shortest rev.sort { |a, b| b.first.length <=> a.first.length }.each do |uri, pfx| slug = t.delete_prefix uri # this is saying the URI either doesn't match or abbreviates to "" if slug == t or pfx.nil? && slug.empty? slug = nil else # it's already a slug so we add a prefix if there is one slug = '%s:%s' % [pfx, slug] unless pfx.nil? break # we have our match end end # at this point slug is either an abbreviated term or nil, so: slug ||= t if noop slug end # only sort if noop is set term.sort! if noop && sort scalar ? term.first : term end |
#all_related(rdftype) ⇒ Array
Obtain everything that is an owl:equivalentClass or rdfs:subClassOf the given type.
2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 |
# File 'lib/rdf/sak/util.rb', line 2046 def rdftype t = RDF::Vocabulary.find_term(rdftype) or raise "No type #{rdftype.to_s}" q = [t] # queue c = {} # cache while term = q.shift # add term to cache c[term] = term # keep this from tripping up next unless term.uri? and term.respond_to? :class? # entail equivalent classes term.entail(:equivalentClass).each do |ec| # add equivalent classes to queue (if not already cached) q.push ec unless c[ec] c[ec] = ec unless ec == term end # entail subclasses term.subClass.each do |sc| # add subclasses to queue (if not already cached) q.push sc unless c[sc] c[sc] = sc unless sc == term end end # smush the result c.keys end |
#authors_for(repo, subject, unique: false, contrib: false, base: nil) ⇒ RDF::Value, Array
Assuming the subject is a thing that has authors, return the list of authors. Try bibo:authorList first for an explicit ordering, then continue to the various other predicates.
1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 |
# File 'lib/rdf/sak/util.rb', line 1137 def repo, subject, unique: false, contrib: false, base: nil = [] # try the author list lp = [RDF::Vocab::BIBO[contrib ? :contributorList : :authorList]] lp += lp.first.entail(:equivalentProperty) # XXX cache this lp.each do |pred| o = repo.first_object([subject, pred, nil]) next unless o # note this use of RDF::List is not particularly well-documented += RDF::List.from(repo, o).to_a end # now try various permutations of the author/contributor predicate unsorted = [] preds = contrib ? CONTRIB : AUTHOR preds.each do |pred| unsorted += repo.query([subject, pred, nil]).objects end # prefetch the author names labels = .map { |a| [a, label_for(repo, a)] }.to_h += unsorted.uniq.sort { |a, b| labels[a] <=> labels[b] } unique ? .first : .uniq end |
#cmp_resource(a, b, www: nil) ⇒ Object
866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 |
# File 'lib/rdf/sak/util.rb', line 866 def cmp_resource a, b, www: nil raise 'Comparands must be instances of RDF::Value' unless [a, b].all? { |x| x.is_a? RDF::Value } # URI beats non-URI if a.uri? if b.uri? # https beats http beats other as = a.scheme.downcase.to_sym bs = b.scheme.downcase.to_sym cmp = SCHEME_RANK.fetch(as, 2) <=> SCHEME_RANK.fetch(bs, 2) # bail out early return cmp unless cmp == 0 # this would have returned if the schemes were different, as # such we only need to test one of them if [:http, :https].any?(as) and not www.nil? # if www is non-nil, prefer www or no-www depending on # truthiness of `www` parameter pref = [false, true].zip(www ? [1, 0] : [0, 1]).to_h re = /^(?:(www)\.)?(.*?)$/ ah = re.match(a.host.to_s.downcase)[1,2] bh = re.match(b.host.to_s.downcase)[1,2] # compare hosts sans www cmp = ah[1] <=> bh[1] return cmp unless cmp == 0 # now compare presence of www cmp = pref[ah[0] == 'www'] <=> pref[bh[0] == 'www'] return cmp unless cmp == 0 # if we're still here, compare the path/query/fragment re = /^.*?\/\/.*?(\/.*)$/ al = re.match(a.to_s)[1].to_s bl = re.match(b.to_s)[1].to_s return al <=> bl end return a <=> b else return -1 end elsif b.uri? return 1 else return a <=> b end end |
#coerce_node_spec(spec, rev: false) ⇒ Object
409 410 411 412 413 414 415 416 417 418 |
# File 'lib/rdf/sak/util.rb', line 409 def coerce_node_spec spec, rev: false spec = [spec] unless spec.respond_to? :to_a spec = spec - [:resource] + [:uri, :blank] if spec.include? :resource raise 'Subjects are never literals' if rev and spec.include? :literal spec = NMAP.values_at(*spec).reject(&:nil?).uniq spec = NTESTS.keys if spec.empty? spec.delete :literal if rev spec.uniq end |
#coerce_resource(arg, base = nil, as: :rdf) ⇒ URI, ...
Coerce a stringlike argument into a URI. Raises an exception if the string can’t be turned into a valid URI. Optionally resolves against a base
, and the coercion can be tuned to either URI or RDF::URI via :as
.
1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 |
# File 'lib/rdf/sak/util.rb', line 1487 def coerce_resource arg, base = nil, as: :rdf as = assert_uri_coercion as return arg if as and arg.is_a?({ uri: URI, rdf: RDF::URI }[as]) raise ArgumentError, 'arg must be stringable' unless arg.respond_to? :to_s arg = arg.to_s.strip if arg.start_with? '_:' and as # override the coercion if this is a blank node as = :rdf elsif base begin arg = (base.is_a?(URI) ? base : URI(uri_pp base.to_s.strip)).merge arg rescue URI::InvalidURIError => e warn "attempted to coerce #{arg} which turned out to be invalid: #{e}" return end end URI_COERCIONS[as].call arg end |
#coerce_uuid_urn(arg, base = nil) ⇒ Object
Coerce a stringlike argument into a UUID URN. Will
1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 |
# File 'lib/rdf/sak/util.rb', line 1510 def coerce_uuid_urn arg, base = nil # if this is an ncname then change it if ([URI, RDF::URI] & arg.class.ancestors).empty? && arg.respond_to?(:to_s) arg = arg.to_s # coerce ncname to uuid arg = UUID::NCName::from_ncname(arg, version: 1) if arg =~ /^[A-P](?:[0-9A-Z_-]{20}|[2-7A-Z]{24})[A-P]$/i # now the string is either a UUID or it isn't arg = "urn:uuid:#{arg}" unless arg.start_with? 'urn:uuid:' else arg = arg.class.new arg.to_s.downcase unless arg == arg.to_s.downcase end raise ArgumentError, 'not a UUID' unless arg.to_s =~ /^urn:uuid:[0-9a-f]{8}(?:-[0-9a-f]{4}){4}[0-9a-f]{8}$/ arg = coerce_resource arg, base end |
#dehydrate(doc) ⇒ Object
Strip all the links surrounding and RDFa attributes off dfn
/abbr
/span
tags. Assuming a construct like <a rel=“some:relation” href=“#…” typeof=“skos:Concept”><dfn property=“some:property”>Term</dfn></a> is a link to a glossary entry, this method returns the term back to an undecorated state (+<dfn>Term</dfn>+).
1823 1824 1825 1826 1827 1828 1829 1830 |
# File 'lib/rdf/sak/util.rb', line 1823 def dehydrate doc doc.xpath(XPATH[:dehydrate], XPATHNS).each do |e| e = e.replace e.elements.first.dup %w[about resource typeof rel rev property datatype].each do |a| e.delete a if e.key? a end end end |
#formats_for(repo, subject, predicate: RDF::Vocab::DC.format, datatype: [RDF::XSD.token]) ⇒ Array
Obtain any specified MIME types for the subject. Just shorthand for a common application of ‘objects_for`.
1260 1261 1262 1263 1264 1265 1266 1267 |
# File 'lib/rdf/sak/util.rb', line 1260 def formats_for repo, subject, predicate: RDF::Vocab::DC.format, datatype: [RDF::XSD.token] objects_for( repo, subject, predicate, only: :literal, datatype: datatype) do |o| t = o.object t =~ /\// ? RDF::SAK::MimeMagic.new(t.to_s.downcase) : nil end.compact.sort.uniq end |
#get_base(elem, default: nil, coerce: nil) ⇒ nil, ...
Returns the base URI from the perspective of the given element. Can optionally be coerced into either a URI or RDF::URI. Also takes a default value.
1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 |
# File 'lib/rdf/sak/util.rb', line 1698 def get_base elem, default: nil, coerce: nil assert_uri_coercion coerce if elem.document? elem = elem.root return unless elem end # get the xpath xpath = (elem.namespace && elem.namespace.href == XHTMLNS or elem.at_xpath('/html')) ? :htmlbase : :xmlbase # now we go looking for the attribute if base = elem.at_xpath(XPATH[xpath], XPATHNS) base = base.value.strip else base = default.to_s.strip if default end # clear it out if it's the empty string base = nil if base and base.empty? # eh that's about all the input sanitation we're gonna get base && coerce ? URI_COERCIONS[coerce].call(base) : base end |
#get_prefixes(elem, traverse: true, coerce: nil, descend: false) ⇒ Hash
The descend: true parameter assumes we are trying to collect all the namespaces in use in the entire subtree, rather than resolve any particular CURIE. As such, the first prefix mapping in document order is preserved over subsequent/descendant ones.
Given an X(HT)ML element, returns a hash of prefixes of the form { prefix: “vocab” }, where the current @vocab is represented by the nil
key. An optional :traverse
parameter can be set to false
to prevent ascending the node tree. Any XML namespace declarations are superseded by the @prefix attribute. Returns any @vocab declaration found as the nil
key.
1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 |
# File 'lib/rdf/sak/util.rb', line 1743 def get_prefixes elem, traverse: true, coerce: nil, descend: false coerce = assert_uri_coercion coerce # deal with a common phenomenon elem = elem.root if elem.is_a? Nokogiri::XML::Document # get namespace definitions first prefix = elem.namespaces.reject do |k, _| k == 'xmlns' end.transform_keys { |k| k.split(?:)[1].to_sym } # now do the prefix attribute if elem.key? 'prefix' # XXX note this assumes largely that the input is clean elem['prefix'].strip.split.each_slice(2) do |k, v| pfx = k.split(?:)[0] or next # otherwise error prefix[pfx.to_sym] = v end end # encode the vocab as the null prefix if vocab = elem['vocab'] vocab.strip! # note that a specified but empty @vocab means kill any existing vocab prefix[nil] = vocab.empty? ? nil : vocab end # don't forget we can coerce prefix.transform_values! { |v| COERCIONS[coerce].call v } if coerce # don't proceed if `traverse` is false return prefix unless traverse # save us having to recurse in ruby by using xpath implemented in c xpath = '%s::*[namespace::*|@prefix|@vocab]' % (descend ? :descendant : :ancestor) elem.xpath(xpath).each do |e| # this will always merge our prefix on top irrespective of direction prefix = get_prefix(e, traverse: false, coerce: coerce).merge prefix end prefix end |
#invert_struct(struct) ⇒ Object
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 |
# File 'lib/rdf/sak/util.rb', line 2000 def invert_struct struct nodes = {} struct.each do |p, v| v.each do |o| nodes[o] ||= Set.new nodes[o] << p end end nodes end |
#modernize(doc) ⇒ Object
1809 1810 1811 1812 1813 1814 |
# File 'lib/rdf/sak/util.rb', line 1809 def modernize doc doc.xpath(XPATH[:modernize], XPATHNS).each do |e| # gotta instance_exec because `markup` is otherwise unbound instance_exec e, &MODERNIZE[e.name.to_sym] end end |
#node_matches?(node, spec) ⇒ Boolean
420 421 422 |
# File 'lib/rdf/sak/util.rb', line 420 def node_matches? node, spec spec.any? { |k| node.send NTESTS[k] } end |
#predicate_set(predicates, seen: Set.new) ⇒ Array
Obtain the objects for a given subject-predicate pair.
532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 |
# File 'lib/rdf/sak/util.rb', line 532 def predicate_set predicates, seen: Set.new predicates = Set[predicates] if predicates.is_a? RDF::URI unless predicates.is_a? Set raise "predicates must be a set" unless predicates.respond_to? :to_set predicates = predicates.to_set end # shortcut return predicates if predicates.empty? raise 'predicates must all be RDF::URI' unless predicates.all? do |p| p.is_a? RDF::URI end # first we generate the set of equivalent properties for the given # properties predicates += predicates.map do |p| p.entail :equivalentProperty end.flatten.to_set # then we take the resulting set of properties and # compute their subproperties subp = Set.new (predicates - seen).each do |p| subp += p.subProperty.flatten.to_set end # uhh this whole "seen" business might not be necessary predicates + predicate_set(subp - predicates - seen, seen: predicates) end |
#prefix_subset(prefixes, nodes) ⇒ Hash
Given a hash of prefixes and an array of nodes, obtain the the subset of prefixes that abbreviate the nodes. Scans RDF URIs as well as RDF::Literal datatypes.
1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 |
# File 'lib/rdf/sak/util.rb', line 1961 def prefix_subset prefixes, nodes prefixes = sanitize_prefixes prefixes, true raise 'nodes must be arrayable' unless nodes.respond_to? :to_a # sniff out all the URIs and datatypes resources = Set.new nodes.each do |n| next unless n.is_a? RDF::Term if n.literal? && n.datatype? resources << n.datatype elsif n.uri? resources << n end end # now we abbreviate all the resources pfx = abbreviate(resources.to_a, prefixes: prefixes, noop: false, sort: false).uniq.compact.map do |p| p.split(?:).first.to_sym end.uniq.to_set # now we return the subset prefixes.select { |k, _| pfx.include? k.to_sym } end |
#prepare_collation(struct) {|p, o| ... } ⇒ Hash
Given a structure of the form { predicate => [objects] }, rearrange the structure into one more amenable to rendering RDFa. Returns a hash of the form { resources: { r1 => Set[p1, pn] }, literals: { l1 => Set[p2, pm] }, types: Set[t1, tn], datatypes: Set[d1, dn] }. This inverted structure can then be conveniently traversed to generate the RDFa. An optional block lets us examine the predicate-object pairs as they go by.
1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 |
# File 'lib/rdf/sak/util.rb', line 1922 def prepare_collation struct, &block resources = {} literals = {} datatypes = Set.new types = Set.new struct.each do |p, v| v.each do |o| block.call p, o if block if o.literal? literals[o] ||= Set.new literals[o].add p # collect the datatype datatypes.add o.datatype if o.has_datatype? else if p == RDF::RDFV.type # separate the type types.add o else # collect the resource resources[o] ||= Set.new resources[o].add p end end end end { resources: resources, literals: literals, datatypes: datatypes, types: types } end |
#rehydrate(doc, graph, &block) ⇒ Object
(maybe add code
/kbd
/samp
/var
/time
one day too)
1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 |
# File 'lib/rdf/sak/util.rb', line 1842 def rehydrate doc, graph, &block doc.xpath(XPATH[:rehydrate], XPATHNS).each do |e| lang = e.xpath(XPATH[:lang]).to_s.strip # dt = e['datatype'] # XXX no datatype rn text = (e['content'] || e.xpath('.//text()').to_a.join).strip # now we have the literal lit = [RDF::Literal(text)] lit.unshift RDF::Literal(text, language: lang) unless lang.empty? # candidates cand = {} lit.map do |t| graph.query(object: t).to_a end.flatten.each do |x| y = cand[x.subject] ||= {} (y[:stmts] ||= []) << x y[:types] ||= graph.query([x.subject, RDF.type, nil]).objects.sort end # if there's only one candidate, this is basically a noop chosen = cand.keys.first if cand.size == 1 # call the block to reconcile any gaps or conflicts if block_given? and cand.size != 1 # the block is expected to return one of the candidates or # nil. we call the block with the graph so that the block can # manipulate its contents. chosen = block.call cand, graph raise ArgumentError, 'block must return nil or a term' unless chosen.nil? or chosen.is_a? RDF::Term end if chosen # we assume this has been retrieved from the graph cc = cand[chosen] unless cc cc = cand[chosen] = {} cc[:stmts] = graph.query([chosen, nil, lit[0]]).to_a.sort cc[:types] = graph.query([chosen, RDF.type, nil]).objects.sort # if either of these are empty then the graph was not # appropriately populated raise 'Missing a statement relating #{chosen} to #{text}' if cc[:stmts].empty? end # we should actually probably move any prefix/vocab/xmlns # declarations from the inner node to the outer one (although # in practice this will be an unlikely configuration) pfx = get_prefixes e # here we have pretty much everything except for the prefixes # and wherever we want to actually link to. inner = e.dup spec = { [inner] => :a, href: '' } # we should have types spec[:typeof] = abbreviate cc[:types], prefixes: pfx unless cc[:types].empty? markup replace: e, spec: spec end end # return maybe the elements that did/didn't get changed? end |
#reindent(node, depth = 0, indent = ' ') ⇒ Object
reindent text nodes
1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 |
# File 'lib/rdf/sak/util.rb', line 1318 def reindent node, depth = 0, indent = ' ' kids = node.children if kids and child = kids.first loop do if child.element? # recurse into the element reindent child, depth + 1, indent elsif child.text? text = child.content || '' # optional horizontal whitespace followed by at least # one newline (we don't care what kind), followed by # optional horizontal or vertical whitespace preamble = !!text.gsub!(/\A[ \t]*[\r\n]+\s*/, '') # then we don't care what's in the middle, but hey let's get # rid of dos newlines because we can always put them back # later if we absolutely have to text.gsub!(/\r+/, '') # then optionally any whitespace followed by at least # another newline again, followed by optional horizontal # whitespace and then the end of the string epilogue = !!text.gsub!(/\s*[\r\n]+[ \t]*\z/, '') # if we prune these off we'll have a text node that is # either the empty string or it isn't (note we will only # register an epilogue if the text has some non-whitespace # in it, because otherwise the first regex would have # snagged everything, so it's probably redundant) # if it's *not* empty then we *prepend* indented whitespace if preamble and !text.empty? d = depth + (child.previous ? 1 : 0) text = "\n" + (indent * d) + text end # then we unconditionally *append*, (modulo there being a # newline in the original at all), but we have to check by # how much: if this is *not* the last node then depth + 1, # otherwise depth if preamble or epilogue d = depth + (child.next ? 1 : 0) text << "\n" + (indent * d) end child.content = text end break unless child = child.next end end node end |
#resolve_curie(curie, prefixes: {}, vocab: nil, base: nil, refnode: nil, term: false, noop: true, scalar: false, coerce: nil) ⇒ nil, ...
:vocab
overrides, and is the same as supplying prefix[nil]
. It is only meaningful when :term
(i.e., when we expect the input to be an RDFa term) is true.
Resolve a string or array or attribute node containing one or more terms/CURIEs against a set of prefixes. The CURIE can be a string, Nokogiri::XML::Attr, or an array thereof. Strings are stripped and split on whitespace. :prefixes
and :base
can be supplied or gleaned from :refnode
, which itself can be gleaned if curie
is a Nokogiri::XML::Attr. Returns an array of (attempted) resolved terms unless :scalar
is true, in which case only the first URI is returned. When :noop
is true, this method will always return a value. Can coerce results to either RDF::URI or URI objects.
1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 |
# File 'lib/rdf/sak/util.rb', line 1577 def resolve_curie curie, prefixes: {}, vocab: nil, base: nil, refnode: nil, term: false, noop: true, scalar: false, coerce: nil prefixes = sanitize_prefixes prefixes raise 'coerce must be either :uri or :rdf' if coerce and not %i[uri rdf].include?(coerce) # coerce curie to its value and set refnode if not present if curie.is_a? Nokogiri::XML::Attr refnode ||= curie.parent curie = curie.value.strip.split elsif curie.respond_to? :to_a curie = curie.to_a raise ArgumentError, 'if curie is an array, it has to be all strings' unless curie.all? { |x| x.respond_to? :to_s } curie = curie.map { |x| x.to_s.strip.split }.flatten else raise ArgumentError, 'curie must be stringable' unless curie.respond_to? :to_s curie = curie.to_s.strip.split end if refnode raise ArgumentError, 'refnode must be an element' unless refnode.is_a? Nokogiri::XML::Element prefixes = get_prefixes refnode if prefixes.empty? end # now we overwrite the vocab if vocab raise ArgumentError, 'vocab must be stringable' unless vocab.respond_to? :to_s prefixes[nil] = vocab.to_s.strip end out = curie.map do |c| prefix, slug = /^\[?(?:([^:]+):)?(.*?)\]?$/.match(c).captures prefix = prefix.to_sym if prefix tmp = if prefixes[prefix] prefixes[prefix] + slug else noop ? c : nil end tmp && coerce ? URI_COERCIONS[coerce].call(tmp) : tmp end scalar ? out.first : out end |
#smush_struct(struct) ⇒ Object
turns any data structure into a set of nodes
1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 |
# File 'lib/rdf/sak/util.rb', line 1988 def smush_struct struct out = Set.new if struct.is_a? RDF::Term out << struct elsif struct.respond_to? :to_a out |= struct.to_a.map { |s| smush_struct(s).to_a }.flatten.to_set end out end |
#split_pp(uri, only: false) ⇒ Array
Given a URI as input, split any path parameters out of the last path segment. Works the same way as #split_pp.
1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 |
# File 'lib/rdf/sak/util.rb', line 1437 def split_pp uri, only: false begin u = (uri.is_a?(URI) ? uri : URI(uri_pp uri.to_s)).normalize rescue URI::InvalidURIError => e # these stock error messages don't even tell you what the uri is raise URI::InvalidURIError, "#{e.} (#{uri.to_s})" end return only ? [] : [uri] unless u.path uri = u ps = uri.path.split '/', -1 pp = ps.pop.split ';', -1 bp = (ps + [pp.shift]).join '/' uri = uri.dup begin uri.path = bp rescue URI::InvalidURIError => e # these stock error messages don't even tell you what the uri is m = e. raise URI::InvalidURIError, "#{m} (#{uri.to_s}, #{bp})" end return pp if only [uri] + pp end |
#split_pp2(path, only: false) ⇒ Object
1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 |
# File 'lib/rdf/sak/util.rb', line 1466 def split_pp2 path, only: false # ugh apparently we need a special case for ''.split return only ? [] : [''] if !path or path.empty? ps = path.to_s.split ?/, -1 # path segments pp = ps.pop.to_s.split ?;, -1 # path parameters bp = (ps + [pp.shift]).join ?/ # base path only ? pp : [bp] + pp end |
#split_qp(uri, only: false) ⇒ Array
Given a URI as input, split any query parameters into an array of key-value pairs. If :only
is true, this will just return the pairs. Otherwise it will prepend the query-less URI to the array, and can be captured with an idiom like uri, *qp = split_qp uri.
1422 1423 1424 1425 1426 1427 1428 |
# File 'lib/rdf/sak/util.rb', line 1422 def split_qp uri, only: false uri = URI(uri_pp uri.to_s) unless uri.is_a? URI qp = URI::decode_www_form(uri.query) return qp if only uri.query = nil [uri] + qp end |
#subject_for(node, prefixes: nil, base: nil, coerce: :rdf) ⇒ URI, ...
Given an X(HT)ML element, return the nearest RDFa subject. Optionally takes :prefix
and :base
parameters which override anything found in the document tree.
1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 |
# File 'lib/rdf/sak/util.rb', line 1797 def subject_for node, prefixes: nil, base: nil, coerce: :rdf assert_xml_node node coerce = assert_uri_coercion coerce if n = node.at_xpath(XPATH[:literal]) return internal_subject_for n, prefixes: prefixes, base: base, coerce: coerce end internal_subject_for node, prefixes: prefixes, base: base, coerce: coerce end |
#subtree(doc, xpath = '/*', reindent: true, prefixes: {}) ⇒ Object
isolate an element into a new document
1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 |
# File 'lib/rdf/sak/util.rb', line 1297 def subtree doc, xpath = '/*', reindent: true, prefixes: {} # at this time we shouldn't try to do anything cute with the xpath # even though it is attractive to want to prune out prefixes # how about we start with a noop return doc.root.dup if xpath == '/*' begin nodes = doc.xpath xpath, prefixes return unless nodes and nodes.is_a?(Nokogiri::XML::NodeSet) and !nodes.empty? out = Nokogiri::XML::Document.new out << nodes.first.dup reindent out.root if reindent out rescue Nokogiri::SyntaxError return end end |
#terminal_slug(uri, base: nil) ⇒ String
Get the last non-empty path segment of the URI
1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 |
# File 'lib/rdf/sak/util.rb', line 1537 def terminal_slug uri, base: nil uri = coerce_resource uri, base return unless uri.respond_to? :path if p = uri.path if p = /^\/+(.*?)\/*$/.match(p) if p = p[1].split(/\/+/).last # we need to escape colons or it will think it's absolute return uri_pp(p.split(/;+/).first || '', ':') end end end '' end |
#title_tag(predicates, content, prefixes: {}, vocab: nil, lang: nil, xhtml: true) ⇒ Object
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 |
# File 'lib/rdf/sak/util.rb', line 2013 def title_tag predicates, content, prefixes: {}, vocab: nil, lang: nil, xhtml: true # begin with the tag tag = { '#title' => content.to_s, property: abbreviate(predicates, prefixes: prefixes, vocab: vocab) } # we set the language if it exists and is different from the # body OR if it is xsd:string we set it to the empty string lang = (content.language? && content.language != lang ? content.language : nil) || (content.datatype == RDF::XSD.string && lang ? '' : nil) if lang tag['xml:lang'] = lang if xhtml tag[:lang] = lang end if content.datatype? && content.datatype != RDF::XSD.string tag[:datatype] = abbreviate(content.datatype, prefixes: prefixes, vocab: vocab) end tag end |
#type_strata(rdftype) ⇒ Array
Obtain a stack of types for an asserted initial type or set thereof. Returns an array of arrays, where the first is the asserted types and their inferred equivalents, and subsequent elements are immediate superclasses and their equivalents. A given URI will only appear once in the entire structure.
458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 |
# File 'lib/rdf/sak/util.rb', line 458 def type_strata rdftype # first we coerce this to an array if rdftype.respond_to? :to_a rdftype = rdftype.to_a else rdftype = [rdftype] end # now squash and coerce rdftype = rdftype.uniq.map { |t| RDF::Vocabulary.find_term t }.compact # bail out early return [] if rdftype.empty? # essentially what we want to do is construct a layer of # asserted classes and their inferred equivalents, then probe # the classes in the first layer for subClassOf assertions, # which will form the second layer, and so on. queue = [rdftype] strata = [] seen = Set.new while qin = queue.shift qwork = [] qin.each do |q| qwork << q # entail doesn't include q qwork += q.entail(:equivalentClass) if q.uri? end # grep and flatten qwork = qwork.map do |t| next t if t.is_a? RDF::Vocabulary::Term RDF::Vocabulary.find_term t end.compact.uniq - seen.to_a seen |= qwork # warn "qwork == #{qwork.inspect}" # push current layer out strata.push qwork.dup unless qwork.empty? # now deal with subClassOf qsuper = [] qwork.each { |q| qsuper += q.subClassOf } # grep and flatten this too qsuper = qsuper.map do |t| next t if t.is_a? RDF::Vocabulary::Term RDF::Vocabulary.find_term t end.compact.uniq - seen.to_a # do not append qsuper to seen! # warn "qsuper == #{qsuper.inspect}" # same deal, conditionally push the input queue queue.push qsuper.dup unless qsuper.empty? end # voila strata end |
#uri_pp(uri, extra = '') ⇒ Object
really gotta stop carting this thing around
1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 |
# File 'lib/rdf/sak/util.rb', line 1393 def uri_pp uri, extra = '' # take care of malformed escapes uri = uri.to_s.b.gsub(/%(?![0-9A-Fa-f]{2})/n, '%25') uri.gsub!(/([#{Regexp.quote extra}])/) do |s| sprintf('%%%02X', s.ord) end unless extra.empty? # we want the minimal amount of escaping so we split out the separators out = '' parts = RFC3986.match(uri).captures parts.each_index do |i| next if parts[i].nil? out << SEPS[i].first out << parts[i].b.gsub(SF) { |s| sprintf('%%%02X', s.ord) } out << SEPS[i].last end # make sure escaped hex is upper case like the rfc says out.gsub(/(%[0-9A-Fa-f]{2})/) { |x| x.upcase } end |