Module: RDF::Util::File

Defined in:
lib/rdf/util/file.rb

Overview

Wrapper for retrieving RDF resources from HTTP(S) and file: scheme locations.

By default, HTTP(S) resources are retrieved using Net::HTTP. However, If the [Rest Client](rubygems.org/gems/rest-client) gem is included, it will be used for retrieving resources, allowing for sophisticated HTTP caching using [REST Client Components](rubygems.org/gems/rest-client-components) allowing the use of ‘Rack::Cache` to avoid network access.

To use other HTTP clients, consumers can subclass HttpAdapter and set the File.

Also supports the file: scheme for access to local files.

Since:

  • 0.2.4

Defined Under Namespace

Classes: FaradayAdapter, HttpAdapter, NetHttpAdapter, RemoteDocument, RestClientAdapter

Class Method Summary collapse

Class Method Details

.http_adapter(use_net_http = false) ⇒ HttpAdapter

Get current HTTP adapter. If no adapter has been explicitly set, use RestClientAdapter (if RestClient is loaded), or the NetHttpAdapter

Parameters:

  • use_net_http (Boolean) (defaults to: false)

    use the NetHttpAdapter, even if other adapters have been configured

Returns:

Since:

  • 1.2



244
245
246
247
248
249
250
251
252
253
254
255
256
257
# File 'lib/rdf/util/file.rb', line 244

def http_adapter(use_net_http = false)
  if use_net_http
    NetHttpAdapter
  else
    @http_adapter ||= begin
      # Otherwise, fallback to Net::HTTP
      if defined?(RestClient)
        RestClientAdapter
      else
        NetHttpAdapter
      end
    end
  end
end

.http_adapter=(http_adapter) ⇒ HttpAdapter

Set the HTTP adapter

Parameters:

Returns:

See Also:

Since:

  • 1.2



232
233
234
# File 'lib/rdf/util/file.rb', line 232

def http_adapter= http_adapter
  @http_adapter = http_adapter
end

.open_file(filename_or_url, proxy: nil, headers: {}, verify_none: false, **options) {|RemoteDocument| ... } ⇒ RemoteDocument, Object

Open the file, returning or yielding RemoteDocument.

Input received as non-unicode, is transformed to UTF-8. With Ruby >= 2.2, all UTF is normalized to [Unicode Normalization Form C (NFC)](unicode.org/reports/tr15/#Norm_Forms).

HTTP resources may be retrieved via proxy using the ‘proxy` option. If `RestClient` is loaded, they will use the proxy globally by setting something like the following:

`RestClient.proxy = "http://proxy.example.com/"`.

When retrieving documents over HTTP(S), use the mechanism described in [Providing and Discovering URI Documentation](www.w3.org/2001/tag/awwsw/issue57/latest/) to pass the appropriate ‘base_uri` to the block or as the return.

Applications needing HTTP caching may consider [Rest Client](rubygems.org/gems/rest-client) and [REST Client Components](rubygems.org/gems/rest-client-components) allowing the use of ‘Rack::Cache` as a local file cache.

Examples:

using a local HTTP cache

require 'restclient/components'
require 'rack/cache'
RestClient.enable Rack::Cache
RDF::Util::File.open_file("http://example.org/some/resource")
  # => Cached resource if current, otherwise returned resource

Parameters:

  • filename_or_url (String)

    to open

  • proxy (String) (defaults to: nil)

    HTTP Proxy to use for requests.

  • headers (Array, String) (defaults to: {})

    ({}) HTTP Request headers

    Defaults ‘Accept` header based on available reader content types to allow for content negotiation based on available readers.

    Defaults ‘User-Agent` header, unless one is specified.

  • verify_none (Boolean) (defaults to: false)

    (false) Don’t verify SSL certificates

  • options (Hash{Symbol => Object})

    options are ignored in this implementation. Applications are encouraged to override this implementation to provide more control over HTTP headers and redirect following. If opening as a file, options are passed to ‘Kernel.open`.

Yields:

Yield Returns:

  • (Object)

    returned from open_file

Returns:

Raises:

  • (IOError)

    if not found

Since:

  • 0.2.4



299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
# File 'lib/rdf/util/file.rb', line 299

def self.open_file(filename_or_url, proxy: nil, headers: {}, verify_none: false, **options, &block)
  filename_or_url = $1 if filename_or_url.to_s.match(/^file:(.*)$/)
  remote_document = nil

  if filename_or_url.to_s.match?(/^https?/)
    base_uri = filename_or_url.to_s

    remote_document = self.http_adapter(!!options[:use_net_http]).
      open_url(base_uri,
               proxy:       proxy,
               headers:     headers,
               verify_none: verify_none,
               **options)
  else
    # Fake content type based on found format
    format = RDF::Format.for(filename_or_url.to_s)
    content_type = format ? format.content_type.first : 'text/plain'
    # Open as a file, passing any options
    begin
      url_no_frag_or_query = RDF::URI(filename_or_url).dup
      url_no_frag_or_query.query = nil
      url_no_frag_or_query.fragment = nil
      options[:encoding] ||= Encoding::UTF_8
      Kernel.open(url_no_frag_or_query, "r", **options) do |file|
        document_options = {
          base_uri:     filename_or_url.to_s,
          charset:      file.external_encoding.to_s,
          code:         200,
          content_type: content_type,
          last_modified:file.mtime,
          headers:      {content_type: content_type, last_modified: file.mtime.xmlschema}
        }

        remote_document = RemoteDocument.new(file.read, document_options)
      end
    rescue Errno::ENOENT => e
      raise IOError, e.message
    end
  end

  if block_given?
    yield remote_document
  else
    remote_document
  end
end