Class: OAI::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/oai/client.rb

Overview

A OAI::Client provides a client api for issuing OAI-PMH verbs against a OAI-PMH server. The 6 OAI-PMH verbs translate directly to methods you can call on a OAI::Client object. Verb arguments are passed as a hash:

“‘ruby

client = OAI::Client.new 'http://www.pubmedcentral.gov/oai/oai.cgi'
record = client.get_record :identifier => 'oai:pubmedcentral.gov:13901'
for identifier in client.list_identifiers
  puts identifier
end

“‘

It is worth noting that the API uses methods and parameter names with underscores in them rather than studly caps. So above list_identifiers and metadata_prefix are used instead of the listIdentifiers and metadataPrefix used in the OAI-PMH specification.

Also, the from and until arguments which specify dates should be passed in as Date or DateTime objects depending on the granularity supported by the server.

For detailed information on the arguments that can be used please consult the OAI-PMH docs at <www.openarchives.org/OAI/openarchivesprotocol.html>.

Instance Method Summary collapse

Constructor Details

#initialize(base_url, options = {}) ⇒ Client

The constructor which must be passed a valid base url for an oai service:

client = OAI::Client.new 'http://www.pubmedcentral.gov/oai/oai.cgi'

If you want to see debugging messages on STDERR use:

client = OAI::Client.new 'http://example.com', :debug => true

By default OAI verbs called on the client will return REXML::Element objects for metadata records, however if you wish you can use the :parser option to indicate you want to use libxml instead, and get back XML::Node objects

client = OAI::Client.new 'http://example.com', :parser => 'libxml'

You can configure the Faraday HTTP client by providing an alternate Faraday instance:

“‘ruby client = OAI::Client.new ’example.com’, :http => Faraday.new {|c|} “‘

### HIGH PERFORMANCE

If you want to supercharge this api install ‘libxml-ruby >= 0.3.8` and use the :parser option when you construct your OAI::Client.



86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/oai/client.rb', line 86

def initialize(base_url, options={})
  @base = URI.parse base_url
  @debug = options.fetch(:debug, false)
  @parser = options.fetch(:parser, 'rexml')
  @headers = options.fetch(:headers, {})

  @http_client = options.fetch(:http) do
    Faraday.new(:url => @base.clone) do |builder|
      follow_redirects = options.fetch(:redirects, true)
      follow_redirects = 5 if follow_redirects == true

      if follow_redirects
        require 'faraday_middleware'
        builder.response :follow_redirects, :limit => follow_redirects.to_i
      end
      builder.adapter :net_http
    end
  end

  # load appropriate parser
  case @parser
  when 'libxml'
    begin
      require 'rubygems'
      require 'xml/libxml'
    rescue
      raise OAI::Exception.new("xml/libxml not available")
    end
  when 'rexml'
    require 'rexml/document'
    require 'rexml/xpath'
  else
    raise OAI::Exception.new("unknown parser: #{@parser}")
  end
end

Instance Method Details

#get_record(opts = {}) ⇒ Object

Equivalent to a GetRecord request. You must supply an :identifier argument. You should get back a OAI::GetRecordResponse object which you can extract a OAI::Record object from.



154
155
156
# File 'lib/oai/client.rb', line 154

def get_record(opts={})
  OAI::GetRecordResponse.new(do_request('GetRecord', opts))
end

#identifyObject

Equivalent to a Identify request. You’ll get back a OAI::IdentifyResponse object which is essentially just a wrapper around a REXML::Document for the response. If you created your client using the libxml parser then you will get an XML::Node object instead.



127
128
129
# File 'lib/oai/client.rb', line 127

def identify
  OAI::IdentifyResponse.new(do_request('Identify'))
end

#list_identifiers(opts = {}) ⇒ Object

Equivalent to a ListIdentifiers request. Pass in :from, :until arguments as Date or DateTime objects as appropriate depending on the granularity supported by the server.

You can use seamless resumption with this verb, which allows you to mitigate (to some extent) the lack of a Count verb:

client.list_identifiers.full.count # Don't try this on PubMed though!


147
148
149
# File 'lib/oai/client.rb', line 147

def list_identifiers(opts={})
  do_resumable(OAI::ListIdentifiersResponse, 'ListIdentifiers', opts)
end

#list_metadata_formats(opts = {}) ⇒ Object

Equivalent to a ListMetadataFormats request. A ListMetadataFormatsResponse object is returned to you.



134
135
136
# File 'lib/oai/client.rb', line 134

def (opts={})
  OAI::ListMetadataFormatsResponse.new(do_request('ListMetadataFormats', opts))
end

#list_records(opts = {}) ⇒ Object

Equivalent to the ListRecords request. A ListRecordsResponse will be returned which you can use to iterate through records

response = client.list_records
response.each do |record|
  puts record.
end

Alternately, you can use seamless resumption to avoid handling resumption tokens:

client.list_records.full.each do |record|
  puts record.
end

### Memory Use :full will avoid storing more than one page of records in memory, but your use it in ways that override that behaviour. Be careful to avoid using client.list_records.full.entries unless you really want to hold all the records in the feed in memory!



178
179
180
# File 'lib/oai/client.rb', line 178

def list_records(opts={})
  do_resumable(OAI::ListRecordsResponse, 'ListRecords', opts)
end

#list_sets(opts = {}) ⇒ Object

Equivalent to the ListSets request. A ListSetsResponse object will be returned which you can use for iterating through the OAI::Set objects

for set in client.list_sets
  puts set
end

A large number of sets is not unusual for some OAI-PMH feeds, so using seamless resumption may be preferable:

client.list_sets.full.each do |set|
  puts set
end


196
197
198
# File 'lib/oai/client.rb', line 196

def list_sets(opts={})
  do_resumable(OAI::ListSetsResponse, 'ListSets', opts)
end