Class: Bio::NCBI::REST

Inherits:
Object
  • Object
show all
Defined in:
lib/bio/io/ncbirest.rb

Overview

Description

The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities

Entrez Programming Utilities Help:

Direct Known Subclasses

PubMed

Defined Under Namespace

Classes: EFetch, ESearch

Constant Summary collapse

NCBI_INTERVAL =

Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time weekdays for any series of more than 100 requests. -> Not implemented yet in BioRuby

Wait for 1/3 seconds. NCBI’s restriction is: “Make no more than 3 requests every 1 second.”.

1.0 / 3.0
@@last_access =
nil
@@last_access_mutex =
nil

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.efetch(*args) ⇒ Object



391
392
393
# File 'lib/bio/io/ncbirest.rb', line 391

def self.efetch(*args)
  self.new.efetch(*args)
end

.einfoObject



379
380
381
# File 'lib/bio/io/ncbirest.rb', line 379

def self.einfo
  self.new.einfo
end

.esearch(*args) ⇒ Object



383
384
385
# File 'lib/bio/io/ncbirest.rb', line 383

def self.esearch(*args)
  self.new.esearch(*args)
end

.esearch_count(*args) ⇒ Object



387
388
389
# File 'lib/bio/io/ncbirest.rb', line 387

def self.esearch_count(*args)
  self.new.esearch_count(*args)
end

Instance Method Details

#efetch(ids, hash = {}, step = 100) ⇒ Object

Retrieve database entries by given IDs and using E-Utils (efetch) service.

For information on the possible arguments, see

Usage

ncbi = Bio::NCBI::REST.new
ncbi.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
ncbi.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"})
ncbi.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Bio::NCBI::REST.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"})
Bio::NCBI::REST.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Arguments:

  • ids: list of NCBI entry IDs (required)

  • hash: hash of E-Utils option => “nuccore”, “rettype” => “gb”

    • db: “sequences”, “nucleotide”, “protein”, “pubmed”, “omim”, …

    • retmode: “text”, “xml”, “html”, …

    • rettype: “gb”, “gbc”, “medline”, “count”,…

  • step: maximum number of entries retrieved at a time

Returns

String



355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
# File 'lib/bio/io/ncbirest.rb', line 355

def efetch(ids, hash = {}, step = 100)
  serv = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
  opts = default_parameters.merge({ "retmode"  => "text" })
  opts.update(hash)

  case ids
  when Array
    list = ids
  else
    list = ids.to_s.split(/\s*,\s*/)
  end

  result = ""
  0.step(list.size, step) do |i|
    opts["id"] = list[i, step].join(',')
    unless opts["id"].empty?
      response = ncbi_post_form(serv, opts)
      result += response.body
    end
  end
  return result.strip
  #return result.strip.split(/\n\n+/)
end

#einfoObject

List the NCBI database names E-Utils (einfo) service

pubmed protein nucleotide nuccore nucgss nucest structure genome
books cancerchromosomes cdd gap domains gene genomeprj gensat geo
gds homologene journals mesh ncbisearch nlmcatalog omia omim pmc
popset probe proteinclusters pcassay pccompound pcsubstance snp
taxonomy toolkit unigene unists

Usage

ncbi = Bio::NCBI::REST.new
ncbi.einfo

Bio::NCBI::REST.einfo

Returns

array of string (database names)



218
219
220
221
222
223
224
225
# File 'lib/bio/io/ncbirest.rb', line 218

def einfo
  serv = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi"
  opts = default_parameters.merge({})
  response = ncbi_post_form(serv, opts)
  result = response.body
  list = result.scan(/<DbName>(.*?)<\/DbName>/m).flatten
  return list
end

#esearch(str, hash = {}, limit = nil, step = 10000) ⇒ Object

Search the NCBI database by given keywords using E-Utils (esearch) service and returns an array of entry IDs.

For information on the possible arguments, see

Usage

ncbi = Bio::NCBI::REST.new
ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
ncbi.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Arguments:

  • str: query string (required)

  • hash: hash of E-Utils option => “nuccore”, “rettype” => “gb”

    • db: “sequences”, “nucleotide”, “protein”, “pubmed”, “taxonomy”, …

    • retmode: “text”, “xml”, “html”, …

    • rettype: “gb”, “medline”, “count”, …

    • retmax: integer (default 100)

    • retstart: integer

    • field:

      • “titl”: Title [TI]

      • “tiab”: Title/Abstract [TIAB]

      • “word”: Text words [TW]

      • “auth”: Author [AU]

      • “affl”: Affiliation [AD]

      • “jour”: Journal [TA]

      • “vol”: Volume [VI]

      • “iss”: Issue [IP]

      • “page”: First page [PG]

      • “pdat”: Publication date [DP]

      • “ptyp”: Publication type [PT]

      • “lang”: Language [LA]

      • “mesh”: MeSH term [MH]

      • “majr”: MeSH major topic [MAJR]

      • “subh”: Mesh sub headings [SH]

      • “mhda”: MeSH date [MHDA]

      • “ecno”: EC/RN Number [rn]

      • “si”: Secondary source ID [SI]

      • “uid”: PubMed ID (PMID) [UI]

      • “fltr”: Filter [FILTER] [SB]

      • “subs”: Subset [SB]

    • reldate: 365

    • mindate: 2001

    • maxdate: 2002/01/01

    • datetype: “edat”

  • limit: maximum number of entries to be returned (0 for unlimited; nil for the “retmax” value in the hash or the internal default value (=100))

  • step: maximum number of entries retrieved at a time

Returns

array of entry IDs or a number of results



286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
# File 'lib/bio/io/ncbirest.rb', line 286

def esearch(str, hash = {}, limit = nil, step = 10000)
  serv = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
  opts = default_parameters.merge({ "term" => str })
  opts.update(hash)

  case opts["rettype"]
  when "count"
    count = esearch_count(str, opts)
    return count
  else
    retstart = 0
    retstart = hash["retstart"].to_i if hash["retstart"]

    limit ||= hash["retmax"].to_i if hash["retmax"]
    limit ||= 100 # default limit is 100
    limit = esearch_count(str, opts) if limit == 0   # unlimit

    list = []
    0.step(limit, step) do |i|
      retmax = [step, limit - i].min
      opts.update("retmax" => retmax, "retstart" => i + retstart)
      response = ncbi_post_form(serv, opts)
      result = response.body
      list += result.scan(/<Id>(.*?)<\/Id>/m).flatten
    end
    return list
  end
end

#esearch_count(str, hash = {}) ⇒ Object

Arguments

same as esearch method

Returns

array of entry IDs or a number of results



317
318
319
320
321
322
323
324
325
326
# File 'lib/bio/io/ncbirest.rb', line 317

def esearch_count(str, hash = {})
  serv = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
  opts = default_parameters.merge({ "term" => str })
  opts.update(hash)
  opts.update("rettype" => "count")
  response = ncbi_post_form(serv, opts)
  result = response.body
  count = result.scan(/<Count>(.*?)<\/Count>/m).flatten.first.to_i
  return count
end