Class: Bio::PubMed

Inherits:
NCBI::REST show all
Defined in:
lib/bio/io/pubmed.rb

Overview

Description

The Bio::PubMed class provides several ways to retrieve bibliographic information from the PubMed database at

http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed

Basically, two types of queries are possible:

  • searching for PubMed IDs given a query string:

    • Bio::PubMed#esearch (recommended)

    • Bio::PubMed#search (only retrieves top 20 hits)

  • retrieving the MEDLINE text (i.e. authors, journal, abstract, …) given a PubMed ID

    • Bio::PubMed#efetch (recommended)

    • Bio::PubMed#query (unstable for the change of the HTML design)

    • Bio::PubMed#pmfetch (still working but could be obsoleted by NCBI)

The different methods within the same group are interchangeable and should return the same result.

Additional information about the MEDLINE format and PubMed programmable APIs can be found on the following websites:

  • PubMed Overview:

    http://www.ncbi.nlm.nih.gov/entrez/query/static/overview.html
    
  • PubMed help:

    http://www.ncbi.nlm.nih.gov/entrez/query/static/help/pmhelp.html
    
  • Entrez utilities index:

    http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html
    
  • How to link:

    http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helplinks.chapter.linkshelp
    

Usage

require 'bio'

# If you don't know the pubmed ID:
Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics").each do |x|
  p x
end

Bio::PubMed.search("(genome AND analysis) OR bioinformatics").each do |x|
  p x
end

# To retrieve the MEDLINE entry for a given PubMed ID:
puts Bio::PubMed.efetch("10592173", "14693808")
puts Bio::PubMed.query("10592173")
puts Bio::PubMed.pmfetch("10592173")

# This can be converted into a Bio::MEDLINE object:
manuscript = Bio::PubMed.query("10592173")
medline = Bio::MEDLINE.new(manuscript)

Constant Summary

Constants inherited from NCBI::REST

NCBI::REST::NCBI_INTERVAL

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from NCBI::REST

#einfo, einfo, #esearch_count, esearch_count

Class Method Details

.efetch(*args) ⇒ Object



204
205
206
# File 'lib/bio/io/pubmed.rb', line 204

def self.efetch(*args)
  self.new.efetch(*args)
end

.esearch(*args) ⇒ Object



200
201
202
# File 'lib/bio/io/pubmed.rb', line 200

def self.esearch(*args)
  self.new.esearch(*args)
end

.pmfetch(*args) ⇒ Object



216
217
218
# File 'lib/bio/io/pubmed.rb', line 216

def self.pmfetch(*args)
  self.new.pmfetch(*args)
end

.query(*args) ⇒ Object



212
213
214
# File 'lib/bio/io/pubmed.rb', line 212

def self.query(*args)
  self.new.query(*args)
end

.search(*args) ⇒ Object



208
209
210
# File 'lib/bio/io/pubmed.rb', line 208

def self.search(*args)
  self.new.search(*args)
end

Instance Method Details

#efetch(ids, hash = {}) ⇒ Object

Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez efetch. Multiple PubMed IDs can be provided:

Bio::PubMed.efetch(123)
Bio::PubMed.efetch([123,456,789])

Arguments:

  • ids: list of PubMed IDs (required)

  • hash: hash of E-Utils options

    • retmode: “xml”, “html”, …

    • rettype: “medline”, …

    • retmax: integer (default 100)

    • retstart: integer

    • field

    • reldate

    • mindate

    • maxdate

    • datetype

Returns

Array of MEDLINE formatted String



117
118
119
120
121
122
123
124
125
# File 'lib/bio/io/pubmed.rb', line 117

def efetch(ids, hash = {})
  opts = { "db" => "pubmed", "rettype"  => "medline" }
  opts.update(hash)
  result = super(ids, opts)
  if !opts["retmode"] or opts["retmode"] == "text"
    result = result.split(/\n\n+/)
  end
  result
end

#esearch(str, hash = {}) ⇒ Object

Search the PubMed database by given keywords using E-Utils and returns an array of PubMed IDs.

For information on the possible arguments, see eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html#PubMed


Arguments:

  • str: query string (required)

  • hash: hash of E-Utils options

    • retmode: “xml”, “html”, …

    • rettype: “medline”, …

    • retmax: integer (default 100)

    • retstart: integer

    • field

    • reldate

    • mindate

    • maxdate

    • datetype

Returns

array of PubMed IDs or a number of results



93
94
95
96
97
# File 'lib/bio/io/pubmed.rb', line 93

def esearch(str, hash = {})
  opts = { "db" => "pubmed" }
  opts.update(hash)
  super(str, opts)
end

#pmfetch(id) ⇒ Object

Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez pmfetch.


Arguments:

  • id: PubMed ID (required)

Returns

MEDLINE formatted String



183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# File 'lib/bio/io/pubmed.rb', line 183

def pmfetch(id)
  host = "www.ncbi.nlm.nih.gov"
  path = "/entrez/utils/pmfetch.fcgi?tool=bioruby&mode=text&report=medline&db=PubMed&id="

  ncbi_access_wait

  http = Bio::Command.new_http(host)
  response = http.get(path + CGI.escape(id.to_s))
  result = response.body
  if result =~ /#{id}\s+Error/
    raise( result )
  else
    result = result.gsub("\r", "\n").squeeze("\n").gsub(/<\/?pre>/, '')
    return result
  end
end

#query(*ids) ⇒ Object

Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez query.


Arguments:

  • id: PubMed ID (required)

Returns

MEDLINE formatted String



153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
# File 'lib/bio/io/pubmed.rb', line 153

def query(*ids)
  host = "www.ncbi.nlm.nih.gov"
  path = "/sites/entrez?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid="
  list = ids.collect { |x| CGI.escape(x.to_s) }.join(",")

  ncbi_access_wait

  http = Bio::Command.new_http(host)
  response = http.get(path + list)
  result = response.body
  result = result.scan(/<pre>\s*(.*?)<\/pre>/m).flatten

  if result =~ /id:.*Error occurred/
    # id: xxxxx Error occurred: Article does not exist
    raise( result )
  else
    if ids.size > 1
      return result
    else
      return result.first
    end
  end
end

#search(str) ⇒ Object

Search the PubMed database by given keywords using entrez query and returns an array of PubMed IDs. Caution: this method returns the first 20 hits only. Instead, use of the ‘esearch’ method is strongly recomended.


Arguments:

  • id: query string (required)

Returns

array of PubMed IDs



134
135
136
137
138
139
140
141
142
143
144
145
# File 'lib/bio/io/pubmed.rb', line 134

def search(str)
  host = "www.ncbi.nlm.nih.gov"
  path = "/sites/entrez?tool=bioruby&cmd=Search&doptcmdl=Brief&db=PubMed&term="

  ncbi_access_wait

  http = Bio::Command.new_http(host)
  response = http.get(path + CGI.escape(str))
  result = response.body
  result = result.scan(/value="(\d+)" id="UidCheckBox"/m).flatten
  return result
end