Class: Bio::PubMed
- Inherits:
-
NCBI::REST
- Object
- NCBI::REST
- Bio::PubMed
- Defined in:
- lib/bio/io/pubmed.rb
Overview
Description
The Bio::PubMed class provides several ways to retrieve bibliographic information from the PubMed database at
http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed
Basically, two types of queries are possible:
-
searching for PubMed IDs given a query string:
-
Bio::PubMed#esearch (recommended)
-
Bio::PubMed#search (only retrieves top 20 hits)
-
-
retrieving the MEDLINE text (i.e. authors, journal, abstract, ...) given a PubMed ID
-
Bio::PubMed#efetch (recommended)
-
Bio::PubMed#query (unstable for the change of the HTML design)
-
Bio::PubMed#pmfetch (still working but could be obsoleted by NCBI)
-
The different methods within the same group are interchangeable and should return the same result.
Additional information about the MEDLINE format and PubMed programmable APIs can be found on the following websites:
-
PubMed Overview:
http://www.ncbi.nlm.nih.gov/entrez/query/static/overview.html -
PubMed help:
http://www.ncbi.nlm.nih.gov/entrez/query/static/help/pmhelp.html -
Entrez utilities index:
http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html -
How to link:
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helplinks.chapter.linkshelp
Usage
require 'bio'
# If you don't know the pubmed ID:
Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics").each do |x|
p x
end
Bio::PubMed.search("(genome AND analysis) OR bioinformatics").each do |x|
p x
end
# To retrieve the MEDLINE entry for a given PubMed ID:
puts Bio::PubMed.efetch("10592173", "14693808")
puts Bio::PubMed.query("10592173")
puts Bio::PubMed.pmfetch("10592173")
# This can be converted into a Bio::MEDLINE object:
manuscript = Bio::PubMed.query("10592173")
medline = Bio::MEDLINE.new(manuscript)
Constant Summary
Constant Summary
Constants inherited from NCBI::REST
Class Method Summary (collapse)
- + (Object) efetch(*args)
- + (Object) esearch(*args)
- + (Object) pmfetch(*args)
- + (Object) query(*args)
- + (Object) search(*args)
Instance Method Summary (collapse)
-
- (Object) efetch(ids, hash = {})
Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez efetch.
-
- (Object) esearch(str, hash = {})
Search the PubMed database by given keywords using E-Utils and returns an array of PubMed IDs.
-
- (Object) pmfetch(id)
Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez pmfetch.
-
- (Object) query(*ids)
Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez query.
-
- (Object) search(str)
Search the PubMed database by given keywords using entrez query and returns an array of PubMed IDs.
Methods inherited from NCBI::REST
#einfo, einfo, #esearch_count, esearch_count
Class Method Details
+ (Object) efetch(*args)
204 205 206 |
# File 'lib/bio/io/pubmed.rb', line 204 def self.efetch(*args) self.new.efetch(*args) end |
+ (Object) esearch(*args)
200 201 202 |
# File 'lib/bio/io/pubmed.rb', line 200 def self.esearch(*args) self.new.esearch(*args) end |
+ (Object) pmfetch(*args)
216 217 218 |
# File 'lib/bio/io/pubmed.rb', line 216 def self.pmfetch(*args) self.new.pmfetch(*args) end |
+ (Object) query(*args)
212 213 214 |
# File 'lib/bio/io/pubmed.rb', line 212 def self.query(*args) self.new.query(*args) end |
+ (Object) search(*args)
208 209 210 |
# File 'lib/bio/io/pubmed.rb', line 208 def self.search(*args) self.new.search(*args) end |
Instance Method Details
- (Object) efetch(ids, hash = {})
Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez efetch. Multiple PubMed IDs can be provided:
Bio::PubMed.efetch(123)
Bio::PubMed.efetch([123,456,789])
Arguments:
-
ids: list of PubMed IDs (required)
-
hash: hash of E-Utils options
-
retmode: ???xml???, ???html???, ???
-
rettype: ???medline???, ???
-
retmax: integer (default 100)
-
retstart: integer
-
field
-
reldate
-
mindate
-
maxdate
-
datetype
-
Returns |
Array of MEDLINE formatted String |
117 118 119 120 121 122 123 124 125 |
# File 'lib/bio/io/pubmed.rb', line 117 def efetch(ids, hash = {}) opts = { "db" => "pubmed", "rettype" => "medline" } opts.update(hash) result = super(ids, opts) if !opts["retmode"] or opts["retmode"] == "text" result = result.split(/\n\n+/) end result end |
- (Object) esearch(str, hash = {})
Search the PubMed database by given keywords using E-Utils and returns an array of PubMed IDs.
For information on the possible arguments, see eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html#PubMed
Arguments:
-
str: query string (required)
-
hash: hash of E-Utils options
-
retmode: ???xml???, ???html???, ???
-
rettype: ???medline???, ???
-
retmax: integer (default 100)
-
retstart: integer
-
field
-
reldate
-
mindate
-
maxdate
-
datetype
-
Returns |
array of PubMed IDs or a number of results |
93 94 95 96 97 |
# File 'lib/bio/io/pubmed.rb', line 93 def esearch(str, hash = {}) opts = { "db" => "pubmed" } opts.update(hash) super(str, opts) end |
- (Object) pmfetch(id)
Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez pmfetch.
Arguments:
-
id: PubMed ID (required)
Returns |
MEDLINE formatted String |
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/bio/io/pubmed.rb', line 183 def pmfetch(id) host = "www.ncbi.nlm.nih.gov" path = "/entrez/utils/pmfetch.fcgi?tool=bioruby&mode=text&report=medline&db=PubMed&id=" ncbi_access_wait http = Bio::Command.new_http(host) response = http.get(path + CGI.escape(id.to_s)) result = response.body if result =~ /#{id}\s+Error/ raise( result ) else result = result.gsub("\r", "\n").squeeze("\n").gsub(/<\/?pre>/, '') return result end end |
- (Object) query(*ids)
Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez query.
Arguments:
-
id: PubMed ID (required)
Returns |
MEDLINE formatted String |
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
# File 'lib/bio/io/pubmed.rb', line 153 def query(*ids) host = "www.ncbi.nlm.nih.gov" path = "/sites/entrez?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid=" list = ids.collect { |x| CGI.escape(x.to_s) }.join(",") ncbi_access_wait http = Bio::Command.new_http(host) response = http.get(path + list) result = response.body result = result.scan(/<pre>\s*(.*?)<\/pre>/m).flatten if result =~ /id:.*Error occurred/ # id: xxxxx Error occurred: Article does not exist raise( result ) else if ids.size > 1 return result else return result.first end end end |
- (Object) search(str)
Search the PubMed database by given keywords using entrez query and returns an array of PubMed IDs. Caution: this method returns the first 20 hits only. Instead, use of the 'esearch' method is strongly recomended.
Arguments:
-
id: query string (required)
Returns |
array of PubMed IDs |
134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'lib/bio/io/pubmed.rb', line 134 def search(str) host = "www.ncbi.nlm.nih.gov" path = "/sites/entrez?tool=bioruby&cmd=Search&doptcmdl=Brief&db=PubMed&term=" ncbi_access_wait http = Bio::Command.new_http(host) response = http.get(path + CGI.escape(str)) result = response.body result = result.scan(/value="(\d+)" id="UidCheckBox"/m).flatten return result end |