Class: Bio::PubMed
- Inherits:
-
NCBI::REST
- Object
- NCBI::REST
- Bio::PubMed
- Defined in:
- lib/bio/io/pubmed.rb
Overview
Description
The Bio::PubMed class provides several ways to retrieve bibliographic information from the PubMed database at
http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed
Basically, two types of queries are possible:
-
searching for PubMed IDs given a query string:
-
Bio::PubMed#esearch (recommended)
-
Bio::PubMed#search (only retrieves top 20 hits)
-
-
retrieving the MEDLINE text (i.e. authors, journal, abstract, …) given a PubMed ID
-
Bio::PubMed#efetch (recommended)
-
Bio::PubMed#query (unstable for the change of the HTML design)
-
Bio::PubMed#pmfetch (still working but could be obsoleted by NCBI)
-
The different methods within the same group are interchangeable and should return the same result.
Additional information about the MEDLINE format and PubMed programmable APIs can be found on the following websites:
-
PubMed Overview:
http://www.ncbi.nlm.nih.gov/entrez/query/static/overview.html
-
PubMed help:
http://www.ncbi.nlm.nih.gov/entrez/query/static/help/pmhelp.html
-
Entrez utilities index:
http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html
-
How to link:
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helplinks.chapter.linkshelp
Usage
require 'bio'
# If you don't know the pubmed ID:
Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics").each do |x|
p x
end
Bio::PubMed.search("(genome AND analysis) OR bioinformatics").each do |x|
p x
end
# To retrieve the MEDLINE entry for a given PubMed ID:
puts Bio::PubMed.efetch("10592173", "14693808")
puts Bio::PubMed.query("10592173")
puts Bio::PubMed.pmfetch("10592173")
# This can be converted into a Bio::MEDLINE object:
manuscript = Bio::PubMed.query("10592173")
medline = Bio::MEDLINE.new(manuscript)
Constant Summary
Constants inherited from NCBI::REST
Class Method Summary collapse
- .efetch(*args) ⇒ Object
- .esearch(*args) ⇒ Object
- .pmfetch(*args) ⇒ Object
- .query(*args) ⇒ Object
- .search(*args) ⇒ Object
Instance Method Summary collapse
-
#efetch(ids, hash = {}) ⇒ Object
Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez efetch.
-
#esearch(str, hash = {}) ⇒ Object
Search the PubMed database by given keywords using E-Utils and returns an array of PubMed IDs.
-
#pmfetch(id) ⇒ Object
Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez pmfetch.
-
#query(*ids) ⇒ Object
Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez query.
-
#search(str) ⇒ Object
Search the PubMed database by given keywords using entrez query and returns an array of PubMed IDs.
Methods inherited from NCBI::REST
#einfo, einfo, #esearch_count, esearch_count
Class Method Details
.efetch(*args) ⇒ Object
204 205 206 |
# File 'lib/bio/io/pubmed.rb', line 204 def self.efetch(*args) self.new.efetch(*args) end |
.esearch(*args) ⇒ Object
200 201 202 |
# File 'lib/bio/io/pubmed.rb', line 200 def self.esearch(*args) self.new.esearch(*args) end |
.pmfetch(*args) ⇒ Object
216 217 218 |
# File 'lib/bio/io/pubmed.rb', line 216 def self.pmfetch(*args) self.new.pmfetch(*args) end |
.query(*args) ⇒ Object
212 213 214 |
# File 'lib/bio/io/pubmed.rb', line 212 def self.query(*args) self.new.query(*args) end |
.search(*args) ⇒ Object
208 209 210 |
# File 'lib/bio/io/pubmed.rb', line 208 def self.search(*args) self.new.search(*args) end |
Instance Method Details
#efetch(ids, hash = {}) ⇒ Object
Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez efetch. Multiple PubMed IDs can be provided:
Bio::PubMed.efetch(123)
Bio::PubMed.efetch([123,456,789])
Arguments:
-
ids: list of PubMed IDs (required)
-
hash: hash of E-Utils options
-
retmode: “xml”, “html”, …
-
rettype: “medline”, …
-
retmax: integer (default 100)
-
retstart: integer
-
field
-
reldate
-
mindate
-
maxdate
-
datetype
-
- Returns
-
Array of MEDLINE formatted String
117 118 119 120 121 122 123 124 125 |
# File 'lib/bio/io/pubmed.rb', line 117 def efetch(ids, hash = {}) opts = { "db" => "pubmed", "rettype" => "medline" } opts.update(hash) result = super(ids, opts) if !opts["retmode"] or opts["retmode"] == "text" result = result.split(/\n\n+/) end result end |
#esearch(str, hash = {}) ⇒ Object
Search the PubMed database by given keywords using E-Utils and returns an array of PubMed IDs.
For information on the possible arguments, see eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html#PubMed
Arguments:
-
str: query string (required)
-
hash: hash of E-Utils options
-
retmode: “xml”, “html”, …
-
rettype: “medline”, …
-
retmax: integer (default 100)
-
retstart: integer
-
field
-
reldate
-
mindate
-
maxdate
-
datetype
-
- Returns
-
array of PubMed IDs or a number of results
93 94 95 96 97 |
# File 'lib/bio/io/pubmed.rb', line 93 def esearch(str, hash = {}) opts = { "db" => "pubmed" } opts.update(hash) super(str, opts) end |
#pmfetch(id) ⇒ Object
Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez pmfetch.
Arguments:
-
id: PubMed ID (required)
- Returns
-
MEDLINE formatted String
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/bio/io/pubmed.rb', line 183 def pmfetch(id) host = "www.ncbi.nlm.nih.gov" path = "/entrez/utils/pmfetch.fcgi?tool=bioruby&mode=text&report=medline&db=PubMed&id=" ncbi_access_wait http = Bio::Command.new_http(host) response = http.get(path + CGI.escape(id.to_s)) result = response.body if result =~ /#{id}\s+Error/ raise( result ) else result = result.gsub("\r", "\n").squeeze("\n").gsub(/<\/?pre>/, '') return result end end |
#query(*ids) ⇒ Object
Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez query.
Arguments:
-
id: PubMed ID (required)
- Returns
-
MEDLINE formatted String
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
# File 'lib/bio/io/pubmed.rb', line 153 def query(*ids) host = "www.ncbi.nlm.nih.gov" path = "/sites/entrez?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid=" list = ids.collect { |x| CGI.escape(x.to_s) }.join(",") ncbi_access_wait http = Bio::Command.new_http(host) response = http.get(path + list) result = response.body result = result.scan(/<pre>\s*(.*?)<\/pre>/m).flatten if result =~ /id:.*Error occurred/ # id: xxxxx Error occurred: Article does not exist raise( result ) else if ids.size > 1 return result else return result.first end end end |
#search(str) ⇒ Object
Search the PubMed database by given keywords using entrez query and returns an array of PubMed IDs. Caution: this method returns the first 20 hits only. Instead, use of the ‘esearch’ method is strongly recomended.
Arguments:
-
id: query string (required)
- Returns
-
array of PubMed IDs
134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'lib/bio/io/pubmed.rb', line 134 def search(str) host = "www.ncbi.nlm.nih.gov" path = "/sites/entrez?tool=bioruby&cmd=Search&doptcmdl=Brief&db=PubMed&term=" ncbi_access_wait http = Bio::Command.new_http(host) response = http.get(path + CGI.escape(str)) result = response.body result = result.scan(/value="(\d+)" id="UidCheckBox"/m).flatten return result end |