Entrez
Entrez is a simple API for making HTTP requests to Entrez utilities (eutils: eutils.ncbi.nlm.nih.gov/).
Installation
gem install entrez
or if you use Bundler:
# Gemfile
gem 'entrez'
It requires httparty.
See ‘Email & Tool’ section below for setup.
Usage
Supported Utilities
-
EFetch
-
EInfo
-
ESearch
-
ESummary
Not yet implemented
-
EPost
-
ELink
-
EGQuery
-
ESpell
You can copy/paste the resulting following request URLs into a browser to see what the response would be.
EFetch, ESummary
args:
-
database
-
params hash (optional)
Entrez.EFetch(‘snp’, id: 123, retmode: :xml) #=> makes request to eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=9268480&retmode=xml. #=> returns XML document with SNP rs9268480 data.
ESummary takes the same arguments.
ESearch
args:
-
database
-
search terms (hash will be converted to termANDanother_term notation. It can also be a string literal.)
-
params hash (optional)
Entrez.ESearch(‘genomeprj’, ‘hapmap’, SEQS: ‘inprogress’, retmode: :xml) #=> makes request to eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=genomeprj&term=hapmap[WORD]+AND+inprogress[SEQS]&retmode=xml. #=> returns XML document with list of Ids of genome projects that match the searc term criteria. #=> i.e. genome projects that have ‘hapmap’ in the description and whose sequencing status is ‘inprogress’.
Customized Queries
You can build your own customized queries if you have something more complex with ANDs and ORs. Use Entrez.convert_search_term_hash() to help you. It converts a hash into a valid Entrez search string properly joined with the operator of your choosing. If you pass in the OR operator, the returned search string will be wrapped in a set of parentheses.
EInfo
args:
-
database
-
params hash (optional)
Entrez.EInfo(‘gene’, retmode: :xml) #=> makes request to eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=gene. #=> returns XML document with list of searchable fields for gene database.
Email & Tool
NCBI asks that you supply the tool you are using and your email. The Entrez gem uses ‘ruby’ as the tool. Email is obtained from an environment variable ENTREZ_EMAIL on your computer. I set mine in my ~/.bash_profile:
export ENTREZ_EMAIL='[email protected]'
NCBI query limits
NCBI recommends no more than 3 URL requests per second: www.ncbi.nlm.nih.gov/books/NBK25497/#chapter2.Usage_Guidelines_and_Requiremen This gem respects this limit. It will delay the next request if the last 3 have been made within 1 second. The amount of delay time is no more than what is necessary to make the next request “respectful”.