ScienceDirect API in Python

ScienceDirect API in Python#

by Vincent F. Scalfani

These recipe examples use the Elsevier ScienceDirect Article (Full-Text) API. Code was tested and sample data downloaded from the ScienceDirect API on April 20, 2022 via http://api.elsevier.com and https://www.sciencedirect.com/.

You will need to register for an API key from the Elsevier Developer portal in order to use the ScienceDirect API. This tutorial content is intended to help facillitate academic research. Before continuing or reusing any of this code, please be aware of Elsevier’s API policies and appropiate use-cases, as for example, Elsevier has detailed policies regarding text and data mining of Elsevier full-text content. If you have copyright or other related text and data mining questions, please contact The University of Alabama Libraries.

ScienceDirect APIs Specification: https://dev.elsevier.com/sd_api_spec.html

Elsevier How to Guide: Text Mining: https://dev.elsevier.com/tecdoc_text_mining.html

Setup#

Import Libraries#

import json
import requests
from time import sleep

Import API key#

As a good practice, do not display your API key in your computational notebook (to prevent accidental sharing). Save your API key to a separate python file, then import your key.

from api_key import myAPIKey

Identifier Note#

We will use DOIs as the article identifiers. See our Crossref and Scopus API tutorials for workflows on how to create lists of DOIs and identfiers for specific searches and journals. The Elsevier ScienceDirect Article (Full-Text) API also accepts other identifiers like Scopus IDs and PubMed IDs (see API specification documents linked above).

1. Retrieve full-text XML of an article#

# for xml download
elsevier_url = "https://api.elsevier.com/content/article/doi/"
doi1 = '10.1016/j.tetlet.2017.07.080' # example Tetrahedron Letters article
fulltext1 = requests.get(elsevier_url + doi1 + "?APIKey=" + myAPIKey + "&httpAccept=text/xml")

# save to file
with open('fulltext1.xml', 'w') as outfile:
    outfile.write(fulltext1.text)

2. Retrieve plain text of an article#

# for simplified text download
elsevier_url = "https://api.elsevier.com/content/article/doi/"
doi2 = '10.1016/j.tetlet.2022.153680' # example Tetrahedron Letters article
fulltext2 = requests.get(elsevier_url + doi2 + "?APIKey=" + myAPIKey + "&httpAccept=text/plain")

# save to file
with open('fulltext2.txt', 'w') as outfile:
    outfile.write(fulltext2.text)

3. Retrieve full-text in a loop#

# make a list of 5 DOIs for testing
dois = ['10.1016/j.tetlet.2018.10.031',
        '10.1016/j.tetlet.2018.10.033',
        '10.1016/j.tetlet.2018.10.034',
        '10.1016/j.tetlet.2018.10.038',
        '10.1016/j.tetlet.2018.10.041']

# Retrieve article full text for each DOI in a loop and save each article to a separate file.
# Example shown for plain text, XML also works (replace 'plain' with 'xml')

elsevier_url = "https://api.elsevier.com/content/article/doi/"
for doi in dois:
    article = requests.get(elsevier_url + doi + "?APIKey=" + myAPIKey + "&httpAccept=text/plain")    
    doi_name = doi.replace('/','_') # can't save files with a '/' character on Linux
    with open(doi_name + '_plain_text.txt', 'w') as outfile:
        outfile.write(article.text)
    sleep(1) # pause for 1 second between API calls