# Crossref API in Python
by Avery Fernandez and Vincent F. Scalfani

**Crossref API documentation:** https://api.crossref.org/swagger-ui/index.html

These recipe examples were tested on January 21, 2022.

*From our testing, we have found that the crossref metadata across publishers and even journals can vary considerably. As a result, it can be easier to work with one journal at a time when using the crossref API (e.g., particulary when trying to extract selected data from records).*

## 1. Basic crossref API call

### Import libraries

In [12]:
import json
import requests
from pprint import pprint

### Setup API parameters

In [13]:
base_url = "https://api.crossref.org/works/" # the base url for api calls
email = "your_email@ua.edu" # Change this to your email
mailto = "?mailto=" + email
doi = "10.1186/1758-2946-4-12" # example

### Request data from crossref API

In [14]:
api_data = requests.get(base_url + doi + mailto).json()
pprint(api_data)

{'message': {'DOI': '10.1186/1758-2946-4-12',
             'ISSN': ['1758-2946'],
             'URL': 'http://dx.doi.org/10.1186/1758-2946-4-12',
             'alternative-id': ['336'],
             'article-number': '12',
             'author': [{'affiliation': [],
                         'family': 'Ertl',
                         'given': 'Peter',
                         'sequence': 'first'},
                        {'affiliation': [],
                         'family': 'Rohde',
                         'given': 'Bernhard',
                         'sequence': 'additional'}],
             'container-title': ['Journal of Cheminformatics'],
             'content-domain': {'crossmark-restriction': False, 'domain': []},
             'created': {'date-parts': [[2012, 7, 6]],
                         'date-time': '2012-07-06T12:14:34Z',
                         'timestamp': 1341576874000},
             'deposited': {'date-parts': [[2019, 6, 24]],
                           'date-time': '

### Select Some Specific Data

In [15]:
# Get Journal title
api_data["message"]["container-title"]

['Journal of Cheminformatics']

In [16]:
# Get article title
api_data["message"]["title"]

['The Molecule Cloud - compact visualization of large collections of molecules']

In [17]:
# Get article author names
for au in range(len(api_data["message"]["author"])):
    print(api_data["message"]["author"][au]["given"] + " " + api_data["message"]["author"][au]["family"])

Peter Ertl
Bernhard Rohde


In [20]:
# Get bibliography references and save to list
bib_refs = []
for ref in range(len(api_data["message"]["reference"])):
    bib_refs.append(api_data["message"]["reference"][ref]["unstructured"])

# print the first 5
print(bib_refs[0:5])

['Martin E, Ertl P, Hunt P, Duca J, Lewis R: Gazing into the crystal ball; the future of computer-aided drug design. J Comp-Aided Mol Des. 2011, 26: 77-79.', 'Langdon SR, Brown N, Blagg J: Scaffold diversity of exemplified medicinal chemistry space. J Chem Inf Model. 2011, 26: 2174-2185.', 'Blum LC, Reymond J-C: 970 Million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc. 2009, 131: 8732-8733. 10.1021/ja902302h.', 'Dubois J, Bourg S, Vrain C, Morin-Allory L: Collections of compounds - how to deal with them?. Cur Comp-Aided Drug Des. 2008, 4: 156-168. 10.2174/157340908785747410.', 'Medina-Franco JL, Martinez-Mayorga K, Giulianotti MA, Houghten RA, Pinilla C: Visualization of the chemical space in drug discovery. Cur Comp-Aided Drug Des. 2008, 4: 322-333. 10.2174/157340908786786010.']


### Save JSON data to a file
This is particularly useful for downstream testing or returning to results in the future (e.g., no need to keep requesting the data from crossref, save the results to a file)

In [21]:
with open('my_data.json', 'w') as outfile:
    json.dump(api_data, outfile)

### Load JSON data from file

In [22]:
with open('my_data.json','r') as infile:
    loadedData = json.load(infile)

## 2. Crossref API call with a Loop

### Import libraries

In [23]:
import json
import requests
from pprint import pprint
from time import sleep

### Setup API parameters

In [24]:
base_url = "https://api.crossref.org/works/" # the base url for api calls
email = "your_email@ua.edu" # Change this to be your email
mailto = "?mailto=" + email

### Create a List of DOIs

In [25]:
doi_list = ['10.1021/acsomega.1c03250',
'10.1021/acsomega.1c05512',
'10.1021/acsomega.8b01647',
'10.1021/acsomega.1c04287',
'10.1021/acsomega.8b01834']

### Request metadata for each DOI from Crossref API and save to a list

In [26]:
doi_metadata = []
for doi in doi_list:
    doi_metadata.append(requests.get(base_url + doi + mailto).json())
    sleep(1) # important to add a delay between API calls

# print(doi_metadata) ## not shown here as it is long.

### Select Some Specific Data

In [27]:
# Get article titles
for item in range(len(doi_metadata)):
    print(doi_metadata[item]["message"]["title"])

['Navigating into the Chemical Space of Monoamine Oxidase Inhibitors by Artificial Intelligence and Cheminformatics Approach']
['Impact of Artificial Intelligence on Compound Discovery, Design, and Synthesis']
['How Precise Are Our Quantitative Structure–Activity Relationship Derived Predictions for New Query Chemicals?']
['Applying Neuromorphic Computing Simulation in Band Gap Prediction and Chemical Reaction Classification']
['QSPR Modeling of the Refractive Index for Diverse Polymers Using 2D Descriptors']


In [28]:
# Get all author affiliations for each article
for item in range(len(doi_metadata)):
    for au in range(len(doi_metadata[item]["message"]["author"])):
        print(doi_metadata[item]["message"]["author"][au]["affiliation"][0]["name"])        

Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
Department of Pharmaceutics and Industrial Pharmacy, College of Pharmacy, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
Department of Pharmaceutical Chemistry, College of Pharmacy, Jouf University, Sakak

## 3. Crossref API call for journal information

### Import libraries

In [33]:
import json
import requests
from pprint import pprint

### Setup API parameters

In [34]:
jbase_url = "https://api.crossref.org/journals/" # the base url for api calls
email = "your_email@ua.edu" # Change this to be your email
mailto = "?mailto=" + email
issn = "1471-2105"  # issn for the journal BMC Bioinformatics

### Request journal data from crossref API

In [35]:
jour_data = requests.get(jbase_url + issn + mailto).json()
pprint(jour_data)

{'message': {'ISSN': ['1471-2105', '1471-2105'],
             'breakdowns': {'dois-by-issued-year': [[2010, 861],
                                                    [2019, 762],
                                                    [2008, 745],
                                                    [2009, 729],
                                                    [2011, 722],
                                                    [2006, 633],
                                                    [2014, 619],
                                                    [2007, 613],
                                                    [2012, 609],
                                                    [2021, 607],
                                                    [2013, 607],
                                                    [2015, 603],
                                                    [2020, 585],
                                                    [2017, 585],
                                         

## 4. Crossref API - Get article DOIs for a journal

### Import libraries

In [36]:
import json
import requests
from pprint import pprint
from time import sleep

### Setup API Parameters

In [37]:
jbase_url = "https://api.crossref.org/journals/" # the base url for api calls
email = "your_email@ua.edu" # Change this to be your email
mailto = "&mailto=" + email
issn = "1471-2105"  # issn for the journal BMC Bioinformatics
journal_works2014 = "/works?filter=from-pub-date:2014,until-pub-date:2014&select=DOI" # query to get DOIs for 2014

### Request DOI data from crossref API

In [38]:
doi_data = requests.get(jbase_url + issn + journal_works2014 + mailto).json()
pprint(doi_data)

{'message': {'facets': {},
             'items': [{'DOI': '10.1186/1471-2105-15-84'},
                       {'DOI': '10.1186/1471-2105-15-94'},
                       {'DOI': '10.1186/1471-2105-15-172'},
                       {'DOI': '10.1186/1471-2105-15-106'},
                       {'DOI': '10.1186/1471-2105-15-95'},
                       {'DOI': '10.1186/1471-2105-15-s9-s12'},
                       {'DOI': '10.1186/1471-2105-15-33'},
                       {'DOI': '10.1186/1471-2105-15-s10-p33'},
                       {'DOI': '10.1186/1471-2105-15-278'},
                       {'DOI': '10.1186/1471-2105-15-s13-s3'},
                       {'DOI': '10.1186/1471-2105-15-s16-s13'},
                       {'DOI': '10.1186/1471-2105-15-254'},
                       {'DOI': '10.1186/1471-2105-15-s10-p24'},
                       {'DOI': '10.1186/1471-2105-15-310'},
                       {'DOI': '10.1186/1471-2105-15-101'},
                       {'DOI': '10.1186/1471-2105-15-56'},


By default, 20 results are displayed. Crossref allows up to 1000 returned results using the rows parameter. To get all 619 results, we can increase the number of returned rows.

In [39]:
rows = "&rows=700"
doi_data_all = requests.get(jbase_url + issn + journal_works2014 + rows + mailto).json()

### Extract DOIs

In [40]:
dois_list = []
for i in range(len(doi_data_all["message"]["items"])):
    dois_list.append(doi_data_all["message"]["items"][i]["DOI"])

In [41]:
len(dois_list)

619

In [42]:
# display the first 20
pprint(dois_list[0:20])

['10.1186/1471-2105-15-84',
 '10.1186/1471-2105-15-94',
 '10.1186/1471-2105-15-172',
 '10.1186/1471-2105-15-106',
 '10.1186/1471-2105-15-95',
 '10.1186/1471-2105-15-s9-s12',
 '10.1186/1471-2105-15-33',
 '10.1186/1471-2105-15-s10-p33',
 '10.1186/1471-2105-15-278',
 '10.1186/1471-2105-15-s13-s3',
 '10.1186/1471-2105-15-s16-s13',
 '10.1186/1471-2105-15-254',
 '10.1186/1471-2105-15-s10-p24',
 '10.1186/1471-2105-15-161',
 '10.1186/1471-2105-15-266',
 '10.1186/1471-2105-15-310',
 '10.1186/1471-2105-15-101',
 '10.1186/1471-2105-15-56',
 '10.1186/1471-2105-15-s10-p6',
 '10.1186/s12859-014-0411-1']



**What if we have more than 1000 results in a single query?**

For example, if we wanted the DOIs from *BMC Bioinformatics* for years 2014 through 2016?

In [43]:
jbase_url = "https://api.crossref.org/journals/" # the base url for api calls
email = "your_email@ua.edu" # Change this to be your email
mailto = "&mailto=" + email
issn = "1471-2105"  # issn for the journal BMC Bioinformatics
journal_works2014_2016 = "/works?filter=from-pub-date:2014,until-pub-date:2016&select=DOI" # query to get DOIs for 2014-2016
doi_data2 = requests.get(jbase_url + issn + journal_works2014_2016 + mailto).json()
pprint(doi_data2)

{'message': {'facets': {},
             'items': [{'DOI': '10.1186/1471-2105-15-84'},
                       {'DOI': '10.1186/1471-2105-15-94'},
                       {'DOI': '10.1186/1471-2105-16-s15-p11'},
                       {'DOI': '10.1186/s12859-016-1335-8'},
                       {'DOI': '10.1186/1471-2105-15-172'},
                       {'DOI': '10.1186/s12859-015-0538-8'},
                       {'DOI': '10.1186/1471-2105-15-106'},
                       {'DOI': '10.1186/s12859-016-1086-6'},
                       {'DOI': '10.1186/s12859-015-0468-5'},
                       {'DOI': '10.1186/s12859-015-0585-1'},
                       {'DOI': '10.1186/1471-2105-15-95'},
                       {'DOI': '10.1186/1471-2105-16-s15-p20'},
                       {'DOI': '10.1186/1471-2105-15-s9-s12'},
                       {'DOI': '10.1186/s12859-015-0845-0'},
                       {'DOI': '10.1186/s12859-016-1202-7'},
                       {'DOI': '10.1186/s12859-016-1198-z'

Here we see that the total results is over 1000 (total-results: 1772). An additional parameter that we can use with crossref API is called "offset". The offset option allows us to select sets of records and define a starting position (e.g., the first 1000, and then the second set of up to 1000.)

In [44]:
doi_list2 = []
rows = "&rows=1000"
numResults = requests.get(jbase_url + issn + journal_works2014_2016 + mailto).json()["message"]["total-results"]
sleep(1)
for n in range(int(numResults/1000)+1): # list(range(int(numberOfResults/1000)+1)) = [0,1]
    query = requests.get(jbase_url + issn + journal_works2014_2016 + rows + "&offset=" + str(1000*n) + mailto).json()
    sleep(1)
    for doi in range(len(query["message"]["items"])):
        doi_list2.append(query["message"]["items"][doi]["DOI"])

In [45]:
len(doi_list2)

1772

In [46]:
# show results 1000 through 1020
pprint(doi_list2[1000:1020])

['10.1186/s12859-015-0822-7',
 '10.1186/s12859-015-0758-y',
 '10.1186/1471-2105-15-s6-s1',
 '10.1186/s12859-015-0768-9',
 '10.1186/1471-2105-15-139',
 '10.1186/s12859-015-0664-3',
 '10.1186/s12859-016-1246-8',
 '10.1186/s12859-016-1155-x',
 '10.1186/1471-2105-15-67',
 '10.1186/s12859-016-1224-1',
 '10.1186/s12859-016-1137-z',
 '10.1186/s12859-016-1121-7',
 '10.1186/s12859-015-0863-y',
 '10.1186/1471-2105-15-s13-s4',
 '10.1186/1471-2105-15-20',
 '10.1186/s12859-016-1317-x',
 '10.1186/s12859-016-1384-z',
 '10.1186/1471-2105-15-231',
 '10.1186/s12859-015-0569-1',
 '10.1186/s12859-016-1404-z']
