Crossref API in Python#

by Avery Fernandez and Vincent F. Scalfani

Crossref API documentation: https://api.crossref.org/swagger-ui/index.html

These recipe examples were tested on January 21, 2022.

From our testing, we have found that the crossref metadata across publishers and even journals can vary considerably. As a result, it can be easier to work with one journal at a time when using the crossref API (e.g., particulary when trying to extract selected data from records).

1. Basic crossref API call#

Import libraries#

import json
import requests
from pprint import pprint

Setup API parameters#

base_url = "https://api.crossref.org/works/" # the base url for api calls
email = "your_email@ua.edu" # Change this to your email
mailto = "?mailto=" + email
doi = "10.1186/1758-2946-4-12" # example

Request data from crossref API#

api_data = requests.get(base_url + doi + mailto).json()
pprint(api_data)
{'message': {'DOI': '10.1186/1758-2946-4-12',
             'ISSN': ['1758-2946'],
             'URL': 'http://dx.doi.org/10.1186/1758-2946-4-12',
             'alternative-id': ['336'],
             'article-number': '12',
             'author': [{'affiliation': [],
                         'family': 'Ertl',
                         'given': 'Peter',
                         'sequence': 'first'},
                        {'affiliation': [],
                         'family': 'Rohde',
                         'given': 'Bernhard',
                         'sequence': 'additional'}],
             'container-title': ['Journal of Cheminformatics'],
             'content-domain': {'crossmark-restriction': False, 'domain': []},
             'created': {'date-parts': [[2012, 7, 6]],
                         'date-time': '2012-07-06T12:14:34Z',
                         'timestamp': 1341576874000},
             'deposited': {'date-parts': [[2019, 6, 24]],
                           'date-time': '2019-06-24T14:22:07Z',
                           'timestamp': 1561386127000},
             'indexed': {'date-parts': [[2021, 12, 18]],
                         'date-time': '2021-12-18T19:58:47Z',
                         'timestamp': 1639857527355},
             'is-referenced-by-count': 25,
             'issn-type': [{'type': 'electronic', 'value': '1758-2946'}],
             'issue': '1',
             'issued': {'date-parts': [[2012, 7, 6]]},
             'journal-issue': {'issue': '1',
                               'published-print': {'date-parts': [[2012, 12]]}},
             'language': 'en',
             'license': [{'URL': 'http://creativecommons.org/licenses/by/2.0',
                          'content-version': 'tdm',
                          'delay-in-days': 0,
                          'start': {'date-parts': [[2012, 7, 6]],
                                    'date-time': '2012-07-06T00:00:00Z',
                                    'timestamp': 1341532800000}}],
             'link': [{'URL': 'http://link.springer.com/content/pdf/10.1186/1758-2946-4-12.pdf',
                       'content-type': 'application/pdf',
                       'content-version': 'vor',
                       'intended-application': 'text-mining'},
                      {'URL': 'http://link.springer.com/article/10.1186/1758-2946-4-12/fulltext.html',
                       'content-type': 'text/html',
                       'content-version': 'vor',
                       'intended-application': 'text-mining'},
                      {'URL': 'http://link.springer.com/content/pdf/10.1186/1758-2946-4-12.pdf',
                       'content-type': 'application/pdf',
                       'content-version': 'vor',
                       'intended-application': 'similarity-checking'}],
             'member': '297',
             'original-title': [],
             'prefix': '10.1186',
             'published': {'date-parts': [[2012, 7, 6]]},
             'published-online': {'date-parts': [[2012, 7, 6]]},
             'published-print': {'date-parts': [[2012, 12]]},
             'publisher': 'Springer Science and Business Media LLC',
             'reference': [{'DOI': '10.1007/s10822-011-9487-0',
                            'author': 'E Martin',
                            'doi-asserted-by': 'publisher',
                            'first-page': '77',
                            'journal-title': 'J Comp-Aided Mol Des',
                            'key': '336_CR1',
                            'unstructured': 'Martin E, Ertl P, Hunt P, Duca J, '
                                            'Lewis R: Gazing into the crystal '
                                            'ball; the future of '
                                            'computer-aided drug design. J '
                                            'Comp-Aided Mol Des. 2011, 26: '
                                            '77-79.',
                            'volume': '26',
                            'year': '2011'},
                           {'DOI': '10.1021/ci2001428',
                            'author': 'SR Langdon',
                            'doi-asserted-by': 'publisher',
                            'first-page': '2174',
                            'journal-title': 'J Chem Inf Model',
                            'key': '336_CR2',
                            'unstructured': 'Langdon SR, Brown N, Blagg J: '
                                            'Scaffold diversity of exemplified '
                                            'medicinal chemistry space. J Chem '
                                            'Inf Model. 2011, 26: 2174-2185.',
                            'volume': '26',
                            'year': '2011'},
                           {'DOI': '10.1021/ja902302h',
                            'author': 'LC Blum',
                            'doi-asserted-by': 'publisher',
                            'first-page': '8732',
                            'journal-title': 'J Am Chem Soc',
                            'key': '336_CR3',
                            'unstructured': 'Blum LC, Reymond J-C: 970 Million '
                                            'druglike small molecules for '
                                            'virtual screening in the chemical '
                                            'universe database GDB-13. J Am '
                                            'Chem Soc. 2009, 131: 8732-8733. '
                                            '10.1021/ja902302h.',
                            'volume': '131',
                            'year': '2009'},
                           {'DOI': '10.2174/157340908785747410',
                            'author': 'J Dubois',
                            'doi-asserted-by': 'publisher',
                            'first-page': '156',
                            'journal-title': 'Cur Comp-Aided Drug Des',
                            'key': '336_CR4',
                            'unstructured': 'Dubois J, Bourg S, Vrain C, '
                                            'Morin-Allory L: Collections of '
                                            'compounds - how to deal with '
                                            'them?. Cur Comp-Aided Drug Des. '
                                            '2008, 4: 156-168. '
                                            '10.2174/157340908785747410.',
                            'volume': '4',
                            'year': '2008'},
                           {'DOI': '10.2174/157340908786786010',
                            'author': 'JL Medina-Franco',
                            'doi-asserted-by': 'publisher',
                            'first-page': '322',
                            'journal-title': 'Cur Comp-Aided Drug Des',
                            'key': '336_CR5',
                            'unstructured': 'Medina-Franco JL, '
                                            'Martinez-Mayorga K, Giulianotti '
                                            'MA, Houghten RA, Pinilla C: '
                                            'Visualization of the chemical '
                                            'space in drug discovery. Cur '
                                            'Comp-Aided Drug Des. 2008, 4: '
                                            '322-333. '
                                            '10.2174/157340908786786010.',
                            'volume': '4',
                            'year': '2008'},
                           {'DOI': '10.1021/ci600338x',
                            'author': 'A Schuffenhauer',
                            'doi-asserted-by': 'publisher',
                            'first-page': '47',
                            'journal-title': 'J Chem Inf Model',
                            'key': '336_CR6',
                            'unstructured': 'Schuffenhauer A, Ertl P, Roggo S, '
                                            'Wetzel S, Koch MA, Waldmann H: '
                                            'The Scaffold Tree - visualization '
                                            'of the scaffold universe by '
                                            'hierarchical scaffold '
                                            'classification. J Chem Inf Model. '
                                            '2007, 47: 47-58. '
                                            '10.1021/ci600338x.',
                            'volume': '47',
                            'year': '2007'},
                           {'DOI': '10.1002/minf.201000019',
                            'author': 'S Langdon',
                            'doi-asserted-by': 'publisher',
                            'first-page': '366',
                            'journal-title': 'Mol Inf',
                            'key': '336_CR7',
                            'unstructured': 'Langdon S, Ertl P, Brown N: '
                                            'Bioisosteric replacement and '
                                            'scaffold hopping in lead '
                                            'generation and optimization. Mol '
                                            'Inf. 2010, 29: 366-385. '
                                            '10.1002/minf.201000019.',
                            'volume': '29',
                            'year': '2010'},
                           {'DOI': '10.1021/jo8001276',
                            'author': 'AH Lipkus',
                            'doi-asserted-by': 'publisher',
                            'first-page': '4443',
                            'journal-title': 'J Org Chem',
                            'key': '336_CR8',
                            'unstructured': 'Lipkus AH, Yuan Q, Lucas KA, Funk '
                                            'SA, Bartelt WF, Schenck RJ, '
                                            'Trippe AJ: Structural diversity '
                                            'of organic chemistry. A scaffold '
                                            'analysis of the CAS Registry. J '
                                            'Org Chem. 2008, 73: 4443-4451. '
                                            '10.1021/jo8001276.',
                            'volume': '73',
                            'year': '2008'},
                           {'key': '336_CR9',
                            'unstructured': 'mib 2010.10, Molinspiration '
                                            'Cheminformatics: \n'
                                            '                    '
                                            'http://www.molinspiration.com\n'
                                            '                    \n'
                                            '                  ,'},
                           {'key': '336_CR10',
                            'unstructured': 'Bernhard R: Avalon '
                                            'Cheminformatics Toolkit. \n'
                                            '                    '
                                            'http://sourceforge.net/projects/avalontoolkit/\n'
                                            '                    \n'
                                            '                  ,'},
                           {'DOI': '10.1093/nar/gkp965',
                            'author': 'Y Wang',
                            'doi-asserted-by': 'publisher',
                            'first-page': 'D255',
                            'journal-title': 'Nucleic Acids Res',
                            'key': '336_CR11',
                            'unstructured': 'Wang Y, Bolton E, Dracheva S, '
                                            'Karapetyan K, Shoemaker BA, Suzek '
                                            'TO, Wang J, Xiao J, Zhang J, '
                                            'Bryant SH: An overview of the '
                                            'PubChem BioAssay resource. '
                                            'Nucleic Acids Res. 2009, 38: '
                                            'D255-D266.',
                            'volume': '38',
                            'year': '2009'},
                           {'DOI': '10.1021/ci049714+',
                            'author': 'JJ Irwin',
                            'doi-asserted-by': 'publisher',
                            'first-page': '177',
                            'journal-title': 'J Chem Inf Model',
                            'key': '336_CR12',
                            'unstructured': 'Irwin JJ, Shoichet BK: ZINC\u2009'
                                            '−\u2009a free database of '
                                            'commercially available compounds '
                                            'for virtual screening. J Chem Inf '
                                            'Model. 2004, 45: 177-182.',
                            'volume': '45',
                            'year': '2004'},
                           {'DOI': '10.1093/nar/gkr777',
                            'author': 'A Gaulton',
                            'doi-asserted-by': 'publisher',
                            'first-page': 'D1100',
                            'journal-title': 'Nucleic Acids Res',
                            'key': '336_CR13',
                            'unstructured': 'Gaulton A, Bellis LJ, Bento AP, '
                                            'Chambers J, Davies M, Hersey A, '
                                            'Light Y, McGlinchey S, '
                                            'Michalovich D, Al-Lazikani B, '
                                            'Overington JP: ChEMBL: a '
                                            'large-scale bioactivity database '
                                            'for drug discovery. Nucleic Acids '
                                            'Res. 2012, 40: D1100-D1107. '
                                            '10.1093/nar/gkr777.',
                            'volume': '40',
                            'year': '2012'},
                           {'DOI': '10.1016/j.cbpa.2010.02.018',
                            'author': 'ME Welsch',
                            'doi-asserted-by': 'publisher',
                            'first-page': '347',
                            'journal-title': 'Curr Opin Chem Biol',
                            'key': '336_CR14',
                            'unstructured': 'Welsch ME, Snyder SA, Stockwell '
                                            'BR: Privileged scaffolds for '
                                            'library design and drug '
                                            'discovery. Curr Opin Chem Biol. '
                                            '2010, 14: 347-361. '
                                            '10.1016/j.cbpa.2010.02.018.',
                            'volume': '14',
                            'year': '2010'},
                           {'DOI': '10.1021/ci0255782',
                            'author': 'P Ertl',
                            'doi-asserted-by': 'publisher',
                            'first-page': '374',
                            'journal-title': 'J Chem Inf Comp Sci',
                            'key': '336_CR15',
                            'unstructured': 'Ertl P: Cheminformatics analysis '
                                            'of organic substituents: '
                                            'Identification of the most common '
                                            'substituents, calculation of '
                                            'substituent properties, and '
                                            'automatic identification of '
                                            'drug-like bioisosteric groups. J '
                                            'Chem Inf Comp Sci. 2003, 43: '
                                            '374-380. 10.1021/ci0255782.',
                            'volume': '43',
                            'year': '2003'},
                           {'key': '336_CR16',
                            'unstructured': 'TagCrowd: \n'
                                            '                    '
                                            'http://tagcrowd.com'}],
             'reference-count': 16,
             'references-count': 16,
             'relation': {},
             'score': 1,
             'short-container-title': ['J Cheminform'],
             'short-title': [],
             'source': 'Crossref',
             'subject': ['Library and Information Sciences',
                         'Computer Graphics and Computer-Aided Design',
                         'Physical and Theoretical Chemistry',
                         'Computer Science Applications'],
             'subtitle': [],
             'title': ['The Molecule Cloud - compact visualization of large '
                       'collections of molecules'],
             'type': 'journal-article',
             'volume': '4'},
 'message-type': 'work',
 'message-version': '1.0.0',
 'status': 'ok'}

Select Some Specific Data#

# Get Journal title
api_data["message"]["container-title"]
['Journal of Cheminformatics']
# Get article title
api_data["message"]["title"]
['The Molecule Cloud - compact visualization of large collections of molecules']
# Get article author names
for au in range(len(api_data["message"]["author"])):
    print(api_data["message"]["author"][au]["given"] + " " + api_data["message"]["author"][au]["family"])
Peter Ertl
Bernhard Rohde
# Get bibliography references and save to list
bib_refs = []
for ref in range(len(api_data["message"]["reference"])):
    bib_refs.append(api_data["message"]["reference"][ref]["unstructured"])

# print the first 5
print(bib_refs[0:5])
['Martin E, Ertl P, Hunt P, Duca J, Lewis R: Gazing into the crystal ball; the future of computer-aided drug design. J Comp-Aided Mol Des. 2011, 26: 77-79.', 'Langdon SR, Brown N, Blagg J: Scaffold diversity of exemplified medicinal chemistry space. J Chem Inf Model. 2011, 26: 2174-2185.', 'Blum LC, Reymond J-C: 970 Million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc. 2009, 131: 8732-8733. 10.1021/ja902302h.', 'Dubois J, Bourg S, Vrain C, Morin-Allory L: Collections of compounds - how to deal with them?. Cur Comp-Aided Drug Des. 2008, 4: 156-168. 10.2174/157340908785747410.', 'Medina-Franco JL, Martinez-Mayorga K, Giulianotti MA, Houghten RA, Pinilla C: Visualization of the chemical space in drug discovery. Cur Comp-Aided Drug Des. 2008, 4: 322-333. 10.2174/157340908786786010.']

Save JSON data to a file#

This is particularly useful for downstream testing or returning to results in the future (e.g., no need to keep requesting the data from crossref, save the results to a file)

with open('my_data.json', 'w') as outfile:
    json.dump(api_data, outfile)

Load JSON data from file#

with open('my_data.json','r') as infile:
    loadedData = json.load(infile)

2. Crossref API call with a Loop#

Import libraries#

import json
import requests
from pprint import pprint
from time import sleep

Setup API parameters#

base_url = "https://api.crossref.org/works/" # the base url for api calls
email = "your_email@ua.edu" # Change this to be your email
mailto = "?mailto=" + email

Create a List of DOIs#

doi_list = ['10.1021/acsomega.1c03250',
'10.1021/acsomega.1c05512',
'10.1021/acsomega.8b01647',
'10.1021/acsomega.1c04287',
'10.1021/acsomega.8b01834']

Request metadata for each DOI from Crossref API and save to a list#

doi_metadata = []
for doi in doi_list:
    doi_metadata.append(requests.get(base_url + doi + mailto).json())
    sleep(1) # important to add a delay between API calls

# print(doi_metadata) ## not shown here as it is long.

Select Some Specific Data#

# Get article titles
for item in range(len(doi_metadata)):
    print(doi_metadata[item]["message"]["title"])
['Navigating into the Chemical Space of Monoamine Oxidase Inhibitors by Artificial Intelligence and Cheminformatics Approach']
['Impact of Artificial Intelligence on Compound Discovery, Design, and Synthesis']
['How Precise Are Our Quantitative Structure–Activity Relationship Derived Predictions for New Query Chemicals?']
['Applying Neuromorphic Computing Simulation in Band Gap Prediction and Chemical Reaction Classification']
['QSPR Modeling of the Refractive Index for Diverse Polymers Using 2D Descriptors']
# Get all author affiliations for each article
for item in range(len(doi_metadata)):
    for au in range(len(doi_metadata[item]["message"]["author"])):
        print(doi_metadata[item]["message"]["author"][au]["affiliation"][0]["name"])        
Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
Department of Pharmaceutics and Industrial Pharmacy, College of Pharmacy, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
Department of Pharmaceutical Chemistry, College of Pharmacy, Jouf University, Sakaka, Al Jouf 72341, Saudi Arabia
Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India
Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India
Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, Jackson, Mississippi 39217, United States
Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States
Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States
Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States
Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, Ohio 43210, United States
Department of Pharmacoinformatics, National Institute of Pharmaceutical Educational and Research (NIPER), Chunilal Bhawan, 168, Manikata Main Road, 700054 Kolkata, India
Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58108-6050, United States
Drug Theoretics and Cheminformatics Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, 700032 Kolkata, India

3. Crossref API call for journal information#

Import libraries#

import json
import requests
from pprint import pprint

Setup API parameters#

jbase_url = "https://api.crossref.org/journals/" # the base url for api calls
email = "your_email@ua.edu" # Change this to be your email
mailto = "?mailto=" + email
issn = "1471-2105"  # issn for the journal BMC Bioinformatics

Request journal data from crossref API#

jour_data = requests.get(jbase_url + issn + mailto).json()
pprint(jour_data)
{'message': {'ISSN': ['1471-2105', '1471-2105'],
             'breakdowns': {'dois-by-issued-year': [[2010, 861],
                                                    [2019, 762],
                                                    [2008, 745],
                                                    [2009, 729],
                                                    [2011, 722],
                                                    [2006, 633],
                                                    [2014, 619],
                                                    [2007, 613],
                                                    [2012, 609],
                                                    [2021, 607],
                                                    [2013, 607],
                                                    [2015, 603],
                                                    [2020, 585],
                                                    [2017, 585],
                                                    [2016, 550],
                                                    [2018, 536],
                                                    [2005, 414],
                                                    [2004, 209],
                                                    [2003, 66],
                                                    [2022, 56],
                                                    [2002, 40],
                                                    [2001, 9],
                                                    [2000, 1]]},
             'counts': {'backfile-dois': 9913,
                        'current-dois': 1248,
                        'total-dois': 11161},
             'coverage': {'abstracts-backfile': 0.2970846363361243,
                          'abstracts-current': 0.9631410256410255,
                          'affiliations-backfile': 0.0,
                          'affiliations-current': 0.0,
                          'award-numbers-backfile': 0.1316453142338344,
                          'award-numbers-current': 0.5745192307692308,
                          'descriptions-backfile': 0.0,
                          'descriptions-current': 0.0,
                          'funders-backfile': 0.1451629173812166,
                          'funders-current': 0.640224358974359,
                          'licenses-backfile': 0.535761121759306,
                          'licenses-current': 1.0,
                          'open-references-backfile': 1.0,
                          'open-references-current': 1.0,
                          'orcids-backfile': 0.09623726419852718,
                          'orcids-current': 0.6578525641025641,
                          'references-backfile': 0.9367497225865026,
                          'references-current': 0.9967948717948718,
                          'resource-links-backfile': 0.5351558559467366,
                          'resource-links-current': 1.0,
                          'ror-ids-backfile': 0.0,
                          'ror-ids-current': 0.0,
                          'similarity-checking-backfile': 0.953495410067588,
                          'similarity-checking-current': 1.0,
                          'update-policies-backfile': 0.6406738626046605,
                          'update-policies-current': 1.0},
             'coverage-type': {'all': {'abstracts': 0.3715616880207867,
                                       'affiliations': 0.0,
                                       'award-numbers': 0.1811665621360093,
                                       'descriptions': 0.0,
                                       'funders': 0.2005196666965326,
                                       'last-status-check-time': 1642727046333,
                                       'licenses': 0.5876713556132963,
                                       'open-references': 1.0,
                                       'orcids': 0.1590359286802258,
                                       'references': 0.9434638473255085,
                                       'resource-links': 0.587133769375504,
                                       'ror-ids': 0.0,
                                       'similarity-checking': 0.9586954573962907,
                                       'update-policies': 0.6808529701639638},
                               'backfile': {'abstracts': 0.2970846363361243,
                                            'affiliations': 0.0,
                                            'award-numbers': 0.1316453142338344,
                                            'descriptions': 0.0,
                                            'funders': 0.1451629173812166,
                                            'last-status-check-time': 1642727046333,
                                            'licenses': 0.535761121759306,
                                            'open-references': 1.0,
                                            'orcids': 0.09623726419852718,
                                            'references': 0.9367497225865026,
                                            'resource-links': 0.5351558559467366,
                                            'ror-ids': 0.0,
                                            'similarity-checking': 0.953495410067588,
                                            'update-policies': 0.6406738626046605},
                               'current': {'abstracts': 0.9631410256410255,
                                           'affiliations': 0.0,
                                           'award-numbers': 0.5745192307692308,
                                           'descriptions': 0.0,
                                           'funders': 0.640224358974359,
                                           'last-status-check-time': 1642727046333,
                                           'licenses': 1.0,
                                           'open-references': 1.0,
                                           'orcids': 0.6578525641025641,
                                           'references': 0.9967948717948718,
                                           'resource-links': 1.0,
                                           'ror-ids': 0.0,
                                           'similarity-checking': 1.0,
                                           'update-policies': 1.0}},
             'flags': {'deposits': True,
                       'deposits-abstracts-backfile': True,
                       'deposits-abstracts-current': True,
                       'deposits-affiliations-backfile': False,
                       'deposits-affiliations-current': False,
                       'deposits-articles': True,
                       'deposits-award-numbers-backfile': True,
                       'deposits-award-numbers-current': True,
                       'deposits-descriptions-backfile': False,
                       'deposits-descriptions-current': False,
                       'deposits-funders-backfile': True,
                       'deposits-funders-current': True,
                       'deposits-licenses-backfile': True,
                       'deposits-licenses-current': True,
                       'deposits-open-references-backfile': True,
                       'deposits-open-references-current': True,
                       'deposits-orcids-backfile': True,
                       'deposits-orcids-current': True,
                       'deposits-references-backfile': True,
                       'deposits-references-current': True,
                       'deposits-resource-links-backfile': True,
                       'deposits-resource-links-current': True,
                       'deposits-ror-ids-backfile': False,
                       'deposits-ror-ids-current': False,
                       'deposits-similarity-checking-backfile': True,
                       'deposits-similarity-checking-current': True,
                       'deposits-update-policies-backfile': True,
                       'deposits-update-policies-current': True},
             'issn-type': [{'type': 'print', 'value': '1471-2105'},
                           {'type': 'electronic', 'value': '1471-2105'}],
             'last-status-check-time': 1642727046333,
             'publisher': 'Springer (Biomed Central Ltd.)',
             'subjects': [{'ASJC': 2604, 'name': 'Applied Mathematics'},
                          {'ASJC': 1706,
                           'name': 'Computer Science Applications'},
                          {'ASJC': 1312, 'name': 'Molecular Biology'},
                          {'ASJC': 1303, 'name': 'Biochemistry'},
                          {'ASJC': 1315, 'name': 'Structural Biology'}],
             'title': 'BMC Bioinformatics'},
 'message-type': 'journal',
 'message-version': '1.0.0',
 'status': 'ok'}

4. Crossref API - Get article DOIs for a journal#

Import libraries#

import json
import requests
from pprint import pprint
from time import sleep

Setup API Parameters#

jbase_url = "https://api.crossref.org/journals/" # the base url for api calls
email = "your_email@ua.edu" # Change this to be your email
mailto = "&mailto=" + email
issn = "1471-2105"  # issn for the journal BMC Bioinformatics
journal_works2014 = "/works?filter=from-pub-date:2014,until-pub-date:2014&select=DOI" # query to get DOIs for 2014

Request DOI data from crossref API#

doi_data = requests.get(jbase_url + issn + journal_works2014 + mailto).json()
pprint(doi_data)
{'message': {'facets': {},
             'items': [{'DOI': '10.1186/1471-2105-15-84'},
                       {'DOI': '10.1186/1471-2105-15-94'},
                       {'DOI': '10.1186/1471-2105-15-172'},
                       {'DOI': '10.1186/1471-2105-15-106'},
                       {'DOI': '10.1186/1471-2105-15-95'},
                       {'DOI': '10.1186/1471-2105-15-s9-s12'},
                       {'DOI': '10.1186/1471-2105-15-33'},
                       {'DOI': '10.1186/1471-2105-15-s10-p33'},
                       {'DOI': '10.1186/1471-2105-15-278'},
                       {'DOI': '10.1186/1471-2105-15-s13-s3'},
                       {'DOI': '10.1186/1471-2105-15-s16-s13'},
                       {'DOI': '10.1186/1471-2105-15-254'},
                       {'DOI': '10.1186/1471-2105-15-s10-p24'},
                       {'DOI': '10.1186/1471-2105-15-310'},
                       {'DOI': '10.1186/1471-2105-15-101'},
                       {'DOI': '10.1186/1471-2105-15-56'},
                       {'DOI': '10.1186/1471-2105-15-s10-p6'},
                       {'DOI': '10.1186/s12859-014-0411-1'},
                       {'DOI': '10.1186/s12859-014-0358-2'},
                       {'DOI': '10.1186/1471-2105-15-166'}],
             'items-per-page': 20,
             'query': {'search-terms': None, 'start-index': 0},
             'total-results': 619},
 'message-type': 'work-list',
 'message-version': '1.0.0',
 'status': 'ok'}

By default, 20 results are displayed. Crossref allows up to 1000 returned results using the rows parameter. To get all 619 results, we can increase the number of returned rows.

rows = "&rows=700"
doi_data_all = requests.get(jbase_url + issn + journal_works2014 + rows + mailto).json()

Extract DOIs#

dois_list = []
for i in range(len(doi_data_all["message"]["items"])):
    dois_list.append(doi_data_all["message"]["items"][i]["DOI"])
len(dois_list)
619
# display the first 20
pprint(dois_list[0:20])
['10.1186/1471-2105-15-84',
 '10.1186/1471-2105-15-94',
 '10.1186/1471-2105-15-172',
 '10.1186/1471-2105-15-106',
 '10.1186/1471-2105-15-95',
 '10.1186/1471-2105-15-s9-s12',
 '10.1186/1471-2105-15-33',
 '10.1186/1471-2105-15-s10-p33',
 '10.1186/1471-2105-15-278',
 '10.1186/1471-2105-15-s13-s3',
 '10.1186/1471-2105-15-s16-s13',
 '10.1186/1471-2105-15-254',
 '10.1186/1471-2105-15-s10-p24',
 '10.1186/1471-2105-15-161',
 '10.1186/1471-2105-15-266',
 '10.1186/1471-2105-15-310',
 '10.1186/1471-2105-15-101',
 '10.1186/1471-2105-15-56',
 '10.1186/1471-2105-15-s10-p6',
 '10.1186/s12859-014-0411-1']

What if we have more than 1000 results in a single query?

For example, if we wanted the DOIs from BMC Bioinformatics for years 2014 through 2016?

jbase_url = "https://api.crossref.org/journals/" # the base url for api calls
email = "your_email@ua.edu" # Change this to be your email
mailto = "&mailto=" + email
issn = "1471-2105"  # issn for the journal BMC Bioinformatics
journal_works2014_2016 = "/works?filter=from-pub-date:2014,until-pub-date:2016&select=DOI" # query to get DOIs for 2014-2016
doi_data2 = requests.get(jbase_url + issn + journal_works2014_2016 + mailto).json()
pprint(doi_data2)
{'message': {'facets': {},
             'items': [{'DOI': '10.1186/1471-2105-15-84'},
                       {'DOI': '10.1186/1471-2105-15-94'},
                       {'DOI': '10.1186/1471-2105-16-s15-p11'},
                       {'DOI': '10.1186/s12859-016-1335-8'},
                       {'DOI': '10.1186/1471-2105-15-172'},
                       {'DOI': '10.1186/s12859-015-0538-8'},
                       {'DOI': '10.1186/1471-2105-15-106'},
                       {'DOI': '10.1186/s12859-016-1086-6'},
                       {'DOI': '10.1186/s12859-015-0468-5'},
                       {'DOI': '10.1186/s12859-015-0585-1'},
                       {'DOI': '10.1186/1471-2105-15-95'},
                       {'DOI': '10.1186/1471-2105-16-s15-p20'},
                       {'DOI': '10.1186/1471-2105-15-s9-s12'},
                       {'DOI': '10.1186/s12859-015-0845-0'},
                       {'DOI': '10.1186/s12859-016-1202-7'},
                       {'DOI': '10.1186/s12859-016-1198-z'},
                       {'DOI': '10.1186/s12859-015-0843-2'},
                       {'DOI': '10.1186/1471-2105-15-33'},
                       {'DOI': '10.1186/s12859-015-0729-3'},
                       {'DOI': '10.1186/s12859-016-1234-z'}],
             'items-per-page': 20,
             'query': {'search-terms': None, 'start-index': 0},
             'total-results': 1772},
 'message-type': 'work-list',
 'message-version': '1.0.0',
 'status': 'ok'}

Here we see that the total results is over 1000 (total-results: 1772). An additional parameter that we can use with crossref API is called “offset”. The offset option allows us to select sets of records and define a starting position (e.g., the first 1000, and then the second set of up to 1000.)

doi_list2 = []
rows = "&rows=1000"
numResults = requests.get(jbase_url + issn + journal_works2014_2016 + mailto).json()["message"]["total-results"]
sleep(1)
for n in range(int(numResults/1000)+1): # list(range(int(numberOfResults/1000)+1)) = [0,1]
    query = requests.get(jbase_url + issn + journal_works2014_2016 + rows + "&offset=" + str(1000*n) + mailto).json()
    sleep(1)
    for doi in range(len(query["message"]["items"])):
        doi_list2.append(query["message"]["items"][doi]["DOI"])
len(doi_list2)
1772
# show results 1000 through 1020
pprint(doi_list2[1000:1020])
['10.1186/s12859-015-0822-7',
 '10.1186/s12859-015-0758-y',
 '10.1186/1471-2105-15-s6-s1',
 '10.1186/s12859-015-0768-9',
 '10.1186/1471-2105-15-139',
 '10.1186/s12859-015-0664-3',
 '10.1186/s12859-016-1246-8',
 '10.1186/s12859-016-1155-x',
 '10.1186/1471-2105-15-67',
 '10.1186/s12859-016-1224-1',
 '10.1186/s12859-016-1137-z',
 '10.1186/s12859-016-1121-7',
 '10.1186/s12859-015-0863-y',
 '10.1186/1471-2105-15-s13-s4',
 '10.1186/1471-2105-15-20',
 '10.1186/s12859-016-1317-x',
 '10.1186/s12859-016-1384-z',
 '10.1186/1471-2105-15-231',
 '10.1186/s12859-015-0569-1',
 '10.1186/s12859-016-1404-z']