Springer Nature API in Python#

by Avery Fernandez and Vincent F. Scalfani

These recipe examples use the Springer Nature Open Access API to retrieve metadata and full-text content. About 650,000 full-text are available from BioMed Central and SpringerOpen Journals: https://dev.springernature.com/docs

An API key is required to access the Springer Nature API, sign up can be found at https://dev.springernature.com/

Code was tested on October 13, 2023. This tutorial content is intended to help facillitate academic research. Before continuing or reusing any of this code, please be aware of the Springer Nature Text and Data Mining Policies, Terms and Conditions, and Terms for API Users:

Setup#

Import Libraries#

import requests
from time import sleep
from pprint import pprint

Import API Key#

We store our API key in a separate file for easy access and security.

from key import api_key

1. Retrieve full-text JATS XML of an article#

Before we can query, we must establish a few things:

  • base_url: The base url for the Springer API, more specifically the open access API with JATS format: https://jats.nlm.nih.gov/archiving/tag-library/1.1/index.html

  • ?q=doi:: The query parameter, in this case we are searching for a DOI

  • doi: The DOI of the article

  • openaccess:true: This requests content through the openaccess API

  • &api_key=: This the text for the api key

You can read more about the API parameters at https://dev.springernature.com/restfuloperations

base_url = 'https://api.springernature.com/openaccess/jats'

# example DOI from SpringerOpen Brain Informatics
doi = '"10.1007/s40708-014-0001-z"' # doi must be wrapped in double quotes
data = requests.get(f"{base_url}?q=doi:{doi} openaccess:true&api_key={api_key}")
pprint(data) # Response 200 means that the response was successful
<Response [200]>
# Save to a file
with open('fulltext.jats', 'w') as outfile:
    outfile.write(data.text)

2. Retrieve full-text in a loop#

# Examples from SprigerOpen Brain Informatics

dois = [
    '"10.1007/s40708-014-0001-z"',
    '"10.1007/s40708-014-0002-y"',
    '"10.1007/s40708-014-0003-x"',
    '"10.1007/s40708-014-0004-9"',
    '"10.1007/s40708-014-0005-8"',
]
base_url = 'https://api.springernature.com/openaccess/jats'
for doi in dois:
    data = requests.get(f"{base_url}?q=doi:{doi} openaccess:true&api_key={api_key}")
    sleep(1) # add a delay.
    doi_name = doi.replace('/', '_').replace('"', '') # remove / and " from doi
    with open(f'{doi_name}_jats_text.jats', 'w') as outfile:
        outfile.write(data.text)

3. Acquire and Parse Metadata#

We can also acquire only the metadata as JSON text.

base_url = 'https://api.springernature.com/openaccess/json'
doi = '"10.1007/s40708-014-0001-z"' # doi must be wrapped in double quotes
data = requests.get(f"{base_url}?q=doi:{doi} openaccess:true&api_key={api_key}").json()

We can now extract data out of ["records"][0], where all the data is stored for the article

# some examples:
pprint(data["apiMessage"])
pprint(data["query"])
pprint(data["records"][0]["abstract"])
pprint(data["records"][0]["doi"])
pprint(data["records"][0]["onlineDate"])
pprint(data["records"][0]["printDate"])
pprint(data["records"][0]["publicationName"])
pprint(data["records"][0]["title"])
'This JSON was provided by Springer Nature'
'doi:"10.1007/s40708-014-0001-z" openaccess:true'
{'h1': 'Abstract',
 'p': 'Big data is the term for a collection of datasets so huge and complex '
      'that it becomes difficult to be processed using on-hand theoretical '
      'models and technique tools. Brain big data is one of the most typical, '
      'important big data collected using powerful equipments of functional '
      'magnetic resonance imaging, multichannel electroencephalography, '
      'magnetoencephalography, Positron emission tomography, near infrared '
      'spectroscopic imaging, as well as other various devices. Granular '
      'computing with multiple granular layers, referred to as multi-granular '
      'computing (MGrC) for short hereafter, is an emerging computing paradigm '
      'of information processing, which simulates the multi-granular '
      'intelligent thinking model of human brain. It concerns the processing '
      'of complex information entities called information granules, which '
      'arise in the process of data abstraction and derivation of information '
      'and even knowledge from data. This paper analyzes three basic '
      'mechanisms of MGrC, namely granularity optimization, granularity '
      'conversion, and multi-granularity joint computation, and discusses the '
      'potential of introducing MGrC into intelligent processing of brain big '
      'data.'}
'10.1007/s40708-014-0001-z'
'2014-09-06'
'2015-01-30'
'Brain Informatics'
'Granular computing with multiple granular layers for brain big data processing'