Wiley Text and Data Mining (TDM) in Python

Wiley Text and Data Mining (TDM) in Python#

by Michael T. Moen

Wiley TDM: https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining

Wiley TDM Terms of Use: Please check with your institution to see their Text and Data Mining Agreement

The Wiley Text and Data Mining (TDM) API allows users to retrieve the full-text articles of Wiley content in PDF form.

These recipe examples were tested on January 19, 2024.

NOTE: The Wiley TDM API limits requests to a maximum of 3 requests per second.

Setup#

Text and Data Mining Token#

A token is required to access the Wiley TDM API. Sign up can be found here:

Add your token below:

wiley_token = ""

Alternatively, you can save the above data in a separate python file and import it:

from wiley_token import wiley_token

Import Libraries#

This tutorial uses the following libraries:

import requests                     # Manages API requests
from time import sleep              # Allows staggering of API requests to conform to rate limits

1. Retrieve full-text of an article#

The Wiley TDM API returns the full-text of an article as a PDF when given the article’s DOI.

In the first example, we download the full-text of the article with the DOI “10.1002/net.22207”. This article was found on the Wiley Online Library.

# DOI of article to download
doi = '10.1002/net.22207'

# Construct URL
url = f'https://api.wiley.com/onlinelibrary/tdm/v1/articles/{doi}'

# Include token in header
headers = {
    "Wiley-TDM-Client-Token": wiley_token
}

# Make a GET request to the Wiley TDM API
response = requests.get(url, headers=headers)

# Download PDF if status code indicates success
if response.status_code == 200:

    # Name file after the DOI
    filename = f'{doi.replace('/', '_')}.pdf'

    # Write data to PDF file
    with open(filename, 'wb') as file:
        file.write(response.content)

    print(f'{filename} downloaded successfully')

# Print status code if unsuccessful
else:
    print(f'Failed to download PDF. Status code: {response.status_code}')

10.1002_net.22207.pdf downloaded successfully

2. Retrieve full-text of multiple articles#

In this example, we download 5 articles found in the Wiley Online Library:

# DOIs of articles to download
dois = [
    '10.1111/j.1467-8624.2010.01564.x',
    '10.1111/1467-8624.00164',
    '10.1111/cdev.12864',
    '10.1111/j.1467-8624.2007.00995.x',
    '10.1111/j.1467-8624.2010.01499.x',
    '10.1111/j.1467-8624.2010.0149.x'       # Invalid DOI, will throw error
]

# Include token in header
headers = {
    "Wiley-TDM-Client-Token": wiley_token
}

# Send an HTTP request for each DOI
for doi in dois:

    # Construct URL
    url = f'https://api.wiley.com/onlinelibrary/tdm/v1/articles/{doi}'

    # Make a GET request to the Wiley TDM API
    response = requests.get(url, headers=headers)

    # Download PDF if status code indicates success
    if response.status_code == 200:

        # Name file after the DOI
        filename = f'{doi.replace('/', '_')}.pdf'

        # Write data to PDF file
        with open(filename, 'wb') as file:
            file.write(response.content)

        print(f'{filename} downloaded successfully')

    # Print status code if unsuccessful
    else:
        print(f'Failed to download PDF for {doi.replace('%2f', '/')}. Status code: {response.status_code}')
    
    # Wait 1 second to be nice on Wiley's servers
    sleep(1)

1111_j.1467-8624.2010.01564.x.pdf downloaded successfully
1111_1467-8624.00164.pdf downloaded successfully
1111_cdev.12864.pdf downloaded successfully
1111_j.1467-8624.2007.00995.x.pdf downloaded successfully
1111_j.1467-8624.2010.01499.x.pdf downloaded successfully
Failed to download PDF for 10.1111/j.1467-8624.2010.0149.x. Status code: 404