Wiley Text and Data Mining (TDM) in Python#
by Michael T. Moen
Wiley TDM: https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining
Wiley TDM Terms of Use: Please check with your institution to see their Text and Data Mining Agreement
The Wiley Text and Data Mining (TDM) API allows users to retrieve the full-text articles of Wiley content in PDF form.
These recipe examples were tested on January 19, 2024.
NOTE: The Wiley TDM API limits requests to a maximum of 3 requests per second.
Setup#
Text and Data Mining Token#
A token is required to access the Wiley TDM API. Sign up can be found here:
Add your token below:
wiley_token = ""
Alternatively, you can save the above data in a separate python file and import it:
from wiley_token import wiley_token
Import Libraries#
This tutorial uses the following libraries:
import requests # Manages API requests
from time import sleep # Allows staggering of API requests to conform to rate limits
1. Retrieve full-text of an article#
The Wiley TDM API returns the full-text of an article as a PDF when given the article’s DOI.
In the first example, we download the full-text of the article with the DOI “10.1002/net.22207”. This article was found on the Wiley Online Library.
# DOI of article to download
doi = '10.1002/net.22207'
# Construct URL
url = f'https://api.wiley.com/onlinelibrary/tdm/v1/articles/{doi}'
# Include token in header
headers = {
"Wiley-TDM-Client-Token": wiley_token
}
# Make a GET request to the Wiley TDM API
response = requests.get(url, headers=headers)
# Download PDF if status code indicates success
if response.status_code == 200:
# Name file after the DOI
filename = f'{doi.replace('/', '_')}.pdf'
# Write data to PDF file
with open(filename, 'wb') as file:
file.write(response.content)
print(f'{filename} downloaded successfully')
# Print status code if unsuccessful
else:
print(f'Failed to download PDF. Status code: {response.status_code}')
10.1002_net.22207.pdf downloaded successfully
2. Retrieve full-text of multiple articles#
In this example, we download 5 articles found in the Wiley Online Library:
# DOIs of articles to download
dois = [
'10.1111/j.1467-8624.2010.01564.x',
'10.1111/1467-8624.00164',
'10.1111/cdev.12864',
'10.1111/j.1467-8624.2007.00995.x',
'10.1111/j.1467-8624.2010.01499.x',
'10.1111/j.1467-8624.2010.0149.x' # Invalid DOI, will throw error
]
# Include token in header
headers = {
"Wiley-TDM-Client-Token": wiley_token
}
# Send an HTTP request for each DOI
for doi in dois:
# Construct URL
url = f'https://api.wiley.com/onlinelibrary/tdm/v1/articles/{doi}'
# Make a GET request to the Wiley TDM API
response = requests.get(url, headers=headers)
# Download PDF if status code indicates success
if response.status_code == 200:
# Name file after the DOI
filename = f'{doi.replace('/', '_')}.pdf'
# Write data to PDF file
with open(filename, 'wb') as file:
file.write(response.content)
print(f'{filename} downloaded successfully')
# Print status code if unsuccessful
else:
print(f'Failed to download PDF for {doi.replace('%2f', '/')}. Status code: {response.status_code}')
# Wait 1 second to be nice on Wiley's servers
sleep(1)
10.1111_j.1467-8624.2010.01564.x.pdf downloaded successfully
10.1111_1467-8624.00164.pdf downloaded successfully
10.1111_cdev.12864.pdf downloaded successfully
10.1111_j.1467-8624.2007.00995.x.pdf downloaded successfully
10.1111_j.1467-8624.2010.01499.x.pdf downloaded successfully
Failed to download PDF for 10.1111/j.1467-8624.2010.0149.x. Status code: 404