Wiley Text and Data Mining (TDM) in C#
by Cyrus Gomes
Wiley TDM: https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining
Wiley TDM terms of use: Please check with your institution to see their Text and Data Mining Agreement
The Wiley Text and Data Mining (TDM) API allows users to retrieve the full-text articles of Wiley content in PDF form.
These recipe examples were tested on August 21, 2024.
NOTE: The Wiley TDM API limits requests to a maximum of 3 requests per second.
Setup#
First, install the CURL and jq packages by typing the following command in the terminal:
!sudo apt install curl jq
Create a directory for the Wiley project:
!mkdir Wiley
Change to the newly created Wiley directory:
%cd Wiley
Create a variable for API Key#
Text and Data Mining Token#
A token is required to access the Wiley TDM API. Sign up can be found here. If creating a new account make sure to log in to access your wiley token.
Make sure to input the wiley token in the C program below.
# Create the key file
!touch "wiley_token.txt"
We use the following command to access the key as Jupyter does not allow variable sharing for bash scripts.
# Input the key into the file by copy/paste or keying in manually
# Read the key from the file
!wiley_token=$(cat "wiley_token.txt")
We use the %%file
command to create the following makefile which will compile our program and create an executable.
%%file makefile
# Set the variable CC to gcc, which is used to build the program
CC=gcc
# Enable debugging information and enable all compiler warnings
CFLAGS=-g -Wall
# Set the bin variable as the name of the binary file we are creating
BIN=wiley
# Create the binary file with the name we put
all: $(BIN)
# Map any file ending in .c to a binary executable.
# "$<" represents the .c file and "$@" represents the target binary executable
%: %.c
# Compile the .c file using the gcc compiler with the CFLAGS and links
# resulting binary with the CURL library
$(CC) $(CFLAGS) $< -o $@ -lcurl
# Clean target which removes specific files
clean:
# Remove the binary file and an ".dSYM" (debug symbols for debugging) directories
# the RM command used -r to remove directories and -f to force delete
$(RM) -rf $(BIN) *.dSYM
Writing makefile
This command is used again to create our .c file which contains the code for the program
%%file ./wiley.c
#include <stdio.h>
#include <stdlib.h>
#include <curl/curl.h>
#include <string.h>
// Callback function to write response data to a file
size_t write_data(void *ptr, size_t size, size_t nmemb, FILE *stream) {
return fwrite(ptr, size, nmemb, stream);
}
// Function to replace characters in a string
void replace_char(char *str, char find, char replace) {
char *current_pos = strchr(str, find);
while (current_pos) {
*current_pos = replace;
current_pos = strchr(current_pos, find);
}
}
int main(int argc, char* argv[]) {
// Default doi and header codes
char doi[200] = {};
char header[200] = {};
// If there are enough arguments
if (argc == 5) {
// Check argument order
if (strcmp(argv[1], "-h") == 0 && strcmp(argv[3], "-d") == 0) {
strcat(header, argv[2]);
strcat(doi, argv[4]);
} else if (strcmp(argv[1], "-d") == 0 && strcmp(argv[3], "-h") == 0) {
strcat(doi, argv[2]);
strcat(header, argv[4]);
} else {
fprintf(stderr, "Invalid argument order.\n");
return -1;
}
} else {
fprintf(stderr, "Invalid number of arguments.\n");
return -1;
}
// Construct URL
char url[300];
sprintf(url, "https://api.wiley.com/onlinelibrary/tdm/v1/articles/%s", doi);
// Include token in header
struct curl_slist *headers = NULL;
headers = curl_slist_append(headers, header);
// Initialize libcurl
curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
// Set URL and headers
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
// Follow redirects
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
// Generate file name
char filename[300];
strcpy(filename, doi);
replace_char(filename, '/', '_');
strcat(filename, ".pdf");
// Open file for writing
FILE *file = fopen(filename, "wb");
if (!file) {
fprintf(stderr, "Failed to open file for writing\n");
return 1;
}
// Set callback function to write response data to file
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, file);
// Perform GET request
CURLcode res = curl_easy_perform(curl);
// Debugging: Print the response code
long response_code;
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
printf("Response code: %ld\n", response_code);
// Cleanup
fclose(file);
curl_slist_free_all(headers);
curl_easy_cleanup(curl);
curl_global_cleanup();
if (res != CURLE_OK) {
fprintf(stderr, "Failed to download PDF: %s\n", curl_easy_strerror(res));
return 1;
}
if (response_code != 200) {
fprintf(stderr, "PDF download failed: %s\n", filename);
return 1;
}
printf("PDF downloaded successfully: %s\n", filename);
return 0;
}
Writing ./wiley.c
!make
# Compile the .c file using the gcc compiler with the CFLAGS and links
# resulting binary with the CURL library
gcc -g -Wall wiley.c -o wiley -lcurl
1. Retrieve full-text of an article#
The Wiley TDM API returns the full-text of an article as a PDF when given the article’s DOI.
In the first example, we download the full-text of the article with the DOI “10.1002/net.22207”. This article was found on the Wiley Online Library.
%%bash
# DOI of article to download
doi="10.1002/net.22207"
# Wiley token to be retrieved
wiley_token=$(cat "wiley_token.txt")
# Download PDF using wiley tool
./wiley -d "$doi" -h "Wiley-TDM-Client-Token: $wiley_token"
Response code: 200
PDF downloaded successfully: 10.1002_net.22207.pdf
2. Retrieve full-text of multiple articles#
In this example, we download 5 articles found in the Wiley Online Library:
%%bash
# DOIs of articles to download
dois=(
'10.1111/j.1467-8624.2010.01564.x'
'10.1111/1467-8624.00164'
'10.1111/cdev.12864'
'10.1111/j.1467-8624.2007.00995.x'
'10.1111/j.1467-8624.2010.01499.x'
'10.1111/j.1467-8624.2010.0149.x' # Invalid DOI, will throw error
)
# Retrieve Wiley token from file
wiley_token=$(cat "wiley_token.txt")
# Iterate through each DOI
for doi in "${dois[@]}"; do
# Download PDF using Wiley tool
./wiley -d "$doi" -h "Wiley-TDM-Client-Token: $wiley_token"
# Sleep for 1 second
sleep 1
done
Response code: 200
PDF downloaded successfully: 10.1111_j.1467-8624.2010.01564.x.pdf
Response code: 200
PDF downloaded successfully: 10.1111_1467-8624.00164.pdf
Response code: 200
PDF downloaded successfully: 10.1111_cdev.12864.pdf
Response code: 200
PDF downloaded successfully: 10.1111_j.1467-8624.2007.00995.x.pdf
Response code: 200
PDF downloaded successfully: 10.1111_j.1467-8624.2010.01499.x.pdf
PDF download failed: 10.1111_j.1467-8624.2010.0149.x.pdf
Response code: 404