Wiley Text and Data Mining (TDM) in C#

by Cyrus Gomes

Wiley TDM: https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining

Wiley TDM terms of use: Please check with your institution to see their Text and Data Mining Agreement

The Wiley Text and Data Mining (TDM) API allows users to retrieve the full-text articles of Wiley content in PDF form.

These recipe examples were tested on August 21, 2024.

NOTE: The Wiley TDM API limits requests to a maximum of 3 requests per second.

Setup#

First, install the CURL and jq packages by typing the following command in the terminal:

!sudo apt install curl jq

Create a directory for the Wiley project:

!mkdir Wiley

Change to the newly created Wiley directory:

%cd Wiley

Create a variable for API Key#

Text and Data Mining Token#

A token is required to access the Wiley TDM API. Sign up can be found here. If creating a new account make sure to log in to access your wiley token.

Make sure to input the wiley token in the C program below.

# Create the key file
!touch "wiley_token.txt"

We use the following command to access the key as Jupyter does not allow variable sharing for bash scripts.

# Input the key into the file by copy/paste or keying in manually
# Read the key from the file
!wiley_token=$(cat "wiley_token.txt")

We use the %%file command to create the following makefile which will compile our program and create an executable.

%%file makefile

# Set the variable CC to gcc, which is used to build the program
CC=gcc

# Enable debugging information and enable all compiler warnings
CFLAGS=-g -Wall

# Set the bin variable as the name of the binary file we are creating
BIN=wiley

# Create the binary file with the name we put
all: $(BIN)

# Map any file ending in .c to a binary executable. 
# "$<" represents the .c file and "$@" represents the target binary executable
%: %.c

	# Compile the .c file using the gcc compiler with the CFLAGS and links 
	# resulting binary with the CURL library
	$(CC) $(CFLAGS) $< -o $@ -lcurl

# Clean target which removes specific files
clean:

	# Remove the binary file and an ".dSYM" (debug symbols for debugging) directories
	# the RM command used -r to remove directories and -f to force delete
	$(RM) -rf $(BIN) *.dSYM
Writing makefile

This command is used again to create our .c file which contains the code for the program

%%file ./wiley.c

#include <stdio.h>
#include <stdlib.h>
#include <curl/curl.h>
#include <string.h>

// Callback function to write response data to a file
size_t write_data(void *ptr, size_t size, size_t nmemb, FILE *stream) {
    return fwrite(ptr, size, nmemb, stream);
}

// Function to replace characters in a string
void replace_char(char *str, char find, char replace) {
    char *current_pos = strchr(str, find);
    while (current_pos) {
        *current_pos = replace;
        current_pos = strchr(current_pos, find);
    }
}

int main(int argc, char* argv[]) {
    // Default doi and header codes
    char doi[200] = {};
    char header[200] = {};

    // If there are enough arguments
    if (argc == 5) {
        // Check argument order
        if (strcmp(argv[1], "-h") == 0 && strcmp(argv[3], "-d") == 0) {
            strcat(header, argv[2]);
            strcat(doi, argv[4]);
        } else if (strcmp(argv[1], "-d") == 0 && strcmp(argv[3], "-h") == 0) {
            strcat(doi, argv[2]);
            strcat(header, argv[4]);
        } else {
            fprintf(stderr, "Invalid argument order.\n");
            return -1;
        }
    } else {
        fprintf(stderr, "Invalid number of arguments.\n");
        return -1;
    }

    // Construct URL
    char url[300];
    sprintf(url, "https://api.wiley.com/onlinelibrary/tdm/v1/articles/%s", doi);

    // Include token in header
    struct curl_slist *headers = NULL;
    headers = curl_slist_append(headers, header);

    // Initialize libcurl
    curl_global_init(CURL_GLOBAL_ALL);
    CURL *curl = curl_easy_init();

    // Set URL and headers
    curl_easy_setopt(curl, CURLOPT_URL, url);
    curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);

    // Follow redirects
    curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);

    // Generate file name
    char filename[300];
    strcpy(filename, doi);
    replace_char(filename, '/', '_');
    strcat(filename, ".pdf");

    // Open file for writing
    FILE *file = fopen(filename, "wb");
    if (!file) {
        fprintf(stderr, "Failed to open file for writing\n");
        return 1;
    }

    // Set callback function to write response data to file
    curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
    curl_easy_setopt(curl, CURLOPT_WRITEDATA, file);

    // Perform GET request
    CURLcode res = curl_easy_perform(curl);

    // Debugging: Print the response code
    long response_code;
    curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
    printf("Response code: %ld\n", response_code);

    // Cleanup
    fclose(file);
    curl_slist_free_all(headers);
    curl_easy_cleanup(curl);
    curl_global_cleanup();

    if (res != CURLE_OK) {
        fprintf(stderr, "Failed to download PDF: %s\n", curl_easy_strerror(res));
        return 1;
    }

    if (response_code != 200) {
        fprintf(stderr, "PDF download failed: %s\n", filename);
        return 1;
    }

    printf("PDF downloaded successfully: %s\n", filename);

    return 0;
}
Writing ./wiley.c
!make
# Compile the .c file using the gcc compiler with the CFLAGS and links 
# resulting binary with the CURL library
gcc -g -Wall wiley.c -o wiley -lcurl

1. Retrieve full-text of an article#

The Wiley TDM API returns the full-text of an article as a PDF when given the article’s DOI.

In the first example, we download the full-text of the article with the DOI “10.1002/net.22207”. This article was found on the Wiley Online Library.

%%bash

# DOI of article to download
doi="10.1002/net.22207"

# Wiley token to be retrieved
wiley_token=$(cat "wiley_token.txt")

# Download PDF using wiley tool
./wiley -d "$doi" -h "Wiley-TDM-Client-Token: $wiley_token"
Response code: 200
PDF downloaded successfully: 10.1002_net.22207.pdf

2. Retrieve full-text of multiple articles#

In this example, we download 5 articles found in the Wiley Online Library:

%%bash

# DOIs of articles to download
dois=(
    '10.1111/j.1467-8624.2010.01564.x'
    '10.1111/1467-8624.00164'
    '10.1111/cdev.12864'
    '10.1111/j.1467-8624.2007.00995.x'
    '10.1111/j.1467-8624.2010.01499.x'
    '10.1111/j.1467-8624.2010.0149.x'      # Invalid DOI, will throw error
)

# Retrieve Wiley token from file
wiley_token=$(cat "wiley_token.txt")

# Iterate through each DOI
for doi in "${dois[@]}"; do
    # Download PDF using Wiley tool
    ./wiley -d "$doi" -h "Wiley-TDM-Client-Token: $wiley_token"
    
    # Sleep for 1 second
    sleep 1
done
Response code: 200
PDF downloaded successfully: 10.1111_j.1467-8624.2010.01564.x.pdf
Response code: 200
PDF downloaded successfully: 10.1111_1467-8624.00164.pdf
Response code: 200
PDF downloaded successfully: 10.1111_cdev.12864.pdf
Response code: 200
PDF downloaded successfully: 10.1111_j.1467-8624.2007.00995.x.pdf
Response code: 200
PDF downloaded successfully: 10.1111_j.1467-8624.2010.01499.x.pdf
PDF download failed: 10.1111_j.1467-8624.2010.0149.x.pdf
Response code: 404