PubMed API in C

PubMed API in C#

by Cyrus Gomes

These recipe examples were tested on July 25, 2023.

NCBI Entrez Programming Utilities documentation: https://www.ncbi.nlm.nih.gov/books/NBK25501/

Please see NCBI’s Data Usage Policies and Disclaimers: https://www.ncbi.nlm.nih.gov/home/about/policies/

Setup#

First, install the CURL and jq package by typing the following command in the terminal:

!sudo apt install curl jq libcurl4-openssl-dev

Then we set a directory where we want the PubMed directory for our projects to be created:

!mkdir PubMed

Finally, we change the directory to the folder we created:

%cd PubMed

1. Basic PubMed API call#

We initialize a folder for the current project that we are working on. And then change to that directory

!mkdir basic_api_call

%cd basic_api_call

Then we utilize %%file command to create the following makefile which will compile our program and create an executable.

%%file makefile

# Set the variable CC to gcc, which is used to build the program
CC=gcc

# Enable debugging information and enable all compiler warnings
CFLAGS=-g -Wall

# Sets the bin variable as the name of the binary file we are creating
BIN=api_call

# Create the binary file with the name we put
all: $(BIN)

# Map any file ending in .c to a binary executable. 
# "$<" represents the .c file and "$@" represents the target binary executable
%: %.c

	# Compile the .c file using the gcc compiler with the CFLAGS and links 
	# resulting binary with the CURL library
	$(CC) $(CFLAGS) $< -o $@ -lcurl

# Clean target which removes specific files
clean:

	# Remove the binary file and an ".dSYM" (debug symbols for debugging) directories
	# the RM command used -r to remove directories and -f to force delete
	$(RM) -rf $(BIN) *.dSYM

The command is used again to create our .c file which contains the code for the program

%%file api_call.c

#include <curl/curl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* CURL program that retrieves JSON data from the Pub Chem API
This program allows custom indicator data set to be used */

/* We are going to be inputting the custom ID like this: ./api_call -i 42342346
If the arguments are missing then we use the default: "27933103" */

int main (int argc, char* argv[]) {
    
    // If arguments are invalid then return
    if (argc > 5) {                                                                                      
        printf("Error. Please try again correctly.\n");
        return -1;
    }

    // Default indicator code
    char indicator[100] = {}; 

    // If there is ./api_call or -i
    if ((argc == 1) || ((argc == 2) && (strcmp(argv[1], "-i")==0))) {
        // These arguments run the default parameters and keeps the codes as they are
        strcat(indicator, "27933103");
    }

    // If there is ./api_call -i 34813985
    else if ((argc == 3) && (strcmp(argv[1], "-i")==0)) {
        // Only the country code is changed
        strcat(indicator, argv[2]);
    }

    else {
        printf("usage: ./api_call [-i] indicator\n\n");
        printf("the custom_ID program is used to retrieve json data from the Pub Med API\n\n");
        printf("optional arguments\n");
        printf("\t -i ID    optional custom PubMed ID; default is '27933103'\n");
        return -1;
    }

    // Initialize the CURL HTTP connection
    CURL *curl = curl_easy_init();

    // Bits of the url that are joined together later
    char api[] = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&";                                                                     
    char type1[] = "id=";                          
    char url[1000];
    char label[] = "&retmode=json";

    // Check if CURL initialization is a success or not
    if (!curl) {                                                                                         
        fprintf(stderr, "init failed\n");
        return EXIT_FAILURE;
    }
        
    // Combine all the bits to produce a functioning url
    sprintf(url, "%s%s%s%s", api, type1 , indicator, label);                                             
                                          
    // Set the url to which the HTTP request will be sent to
    // first parameter is for the initialized curl HTTP request, second for the option to be set, and third for the value to be set
    curl_easy_setopt(curl, CURLOPT_URL, url);

    // If result is not retrieved then output error
    CURLcode result = curl_easy_perform(curl);

    // If result is not retrieved then output error
    if (result != CURLE_OK) {                                                                            
        fprintf(stderr, "download problem: %s\n", curl_easy_strerror(result));
    }

    // Deallocate memory for the CURL connection
    curl_easy_cleanup(curl);                                                                            
    return EXIT_SUCCESS;
}

The folowing program is run, and an executable is created after using the following command:

!make

The article we are requesting has PubMed ID: 27933103

To print the following json data we do the following:

!./api_call | jq '.'

To output the data for multiple ids from the PubMed API, we enter the following command:

!./api_call -i "34813985,34813140" | jq '.'

To output the data for multiple ids from the PubMed API, we enter the following command:

!./api_call | jq '.["result"]["27933103"]["authors"][]'

To output only the author names:

!./api_call | jq '.["result"]["27933103"]["authors"][]["name"]'

"Scalfani VF"
"Williams AJ"
"Tkachenko V"
"Karapetyan K"
"Pshenichnov A"
"Hanson RM"
"Liddie JM"
"Bara JE"

To output the source name from the PubMed API, we enter the following command:

!./api_call -i 34813072 | jq '.["result"]["34813072"]["source"]'

"Methods Mol Biol"

Here, we output the source name for multiple ids:

%%bash

# List of IDs
idList=('34813985' '34813932' '34813684' '34813661' '34813372' '34813140' '34813072')

for id in "${idList[@]}"; do 

    # Retrieve the source name for the given id
    ./api_call -i "$id" | jq --arg location "$id" '.["result"][$location]["source"]'
    
    # Sleep delay
    sleep 1
    
done

"Cell Calcium"
"Methods"
"FEBS J"
"Dev Growth Differ"
"CRISPR J"
"Chembiochem"
"Methods Mol Biol"

2. PubMed API Calls with Requests & Parameters#

We go back to our original directory

%cd ..

We initialize a folder for the current project that we are working on. And then change to that directory

!mkdir api_request_parameter

We then change directory to the project that we are working on

%cd api_request_parameter

%%file makefile

# Set the variable CC to gcc, which is used to build the program
CC=gcc

# Enable debugging information and enable all compiler warnings
CFLAGS=-g -Wall

# Set the bin variable as the name of the binary file we are creating
BIN=api_req_par

# Create the binary file with the name we put
all: $(BIN)

# Map any file ending in .c to a binary executable. 
# "$<" represents the .c file and "$@" represents the target binary executable
%: %.c

	# Compile the .c file using the gcc compiler with the CFLAGS and links 
	# resulting binary with the CURL library
	$(CC) $(CFLAGS) $< -o $@ -lcurl

# Clean target which removes specific files
clean:

	# Remove the binary file and an ".dSYM" (debug symbols for debugging) directories
	# the RM command used -r to remove directories and -f to force delete
	$(RM) -rf $(BIN) *.dSYM

The command is used again to create our .c file which contains the code for the program

%%file api_req_par.c

#include <curl/curl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* CURL program that retrieves JSON data from the PubMed API
This program allows custom request to be used along with the parameter */

/* We will input the custom database and query like this: ./api_req_par -d "pubmed" -q "neuroscience+intervention+learning"
If the arguments are missing then we use the default: "pubmed" "neuroscience" */

int main (int argc, char* argv[]) {
    
    // If arguments are invalid just return
    if (argc > 5) {                                                                                      
        printf("Error. Please try again correctly.\n");
        return -1;
    }

    // Default parameter and request codes
    char parameter[100] = {};
    char request[500] = {}; 

    // If there is ./api_req_par -d/-q
    if ((argc == 1) || ((argc == 2) && ((strcmp(argv[1], "-d")==0) || (strcmp(argv[1], "-q")==0)))) {
        // These arguments run the default parameters and keeps the codes as they are
        strcat(parameter,"pubmed");
        strcat(request, "neuroscience");
    }

    // If there is ./api_req_par -d "pubmed"
    else if ((argc == 3) && (strcmp(argv[1], "-d")==0)) {
        // Only the parameter code is changed
        strcat(parameter,argv[2]);
        strcat(request, "neuroscience");
    }

    // If there is ./api_req_par -d "pubmed" -q
    else if ((argc == 4) && (strcmp(argv[1], "-d")==0) && (strcmp(argv[3], "-q")==0)) {
        // Only the parameter code is changed
        strcat(parameter,argv[2]);
        strcat(request, "neuroscience");
    }

    // If there is ./api_req_par -d "pubmed" -q "neuroscience+intervention+learning"
    else if ((argc == 5) && (strcmp(argv[1], "-d")==0) && (strcmp(argv[3], "-q")==0)) {
        // Both the parameter and request codes are changed
        strcat(parameter,argv[2]);
        strcat(request, argv[4]);
    }

    // If there is ./api_req_par -q "neuroscience+intervention+learning"
    else if ((argc == 3) && (strcmp(argv[1], "-q")==0)) {
        // Only the request code is changed
        strcat(parameter,"pubmed");
        strcat(request, argv[2]);
    }

    // If there is ./api_req_par -q "neuroscience+intervention+learning" -d
    else if ((argc == 4) && (strcmp(argv[1], "-q")==0) && (strcmp(argv[3], "-d")==0)) {
        // Only the request code is changed
        strcat(parameter,"pubmed");
        strcat(request, argv[2]);
    }

    // If there is ./api_req_par -q "neuroscience+intervention+learning" -d "pubmed" 
    else if ((argc == 5) && (strcmp(argv[1], "-q")==0) && (strcmp(argv[3], "-d")==0)) {
        // Both the request and parameter codes are changed
        strcat(parameter,argv[4]);
        strcat(request, argv[2]);
    }

    else {
        printf("usage: ./api_req_par [-q] request [-d] parameter\n\n");
        printf("the api_req_par program is used to retrieve json data from the PubMed API\n\n");
        printf("optional arguments\n");
        printf("\t -q query        optional custom query; default is 'neuroscience'\n");
        printf("\t -d parameter    optional custom database code; default is 'pubmed',  see: https://www.ncbi.nlm.nih.gov/books/NBK25499/\n");
        return -1;
    }

    // Initialize the CURL HTTP connection
    CURL *curl = curl_easy_init();

    // Bits of the url that are joined together later
    char api[] = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=";                                                                     
    char type1[] = "&";
    char type2[] = "term=";
    char type3[] = "&retmode=json";                           
    char url[1000];

    // Check if CURL initialization is a success or not
    if (!curl) {                                                                                         
        fprintf(stderr, "init failed\n");
        return EXIT_FAILURE;
    }
        
    // Combine all the bits to produce a functioning url
    sprintf(url, "%s%s%s%s%s%s", api, parameter, type1 , type2, request, type3);                                             
                                          

    // Set the url to which the HTTP request will be sent to
    // first parameter is for the initialized curl HTTP request, second for the option to be set, and third for the value to be set
    curl_easy_setopt(curl, CURLOPT_URL, url);

    // If result is not retrieved then output error
    CURLcode result = curl_easy_perform(curl);

    // If result is not retrieved then output error
    if (result != CURLE_OK) {                                                                            
        fprintf(stderr, "download problem: %s\n", curl_easy_strerror(result));
    }

    // Deallocate memory for the CURL connection
    curl_easy_cleanup(curl);                                                                            
    return EXIT_SUCCESS;
}

The folowing program is run, and an executable is created after using the following command:

!make

The default parameter is “pubmed” and the default requests are “neuroscience”

The folowing program is run, and an executable is created after using the following command:

!./api_req_par| jq '.'

!./api_req_par -q "aspirin" -d "pccompound" | jq '.'

The number of returned IDs can be adjusted with the retmax paramater:

!./api_req_par -q "neuroscience+intervention+learning&retmax=25" | jq '.esearchresult.idlist'

[
  "38305455",
  "38304851",
  "38304576",
  "38303964",
  "38303627",
  "38302998",
  "38302981",
  "38302296",
  "38301832",
  "38301514",
  "38301234",
  "38300213",
  "38299388",
  "38298927",
  "38298912",
  "38298803",
  "38298796",
  "38298788",
  "38298783",
  "38298781",
  "38298775",
  "38297494",
  "38296969",
  "38295471",
  "38293166"
]

!./api_req_par -q "neuroscience+intervention+learning&retmax=25" | jq '.esearchresult.idlist | length'

We can also use the query to search for an author.

We will add [au] after the name to specify it is an author

!./api_req_par -q "Darwin[au]" | jq '.esearchresult.count'

"630"

We get the idlist for the custom request:

!./api_req_par -q "Coral+Reefs&retmode=json&usehistory=y&sort=pub+date" | jq '.esearchresult.idlist'

[
  "37393678",
  "37315600",
  "37209734",
  "37290662",
  "37286001",
  "37257610",
  "37247740",
  "37286027",
  "37399735",
  "37385181",
  "37331272",
  "37311517",
  "37137368",
  "37105476",
  "37022443",
  "36549653",
  "37465983",
  "37487981",
  "37481620",
  "37100135"
]

Searching based on publication types:

we can do this by adding AND into the search

term=<searchQuery>+AND+filter[filterType]

[pt] specifies that the filter type is publication type

More filters can be found at https://pubmed.ncbi.nlm.nih.gov/help/

!./api_req_par -q "stem+cells+AND+clinical+trial[pt]" | jq '{esearchresult: .esearchresult}'

3. PubMed API metadata visualization#

Frequency of topic sortpubdate field#

Extracting the sortpubdate field for a “hydrogel drug” search results, limited to publication type clinical trials:

!./api_req_par -q "hydrogel+drug+AND+clinical+trial[pt]&sort=pub+date&retmax=500" | jq '.esearchresult.idlist[0:10]'

[
  "36418469",
  "36870516",
  "36842739",
  "36203046",
  "36261491",
  "35830550",
  "34653384",
  "35556170",
  "35413602",
  "35041809"
]

!./api_req_par -q "hydrogel+drug+AND+clinical+trial[pt]&sort=pub+date&retmax=500" | jq '.esearchresult.idlist | length'

The following code will store the list of IDs in a text file:

!./api_req_par -q "hydrogel+drug+AND+clinical+trial[pt]&sort=pub+date&retmax=500" | jq '.esearchresult.idlist' > idList.txt

To format the text file we use:

!cat idList.txt | tr -d '",[]' > idList2.txt

!sed -i '/^$/d' idList2.txt

!cat idList2.txt | wc -l

Show the first 10 IDs:

!head -10 idList2.txt

We want to get the E-summary of each of the IDs:

Hence we copy the api_call program from our previous project to our current directory

!cp ../basic_api_call/api_call .

We test to see if we get the date for one ID

!./api_call -i 34813072 | jq '.["result"]["34813072"]["sortpubdate"][0:10]'

"2022/01/01"

We then do the same to all the IDs and store them in a .txt file

%%bash

# Now loop through each IDs and get the sortpubdate field. 
# Note that this sortpubdate field may not necassarily be equivalent to a publication date

while read id; do

  # Retrieve data from the api and append the date to the .txt file
  ./api_call -i "$id" | jq --arg ids "$id" '.["result"][$ids]["sortpubdate"][0:10]' >> date_time.txt
  
  # Sleep delay
  sleep 1
  
done < idList2.txt

!head -10 date_time.txt

"2010/09/01"
"2009/03/01"
"2009/03/01"
"2010/08/01"
"2009/03/01"
"2010/08/01"
"2009/02/15"
"2010/07/01"
"2009/02/01"
"2010/01/01"

Frequency of publication for an author search#

!./api_req_par -q "Reed+LK[au]&sort=pub+date&retmax=500" | jq '.["esearchresult"]["count"]'

"59"

We store the id list data in a .txt file

!./api_req_par -q "Reed+LK[au]&sort=pub+date&retmax=500" | jq '.["esearchresult"]["idlist"]' > id_list3.txt

To format the text file we use:

!cat id_list3.txt | tr -d '",[]' > idList4.txt

!sed -i '/^$/d' idList4.txt

!cat idList4.txt | wc -l

Show the first 10 IDs:

!head -10 idList4.txt

%%bash

# Algorithm to retrieve the dates for each of the ids

while read id; do

  ./api_call -i "$id" | jq --arg ids "$id" '.["result"][$ids]["sortpubdate"][0:10]' >> date_time2.txt
  
  # Sleep delay
  sleep 1

done < idList4.txt

PubMed API in C

Contents

PubMed API in C#

Setup#

1. Basic PubMed API call#

2. PubMed API Calls with Requests & Parameters#

3. PubMed API metadata visualization#

Frequency of topic sortpubdate field#

Frequency of publication for an author search#