PubMed API in C#
by Cyrus Gomes
These recipe examples were tested on July 25, 2023.
NCBI Entrez Programming Utilities documentation: https://www.ncbi.nlm.nih.gov/books/NBK25501/
Please see NCBI’s Data Usage Policies and Disclaimers: https://www.ncbi.nlm.nih.gov/home/about/policies/
Setup#
First, install the CURL and jq package by typing the following command in the terminal:
!sudo apt install curl jq libcurl4-openssl-dev
Then we set a directory where we want the PubMed directory for our projects to be created:
!mkdir PubMed
Finally, we change the directory to the folder we created:
%cd PubMed
1. Basic PubMed API call#
We initialize a folder for the current project that we are working on. And then change to that directory
!mkdir basic_api_call
%cd basic_api_call
Then we utilize %%file
command to create the following makefile which will compile our program and create an executable.
%%file makefile
# Set the variable CC to gcc, which is used to build the program
CC=gcc
# Enable debugging information and enable all compiler warnings
CFLAGS=-g -Wall
# Sets the bin variable as the name of the binary file we are creating
BIN=api_call
# Create the binary file with the name we put
all: $(BIN)
# Map any file ending in .c to a binary executable.
# "$<" represents the .c file and "$@" represents the target binary executable
%: %.c
# Compile the .c file using the gcc compiler with the CFLAGS and links
# resulting binary with the CURL library
$(CC) $(CFLAGS) $< -o $@ -lcurl
# Clean target which removes specific files
clean:
# Remove the binary file and an ".dSYM" (debug symbols for debugging) directories
# the RM command used -r to remove directories and -f to force delete
$(RM) -rf $(BIN) *.dSYM
The command is used again to create our .c file which contains the code for the program
%%file api_call.c
#include <curl/curl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* CURL program that retrieves JSON data from the Pub Chem API
This program allows custom indicator data set to be used */
/* We are going to be inputting the custom ID like this: ./api_call -i 42342346
If the arguments are missing then we use the default: "27933103" */
int main (int argc, char* argv[]) {
// If arguments are invalid then return
if (argc > 5) {
printf("Error. Please try again correctly.\n");
return -1;
}
// Default indicator code
char indicator[100] = {};
// If there is ./api_call or -i
if ((argc == 1) || ((argc == 2) && (strcmp(argv[1], "-i")==0))) {
// These arguments run the default parameters and keeps the codes as they are
strcat(indicator, "27933103");
}
// If there is ./api_call -i 34813985
else if ((argc == 3) && (strcmp(argv[1], "-i")==0)) {
// Only the country code is changed
strcat(indicator, argv[2]);
}
else {
printf("usage: ./api_call [-i] indicator\n\n");
printf("the custom_ID program is used to retrieve json data from the Pub Med API\n\n");
printf("optional arguments\n");
printf("\t -i ID optional custom PubMed ID; default is '27933103'\n");
return -1;
}
// Initialize the CURL HTTP connection
CURL *curl = curl_easy_init();
// Bits of the url that are joined together later
char api[] = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&";
char type1[] = "id=";
char url[1000];
char label[] = "&retmode=json";
// Check if CURL initialization is a success or not
if (!curl) {
fprintf(stderr, "init failed\n");
return EXIT_FAILURE;
}
// Combine all the bits to produce a functioning url
sprintf(url, "%s%s%s%s", api, type1 , indicator, label);
// Set the url to which the HTTP request will be sent to
// first parameter is for the initialized curl HTTP request, second for the option to be set, and third for the value to be set
curl_easy_setopt(curl, CURLOPT_URL, url);
// If result is not retrieved then output error
CURLcode result = curl_easy_perform(curl);
// If result is not retrieved then output error
if (result != CURLE_OK) {
fprintf(stderr, "download problem: %s\n", curl_easy_strerror(result));
}
// Deallocate memory for the CURL connection
curl_easy_cleanup(curl);
return EXIT_SUCCESS;
}
The folowing program is run, and an executable is created after using the following command:
!make
The article we are requesting has PubMed ID: 27933103
To print the following json data we do the following:
!./api_call | jq '.'
To output the data for multiple ids from the PubMed API, we enter the following command:
!./api_call -i "34813985,34813140" | jq '.'
To output the data for multiple ids from the PubMed API, we enter the following command:
!./api_call | jq '.["result"]["27933103"]["authors"][]'
To output only the author names:
!./api_call | jq '.["result"]["27933103"]["authors"][]["name"]'
"Scalfani VF"
"Williams AJ"
"Tkachenko V"
"Karapetyan K"
"Pshenichnov A"
"Hanson RM"
"Liddie JM"
"Bara JE"
To output the source name from the PubMed API, we enter the following command:
!./api_call -i 34813072 | jq '.["result"]["34813072"]["source"]'
"Methods Mol Biol"
Here, we output the source name for multiple ids:
%%bash
# List of IDs
idList=('34813985' '34813932' '34813684' '34813661' '34813372' '34813140' '34813072')
for id in "${idList[@]}"; do
# Retrieve the source name for the given id
./api_call -i "$id" | jq --arg location "$id" '.["result"][$location]["source"]'
# Sleep delay
sleep 1
done
"Cell Calcium"
"Methods"
"FEBS J"
"Dev Growth Differ"
"CRISPR J"
"Chembiochem"
"Methods Mol Biol"
2. PubMed API Calls with Requests & Parameters#
We go back to our original directory
%cd ..
We initialize a folder for the current project that we are working on. And then change to that directory
!mkdir api_request_parameter
We then change directory to the project that we are working on
%cd api_request_parameter
%%file makefile
# Set the variable CC to gcc, which is used to build the program
CC=gcc
# Enable debugging information and enable all compiler warnings
CFLAGS=-g -Wall
# Set the bin variable as the name of the binary file we are creating
BIN=api_req_par
# Create the binary file with the name we put
all: $(BIN)
# Map any file ending in .c to a binary executable.
# "$<" represents the .c file and "$@" represents the target binary executable
%: %.c
# Compile the .c file using the gcc compiler with the CFLAGS and links
# resulting binary with the CURL library
$(CC) $(CFLAGS) $< -o $@ -lcurl
# Clean target which removes specific files
clean:
# Remove the binary file and an ".dSYM" (debug symbols for debugging) directories
# the RM command used -r to remove directories and -f to force delete
$(RM) -rf $(BIN) *.dSYM
The command is used again to create our .c file which contains the code for the program
%%file api_req_par.c
#include <curl/curl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* CURL program that retrieves JSON data from the PubMed API
This program allows custom request to be used along with the parameter */
/* We will input the custom database and query like this: ./api_req_par -d "pubmed" -q "neuroscience+intervention+learning"
If the arguments are missing then we use the default: "pubmed" "neuroscience" */
int main (int argc, char* argv[]) {
// If arguments are invalid just return
if (argc > 5) {
printf("Error. Please try again correctly.\n");
return -1;
}
// Default parameter and request codes
char parameter[100] = {};
char request[500] = {};
// If there is ./api_req_par -d/-q
if ((argc == 1) || ((argc == 2) && ((strcmp(argv[1], "-d")==0) || (strcmp(argv[1], "-q")==0)))) {
// These arguments run the default parameters and keeps the codes as they are
strcat(parameter,"pubmed");
strcat(request, "neuroscience");
}
// If there is ./api_req_par -d "pubmed"
else if ((argc == 3) && (strcmp(argv[1], "-d")==0)) {
// Only the parameter code is changed
strcat(parameter,argv[2]);
strcat(request, "neuroscience");
}
// If there is ./api_req_par -d "pubmed" -q
else if ((argc == 4) && (strcmp(argv[1], "-d")==0) && (strcmp(argv[3], "-q")==0)) {
// Only the parameter code is changed
strcat(parameter,argv[2]);
strcat(request, "neuroscience");
}
// If there is ./api_req_par -d "pubmed" -q "neuroscience+intervention+learning"
else if ((argc == 5) && (strcmp(argv[1], "-d")==0) && (strcmp(argv[3], "-q")==0)) {
// Both the parameter and request codes are changed
strcat(parameter,argv[2]);
strcat(request, argv[4]);
}
// If there is ./api_req_par -q "neuroscience+intervention+learning"
else if ((argc == 3) && (strcmp(argv[1], "-q")==0)) {
// Only the request code is changed
strcat(parameter,"pubmed");
strcat(request, argv[2]);
}
// If there is ./api_req_par -q "neuroscience+intervention+learning" -d
else if ((argc == 4) && (strcmp(argv[1], "-q")==0) && (strcmp(argv[3], "-d")==0)) {
// Only the request code is changed
strcat(parameter,"pubmed");
strcat(request, argv[2]);
}
// If there is ./api_req_par -q "neuroscience+intervention+learning" -d "pubmed"
else if ((argc == 5) && (strcmp(argv[1], "-q")==0) && (strcmp(argv[3], "-d")==0)) {
// Both the request and parameter codes are changed
strcat(parameter,argv[4]);
strcat(request, argv[2]);
}
else {
printf("usage: ./api_req_par [-q] request [-d] parameter\n\n");
printf("the api_req_par program is used to retrieve json data from the PubMed API\n\n");
printf("optional arguments\n");
printf("\t -q query optional custom query; default is 'neuroscience'\n");
printf("\t -d parameter optional custom database code; default is 'pubmed', see: https://www.ncbi.nlm.nih.gov/books/NBK25499/\n");
return -1;
}
// Initialize the CURL HTTP connection
CURL *curl = curl_easy_init();
// Bits of the url that are joined together later
char api[] = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=";
char type1[] = "&";
char type2[] = "term=";
char type3[] = "&retmode=json";
char url[1000];
// Check if CURL initialization is a success or not
if (!curl) {
fprintf(stderr, "init failed\n");
return EXIT_FAILURE;
}
// Combine all the bits to produce a functioning url
sprintf(url, "%s%s%s%s%s%s", api, parameter, type1 , type2, request, type3);
// Set the url to which the HTTP request will be sent to
// first parameter is for the initialized curl HTTP request, second for the option to be set, and third for the value to be set
curl_easy_setopt(curl, CURLOPT_URL, url);
// If result is not retrieved then output error
CURLcode result = curl_easy_perform(curl);
// If result is not retrieved then output error
if (result != CURLE_OK) {
fprintf(stderr, "download problem: %s\n", curl_easy_strerror(result));
}
// Deallocate memory for the CURL connection
curl_easy_cleanup(curl);
return EXIT_SUCCESS;
}
The folowing program is run, and an executable is created after using the following command:
!make
The default parameter is “pubmed” and the default requests are “neuroscience”
The folowing program is run, and an executable is created after using the following command:
!./api_req_par| jq '.'
!./api_req_par -q "aspirin" -d "pccompound" | jq '.'
The number of returned IDs can be adjusted with the retmax paramater:
!./api_req_par -q "neuroscience+intervention+learning&retmax=25" | jq '.esearchresult.idlist'
[
"38305455",
"38304851",
"38304576",
"38303964",
"38303627",
"38302998",
"38302981",
"38302296",
"38301832",
"38301514",
"38301234",
"38300213",
"38299388",
"38298927",
"38298912",
"38298803",
"38298796",
"38298788",
"38298783",
"38298781",
"38298775",
"38297494",
"38296969",
"38295471",
"38293166"
]
!./api_req_par -q "neuroscience+intervention+learning&retmax=25" | jq '.esearchresult.idlist | length'
25
We can also use the query to search for an author.
We will add [au]
after the name to specify it is an author
!./api_req_par -q "Darwin[au]" | jq '.esearchresult.count'
"630"
We get the idlist
for the custom request:
!./api_req_par -q "Coral+Reefs&retmode=json&usehistory=y&sort=pub+date" | jq '.esearchresult.idlist'
[
"37393678",
"37315600",
"37209734",
"37290662",
"37286001",
"37257610",
"37247740",
"37286027",
"37399735",
"37385181",
"37331272",
"37311517",
"37137368",
"37105476",
"37022443",
"36549653",
"37465983",
"37487981",
"37481620",
"37100135"
]
Searching based on publication types:
we can do this by adding AND into the search
term=<searchQuery>+AND+filter[filterType]
[pt]
specifies that the filter type is publication type
More filters can be found at https://pubmed.ncbi.nlm.nih.gov/help/
!./api_req_par -q "stem+cells+AND+clinical+trial[pt]" | jq '{esearchresult: .esearchresult}'
3. PubMed API metadata visualization#
Frequency of topic sortpubdate field#
Extracting the sortpubdate field for a “hydrogel drug” search results, limited to publication type clinical trials:
!./api_req_par -q "hydrogel+drug+AND+clinical+trial[pt]&sort=pub+date&retmax=500" | jq '.esearchresult.idlist[0:10]'
[
"36418469",
"36870516",
"36842739",
"36203046",
"36261491",
"35830550",
"34653384",
"35556170",
"35413602",
"35041809"
]
!./api_req_par -q "hydrogel+drug+AND+clinical+trial[pt]&sort=pub+date&retmax=500" | jq '.esearchresult.idlist | length'
302
The following code will store the list of IDs in a text file:
!./api_req_par -q "hydrogel+drug+AND+clinical+trial[pt]&sort=pub+date&retmax=500" | jq '.esearchresult.idlist' > idList.txt
To format the text file we use:
!cat idList.txt | tr -d '",[]' > idList2.txt
!sed -i '/^$/d' idList2.txt
!cat idList2.txt | wc -l
302
Show the first 10 IDs:
!head -10 idList2.txt
36418469
36870516
36842739
36203046
36261491
35830550
34653384
35556170
35413602
35041809
We want to get the E-summary of each of the IDs:
Hence we copy the api_call
program from our previous project to our current directory
!cp ../basic_api_call/api_call .
We test to see if we get the date for one ID
!./api_call -i 34813072 | jq '.["result"]["34813072"]["sortpubdate"][0:10]'
"2022/01/01"
We then do the same to all the IDs and store them in a .txt file
%%bash
# Now loop through each IDs and get the sortpubdate field.
# Note that this sortpubdate field may not necassarily be equivalent to a publication date
while read id; do
# Retrieve data from the api and append the date to the .txt file
./api_call -i "$id" | jq --arg ids "$id" '.["result"][$ids]["sortpubdate"][0:10]' >> date_time.txt
# Sleep delay
sleep 1
done < idList2.txt
!head -10 date_time.txt
"2010/09/01"
"2009/03/01"
"2009/03/01"
"2010/08/01"
"2009/03/01"
"2010/08/01"
"2009/02/15"
"2010/07/01"
"2009/02/01"
"2010/01/01"