PubMed API in Mathematica#
by Vishank Patel
NCBI API documentation: https://www.ncbi.nlm.nih.gov/books/NBK25501/
Mathematica PubMed documentation: https://reference.wolfram.com/language/ref/service/PubMed.html
These recipe examples were tested on January 20, 2022.
1. Basic PubMed API call#
Establish Connection#
pubmed = ServiceConnect["PubMed"]
Request data from PubMed API#
Using a PubMed ID:
id = "27933103";
paper = pubmed["PublicationSearch", "ID" -> id]
To get more information about the paper, we can add the “Elements” parameter and set it to “FullData”
paper = pubmed["PublicationSearch", "ID" -> id, "Elements" -> "FullData"]
Note: The output above is not the complete output that Mathematica produces. The dataset is interactable in a Mathematica notebook, though displaying it in Jupyter Notebook loses that capability. We can display the output in the form of a list to view it in its entirety, as shown below:
paper = pubmed["PublicationSearch", "ID" -> id, "Elements" -> "FullData"] //Normal
Let us try to extract the authors of the paper
paper[[All, "Authors"]]
(*Writing just paper["Authors"] throws an error as paper is a list of associations.
paper[[All,"Authors"]] works as it goes over every association in the list and extracts the authors*)
{{<|Name -> Scalfani VF, AuthorType -> Author, ClusterID -> |>, > <|Name -> Williams AJ, AuthorType -> Author, ClusterID -> |>, > <|Name -> Tkachenko V, AuthorType -> Author, ClusterID -> |>, > <|Name -> Karapetyan K, AuthorType -> Author, ClusterID -> |>, > <|Name -> Pshenichnov A, AuthorType -> Author, ClusterID -> |>, > <|Name -> Hanson RM, AuthorType -> Author, ClusterID -> |>, > <|Name -> Liddie JM, AuthorType -> Author, ClusterID -> |>, > <|Name -> Bara JE, AuthorType -> Author, ClusterID -> |>}}
Request and select data from multiple sources#
idList = {"34813985", "34813932", "34813684", "34813661", "34813372", "34813140", "34813072"};
multiPapers = pubmed["PublicationSearch", "ID" -> idList, "Elements" -> "FullData"]
(*Setting the "Elements" rule to "FullData" is crucial to displaying
information like the journal names, DOIs, and other attributes*)
To extract their publishing journal names:
multiPapers[All, "FullJournalName"] //Normal
{Cell calcium, Methods (San Diego, Calif.), The FEBS journal, > Development, growth & differentiation, The CRISPR journal, > Chembiochem : a European journal of chemical biology, > Methods in molecular biology (Clifton, N.J.)}
2. PubMed API Calls with Requests & Parameters#
Requests
“PublicationSearch”: a Dataset with information about publications matching search criteria
“PublicationTypes” : a List of publication types
“PublicationAbstract”: an Abstract for the specified PubMed ID
A list of all the available parameters for these requests can be found in the official Pubmed-Mathematica documentation: https://reference.wolfram.com/language/ref/service/PubMed.html
We can, for example, use a query to search PubMed:
papers = pubmed["PublicationSearch", "Query" -> "Nueroscience", MaxItems -> 10]
(*Note that Mathematica only shows a subset of all the information about the papers.
Set "Elements"->"FullData" for complete information.*)
or author:
pubmed["PublicationSearch", "Author" -> "Darwin", MaxItems -> 5]
The results can be sorted by different parameters like (“MostRecent”, “Relevance”, “PublicationDate”, “Author”, “Journal”, “Title”).
pubmed["PublicationSearch", "Query" -> "Coral Reefs", MaxItems -> 5, "SortBy" -> "PublicationDate"]
Available publication types
pubmed["PublicationTypes"][[;;20]] (*Lists the first 20 publication types*)
{Addresses, Autobiography, Bibliography, Biography, Case Reports, Classical Article, > Clinical Conference, Clinical Trial, Clinical Trial, Phase I, > Clinical Trial, Phase II, Clinical Trial, Phase III, Clinical Trial, Phase IV, > Collected Works, Comment, Comparative Study, Congresses, > Consensus Development Conference, Consensus Development Conference, NIH, > Controlled Clinical Trial, Corrected and Republished Article}
Searching based on publication types:
pubmed["PublicationSearch", "Query" -> "Stem Cells", "PublicationType" -> "Clinical Trial", MaxItems -> 5]
3. PubMed API metadata visualization#
Frequency of publication#
Extracting the frequency of publication for clinical trials for “hydrogel drug delivery” search:
pubDates =
pubmed["PublicationSearch", "Query" -> "Hydrogel Drug Delivery",
"PublicationType" -> "Clinical Trial",
"PublicationDate" -> {"2000", "2021"},
"SortBy" -> "PublicationDate", MaxItems -> 60][All, "PubDate"] // Normal
DateHistogram[pubDates, "Year"]
Frequency of publication for a specific author:
authorDates =
pubmed["PublicationSearch", "Query" -> "Reed LK",
"SortBy" -> "PublicationDate", MaxItems -> 50][All, "PubDate"] // Normal
DateHistogram[authorDates, "Year"]
WordCloud for the titles of the author’s works:
titleList =
pubmed["PublicationSearch", "Query" -> "Reed LK",
"SortBy" -> "PublicationDate", MaxItems -> 50][All, "Title"] // Normal;
WordCloud[ToString[StringRiffle[Flatten[titleList], ""]]]
(*Breaking the titles down to their individual words*)
WordCloud for the most-published journal for the latest Mental Health papers.
journalList =
pubmed["PublicationSearch","Query" -> "Mental Health",
"SortBy" -> "PublicationDate",MaxItems -> 100, "Elements" -> "FullData"][All, "FullJournalName"];
WordCloud[journalList]