PubMed API in Mathematica#

by Vishank Patel

NCBI API documentation: https://www.ncbi.nlm.nih.gov/books/NBK25501/

Mathematica PubMed documentation: https://reference.wolfram.com/language/ref/service/PubMed.html

These recipe examples were tested on January 20, 2022.

1. Basic PubMed API call#

Establish Connection#

pubmed = ServiceConnect["PubMed"]
Output

Request data from PubMed API#

Using a PubMed ID:

id = "27933103";
paper = pubmed["PublicationSearch", "ID" -> id]
Output

To get more information about the paper, we can add the “Elements” parameter and set it to “FullData”

paper = pubmed["PublicationSearch", "ID" -> id, "Elements" -> "FullData"]
Output

Note: The output above is not the complete output that Mathematica produces. The dataset is interactable in a Mathematica notebook, though displaying it in Jupyter Notebook loses that capability. We can display the output in the form of a list to view it in its entirety, as shown below:

paper = pubmed["PublicationSearch", "ID" -> id, "Elements" -> "FullData"] //Normal
Output

Let us try to extract the authors of the paper

paper[[All, "Authors"]]
(*Writing just paper["Authors"] throws an error as paper is a list of associations. 
  paper[[All,"Authors"]] works as it goes over every association in the list and extracts the authors*)
{{<|Name -> Scalfani VF, AuthorType -> Author, ClusterID -> |>, 
 
>    <|Name -> Williams AJ, AuthorType -> Author, ClusterID -> |>, 
 
>    <|Name -> Tkachenko V, AuthorType -> Author, ClusterID -> |>, 
 
>    <|Name -> Karapetyan K, AuthorType -> Author, ClusterID -> |>, 
 
>    <|Name -> Pshenichnov A, AuthorType -> Author, ClusterID -> |>, 
 
>    <|Name -> Hanson RM, AuthorType -> Author, ClusterID -> |>, 
 
>    <|Name -> Liddie JM, AuthorType -> Author, ClusterID -> |>, 
 
>    <|Name -> Bara JE, AuthorType -> Author, ClusterID -> |>}}

Request and select data from multiple sources#

idList = {"34813985", "34813932", "34813684", "34813661", "34813372", "34813140", "34813072"};
multiPapers = pubmed["PublicationSearch", "ID" -> idList, "Elements" -> "FullData"]   
(*Setting the "Elements" rule to "FullData" is crucial to displaying
  information like the journal names, DOIs, and other attributes*)
Output

To extract their publishing journal names:

multiPapers[All, "FullJournalName"] //Normal
{Cell calcium, Methods (San Diego, Calif.), The FEBS journal, 
 
>   Development, growth & differentiation, The CRISPR journal, 
 
>   Chembiochem : a European journal of chemical biology, 
 
>   Methods in molecular biology (Clifton, N.J.)}

2. PubMed API Calls with Requests & Parameters#

Requests

  • “PublicationSearch”: a Dataset with information about publications matching search criteria

  • “PublicationTypes” : a List of publication types

  • “PublicationAbstract”: an Abstract for the specified PubMed ID

A list of all the available parameters for these requests can be found in the official Pubmed-Mathematica documentation: https://reference.wolfram.com/language/ref/service/PubMed.html

We can, for example, use a query to search PubMed:

papers = pubmed["PublicationSearch", "Query" -> "Nueroscience", MaxItems -> 10]
(*Note that Mathematica only shows a subset of all the information about the papers. 
  Set "Elements"->"FullData" for complete information.*)
Output

or author:

pubmed["PublicationSearch", "Author" -> "Darwin", MaxItems -> 5]
Output

The results can be sorted by different parameters like (“MostRecent”, “Relevance”, “PublicationDate”, “Author”, “Journal”, “Title”).

pubmed["PublicationSearch", "Query" -> "Coral Reefs", MaxItems -> 5, "SortBy" -> "PublicationDate"]
Output

Available publication types

pubmed["PublicationTypes"][[;;20]] (*Lists the first 20 publication types*)
{Addresses, Autobiography, Bibliography, Biography, Case Reports, Classical Article, 
 
>   Clinical Conference, Clinical Trial, Clinical Trial, Phase I, 
 
>   Clinical Trial, Phase II, Clinical Trial, Phase III, Clinical Trial, Phase IV, 
 
>   Collected Works, Comment, Comparative Study, Congresses, 
 
>   Consensus Development Conference, Consensus Development Conference, NIH, 
 
>   Controlled Clinical Trial, Corrected and Republished Article}

Searching based on publication types:

pubmed["PublicationSearch", "Query" -> "Stem Cells", "PublicationType" -> "Clinical Trial", MaxItems -> 5]
Output

3. PubMed API metadata visualization#

Frequency of publication#

Extracting the frequency of publication for clinical trials for “hydrogel drug delivery” search:

pubDates = 
 pubmed["PublicationSearch", "Query" -> "Hydrogel Drug Delivery", 
    "PublicationType" -> "Clinical Trial", 
    "PublicationDate" -> {"2000", "2021"}, 
    "SortBy" -> "PublicationDate", MaxItems -> 60][All, "PubDate"] // Normal
Output
DateHistogram[pubDates, "Year"]
Output

Frequency of publication for a specific author:

authorDates = 
 pubmed["PublicationSearch", "Query" -> "Reed LK", 
     "SortBy" -> "PublicationDate", MaxItems -> 50][All, "PubDate"] // Normal
Output
DateHistogram[authorDates, "Year"]
Output

WordCloud for the titles of the author’s works:

titleList = 
  pubmed["PublicationSearch", "Query" -> "Reed LK", 
     "SortBy" -> "PublicationDate", MaxItems -> 50][All, "Title"] // Normal;
WordCloud[ToString[StringRiffle[Flatten[titleList], ""]]]
(*Breaking the titles down to their individual words*)
Output

WordCloud for the most-published journal for the latest Mental Health papers.

journalList =
  pubmed["PublicationSearch","Query" -> "Mental Health",
    "SortBy" -> "PublicationDate",MaxItems -> 100, "Elements" -> "FullData"][All, "FullJournalName"];
WordCloud[journalList]
Output