Crossref API in Mathematica#

by Vishank Patel

Crossref API documentation: https://api.crossref.org/swagger-ui/index.html

Also see the CrossRef Mathematica documentation: https://reference.wolfram.com/language/ref/service/CrossRef.html

These recipe examples were tested on January 20, 2022.

From our testing, we have found that the Crossref metadata across publishers and even journals can vary considerably. As a result, it can be easier to work with one journal at a time when using the Crossref API (e.g., particulary when trying to extract selected data from records).

1. Basic Crossref API call#

Establish a connection#

crossref = ServiceConnect["CrossRef"]
Output

Request data from Crossref API#

doi = "10.1038/357225a0";
paper = crossref["WorkInformation", "DOI" -> doi];
paper
Output

Format data output#

To present the same data in a nice table format, we can use a postfix called “Dataset”.

paper // Dataset
Output

Select some specific data#

paper["Title"]
{Instability and reconnection in the head-on collision of two vortex rings}
authorNum = 2; (*Second author*)

(*authorNum variable has been created only for clarity, 
the same output can be achieved without defining the variable.*)

sAuthor = paper["Author"][[authorNum, "Family"]]
Nickels
paper["Reference"][[1]] // Dataset     
(*Extracting the first reference*)
Output

To see the publishing journal for the paper’s first reference, we will first turn the default list output into an “Association”.

pAssociation = Association[paper["Reference"][[1]]];
pAssociation["journal-title"]
phys. Soc. Japan

To do this for all the paper’s references,

allRefs = paper["Reference"]; (*returns a list of lists*)
allRefsAssoc = Association @@@ allRefs; (*converts lists at the first level into associations*)
allRefsAssoc[[All, "journal-title"]]
Output
% // Dataset
Output

Similarly, we can also extract the reference years.

allRefsAssoc[[All, "year"]] // Dataset
Output

Request and select data from multiple sources#

Let us define three papers using their DOIs, and the request “WorkInformation”.

vortexP = crossref["WorkInformation", "DOI" -> "10.1038/357225a0"];
crisprP = crossref["WorkInformation", "DOI" -> "10.1038/nprot.2013.143"];
globalWarmingP = crossref["WorkInformation", "DOI" -> "10.1038/d41586-018-07586-5"];

Get article titles:

{vortexP[#], crisprP[#], globalWarmingP[#]} &["Title"]
(*Here, # serves as a placeholder, and anything after the & sign is used to replace the #*)
{{Instability and reconnection in the head-on collision of two vortex rings}, 
 
>   {Genome engineering using the CRISPR-Cas9 system}, 
 
>   {Global warming will happen faster than we think}}

Looking at the publisher, we can see that all of them have been published by Springer Science and Business Media LLC.

{vortexP[#], crisprP[#], globalWarmingP[#]} &["Publisher"]
{Springer Science and Business Media LLC, Springer Science and Business Media LLC, 
 
>   Springer Science and Business Media LLC}

Published print date:

{vortexP[#], crisprP[#], globalWarmingP[#]} &["PublishedPrint"] //Dataset
Output

2. Acquiring a list of DOIs#

Let us extract a list of DOIs by asking for papers from a particular journal from the year 2019 to 2021. Working with the Journal of Cheminformatics, CrossRef can be queried using the journal’s ISSN (International Standard Serial Number), which is 1758-2946.

papers = crossref["WorksDataset", "ISSN" -> "1758-2946","IssuedDate" -> {DateObject[{2019}], DateObject[{2021}]}, MaxItems -> 10];

As we can see below, all the papers are from J Cheminform.

papers[All, "ShortContainerTitle"] //Normal
{{J Cheminform}, {J Cheminform}, {J Cheminform}, {J Cheminform}, {J Cheminform}, 
 
>   {J Cheminform}, {J Cheminform}, {J Cheminform}, {J Cheminform}, {J Cheminform}}

Extracting their respective DOIs.

doiList = papers[All, "DOI"] //Normal
{10.1186/s13321-019-0377-0, 10.1186/s13321-020-00463-2, 10.1186/s13321-020-00424-9, 
 
>   10.1186/s13321-019-0373-4, 10.1186/s13321-019-0357-4, 10.1186/s13321-019-0340-0, 
 
>   10.1186/s13321-019-0351-x, 10.1186/s13321-019-0363-6, 10.1186/s13321-020-00468-x, 
 
>   10.1186/s13321-020-00474-z}

Note that the number of DOIs can easily be changed by manipulating the MaxItems parameter while querying crossref. If the parameter is not defined, Mathematica sets it to 20.

3. Crossref API call with a loop#

Now that we know how to extract a list of DOIs, below are a few ways by which we could operate on them.

listDOIs = {"10.1093/oso/9780198828044.003.0003", "10.1093/oso/9780198714934.003.0006", "10.7551/mitpress/13811.003.0005", "10.1093/oso/9780190941659.003.0001", "10.7551/mitpress/8996.003.0003", "10.1017/9781107338548.009", "10.1002/9781119557500.ch1", "10.7551/mitpress/8996.003.0016", "10.7551/mitpress/13811.003.0004", "10.1002/9781119557500.ch12"};
For[i = 1, i <= Length[listDOIs], i++,
 Print[crossref["WorkInformation", "DOI" -> listDOIs[[i]]]["Title"]]]
{Machine learning with sklearn}
{Statistical machine learning}
{Machine Learning, Statistics, and Data Analytics}
{Why Use Automated Machine Learning?}
{Introduction: Optimization and Machine Learning}
{Adversarial Machine Learning Challenges}
{Introduction to Machine Learning}
{Robust Optimization in Machine Learning}
{Why We Are Interested in Machine Learning}
{Deploying Machine Learning Models}

To check their respective publishers:

For[i = 1, i <= Length[listDOIs], i++,
 Print[crossref["WorkInformation", "DOI" -> listDOIs[[i]]]["Publisher"]]]
Oxford University Press
Oxford University Press
The MIT Press
Oxford University Press
The MIT Press
Cambridge University Press
John Wiley & Sons, Inc.
The MIT Press
The MIT Press
John Wiley & Sons, Inc.

Extracting author last names from the papers:

As a lot of the sources from the “listDOIs” have missing author information, we will switch to the “doiList” defined in the previous section.

For[i = 1, i <= Length[doiList], i++,
 Print[crossref["WorkInformation", "DOI" -> doiList[[i]]]["Author"][[All, "Family"]]]]
{Thibault, Roe, Facelli, Cheatham}
{Steinberg, Russo, Frey}
{Kuhn, Neumann, Steinbeck, Wittekindt, Zielesny}
{Zhang, Zhang, Li, Wang, Zhang, Hou}
{Kru"ger, Gohlke}
{Rupp, Tkatchenko, Mu"ller, von Lilienfeld}
{Spjuth, Rydberg, Willighagen, Evelo, Jeliazkova}
{Barnard, Downs}
{Mavridis, Mitchell}
{Baumann, Baumann}

4. Crossref metadata visualization#

Let us try to visualize where research papers related to the subject of Tectonic Plates are published using a word cloud. We will start by generating a list of all the publishers.

To achieve that, we loop through a hundred calls to crossref, starting at a random index to get a new paper every time. A pause is also added to not exceed crossref’s call limits.

Note: run Off[DateObject::date] to suppress the common DateObject error (irrelevant to our current application).

publisherList = {};

For[i = 1, i <= 100, i++,
  {Pause[0.5]; 
   publisherName := crossref["WorksList", "Query" -> "Tectonic Plates", MaxItems -> 1,
       "StartIndex" -> RandomInteger[{1, 10000}]][[1, "Publisher"]];
   AppendTo[publisherList, publisherName]}];

publisherList
{US Geological Survey, Test accounts, Geological Society of America, 
 
>   Springer Science and Business Media LLC, CSIRO Publishing, John Wiley & Sons, Inc., 
 
>   BSI British Standards, Victoria University of Wellington Library, 
 
>   ASTM International, Elsevier, EAGE Publications, John Wiley & Sons, Ltd, 
 
>   American Association for the Advancement of Science (AAAS), AIP, Elsevier, Wiley, 
 
>   Mercator - Revista de Geografia da UFC, Pluto Press, I.B.Tauris, ASTM International, 
 
>   Elsevier, European Association of Geoscientists & Engineers, 
 
>   Simpkin & Marshall ... ;, Springer Netherlands, CRC Press, 
 
>   American Association of Petroleum Geologists, Elsevier, Elsevier, 
 
>   Oxford University Press, University of California Press, ASTM International, 
 
>   Elsevier, Geological Society of America, Oxbow Books, 
 
>   American Geophysical Union (AGU), CRC Press, 
 
>   Natural Resources Canada/CMSS/Information Management, Springer Berlin Heidelberg, 
 
>   US Geological Survey, John Wiley & Sons, Inc., Liverpool University Press, 
 
>   Palgrave Macmillan, BSI British Standards, University of Pennsylvania Press, Inc., 
 
>   Springer US, American Geophysical Union (AGU), Springer Singapore, Wiley, 
 
>   Canadian Museum of History, Springer Netherlands, 
 
>   American Association for the Advancement of Science (AAAS), Springer Netherlands, 
 
>   Elsevier, Copernicus GmbH, Geological Society of America, John Wiley & Sons, Inc., 
 
>   Elsevier BV, Geological Society of America, Elsevier, 
 
>   American Association of Petroleum Geologists AAPG/Datapages, Elsevier, 
 
>   The Littman Library of Jewish Civilization, Indonesian Petroleum Association (IPA), 
 
>   American Association of Petroleum Geologists AAPG/Datapages, Elsevier BV, 
 
>   MIT Press - Journals, Springer Science and Business Media LLC, 
 
>   Manchester University Press, Springer International Publishing, Elsevier, Wiley, 
 
>   American Association for the Advancement of Science (AAAS), Copernicus GmbH, 
 
>   University Press of Florida, West Virginia University Libraries, ASM Press, 
 
>   Elsevier, JSTOR, National Bureau of Standards, Springer Berlin Heidelberg, 
 
>   ASTM International, US Geological Survey, AIMS Press, Elsevier, 
 
>   American Geophysical Union (AGU), Elsevier, Informa UK Limited, 
 
>   Cambridge University Press (CUP), John Wiley & Sons, Inc., 
 
>   Geological Society of America, The University of Hong Kong Libraries, 
 
>   Wiley-VCH Verlag GmbH & Co. KGaA, European Association of Geoscientists & Engineers, 
 
>   Elsevier, University of California Press, Yale University Press, Routledge, 
 
>   Geological Society of America, Elsevier BV, Geological Society of America}
WordCloud[publisherList]
Output

Title word cloud for research from a particular professor:

titleList = {};    
For[i = 1, i <= 30, i++,
  {Pause[0.5];
   titleTemp := crossref["WorksList", "Query" -> "Aaleti, S.",
      MaxItems -> 1, "StartIndex" -> i, "SortBy" -> "Published"][[1,"Title"]];
   AppendTo[titleList, titleTemp]    
   }];
Flatten[titleList]
{Experimental Investigation of Surface Preparation on Normal and Ultrahigh-Performance\
 
>    Concrete Interface Behavior, Experimental Dynamic Testing of Full-Scale\
 
>    Light-Frame-CLT Wood Shear Wall System, 
 
>   Reinforced-Concrete Shear Walls Retrofitted Using Weakening and Self-Centering:\
 
>    Numerical Modeling, Seismic Retrofit of Reinforced Concrete Shear Walls to Ensure\
 
>    Reparability, Searching for Communities, 
 
>   An Experimental and Statistical Study of Normal Strength Concrete (NSC) to Ultra\
 
>    High Performance Concrete (UHPC) Interface Shear Behavior, 
 
>   Experimental Evaluation of Test Methods to Characterize Tensile Behavior of UHPC, 
 
>   Quantifying Bonding Characteristics between UHPC and Normal-Strength Concrete for\
 
>    Bridge Deck Application, Long-Span Hybrid Precast Concrete Bridge Girder Using\
 
>    Ultra-High-Performance Concrete and Normalweight Concrete, 
 
>   Numerical Model for Creep Behavior of Axially Loaded CLT Panels, 
 
>   Experimental investigation of bond behavior of mild steel reinforcement in UHPC, 
 
>   Seismic assessment of a three-story wood building with an integrated CLT-lightframe\
 
>    system using RTHS, Experimental and analytical investigation of end zone cracking\
 
>    in BT-78 girders, Hybrid System of Unbonded Post-Tensioned CLT Panels and\
 
>    Light-Frame Wood Shear Walls, Single precast concrete rocking walls as earthquake\
 
>    force-resisting elements, Design Optimization of Bridge Decks with Precast UHPC\
 
>    Waffle Panels, Experimental and Analytical Investigation of UHPC Pile-to-Abutment\
 
>    Connections, Development length of reinforcing bars in UHPC: An experimental and\
 
>    analytical investigation, Bridge Decks with Precast UHPC Waffle Panels: A Field\
 
>    Evaluation and Design Optimization, 
 
>   Precast concrete wall with end columns (PreWEC) for earthquake resistant design, 
 
>   Closure to “Cyclic Response of Reinforced Concrete Walls with Different Anchorage\
 
>    Details: Experimental Investigation” by Sriram Aaleti, Beth L. Brueggen, Benton\
 
>    Johnson, Catherine E. French, and Sri Sritharan, 
 
>   Discussion of “Cyclic Response of Reinforced Concrete Walls with Different Anchorage\
 
>    Details: Experimental Investigation” by Sriram Aaleti, Beth L. Brueggen, Benton\
 
>    Johnson, Catherine E. French, and Sri Sritharan, 
 
>   Design of Ultrahigh-Performance Concrete Waffle Deck for Accelerated Bridge\
 
>    Construction, Cyclic Response of Reinforced Concrete Walls with Different Anchorage\
 
>    Details: Experimental Investigation, 
 
>   Structural Behavior of Waffle Bridge Deck Panels and Connections of Precast\
 
>    Ultra-High-Performance Concrete, 
 
>   Concept and Finite-Element Modeling of New Steel Shear Connectors for Self-Centering\
 
>    Wall Systems, A simplified analysis method for characterizing unbonded\
 
>    post-tensioned precast wall systems, 
 
>   Tests of Structural Walls to Determine Deformation Contributions of Interest for\
 
>    Performance-Based Design, Seismic Analysis and Design of Precast Concrete Jointed\
 
>    Wall Systems, Behavior of rectangular concrete walls subjected to simulated seismic\
 
>    loading}
WordCloud[ToString[StringRiffle[Flatten[titleList], ""]]]
Output

We can also see the frequency of publication for the author using a histogram:

DatesList = {};
For[i = 1, i <= 30, i++,
  {Pause[0.5];
   DateTemp := crossref["WorksList", "Query" -> "Aaleti, S.", MaxItems -> 1, 
      "StartIndex" -> i][[1, "Published"]];
   AppendTo[DatesList, DateTemp]    
   }];
DatesList
Output
DateHistogram[Flatten[Values[DatesList], 2], "Year"]
Output