PubChem API in Mathematica#

by Vishank Patel

PubChem API Documentation: https://pubchemdocs.ncbi.nlm.nih.gov/programmatic-access

Mathematica PubChem documentation: https://reference.wolfram.com/language/ref/service/PubChem.html

These recipe examples were tested on March 30, 2022.

Attribution: This tutorial was adapted from supporting information in:

Scalfani, V. F.; Ralph, S. C. Alshaikh, A. A.; Bara, J. E. Programmatic Compilation of Chemical Data and Literature From PubChem Using Matlab. Chemical Engineering Education, 2020, 54, 230. https://doi.org/10.18260/2-1-370.660-115508 and vfscalfani/MATLAB-cheminformatics)

Setup#

Establish the Mathematica PubChem connection:

pubchem = ServiceConnect["PubChem"]
Output

1. PubChem Similarity#

Search for chemical structures in PubChem via a Fingerprint Tanimoto Similarity Search.

Get compound image#

compoundID = "2734162";

pubchem["CompoundImage", {"CompoundID" -> compoundID}] 
(*Replace the above CompoundID value to customize*)
Output

Retrieve InChI and SMILES#

compProperties = pubchem["CompoundProperties", {"CompoundID" -> compoundID}][[1]]
(*Mathematica's output is a list of associations, storing the first
element of the list (the needed output) helps query the data better*)
Output
compProperties //OutputForm  (*Changed to plain text output*)
Dataset[<|CompoundID -> 2734162, MolecularFormula -> C8H15N2+, 
 
>    MolecularWeight -> 139.22 grams per mole, CanonicalSMILES -> CCCCN1C=C[N+](=C1)C, 
 
>    IsomericSMILES -> CCCCN1C=C[N+](=C1)C, 
 
>    InChI -> InChI=1S/C8H15N2/c1-3-4-5-10-7-6-9(2)8-10/h6-8H,3-5H2,1-2H3/q+1, 
 
>    InChIKey -> IQQRAVYLUAZUGX-UHFFFAOYSA-N, 
 
>    IUPACName -> 1-butyl-3-methylimidazol-3-ium, XLogP -> 1.3, 
 
>    ExactMass -> 139.123523487 grams per mole, 
 
>    MonoisotopicMass -> 139.123523487 grams per mole, TPSA -> 8.8, Complexity -> 93, 
 
>    Charge -> 1, HBondDonorCount -> 0, HBondAcceptorCount -> 0, 
 
>    RotatableBondCount -> 3, HeavyAtomCount -> 10, IsotopeAtomCount -> 0, 
 
>    AtomStereoCount -> 0, DefinedAtomStereoCount -> 0, UndefinedAtomStereoCount -> 0, 
 
>    BondStereoCount -> 0, DefinedBondStereoCount -> 0, UndefinedBondStereoCount -> 0, 
 
>    CovalentUnitCount -> 1, Volume3D -> 121.3, XStericQuadrupole3D -> 4.97, 
 
>    YStericQuadrupole3D -> 1.63, ZStericQuadrupole3D -> 0.91, FeatureCount3D -> 3, 
 
>    FeatureAcceptorCount3D -> 0, FeatureDonorCount3D -> 0, FeatureAnionCount3D -> 0, 
 
>    FeatureCationCount3D -> 1, FeatureRingCount3D -> 1, FeatureHydrophobeCount3D -> 1, 
 
>    ConformerModelRMSD3D -> 0.6, EffectiveRotorCount3D -> 3, ConformerCount3D -> 10, 
 
>    Fingerprint2D -> 
 
>     AAADccBzAAAAAAAAAAAAAAAAAAAAAWAAAAAAAAAAAAAAAAABgAAAHAAAAAAACADBAgQvkBcMEACgABAnZA\
 
>      AAgC0REqAJQAAYMACASAAAiAAUAAAIAAKAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==|>, 
 
>   TypeSystem`Assoc[TypeSystem`Atom[String], TypeSystem`AnyType, 41], <||>]

To extract the properties:

compProperties["InChI"]
InChI=1S/C8H15N2/c1-3-4-5-10-7-6-9(2)8-10/h6-8H,3-5H2,1-2H3/q+1
compProperties["IsomericSMILES"]
CCCCN1C=C[N+](=C1)C