PubChem API in Python#

by Avery Fernandez

PubChem API Documentation: https://pubchemdocs.ncbi.nlm.nih.gov/programmatic-access

These recipe examples were tested on May 16, 2022.

Attribution: This tutorial was adapted from supporting information in:

Scalfani, V. F.; Ralph, S. C. Alshaikh, A. A.; Bara, J. E. Programmatic Compilation of Chemical Data and Literature From PubChem Using Matlab. Chemical Engineering Education, 2020, 54, 230. https://doi.org/10.18260/2-1-370.660-115508 and vfscalfani/MATLAB-cheminformatics)

Setup#

First, import libraries:

import requests
from pprint import pprint
from time import sleep

Define the PubChem PUG-REST API base URL:

api = 'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/'

1. PubChem Similarity#

Get compound image#

We can search for a compound and display an image, for example: 1-Butyl-3-methyl-imidazolium; CID = 2734162

# Request PNG from PubChem and save file
compoundID = "2734162"
img = requests.get(api + '/cid/' + compoundID + "/PNG").content
with open("2734162.png", "wb") as out:
    out.write(img)
# Display compound PNG with Matplotlib
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
img = mpimg.imread('2734162.png')
plt.imshow(img)
plt.show()
../../_images/f0ed322f87407d619ebd7488f09249c7688d6e856de3ae0b697639b14a483f84.png

Retrieve InChI and SMILES#

request = requests.get(api + 'cid/' + compoundID + '/property/inchi,IsomericSMILES/JSON').json()
pprint(request)
{'PropertyTable': {'Properties': [{'CID': 2734162,
                                   'InChI': 'InChI=1S/C8H15N2/c1-3-4-5-10-7-6-9(2)8-10/h6-8H,3-5H2,1-2H3/q+1',
                                   'IsomericSMILES': 'CCCCN1C=C[N+](=C1)C'}]}}
# Extract InChI
request["PropertyTable"]["Properties"][0]["InChI"]
'InChI=1S/C8H15N2/c1-3-4-5-10-7-6-9(2)8-10/h6-8H,3-5H2,1-2H3/q+1'
# Extract Isomeric SMILES
request["PropertyTable"]["Properties"][0]["IsomericSMILES"]
'CCCCN1C=C[N+](=C1)C'

Retrieve Identifier and Property Data#

Get the following data for the retrieved CIDs (idList): InChI, Isomeric SMILES, MW, Heavy Atom Count, Rotable Bond Count, and Charge

api = 'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/'
compoundDictionary = []
for cid in idList:
    request = requests.get(api + 'cid/' + str(cid) + "/property/InChI,IsomericSMILES,MolecularWeight,HeavyAtomCount,RotatableBondCount,Charge/JSON").json()
    compoundDictionary.append(request["PropertyTable"]["Properties"][0])
    sleep(1)
len(compoundDictionary)
297
pprint(compoundDictionary[0:5])
[{'CID': 529334,
  'Charge': 0,
  'HeavyAtomCount': 10,
  'InChI': 'InChI=1S/C8H14N2/c1-2-3-4-6-10-7-5-9-8-10/h5,7-8H,2-4,6H2,1H3',
  'IsomericSMILES': 'CCCCCN1C=CN=C1',
  'MolecularWeight': '138.21',
  'RotatableBondCount': 4},
 {'CID': 304622,
  'Charge': 0,
  'HeavyAtomCount': 10,
  'InChI': 'InChI=1S/C8H14N2/c1-3-4-6-10-7-5-9-8(10)2/h5,7H,3-4,6H2,1-2H3',
  'IsomericSMILES': 'CCCCN1C=CN=C1C',
  'MolecularWeight': '138.21',
  'RotatableBondCount': 3},
 {'CID': 118785,
  'Charge': 0,
  'HeavyAtomCount': 8,
  'InChI': 'InChI=1S/C6H10N2/c1-2-4-8-5-3-7-6-8/h3,5-6H,2,4H2,1H3',
  'IsomericSMILES': 'CCCN1C=CN=C1',
  'MolecularWeight': '110.16',
  'RotatableBondCount': 2},
 {'CID': 61347,
  'Charge': 0,
  'HeavyAtomCount': 9,
  'InChI': 'InChI=1S/C7H12N2/c1-2-3-5-9-6-4-8-7-9/h4,6-7H,2-3,5H2,1H3',
  'IsomericSMILES': 'CCCCN1C=CN=C1',
  'MolecularWeight': '124.18',
  'RotatableBondCount': 3},
 {'CID': 12971008,
  'Charge': 0,
  'HeavyAtomCount': 10,
  'InChI': 'InChI=1S/C7H13N2.HI/c1-3-4-9-6-5-8(2)7-9;/h5-7H,3-4H2,1-2H3;1H/q+1;/p-1',
  'IsomericSMILES': 'CCCN1C=C[N+](=C1)C.[I-]',
  'MolecularWeight': '252.10',
  'RotatableBondCount': 2}]

Data Table#

We can display the dictionary as a data table, but we will only do this for the first 25:

# numbers in print statement indicate amount of space used
print ("{:<10} {:<8} {:<16} {:<35} {:<40} {:<18} {:<4} ".format("CID", "Charge", "HeavyAtomCount", "InChI", "IsomericSMILES", "MolecularWeight", "RotatableBondCount"))
for compound in compoundDictionary[0:25]:
    cid = compound["CID"]
    charge = compound["Charge"]
    heavyAtom = compound["HeavyAtomCount"]
    inchi = compound["InChI"][0:30] + "..." # only display first 30 characters of InChI
    isomeric = compound["IsomericSMILES"]
    molecular = compound["MolecularWeight"]
    rotatable = compound["RotatableBondCount"]
    print ("{:<10} {:<8} {:<16} {:<35} {:<40} {:<18} {:<4} ".format(cid, charge, heavyAtom, inchi, isomeric, molecular, rotatable))
CID        Charge   HeavyAtomCount   InChI                               IsomericSMILES                           MolecularWeight    RotatableBondCount 
529334     0        10               InChI=1S/C8H14N2/c1-2-3-4-6-10...   CCCCCN1C=CN=C1                           138.21             4    
304622     0        10               InChI=1S/C8H14N2/c1-3-4-6-10-7...   CCCCN1C=CN=C1C                           138.21             3    
118785     0        8                InChI=1S/C6H10N2/c1-2-4-8-5-3-...   CCCN1C=CN=C1                             110.16             2    
61347      0        9                InChI=1S/C7H12N2/c1-2-3-5-9-6-...   CCCCN1C=CN=C1                            124.18             3    
12971008   0        10               InChI=1S/C7H13N2.HI/c1-3-4-9-6...   CCCN1C=C[N+](=C1)C.[I-]                  252.10             2    
11448496   0        11               InChI=1S/C8H15N2.HI/c1-3-4-5-1...   CCCCN1C=C[N+](=C1)C.[I-]                 266.12             3    
11424151   0        13               InChI=1S/C8H15N2.CHNS/c1-3-4-5...   CCCCN1C=C[N+](=C1)C.C(#N)[S-]            197.30             3    
11171745   0        15               InChI=1S/C8H15N2.C2N3/c1-3-4-5...   CCCCN1C=C[N+](=C1)C.C(=[N-])=NC#N        205.26             3    
11160028   0        10               InChI=1S/C7H13N2.BrH/c1-3-4-9-...   CCCN1C=C[N+](=C1)C.[Br-]                 205.10             2    
2734236    0        11               InChI=1S/C8H15N2.BrH/c1-3-4-5-...   CCCCN1C=C[N+](=C1)C.[Br-]                219.12             3    
2734162    1        10               InChI=1S/C8H15N2/c1-3-4-5-10-7...   CCCCN1C=C[N+](=C1)C                      139.22             3    
2734161    0        11               InChI=1S/C8H15N2.ClH/c1-3-4-5-...   CCCCN1C=C[N+](=C1)C.[Cl-]                174.67             3    
11245926   0        13               InChI=1S/C8H15N2.Br2.BrH/c1-3-...   CCCCN1C=C[N+](=C1)C.[Br-].BrBr           378.93             3    
53384410   0        13               InChI=1S/C8H15N2.Br3/c1-3-4-5-...   CCCCN1C=C[N+](=C1)C.Br[Br-]Br            378.93             3    
11788435   0        11               InChI=1S/C8H15N2.H2O/c1-3-4-5-...   CCCCN1C=C[N+](=C1)C.[OH-]                156.23             3    
5245884    1        9                InChI=1S/C7H13N2/c1-3-4-9-6-5-...   CCCN1C=C[N+](=C1)C                       125.19             2    
2734168    1        11               InChI=1S/C9H17N2/c1-4-5-6-11-8...   CCCCN1C=C[N+](=C1C)C                     153.24             3    
139254006  0        12               InChI=1S/C9H15N2.HI/c1-3-5-6-1...   CCCC[N+]1=CN(C=C1)C=C.[I-]               278.13             4    
91983981   -2       13               InChI=1S/C8H15N2.3BrH/c1-3-4-5...   CCCCN1C=C[N+](=C1)C.[Br-].[Br-].[Br-]    378.93             3    
87560886   0        12               InChI=1S/C9H15N2.BrH/c1-3-5-6-...   CCCC[N+]1=CN(C=C1)C=C.[Br-]              231.13             4    
87559770   0        12               InChI=1S/C9H15N2.ClH/c1-3-5-6-...   CCCC[N+]1=CN(C=C1)C=C.[Cl-]              186.68             4    
11448364   0        14               InChI=1S/C11H21N2.BrH/c1-3-5-7...   CCCCN1C=C[N+](=C1)CCCC.[Br-]             261.20             6    
10537570   1        11               InChI=1S/C9H17N2/c1-3-4-5-6-11...   CCCCCN1C=C[N+](=C1)C                     153.24             4    
10154187   0        10               InChI=1S/C7H13N2.ClH/c1-3-4-9-...   CCCN1C=C[N+](=C1)C.[Cl-]                 160.64             2    
141109628  0        10               InChI=1S/C7H11FN2/c1-2-3-5-10-...   CCCCN1C=CN=C1F                           142.17             3