CAS Common Chemistry API in Matlab#

by Anastasia Ramig

CAS Common Chemistry API Documentation (requires registration): https://www.cas.org/services/commonchemistry-api

These recipe examples were tested on April 21, 2022 in MATLAB R2021a.

Attribution: This tutorial uses the CAS Common Chemistry API. Example data shown is licensed under the CC BY-NC 4.0 license.

1. Common Chemistry Record Detail Retrieval#

Information about substances in CAS Common Chemistry can be retrieved using the /detail API and a CAS RN identifier.

Setup API parameters#

detail_base_url = "https://commonchemistry.cas.org/api/detail?";
options = weboptions('Timeout', 30);
casrn1 = "10094-36-7"; % ethyl cyclohexanepropionate

Request data from CAS Common Chemistry Detail API#

casrn1_data = webread(detail_base_url + "cas_rn=" + casrn1, options)

Output:

casrn1_data = struct with fields:
                    uri: 'substance/pt/10094367'
                     rn: '10094-36-7'
                   name: 'Ethyl cyclohexanepropionate'
                  image: '<svg width="228.6" viewBox="0 0 7620 3716" text-rendering="auto" stroke-width="1" stroke-opacity="1" stroke-miterlimit="10" stroke-linejoin="miter" stroke-linecap="square" stroke-dashoffset="0" stroke-dasharray="none" stroke="black" shape-rendering="auto" image-rendering="auto" height="111.48" font-weight="normal" font-style="normal" font-size="12" font-family="'Dialog'" fill-opacity="1" fill="black" color-rendering="auto" color-interpolation="auto" xmlns="http://www.w3.org/2000/svg"><g><g stroke="white" fill="white"><rect y="0" x="0" width="7620" stroke="none" height="3716"/></g><g transform="translate(32866,32758)" text-rendering="geometricPrecision" stroke-width="44" stroke-linejoin="round" stroke-linecap="round"><line y2="-30850" y1="-31419" x2="-30792" x1="-31777" fill="none"/><line y2="-29715" y1="-30850" x2="-30792" x1="-30792" fill="none"/><line y2="-31419" y1="-30850" x2="-31777" x1="-32762" fill="none"/><line y2="-29146" y1="-29715" x2="-31777" x1="-30792" fill="none"/><line y2="-30850" y1="-29715" x2="-32762" x1="-32762" fill="none"/><line y2="-29715" y1="-29146" x2="-32762" x1="-31777" fill="none"/><line y2="-31376" y1="-30850" x2="-29885" x1="-30792" fill="none"/><line y2="-30850" y1="-31376" x2="-28978" x1="-29885" fill="none"/><line y2="-31376" y1="-30850" x2="-28071" x1="-28978" fill="none"/><line y2="-30960" y1="-31376" x2="-27352" x1="-28071" fill="none"/><line y2="-31376" y1="-30960" x2="-26257" x1="-26976" fill="none"/><line y2="-30850" y1="-31376" x2="-25350" x1="-26257" fill="none"/><line y2="-32202" y1="-31376" x2="-28140" x1="-28140" fill="none"/><line y2="-32202" y1="-31376" x2="-28002" x1="-28002" fill="none"/><text y="-30671" xml:space="preserve" x="-27317" stroke="none" font-size="433.3333" font-family="sans-serif">O</text><text y="-32242" xml:space="preserve" x="-28224" stroke="none" font-size="433.3333" font-family="sans-serif">O</text></g></g></svg>'
                  inchi: 'InChI=1S/C11H20O2/c1-2-13-11(12)9-8-10-6-4-3-5-7-10/h10H,2-9H2,1H3'
               inchiKey: 'InChIKey=NRVPMFHPHGBQLP-UHFFFAOYSA-N'
                  smile: 'C(CC(OCC)=O)C1CCCCC1'
         canonicalSmile: 'O=C(OCC)CCC1CCCCC1'
       molecularFormula: 'C<sub>11</sub>H<sub>20</sub>O<sub>2</sub>'
          molecularMass: '184.28'
 experimentalProperties: [1×1 struct]
      propertyCitations: [1×1 struct]
               synonyms: {9×1 cell}
            replacedRns: []
             hasMolfile: 1

Select some specific data#

%% Get Experimental Properties
casrn1_data.experimentalProperties

Output:

ans = struct with fields:
         name: 'Boiling Point'
     property: '105-113 °C @ Press: 17 Torr'
 sourceNumber: 1
%% Get Boiling Point property
casrn1_data.experimentalProperties.property

Output:

ans = '105-113 °C @ Press: 17 Torr'
%% Get InChIKey
casrn1_data.inchiKey

Output:

ans = 'InChIKey=NRVPMFHPHGBQLP-UHFFFAOYSA-N'
%% Get Canonical Smiles
casrn1_data.canonicalSmile

Output:

ans = 'O=C(OCC)CCC1CCCCC1'

2. Common Chemistry API record detail retrieval in a loop#

Setup API parameters#

detail_base_url = "https://commonchemistry.cas.org/api/detail?";
casrn_list = ["10094-36-7", "10031-92-2", "10199-61-8", "10036-21-2", "1019020-13-3"];

Request data for each CAS RN and save to an array#

casrn_data = cell(1,length(casrn_list)); % preallocate
for c = 1:length(casrn_list)
    casrn = casrn_list(c);
    casrn_data{c} = webread(detail_base_url + "cas_rn=" + casrn);
    pause(1); %% add a delay between API calls
end
disp(casrn_data{1, 1}) %% pull out the data for the first value

Output:

                   uri: 'substance/pt/10094367'
                    rn: '10094-36-7'
                  name: 'Ethyl cyclohexanepropionate'
                 image: '<svg width="228.6" viewBox="0 0 7620 3716" text-rendering="auto" stroke-width="1" stroke-opacity="1" stroke-miterlimit="10" stroke-linejoin="miter" stroke-linecap="square" stroke-dashoffset="0" stroke-dasharray="none" stroke="black" shape-rendering="auto" image-rendering="auto" height="111.48" font-weight="normal" font-style="normal" font-size="12" font-family="'Dialog'" fill-opacity="1" fill="black" color-rendering="auto" color-interpolation="auto" xmlns="http://www.w3.org/2000/svg"><g><g stroke="white" fill="white"><rect y="0" x="0" width="7620" stroke="none" height="3716"/></g><g transform="translate(32866,32758)" text-rendering="geometricPrecision" stroke-width="44" stroke-linejoin="round" stroke-linecap="round"><line y2="-30850" y1="-31419" x2="-30792" x1="-31777" fill="none"/><line y2="-29715" y1="-30850" x2="-30792" x1="-30792" fill="none"/><line y2="-31419" y1="-30850" x2="-31777" x1="-32762" fill="none"/><line y2="-29146" y1="-29715" x2="-31777" x1="-30792" fill="none"/><line y2="-30850" y1="-29715" x2="-32762" x1="-32762" fill="none"/><line y2="-29715" y1="-29146" x2="-32762" x1="-31777" fill="none"/><line y2="-31376" y1="-30850" x2="-29885" x1="-30792" fill="none"/><line y2="-30850" y1="-31376" x2="-28978" x1="-29885" fill="none"/><line y2="-31376" y1="-30850" x2="-28071" x1="-28978" fill="none"/><line y2="-30960" y1="-31376" x2="-27352" x1="-28071" fill="none"/><line y2="-31376" y1="-30960" x2="-26257" x1="-26976" fill="none"/><line y2="-30850" y1="-31376" x2="-25350" x1="-26257" fill="none"/><line y2="-32202" y1="-31376" x2="-28140" x1="-28140" fill="none"/><line y2="-32202" y1="-31376" x2="-28002" x1="-28002" fill="none"/><text y="-30671" xml:space="preserve" x="-27317" stroke="none" font-size="433.3333" font-family="sans-serif">O</text><text y="-32242" xml:space="preserve" x="-28224" stroke="none" font-size="433.3333" font-family="sans-serif">O</text></g></g></svg>'
                 inchi: 'InChI=1S/C11H20O2/c1-2-13-11(12)9-8-10-6-4-3-5-7-10/h10H,2-9H2,1H3'
              inchiKey: 'InChIKey=NRVPMFHPHGBQLP-UHFFFAOYSA-N'
                 smile: 'C(CC(OCC)=O)C1CCCCC1'
        canonicalSmile: 'O=C(OCC)CCC1CCCCC1'
      molecularFormula: 'C<sub>11</sub>H<sub>20</sub>O<sub>2</sub>'
         molecularMass: '184.28'
experimentalProperties: [1×1 struct]
     propertyCitations: [1×1 struct]
              synonyms: {9×1 cell}
           replacedRns: []
            hasMolfile: 1

Select some specific data#

%% Get canonical SMILES
cansmiles = cell(1,length(casrn_data));
for s = 1:length(casrn_data)
    smilesnew = string(casrn_data{1, s}.canonicalSmile);
    cansmiles{s} = smilesnew;
    pause(1);
end
disp(cansmiles)

Output:

Columns 1 through 3

 {["O=C(OCC)CCC1CCCCC1"]}    {["O=C(C#CCCCCCC)OCC"]}    {["O=C(OCC)CN1N=CC=C1"]}

Columns 4 through 5

 {["O=C(OCC)C1=CC=CC(=C1)CCC(=O)OCC"]}    {["N=C(OCC)C1=CCCCC1"]}
%% Get synonyms
synonyms_list = cell(1,length(casrn_data));
for syn = 1:length(casrn_data)
    synonyms_list{syn} = casrn_data{1, syn}.synonyms;
    pause(1);
    synonyms_list{syn}
end

Output:

ans = 9×1 cell
'Cyclohexanepropanoic acid, …
'Cyclohexanepropionic acid, 'Ethyl cyclohexanepropionate'
'Ethyl cyclohexylpropanoate'
'Ethyl 3-cyclohexylpropionate'
'Ethyl 3-cyclohexylpropanoate'
'3-Cyclohexylpropionic acid …
'NSC 71463'
'Ethyl 3-cyclohexanepropionate'
ans = 3×1 cell
'2-Nonynoic acid, ethyl ester'
'Ethyl 2-nonynoate'
'NSC 190985'
ans = 5×1 cell
'1<em>H</em>-Pyrazole-1-acet…
'Pyrazole-1-acetic acid, eth'Ethyl 1<em>H</em>-pyrazole-…
'Ethyl 1-pyrazoleacetate'
'Ethyl 2-(1<em>H</em>-pyrazo…
ans = 3×1 cell
'Benzenepropanoic acid, 3-(e'Hydrocinnamic acid, <em>m</…
'Ethyl 3-(ethoxycarbonyl)benans = 2×1 cell
'1-Cyclohexene-1-carboximidi…
'Ethyl 1-cyclohexene-1-carbo