CAS Common Chemistry API in Mathematica#
by Vishank Patel
These recipe examples were tested on March 31, 2022 using Mathematica 12.3.
CAS Common Chemistry API Documentation (requires registration): https://www.cas.org/services/commonchemistry-api
Attribution: This tutorial uses the CAS Common Chemistry API. Example data shown is licensed under the CC BY-NC 4.0 license.
1. Common Chemistry Record Detail Retrieval#
Information about substances in CAS Common Chemistry can be retrieved using the /detail
API and a CAS RN identifier:
Setup API Parameters#
detailBaseURL = "https://commonchemistry.cas.org/api/detail?";
casrn1 = "10094-36-7" ; (*ethyl cyclohexanepropionate*)
Request data from CAS Common Chemistry Detail API#
casrn1Data = Import[detailBaseURL <> "cas_rn=" <> casrn1, "RawJSON"];
casrn1Data
Output not shown
casrn1Data //OutputForm (*Changed to plain text output*)
<|uri -> substance/pt/10094367, rn -> 10094-36-7, name -> Ethyl cyclohexanepropionate, > image -> <svg width="228.6" viewBox="0 0 7620 3716" text-rendering="auto"\ > stroke-width="1" stroke-opacity="1" stroke-miterlimit="10" stroke-linejoin="miter"\ > stroke-linecap="square" stroke-dashoffset="0" stroke-dasharray="none"\ > stroke="black" shape-rendering="auto" image-rendering="auto" height="111.48"\ > font-weight="normal" font-style="normal" font-size="12" font-family="'Dialog'"\ > fill-opacity="1" fill="black" color-rendering="auto" color-interpolation="auto"\ > xmlns="http://www.w3.org/2000/svg"><g><g stroke="white" fill="white"><rect y="0"\ > x="0" width="7620" stroke="none" height="3716"/></g><g\ > transform="translate(32866,32758)" text-rendering="geometricPrecision"\ > stroke-width="44" stroke-linejoin="round" stroke-linecap="round"><line y2="-30850"\ > y1="-31419" x2="-30792" x1="-31777" fill="none"/><line y2="-29715" y1="-30850"\ > x2="-30792" x1="-30792" fill="none"/><line y2="-31419" y1="-30850" x2="-31777"\ > x1="-32762" fill="none"/><line y2="-29146" y1="-29715" x2="-31777" x1="-30792"\ > fill="none"/><line y2="-30850" y1="-29715" x2="-32762" x1="-32762"\ > fill="none"/><line y2="-29715" y1="-29146" x2="-32762" x1="-31777"\ > fill="none"/><line y2="-31376" y1="-30850" x2="-29885" x1="-30792"\ > fill="none"/><line y2="-30850" y1="-31376" x2="-28978" x1="-29885"\ > fill="none"/><line y2="-31376" y1="-30850" x2="-28071" x1="-28978"\ > fill="none"/><line y2="-30960" y1="-31376" x2="-27352" x1="-28071"\ > fill="none"/><line y2="-31376" y1="-30960" x2="-26257" x1="-26976"\ > fill="none"/><line y2="-30850" y1="-31376" x2="-25350" x1="-26257"\ > fill="none"/><line y2="-32202" y1="-31376" x2="-28140" x1="-28140"\ > fill="none"/><line y2="-32202" y1="-31376" x2="-28002" x1="-28002"\ > fill="none"/><text y="-30671" xml:space="preserve" x="-27317" stroke="none"\ > font-size="433.3333" font-family="sans-serif">O</text><text y="-32242"\ > xml:space="preserve" x="-28224" stroke="none" font-size="433.3333"\ > font-family="sans-serif">O</text></g></g></svg>, > inchi -> InChI=1S/C11H20O2/c1-2-13-11(12)9-8-10-6-4-3-5-7-10/h10H,2-9H2,1H3, > inchiKey -> InChIKey=NRVPMFHPHGBQLP-UHFFFAOYSA-N, smile -> C(CC(OCC)=O)C1CCCCC1, > canonicalSmile -> O=C(OCC)CCC1CCCCC1, > molecularFormula -> C<sub>11</sub>H<sub>20</sub>O<sub>2</sub>, > molecularMass -> 184.28, experimentalProperties -> > {<|name -> Boiling Point, property -> 105-113 °C @ Press: 17 Torr, > sourceNumber -> 1|>}, propertyCitations -> > {<|docUri -> document/pt/document/22252593, sourceNumber -> 1, > source -> De Benneville, Peter L.; Journal of the American Chemical Society,\ > (1940), 62, 283-7, CAplus|>}, > synonyms -> {Cyclohexanepropanoic acid, ethyl ester, > Cyclohexanepropionic acid, ethyl ester, Ethyl cyclohexanepropionate, > Ethyl cyclohexylpropanoate, Ethyl 3-cyclohexylpropionate, > Ethyl 3-cyclohexylpropanoate, 3-Cyclohexylpropionic acid ethyl ester, NSC 71463, > Ethyl 3-cyclohexanepropionate}, replacedRns -> {}, hasMolfile -> True|>
Display the Molecule Drawing#
MoleculePlot[casrn1Data["smile"]]
Select some specific data#
Get experimental properties:
casrn1Data["experimentalProperties"][[1]]
<|name -> Boiling Point, property -> 105-113 °C @ Press: 17 Torr, sourceNumber -> 1|>
Get boiling point property
casrn1Data["experimentalProperties"][[1]]["property"]
105-113 °C @ Press: 17 Torr
Get InChIKey
casrn1Data["inchiKey"]
InChIKey=NRVPMFHPHGBQLP-UHFFFAOYSA-N
casrn1Data["canonicalSmile"]
O=C(OCC)CCC1CCCCC1
2. Common Chemistry API record detail retrieval in a loop#
Setup API parameters#
detailBaseURLLoop = "https://commonchemistry.cas.org/api/detail?";
casrnList = {"10094-36-7", "10031-92-2", "10199-61-8", "10036-21-2","1019020-13-3"};
Request data for each CAS RN and save to a list#
casrnData = {};
For[i = 1, i <= Length[casrnList], i++,
AppendTo[casrnData,
Import[detailBaseURL <> "cas_rn=" <> casrnList[[i]], "RawJSON"]]
]
casrnData[[1]]
Output not shown
casrnData[[1]] //OutputForm (*Changed to plain text output*)
<|uri -> substance/pt/10094367, rn -> 10094-36-7, name -> Ethyl cyclohexanepropionate, > image -> <svg width="228.6" viewBox="0 0 7620 3716" text-rendering="auto"\ > stroke-width="1" stroke-opacity="1" stroke-miterlimit="10" stroke-linejoin="miter"\ > stroke-linecap="square" stroke-dashoffset="0" stroke-dasharray="none"\ > stroke="black" shape-rendering="auto" image-rendering="auto" height="111.48"\ > font-weight="normal" font-style="normal" font-size="12" font-family="'Dialog'"\ > fill-opacity="1" fill="black" color-rendering="auto" color-interpolation="auto"\ > xmlns="http://www.w3.org/2000/svg"><g><g stroke="white" fill="white"><rect y="0"\ > x="0" width="7620" stroke="none" height="3716"/></g><g\ > transform="translate(32866,32758)" text-rendering="geometricPrecision"\ > stroke-width="44" stroke-linejoin="round" stroke-linecap="round"><line y2="-30850"\ > y1="-31419" x2="-30792" x1="-31777" fill="none"/><line y2="-29715" y1="-30850"\ > x2="-30792" x1="-30792" fill="none"/><line y2="-31419" y1="-30850" x2="-31777"\ > x1="-32762" fill="none"/><line y2="-29146" y1="-29715" x2="-31777" x1="-30792"\ > fill="none"/><line y2="-30850" y1="-29715" x2="-32762" x1="-32762"\ > fill="none"/><line y2="-29715" y1="-29146" x2="-32762" x1="-31777"\ > fill="none"/><line y2="-31376" y1="-30850" x2="-29885" x1="-30792"\ > fill="none"/><line y2="-30850" y1="-31376" x2="-28978" x1="-29885"\ > fill="none"/><line y2="-31376" y1="-30850" x2="-28071" x1="-28978"\ > fill="none"/><line y2="-30960" y1="-31376" x2="-27352" x1="-28071"\ > fill="none"/><line y2="-31376" y1="-30960" x2="-26257" x1="-26976"\ > fill="none"/><line y2="-30850" y1="-31376" x2="-25350" x1="-26257"\ > fill="none"/><line y2="-32202" y1="-31376" x2="-28140" x1="-28140"\ > fill="none"/><line y2="-32202" y1="-31376" x2="-28002" x1="-28002"\ > fill="none"/><text y="-30671" xml:space="preserve" x="-27317" stroke="none"\ > font-size="433.3333" font-family="sans-serif">O</text><text y="-32242"\ > xml:space="preserve" x="-28224" stroke="none" font-size="433.3333"\ > font-family="sans-serif">O</text></g></g></svg>, > inchi -> InChI=1S/C11H20O2/c1-2-13-11(12)9-8-10-6-4-3-5-7-10/h10H,2-9H2,1H3, > inchiKey -> InChIKey=NRVPMFHPHGBQLP-UHFFFAOYSA-N, smile -> C(CC(OCC)=O)C1CCCCC1, > canonicalSmile -> O=C(OCC)CCC1CCCCC1, > molecularFormula -> C<sub>11</sub>H<sub>20</sub>O<sub>2</sub>, > molecularMass -> 184.28, experimentalProperties -> > {<|name -> Boiling Point, property -> 105-113 °C @ Press: 17 Torr, > sourceNumber -> 1|>}, propertyCitations -> > {<|docUri -> document/pt/document/22252593, sourceNumber -> 1, > source -> De Benneville, Peter L.; Journal of the American Chemical Society,\ > (1940), 62, 283-7, CAplus|>}, > synonyms -> {Cyclohexanepropanoic acid, ethyl ester, > Cyclohexanepropionic acid, ethyl ester, Ethyl cyclohexanepropionate, > Ethyl cyclohexylpropanoate, Ethyl 3-cyclohexylpropionate, > Ethyl 3-cyclohexylpropanoate, 3-Cyclohexylpropionic acid ethyl ester, NSC 71463, > Ethyl 3-cyclohexanepropionate}, replacedRns -> {}, hasMolfile -> True|>
plotList = {};
For[i = 1, i <= Length[casrnList], i++,
AppendTo[plotList, MoleculePlot[casrnData[[i]]["smile"]]]
]
plotList
Select some specific data#
Get canonical SMILES
cansmiles = {};
For[i = 1, i <= Length[casrnList], i++,
AppendTo[cansmiles,
casrnData[[i]]["canonicalSmile"]]
]
cansmiles
{O=C(OCC)CCC1CCCCC1, O=C(C#CCCCCCC)OCC, O=C(OCC)CN1N=CC=C1, > O=C(OCC)C1=CC=CC(=C1)CCC(=O)OCC, N=C(OCC)C1=CCCCC1}
synonymsList = {};
For[i = 1, i <= Length[casrnList], i++,
AppendTo[synonymsList,
casrnData[[i]]["synonyms"]]
]
synonymsList
{{Cyclohexanepropanoic acid, ethyl ester, Cyclohexanepropionic acid, ethyl ester, > Ethyl cyclohexanepropionate, Ethyl cyclohexylpropanoate, > Ethyl 3-cyclohexylpropionate, Ethyl 3-cyclohexylpropanoate, > 3-Cyclohexylpropionic acid ethyl ester, NSC 71463, Ethyl 3-cyclohexanepropionate}, > {2-Nonynoic acid, ethyl ester, Ethyl 2-nonynoate, NSC 190985}, > {1<em>H</em>-Pyrazole-1-acetic acid, ethyl ester, > Pyrazole-1-acetic acid, ethyl ester, Ethyl 1<em>H</em>-pyrazole-1-acetate, > Ethyl 1-pyrazoleacetate, Ethyl 2-(1<em>H</em>-pyrazol-1-yl)acetate}, > {Benzenepropanoic acid, 3-(ethoxycarbonyl)-, ethyl ester, > Hydrocinnamic acid, <em>m</em>-carboxy-, diethyl ester, > Ethyl 3-(ethoxycarbonyl)benzenepropanoate}, > {1-Cyclohexene-1-carboximidic acid, ethyl ester, > Ethyl 1-cyclohexene-1-carboximidate}}
Transform synonym “list of lists” to a flat list
Flatten[synonymsList]
{Cyclohexanepropanoic acid, ethyl ester, Cyclohexanepropionic acid, ethyl ester, > Ethyl cyclohexanepropionate, Ethyl cyclohexylpropanoate, > Ethyl 3-cyclohexylpropionate, Ethyl 3-cyclohexylpropanoate, > 3-Cyclohexylpropionic acid ethyl ester, NSC 71463, Ethyl 3-cyclohexanepropionate, > 2-Nonynoic acid, ethyl ester, Ethyl 2-nonynoate, NSC 190985, > 1<em>H</em>-Pyrazole-1-acetic acid, ethyl ester, > Pyrazole-1-acetic acid, ethyl ester, Ethyl 1<em>H</em>-pyrazole-1-acetate, > Ethyl 1-pyrazoleacetate, Ethyl 2-(1<em>H</em>-pyrazol-1-yl)acetate, > Benzenepropanoic acid, 3-(ethoxycarbonyl)-, ethyl ester, > Hydrocinnamic acid, <em>m</em>-carboxy-, diethyl ester, > Ethyl 3-(ethoxycarbonyl)benzenepropanoate, > 1-Cyclohexene-1-carboximidic acid, ethyl ester, Ethyl 1-cyclohexene-1-carboximidate}
Create a dataset#
Table[casrnData[[All, i]], {i, {"uri", "rn", "name"}}] // Dataset
Dataset[{{substance/pt/10094367, substance/pt/10031922, substance/pt/10199618, > substance/pt/10036212, substance/pt/1019020133}, > {10094-36-7, 10031-92-2, 10199-61-8, 10036-21-2, 1019020-13-3}, > {Ethyl cyclohexanepropionate, Ethyl 2-nonynoate, > Ethyl 1<em>H</em>-pyrazole-1-acetate, Ethyl 3-(ethoxycarbonyl)benzenepropanoate, > Ethyl 1-cyclohexene-1-carboximidate}}, > TypeSystem`Vector[TypeSystem`Vector[TypeSystem`Atom[String], 5], 3], <||>]
3. Common Chemistry Search#
In addition to the /detail
API, the CAS Common Chemistry API has a /search
method that allows searching by CAS RN, SMILES, InChI/InChIKey, and name.
Setup API Parameters#
searchBaseURL = "https://commonchemistry.cas.org/api/search?q=";
(*InChIKey for Quinine*)
IK = "InChIKey=LOUPRKONTZGTKE-WZBLMQSHSA-N";
Request data from CAS Common Chemistry Search API#
Search query:
quinineSearchData = Import[searchBaseURL <> IK, "RawJSON"];
quinineSearchData
<|count -> 1, results -> > {<|rn -> 130-95-0, name -> Quinine, > image -> <svg width="309.3" viewBox="0 0 10310 5592" text-rendering="auto"\ > stroke-width="1" stroke-opacity="1" stroke-miterlimit="10"\ > stroke-linejoin="miter" stroke-linecap="square" stroke-dashoffset="0"\ > stroke-dasharray="none" stroke="black" shape-rendering="auto"\ > image-rendering="auto" height="167.76" font-weight="normal" font-style="normal"\ > font-size="12" font-family="'Dialog'" fill-opacity="1" fill="black"\ > color-rendering="auto" color-interpolation="auto"\ > xmlns="http://www.w3.org/2000/svg"><g><g stroke="white" fill="white"><rect\ > y="0" x="0" width="10310" stroke="none" height="5592"/></g><g\ > transform="translate(32866,32758)" text-rendering="geometricPrecision"\ > stroke-width="44" stroke-linejoin="round" stroke-linecap="round"><line\ > y2="-28559" y1="-28036" x2="-26635" x1="-25742" fill="none"/><line y2="-29819"\ > y1="-28559" x2="-26635" x1="-26635" fill="none"/><line y2="-28036" y1="-28559"\ > x2="-25367" x1="-24474" fill="none"/><line y2="-30451" y1="-29819" x2="-25555"\ > x1="-26635" fill="none"/><line y2="-28559" y1="-29819" x2="-24474" x1="-24474"\ > fill="none"/><line y2="-29504" y1="-28828" x2="-25194" x1="-26005"\ > fill="none"/><line y2="-29819" y1="-30451" x2="-24474" x1="-25555"\ > fill="none"/><line y2="-29082" y1="-28559" x2="-27542" x1="-26635"\ > fill="none"/><line y2="-29819" y1="-30344" x2="-22660" x1="-23567"\ > fill="none"/><line y2="-29700" y1="-30223" x2="-22729" x1="-23636"\ > fill="none"/><line y2="-28779" y1="-29082" x2="-28071" x1="-27542"\ > fill="none"/><line y2="-30703" y1="-30131" x2="-28524" x1="-27542"\ > fill="none"/><line y2="-31850" y1="-30703" x2="-28524" x1="-28524"\ > fill="none"/><line y2="-31705" y1="-30847" x2="-28354" x1="-28354"\ > fill="none"/><line y2="-30131" y1="-30703" x2="-29507" x1="-28524"\ > fill="none"/><line y2="-30131" y1="-30703" x2="-27542" x1="-26560"\ > fill="none"/><line y2="-30347" y1="-30778" x2="-27505" x1="-26768"\ > fill="none"/><line y2="-31850" y1="-32422" x2="-28524" x1="-29507"\ > fill="none"/><line y2="-32312" y1="-31850" x2="-27730" x1="-28524"\ > fill="none"/><line y2="-30703" y1="-30131" x2="-30489" x1="-29507"\ > fill="none"/><line y2="-30778" y1="-30347" x2="-30281" x1="-29544"\ > fill="none"/><line y2="-30703" y1="-31850" x2="-26560" x1="-26560"\ > fill="none"/><line y2="-32422" y1="-31850" x2="-29507" x1="-30489"\ > fill="none"/><line y2="-32205" y1="-31774" x2="-29544" x1="-30281"\ > fill="none"/><line y2="-31850" y1="-32312" x2="-26560" x1="-27354"\ > fill="none"/><line y2="-31760" y1="-32107" x2="-26745" x1="-27340"\ > fill="none"/><line y2="-31850" y1="-30703" x2="-30489" x1="-30489"\ > fill="none"/><line y2="-30275" y1="-30703" x2="-31200" x1="-30489"\ > fill="none"/><line y2="-30541" y1="-30272" x2="-32040" x1="-31575"\ > fill="none"/><polygon stroke-width="1" stroke="none" points=" -24474 -29819\ > -23602 -30402 -23532 -30284"/><polygon stroke-width="1" points=" -24474 -29819\ > -23602 -30402 -23532 -30284" fill="none"/><polygon stroke-width="1"\ > stroke="none" points=" -26635 -28559 -26973 -27837 -27092 -27903"/><polygon\ > stroke-width="1" points=" -26635 -28559 -26973 -27837 -27092 -27903"\ > fill="none"/><line y2="-28860" y1="-28796" x2="-25945" x1="-26066"\ > fill="none"/><line y2="-28657" y1="-28611" x2="-25865" x1="-25952"\ > fill="none"/><line y2="-28454" y1="-28427" x2="-25785" x1="-25838"\ > fill="none"/><line y2="-28252" y1="-28242" x2="-25706" x1="-25723"\ > fill="none"/><line y2="-29478" y1="-29530" x2="-25257" x1="-25130"\ > fill="none"/><line y2="-29686" y1="-29727" x2="-25321" x1="-25221"\ > fill="none"/><line y2="-29894" y1="-29924" x2="-25384" x1="-25312"\ > fill="none"/><line y2="-30102" y1="-30121" x2="-25448" x1="-25403"\ > fill="none"/><line y2="-30310" y1="-30317" x2="-25512" x1="-25493"\ > fill="none"/><line y2="-30131" y1="-30128" x2="-27473" x1="-27612"\ > fill="none"/><line y2="-29914" y1="-29912" x2="-27487" x1="-27598"\ > fill="none"/><line y2="-29697" y1="-29695" x2="-27502" x1="-27583"\ > fill="none"/><line y2="-29480" y1="-29479" x2="-27516" x1="-27569"\ > fill="none"/><line y2="-29263" y1="-29263" x2="-27530" x1="-27554"\ > fill="none"/><text y="-28380" xml:space="preserve" x="-28602" stroke="none"\ > font-size="433.3333" font-family="sans-serif">OH</text><text y="-29983"\ > xml:space="preserve" x="-31540" stroke="none" font-size="433.3333"\ > font-family="sans-serif">O</text><text y="-30691" xml:space="preserve"\ > x="-32762" stroke="none" font-size="433.3333"\ > font-family="sans-serif">CH</text><text y="-30602" xml:space="preserve"\ > x="-32185" stroke="none" font-size="313.3333"\ > font-family="sans-serif">3</text><text y="-32242" xml:space="preserve"\ > x="-27695" stroke="none" font-size="433.3333"\ > font-family="sans-serif">N</text><text y="-27747" xml:space="preserve"\ > x="-25708" stroke="none" font-size="433.3333"\ > font-family="sans-serif">N</text><text y="-27473" xml:space="preserve"\ > x="-27311" stroke="none" font-size="433.3333"\ > font-family="sans-serif">H</text><text y="-28600" xml:space="preserve"\ > x="-27695" stroke="none" font-style="italic" font-size="313.3333"\ > font-family="sans-serif">R</text><text y="-28522" xml:space="preserve"\ > x="-26540" stroke="none" font-style="italic" font-size="313.3333"\ > font-family="sans-serif">S</text><text y="-27337" xml:space="preserve"\ > x="-25818" stroke="none" font-style="italic" font-size="313.3333"\ > font-family="sans-serif">S</text><text y="-30573" xml:space="preserve"\ > x="-25708" stroke="none" font-style="italic" font-size="313.3333"\ > font-family="sans-serif">S</text><text y="-29495" xml:space="preserve"\ > x="-24876" stroke="none" font-style="italic" font-size="313.3333"\ > font-family="sans-serif">R</text></g></g></svg>|>}|>
Note that with the CAS Common Chemistry Search API, only the image data, name, and CAS RN is returned. In order to retrieve the full record, we can combine our search with the related detail API:
quinineRN = quinineSearchData["results"][[1, "rn"]]
130-95-0
detailBaseURL = "https://commonchemistry.cas.org/api/detail?";
quinineDetailData = Import[detailBaseURL <> "cas_rn=" <> quinineRN, "RawJSON"];
quinineDetailData
Output not shown
quinineDetailData //OutputForm (*Changed to plain text output*)
<|uri -> substance/pt/130950, rn -> 130-95-0, name -> Quinine, > image -> <svg width="309.3" viewBox="0 0 10310 5592" text-rendering="auto"\ > stroke-width="1" stroke-opacity="1" stroke-miterlimit="10" stroke-linejoin="miter"\ > stroke-linecap="square" stroke-dashoffset="0" stroke-dasharray="none"\ > stroke="black" shape-rendering="auto" image-rendering="auto" height="167.76"\ > font-weight="normal" font-style="normal" font-size="12" font-family="'Dialog'"\ > fill-opacity="1" fill="black" color-rendering="auto" color-interpolation="auto"\ > xmlns="http://www.w3.org/2000/svg"><g><g stroke="white" fill="white"><rect y="0"\ > x="0" width="10310" stroke="none" height="5592"/></g><g\ > transform="translate(32866,32758)" text-rendering="geometricPrecision"\ > stroke-width="44" stroke-linejoin="round" stroke-linecap="round"><line y2="-28559"\ > y1="-28036" x2="-26635" x1="-25742" fill="none"/><line y2="-29819" y1="-28559"\ > x2="-26635" x1="-26635" fill="none"/><line y2="-28036" y1="-28559" x2="-25367"\ > x1="-24474" fill="none"/><line y2="-30451" y1="-29819" x2="-25555" x1="-26635"\ > fill="none"/><line y2="-28559" y1="-29819" x2="-24474" x1="-24474"\ > fill="none"/><line y2="-29504" y1="-28828" x2="-25194" x1="-26005"\ > fill="none"/><line y2="-29819" y1="-30451" x2="-24474" x1="-25555"\ > fill="none"/><line y2="-29082" y1="-28559" x2="-27542" x1="-26635"\ > fill="none"/><line y2="-29819" y1="-30344" x2="-22660" x1="-23567"\ > fill="none"/><line y2="-29700" y1="-30223" x2="-22729" x1="-23636"\ > fill="none"/><line y2="-28779" y1="-29082" x2="-28071" x1="-27542"\ > fill="none"/><line y2="-30703" y1="-30131" x2="-28524" x1="-27542"\ > fill="none"/><line y2="-31850" y1="-30703" x2="-28524" x1="-28524"\ > fill="none"/><line y2="-31705" y1="-30847" x2="-28354" x1="-28354"\ > fill="none"/><line y2="-30131" y1="-30703" x2="-29507" x1="-28524"\ > fill="none"/><line y2="-30131" y1="-30703" x2="-27542" x1="-26560"\ > fill="none"/><line y2="-30347" y1="-30778" x2="-27505" x1="-26768"\ > fill="none"/><line y2="-31850" y1="-32422" x2="-28524" x1="-29507"\ > fill="none"/><line y2="-32312" y1="-31850" x2="-27730" x1="-28524"\ > fill="none"/><line y2="-30703" y1="-30131" x2="-30489" x1="-29507"\ > fill="none"/><line y2="-30778" y1="-30347" x2="-30281" x1="-29544"\ > fill="none"/><line y2="-30703" y1="-31850" x2="-26560" x1="-26560"\ > fill="none"/><line y2="-32422" y1="-31850" x2="-29507" x1="-30489"\ > fill="none"/><line y2="-32205" y1="-31774" x2="-29544" x1="-30281"\ > fill="none"/><line y2="-31850" y1="-32312" x2="-26560" x1="-27354"\ > fill="none"/><line y2="-31760" y1="-32107" x2="-26745" x1="-27340"\ > fill="none"/><line y2="-31850" y1="-30703" x2="-30489" x1="-30489"\ > fill="none"/><line y2="-30275" y1="-30703" x2="-31200" x1="-30489"\ > fill="none"/><line y2="-30541" y1="-30272" x2="-32040" x1="-31575"\ > fill="none"/><polygon stroke-width="1" stroke="none" points=" -24474 -29819 -23602\ > -30402 -23532 -30284"/><polygon stroke-width="1" points=" -24474 -29819 -23602\ > -30402 -23532 -30284" fill="none"/><polygon stroke-width="1" stroke="none"\ > points=" -26635 -28559 -26973 -27837 -27092 -27903"/><polygon stroke-width="1"\ > points=" -26635 -28559 -26973 -27837 -27092 -27903" fill="none"/><line y2="-28860"\ > y1="-28796" x2="-25945" x1="-26066" fill="none"/><line y2="-28657" y1="-28611"\ > x2="-25865" x1="-25952" fill="none"/><line y2="-28454" y1="-28427" x2="-25785"\ > x1="-25838" fill="none"/><line y2="-28252" y1="-28242" x2="-25706" x1="-25723"\ > fill="none"/><line y2="-29478" y1="-29530" x2="-25257" x1="-25130"\ > fill="none"/><line y2="-29686" y1="-29727" x2="-25321" x1="-25221"\ > fill="none"/><line y2="-29894" y1="-29924" x2="-25384" x1="-25312"\ > fill="none"/><line y2="-30102" y1="-30121" x2="-25448" x1="-25403"\ > fill="none"/><line y2="-30310" y1="-30317" x2="-25512" x1="-25493"\ > fill="none"/><line y2="-30131" y1="-30128" x2="-27473" x1="-27612"\ > fill="none"/><line y2="-29914" y1="-29912" x2="-27487" x1="-27598"\ > fill="none"/><line y2="-29697" y1="-29695" x2="-27502" x1="-27583"\ > fill="none"/><line y2="-29480" y1="-29479" x2="-27516" x1="-27569"\ > fill="none"/><line y2="-29263" y1="-29263" x2="-27530" x1="-27554"\ > fill="none"/><text y="-28380" xml:space="preserve" x="-28602" stroke="none"\ > font-size="433.3333" font-family="sans-serif">OH</text><text y="-29983"\ > xml:space="preserve" x="-31540" stroke="none" font-size="433.3333"\ > font-family="sans-serif">O</text><text y="-30691" xml:space="preserve" x="-32762"\ > stroke="none" font-size="433.3333" font-family="sans-serif">CH</text><text\ > y="-30602" xml:space="preserve" x="-32185" stroke="none" font-size="313.3333"\ > font-family="sans-serif">3</text><text y="-32242" xml:space="preserve" x="-27695"\ > stroke="none" font-size="433.3333" font-family="sans-serif">N</text><text\ > y="-27747" xml:space="preserve" x="-25708" stroke="none" font-size="433.3333"\ > font-family="sans-serif">N</text><text y="-27473" xml:space="preserve" x="-27311"\ > stroke="none" font-size="433.3333" font-family="sans-serif">H</text><text\ > y="-28600" xml:space="preserve" x="-27695" stroke="none" font-style="italic"\ > font-size="313.3333" font-family="sans-serif">R</text><text y="-28522"\ > xml:space="preserve" x="-26540" stroke="none" font-style="italic"\ > font-size="313.3333" font-family="sans-serif">S</text><text y="-27337"\ > xml:space="preserve" x="-25818" stroke="none" font-style="italic"\ > font-size="313.3333" font-family="sans-serif">S</text><text y="-30573"\ > xml:space="preserve" x="-25708" stroke="none" font-style="italic"\ > font-size="313.3333" font-family="sans-serif">S</text><text y="-29495"\ > xml:space="preserve" x="-24876" stroke="none" font-style="italic"\ > font-size="313.3333" font-family="sans-serif">R</text></g></g></svg>, > inchi -> InChI=1S/C20H24N2O2/c1-3-13-12-22-9-7-14(13)10-19(22)20(23)16-6-8-21-18-5-4\ > -15(24-2)11-17(16)18/h3-6,8,11,13-14,19-20,23H,1,7,9-10,12H2,2H3/t13-,14-,19-,20+/\ > m0/s1, inchiKey -> InChIKey=LOUPRKONTZGTKE-WZBLMQSHSA-N, > smile -> [C@@H](O)(C=1C2=C(C=CC(OC)=C2)N=CC1)[C@]3([N@@]4C[C@H](C=C)[C@H](C3)CC4)[\ > H], canonicalSmile -> OC(C=1C=CN=C2C=CC(OC)=CC21)C3N4CCC(C3)C(C=C)C4, > molecularFormula -> C<sub>20</sub>H<sub>24</sub>N<sub>2</sub>O<sub>2</sub>, > molecularMass -> 324.42, experimentalProperties -> > {<|name -> Melting Point, property -> 57 °C, sourceNumber -> 1|>}, > propertyCitations -> > {<|docUri -> , sourceNumber -> 1, > source -> PhysProp data were obtained from Syracuse Research Corporation of\ > Syracuse, New York (US)|>}, > synonyms -> {Cinchonan-9-ol, 6′-methoxy-, (8α,9<em>R</em>)-, Quinine, > (8α,9<em>R</em>)-6′-Methoxycinchonan-9-ol, 6′-Methoxycinchonidine, (-)-Quinine, > (8<em>S</em>,9<em>R</em>)-Quinine, (<em>R</em>)-(-)-Quinine, NSC 192949, WR297608, > Qualaquin, Mosgard, Quinlup, Quine 9, Cinkona, Quinex, Quinlex, Rezquin, QSM, > SW 85833, (<em>R</em>)-(6-Methoxy-4-quinolyl)[(2<em>S</em>)-5-vinylquinuclidin-2-y\ > l]methanol}, replacedRns -> > {6912-57-8, 12239-42-8, 21480-31-9, 55980-20-6, 72646-90-3, 95650-40-1, > 128544-03-6, 767303-40-2, 840482-04-4, 857212-53-4, 864908-93-0, 875538-34-4, > 888714-03-2, 890027-24-4, 894767-09-0, 898813-59-7, 898814-28-3, 899813-83-3, > 900786-66-5, 900789-95-9, 906550-97-8, 909263-47-4, 909767-48-2, 909882-78-6, > 910878-25-0, 910880-97-6, 911445-75-5, 918778-04-8, 1071756-51-8, 1267651-57-9, > 1628705-47-4, 2244812-93-7, 2244812-97-1, 2409557-51-1, 2566761-34-8}, > hasMolfile -> True|>
Handle multiple results#
Setup search query parameters
searchBaseURL = "https://commonchemistry.cas.org/api/search?q=";
smiBD = "C=CC=C"; (*SMILES for butadiene*)
Request data from CAS Common Chemistry Search API
smiSearchData = Import[searchBaseURL <> smiBD, "RawJSON"];
smiSearchData["count"]
7
Extract out CAS RNs
smicasRNList = smiSearchData["results"][[All, "rn"]]
{106-99-0, 16422-75-6, 26952-74-9, 29406-96-0, 29989-19-3, 31567-90-5, 9003-17-2}
Now use the detail API to retrieve full records
detailBaseURL = "https://commonchemistry.cas.org/api/detail?";
smiDetailData = {};
For[i = 1, i <= Length[smicasRNList], i++,
AppendTo[smiDetailData,
Import[detailBaseURL <> "cas_rn=" <> smicasRNList[[i]],"RawJSON"]];
Pause[1] (*Adding a delay between API calls*)
]
Get some specific data such as name from the detail records
names = smiDetailData[[All, "name"]]
{1,3-Butadiene, Butadiene trimer, Butadiene dimer, > 1,3-Butadiene, homopolymer, isotactic, > 1,3-Butadiene-<em>1</em>,<em>1</em>,<em>2</em>,<em>3</em>,<em>4</em>,<em>4</em>-<em>\ > d</em><sub>6</sub>, homopolymer, Syndiotactic polybutadiene, Polybutadiene}
Handle multiple page results#
The CAS Common Chemistry API returns 50 results per page, and only the first page is returned by default. If the search returns more than 50 results, the offset option can be added to page through and obtain all results.
Setup Search query parameters
searchBaseURL = "https://commonchemistry.cas.org/api/search?q=";
n = "selen*";
Get results count for CAS Common Chemistry Search
numResults = Import[searchBaseURL <> n, "RawJSON"]["count"]
191
Request data and save to a list in a loop for each page
nSearchData = {};
For[i = 0, i <= IntegerPart[numResults/50 + 1], i++,
pageData = Import[searchBaseURL <> n <> "&offset=" <> ToString[i*50],"RawJSON"];
AppendTo[nSearchData, pageData];
Pause[1];
]
Length[nSearchData[[1]]["results"]]
Length[nSearchData[[2]]["results"]]
Length[nSearchData[[3]]["results"]]
Length[nSearchData[[4]]["results"]]
50
50
50
41
We can index and extract out the first CAS RN like this
nSearchData[[1]]["results"][[1]]["rn"]
15123-69-0
Extract out all CAS RNs from the list of lists
nCasRNList = Flatten[nSearchData[[All, "results"]][[All, All, "rn"]]];
nCasRNList //Shallow //OutputForm
{15123-69-0, 15123-97-4, 1544-55-4, 15457-71-3, 15586-47-7, 15593-51-8, 15593-52-9, > 15702-34-8, 15857-43-9, 159417-17-1, <<131>>}
Now we can loop through each casrn and use the detail API to obtain the entire record. This will query CAS Common Chem 191 times and take ~5 min.
detailBaseURL = "https://commonchemistry.cas.org/api/detail?";
nDetailData = {};
For[i = 1, i <= Length[nCasRNList], i++,
AppendTo[nDetailData,
Import[detailBaseURL <> "cas_rn=" <> nCasRNList[[i]], "RawJSON"]];
Pause[1] (*Add a delay between API calls*)
]
Extracting out some data such as Molecular Mass:
nDetailData[[All, "molecularMass"]]
{, , 163.00, , , , , , , , , 231.58, 1174.29, 227.08, , , , , , 138.03, , 122.03, > 120.01, 269.16, 304.07, , , , , , , 234.02, , 362.76, 182.08, 160.96, 210.13, , > 182.08, , , , 209.95, 93.97, 121.04, 131.03, 334.09, 182.08, 334.09, 196.11, 691.59, > , 174.85, 520.07, 168.05, 324.10, 163.12, , 294.87, 197.09, 268.84, 198.08, 171.10, > 159.00, 196.11, 203.48, 238.01, 149.09, 566.37, 151.07, 179.12, , 333.28, 227.08, > 199.07, , 196.11, 151.07, , 226.09, 198.08, , 265.17, 199.11, 109.03, 225.10, , , > 242.43, , 494.39, 123.02, 257.10, 225.10, 157.07, 95.00, 92.99, 165.06, , , 290.26, > 389.24, , , 110.96, , , , 111.03, , 143.09, 350.31, 444.10, 157.03, 362.14, , , > 128.97, , 144.97, , , 132.96, 192.95, , , 254.77, 317.73, 398.58, , 165.87, , > 307.16, 314.10, 342.11, 262.03, 237.22, , , 467.46, }
As there are many empty strings, we will replace them all with “Nothing”
mmStrings = nDetailData[[All, "molecularMass"]] /. "" -> Nothing
{163.00, 231.58, 1174.29, 227.08, 138.03, 122.03, 120.01, 269.16, 304.07, 234.02, > 362.76, 182.08, 160.96, 210.13, 182.08, 209.95, 93.97, 121.04, 131.03, 334.09, > 182.08, 334.09, 196.11, 691.59, 174.85, 520.07, 168.05, 324.10, 163.12, 294.87, > 197.09, 268.84, 198.08, 171.10, 159.00, 196.11, 203.48, 238.01, 149.09, 566.37, > 151.07, 179.12, 333.28, 227.08, 199.07, 196.11, 151.07, 226.09, 198.08, 265.17, > 199.11, 109.03, 225.10, 242.43, 494.39, 123.02, 257.10, 225.10, 157.07, 95.00, > 92.99, 165.06, 290.26, 389.24, 110.96, 111.03, 143.09, 350.31, 444.10, 157.03, > 362.14, 128.97, 144.97, 132.96, 192.95, 254.77, 317.73, 398.58, 165.87, 307.16, > 314.10, 342.11, 262.03, 237.22, 467.46}
Converting the string elements into real numbers by mapping ToExpression to each element of the list:
mm = ToExpression /@ mmStrings
{163., 231.58, 1174.29, 227.08, 138.03, 122.03, 120.01, 269.16, 304.07, 234.02, 362.76, > 182.08, 160.96, 210.13, 182.08, 209.95, 93.97, 121.04, 131.03, 334.09, 182.08, > 334.09, 196.11, 691.59, 174.85, 520.07, 168.05, 324.1, 163.12, 294.87, 197.09, > 268.84, 198.08, 171.1, 159., 196.11, 203.48, 238.01, 149.09, 566.37, 151.07, 179.12, > 333.28, 227.08, 199.07, 196.11, 151.07, 226.09, 198.08, 265.17, 199.11, 109.03, > 225.1, 242.43, 494.39, 123.02, 257.1, 225.1, 157.07, 95., 92.99, 165.06, 290.26, > 389.24, 110.96, 111.03, 143.09, 350.31, 444.1, 157.03, 362.14, 128.97, 144.97, > 132.96, 192.95, 254.77, 317.73, 398.58, 165.87, 307.16, 314.1, 342.11, 262.03, > 237.22, 467.46}
Finally, we can even quickly create a simple visualization from the extracted molecularMass values (from the selen* search):
Histogram[mm, AxesLabel -> {"molecularMass", "Count"}]