CAS Common Chemistry API in Unix Shell#

by Avery Fernandez and Vincent F. Scalfani

These recipe examples were tested on April 1, 2022 using GNOME Terminal (with Bash 4.4.20) in Ubuntu 18.04.

CAS Common Chemistry API Documentation (requires registration): https://www.cas.org/services/commonchemistry-api

Attribution: This tutorial uses the CAS Common Chemistry API. Example data shown is licensed under the CC BY-NC 4.0 license.

Program requirements#

In order to run this code, you will need to first install curl, jq, and gnuplot. curl is used to request the data from the API, jq is used to parse the JSON data, and gnuplot is used to plot the data. In addition, if you want to be able to print the molecules as ASCII characters in your terminal, you will need to install RDKit and download the print_mols Python script.

1. Common Chemistry Record Detail Retrieval#

Information about substances in CAS Common Chemistry can be retrieved using the /detail API and a CAS RN identifier:

Setup API parameters#

Create variables for the CAS Common Chemistry detail base URL and an example CAS RN (10094-36-7, ethyl cyclohexanepropionate):

detail_base_url="https://commonchemistry.cas.org/api/detail?"
casrn1="10094-36-7"

Request data from CAS Common Chemistry Detail API#

casrn1_data=$(curl $detail_base_url$"cas_rn="$casrn1)

View data#

echo "$casrn1_data" | jq '.'

Output:

{
  "uri": "substance/pt/10094367",
  "rn": "10094-36-7",
  "name": "Ethyl cyclohexanepropionate",
  "image": "<svg width=\"228.6\" viewBox=\"0 0 7620 3716\" text-rendering=\"auto\" stroke-width=\"1\" stroke-opacity=\"1\" stroke-miterlimit=\"10\" stroke-linejoin=\"miter\" stroke-linecap=\"square\" stroke-dashoffset=\"0\" stroke-dasharray=\"none\" stroke=\"black\" shape-rendering=\"auto\" image-rendering=\"auto\" height=\"111.48\" font-weight=\"normal\" font-style=\"normal\" font-size=\"12\" font-family=\"'Dialog'\" fill-opacity=\"1\" fill=\"black\" color-rendering=\"auto\" color-interpolation=\"auto\" xmlns=\"http://www.w3.org/2000/svg\"><g><g stroke=\"white\" fill=\"white\"><rect y=\"0\" x=\"0\" width=\"7620\" stroke=\"none\" height=\"3716\"/></g><g transform=\"translate(32866,32758)\" text-rendering=\"geometricPrecision\" stroke-width=\"44\" stroke-linejoin=\"round\" stroke-linecap=\"round\"><line y2=\"-30850\" y1=\"-31419\" x2=\"-30792\" x1=\"-31777\" fill=\"none\"/><line y2=\"-29715\" y1=\"-30850\" x2=\"-30792\" x1=\"-30792\" fill=\"none\"/><line y2=\"-31419\" y1=\"-30850\" x2=\"-31777\" x1=\"-32762\" fill=\"none\"/><line y2=\"-29146\" y1=\"-29715\" x2=\"-31777\" x1=\"-30792\" fill=\"none\"/><line y2=\"-30850\" y1=\"-29715\" x2=\"-32762\" x1=\"-32762\" fill=\"none\"/><line y2=\"-29715\" y1=\"-29146\" x2=\"-32762\" x1=\"-31777\" fill=\"none\"/><line y2=\"-31376\" y1=\"-30850\" x2=\"-29885\" x1=\"-30792\" fill=\"none\"/><line y2=\"-30850\" y1=\"-31376\" x2=\"-28978\" x1=\"-29885\" fill=\"none\"/><line y2=\"-31376\" y1=\"-30850\" x2=\"-28071\" x1=\"-28978\" fill=\"none\"/><line y2=\"-30960\" y1=\"-31376\" x2=\"-27352\" x1=\"-28071\" fill=\"none\"/><line y2=\"-31376\" y1=\"-30960\" x2=\"-26257\" x1=\"-26976\" fill=\"none\"/><line y2=\"-30850\" y1=\"-31376\" x2=\"-25350\" x1=\"-26257\" fill=\"none\"/><line y2=\"-32202\" y1=\"-31376\" x2=\"-28140\" x1=\"-28140\" fill=\"none\"/><line y2=\"-32202\" y1=\"-31376\" x2=\"-28002\" x1=\"-28002\" fill=\"none\"/><text y=\"-30671\" xml:space=\"preserve\" x=\"-27317\" stroke=\"none\" font-size=\"433.3333\" font-family=\"sans-serif\">O</text><text y=\"-32242\" xml:space=\"preserve\" x=\"-28224\" stroke=\"none\" font-size=\"433.3333\" font-family=\"sans-serif\">O</text></g></g></svg>",
  "inchi": "InChI=1S/C11H20O2/c1-2-13-11(12)9-8-10-6-4-3-5-7-10/h10H,2-9H2,1H3",
  "inchiKey": "InChIKey=NRVPMFHPHGBQLP-UHFFFAOYSA-N",
  "smile": "C(CC(OCC)=O)C1CCCCC1",
  "canonicalSmile": "O=C(OCC)CCC1CCCCC1",
  "molecularFormula": "C<sub>11</sub>H<sub>20</sub>O<sub>2</sub>",
  "molecularMass": "184.28",
  "experimentalProperties": [
    {
      "name": "Boiling Point",
      "property": "105-113 °C @ Press: 17 Torr",
      "sourceNumber": 1
    }
  ],
  "propertyCitations": [
    {
      "docUri": "document/pt/document/22252593",
      "sourceNumber": 1,
      "source": "De Benneville, Peter L.; Journal of the American Chemical Society, (1940), 62, 283-7, CAplus"
    }
  ],
  "synonyms": [
    "Cyclohexanepropanoic acid, ethyl ester",
    "Cyclohexanepropionic acid, ethyl ester",
    "Ethyl cyclohexanepropionate",
    "Ethyl cyclohexylpropanoate",
    "Ethyl 3-cyclohexylpropionate",
    "Ethyl 3-cyclohexylpropanoate",
    "3-Cyclohexylpropionic acid ethyl ester",
    "NSC 71463",
    "Ethyl 3-cyclohexanepropionate"
  ],
  "replacedRns": [],
  "hasMolfile": true
}

Display a Molecule Drawing#

For displaying the molecule drawing, we could extract out the SVG image string and display the SVG in an image viewer program, however since we are working within a terminal without graphics, we will instead extract out the SMILES and pipe these to a print_mols Python script, which uses the cheminformatics program RDKit to parse the SMILES, compute drawing coordinates, and then print the molecule as ASCII characters:

echo "$casrn1_data" | jq '.["smile"]' | tr -d '"' | python3 print_mols.py -

Output:

                        O

                        *

        C               C                   C               C

    *       *       *       *         *         *       *       *

C               O               C                   C               C

                                                    *               *

                                                    C               C
                                                        *       *
                                                            C

Note

jq '.["smile"]' extracts out the SMILES string in the smile field; tr -d '"' removes the quotes; python3 print_mols.py - prints the molecule.

Select some specific data#

Get Experimental Properties:

echo $casrn1_data | jq '.["experimentalProperties"][0]'

Output:

{
  "name": "Boiling Point",
  "property": "105-113 °C @ Press: 17 Torr",
  "sourceNumber": 1
}

Get Boiling Point property:

echo $casrn1_data | jq '.["experimentalProperties"][0]["property"]'

Output:

"105-113 °C @ Press: 17 Torr"

Get InChIKey:

echo $casrn1_data | jq '.["inchiKey"]'

Output:

"InChIKey=NRVPMFHPHGBQLP-UHFFFAOYSA-N"

Get Canonical SMILES:

echo $casrn1_data | jq '.["canonicalSmile"]'

Output:

"O=C(OCC)CCC1CCCCC1"

2. Common Chemistry API record detail retrieval in a loop#

Setup API parameters#

detail_base_url="https://commonchemistry.cas.org/api/detail?"
declare -a casrn_list=("10094-36-7" "10031-92-2" "10199-61-8" "10036-21-2" "1019020-13-3")
echo "${casrn_list[@]}"

Output:

10094-36-7 10031-92-2 10199-61-8 10036-21-2 1019020-13-3

Request data for each CAS RN and save to an array#

declare -a casrn_data
for casrn in "${casrn_list[@]}"
do
  data=$(curl $detail_base_url$"cas_rn="$casrn)
  casrn_data+=("$data")
  sleep 1
done

View the first record:

echo "${casrn_data[0]}" | jq '.'

Output:

{
  "uri": "substance/pt/10094367",
  "rn": "10094-36-7",
  "name": "Ethyl cyclohexanepropionate",
  "image": "<svg width=\"228.6\" viewBox=\"0 0 7620 3716\" text-rendering=\"auto\" stroke-width=\"1\" stroke-opacity=\"1\" stroke-miterlimit=\"10\" stroke-linejoin=\"miter\" stroke-linecap=\"square\" stroke-dashoffset=\"0\" stroke-dasharray=\"none\" stroke=\"black\" shape-rendering=\"auto\" image-rendering=\"auto\" height=\"111.48\" font-weight=\"normal\" font-style=\"normal\" font-size=\"12\" font-family=\"'Dialog'\" fill-opacity=\"1\" fill=\"black\" color-rendering=\"auto\" color-interpolation=\"auto\" xmlns=\"http://www.w3.org/2000/svg\"><g><g stroke=\"white\" fill=\"white\"><rect y=\"0\" x=\"0\" width=\"7620\" stroke=\"none\" height=\"3716\"/></g><g transform=\"translate(32866,32758)\" text-rendering=\"geometricPrecision\" stroke-width=\"44\" stroke-linejoin=\"round\" stroke-linecap=\"round\"><line y2=\"-30850\" y1=\"-31419\" x2=\"-30792\" x1=\"-31777\" fill=\"none\"/><line y2=\"-29715\" y1=\"-30850\" x2=\"-30792\" x1=\"-30792\" fill=\"none\"/><line y2=\"-31419\" y1=\"-30850\" x2=\"-31777\" x1=\"-32762\" fill=\"none\"/><line y2=\"-29146\" y1=\"-29715\" x2=\"-31777\" x1=\"-30792\" fill=\"none\"/><line y2=\"-30850\" y1=\"-29715\" x2=\"-32762\" x1=\"-32762\" fill=\"none\"/><line y2=\"-29715\" y1=\"-29146\" x2=\"-32762\" x1=\"-31777\" fill=\"none\"/><line y2=\"-31376\" y1=\"-30850\" x2=\"-29885\" x1=\"-30792\" fill=\"none\"/><line y2=\"-30850\" y1=\"-31376\" x2=\"-28978\" x1=\"-29885\" fill=\"none\"/><line y2=\"-31376\" y1=\"-30850\" x2=\"-28071\" x1=\"-28978\" fill=\"none\"/><line y2=\"-30960\" y1=\"-31376\" x2=\"-27352\" x1=\"-28071\" fill=\"none\"/><line y2=\"-31376\" y1=\"-30960\" x2=\"-26257\" x1=\"-26976\" fill=\"none\"/><line y2=\"-30850\" y1=\"-31376\" x2=\"-25350\" x1=\"-26257\" fill=\"none\"/><line y2=\"-32202\" y1=\"-31376\" x2=\"-28140\" x1=\"-28140\" fill=\"none\"/><line y2=\"-32202\" y1=\"-31376\" x2=\"-28002\" x1=\"-28002\" fill=\"none\"/><text y=\"-30671\" xml:space=\"preserve\" x=\"-27317\" stroke=\"none\" font-size=\"433.3333\" font-family=\"sans-serif\">O</text><text y=\"-32242\" xml:space=\"preserve\" x=\"-28224\" stroke=\"none\" font-size=\"433.3333\" font-family=\"sans-serif\">O</text></g></g></svg>",
  "inchi": "InChI=1S/C11H20O2/c1-2-13-11(12)9-8-10-6-4-3-5-7-10/h10H,2-9H2,1H3",
  "inchiKey": "InChIKey=NRVPMFHPHGBQLP-UHFFFAOYSA-N",
  "smile": "C(CC(OCC)=O)C1CCCCC1",
  "canonicalSmile": "O=C(OCC)CCC1CCCCC1",
  "molecularFormula": "C<sub>11</sub>H<sub>20</sub>O<sub>2</sub>",
  "molecularMass": "184.28",
  "experimentalProperties": [
    {
      "name": "Boiling Point",
      "property": "105-113 °C @ Press: 17 Torr",
      "sourceNumber": 1
    }
  ],
  "propertyCitations": [
    {
      "docUri": "document/pt/document/22252593",
      "sourceNumber": 1,
      "source": "De Benneville, Peter L.; Journal of the American Chemical Society, (1940), 62, 283-7, CAplus"
    }
  ],
  "synonyms": [
    "Cyclohexanepropanoic acid, ethyl ester",
    "Cyclohexanepropionic acid, ethyl ester",
    "Ethyl cyclohexanepropionate",
    "Ethyl cyclohexylpropanoate",
    "Ethyl 3-cyclohexylpropionate",
    "Ethyl 3-cyclohexylpropanoate",
    "3-Cyclohexylpropionic acid ethyl ester",
    "NSC 71463",
    "Ethyl 3-cyclohexanepropionate"
  ],
  "replacedRns": [],
  "hasMolfile": true
}

Display Molecule Drawings#

We can use a similar technique to display the molecules as shown above. We will first extract out the SMILES strings then print them as ASCII characters using the print_mols Python script.

for data in "${!casrn_data[@]}"
do
  echo "${casrn_data[$data]}" | jq '.["smile"]' | tr -d '"' | python3 print_mols.py -
done

Output:

                        O

                        *

        C               C                   C               C

    *       *       *       *         *         *       *       *

C               O               C                   C               C

                                                    *               *

                                                    C               C
                                                        *       *
                                                            C





                                                O

                                                *

                                                C           C
                                              *     *     *     *
                                            C           O           C
                                        *
C           C           C           C
  *     *       *     *     *     *
    C               C           C





    C                           O
          *
  *             C
                                *
C               *

    *           N               C                   C
            *       *       *       *         *         *
        N               C               O                   C






                O                                   O

                *                                   *

    C           C           C           C           C       C
  *     *     *     *     *   *     *     *     *     *   *     *
C           O           C       C           C           O           C
                                *           *
                                C           C
                                    *     *
                                        C



                        N

                        *

        C               C                   C

    *       *       *         *         *       *

C               O                   C               C

                                    *               *

                                    C               C
                                        *       *
                                            C

Select some specific data#

Get canonical SMILES:

declare -a cansmiles
for data in "${!casrn_data[@]}"
do
  cansmiles+=("$(echo "${casrn_data[$data]}" | jq '.["canonicalSmile"]')")
done
echo "${cansmiles[@]}"

Output:

"O=C(OCC)CCC1CCCCC1" "O=C(C#CCCCCCC)OCC" "O=C(OCC)CN1N=CC=C1" "O=C(OCC)C1=CC=CC(=C1)CCC(=O)OCC" "N=C(OCC)C1=CCCCC1"

Get synonyms:

declare -a synonyms_list
for data in "${!casrn_data[@]}"
do
  synonyms_list+=("$(echo "${casrn_data[$data]}" | jq '.["synonyms"]')")
done
echo "${synonyms_list[@]}"

Output:

[
  "Cyclohexanepropanoic acid, ethyl ester",
  "Cyclohexanepropionic acid, ethyl ester",
  "Ethyl cyclohexanepropionate",
  "Ethyl cyclohexylpropanoate",
  "Ethyl 3-cyclohexylpropionate",
  "Ethyl 3-cyclohexylpropanoate",
  "3-Cyclohexylpropionic acid ethyl ester",
  "NSC 71463",
  "Ethyl 3-cyclohexanepropionate"
] [
  "2-Nonynoic acid, ethyl ester",
  "Ethyl 2-nonynoate",
  "NSC 190985"
] [
  "1<em>H</em>-Pyrazole-1-acetic acid, ethyl ester",
  "Pyrazole-1-acetic acid, ethyl ester",
  "Ethyl 1<em>H</em>-pyrazole-1-acetate",
  "Ethyl 1-pyrazoleacetate",
  "Ethyl 2-(1<em>H</em>-pyrazol-1-yl)acetate"
] [
  "Benzenepropanoic acid, 3-(ethoxycarbonyl)-, ethyl ester",
  "Hydrocinnamic acid, <em>m</em>-carboxy-, diethyl ester",
  "Ethyl 3-(ethoxycarbonyl)benzenepropanoate"
] [
  "1-Cyclohexene-1-carboximidic acid, ethyl ester",
  "Ethyl 1-cyclohexene-1-carboximidate"
]

Transform synonym array of lists to a flat structure:

declare -a synonyms_flat
for data in "${!casrn_data[@]}"
do
  # loops through each list and grabs their data
  for (( i = 0 ; i < $(echo "${casrn_data[$data]}" | jq '.["synonyms"] | length') ; i++))
  do
    synonyms_flat+=("$(echo "${casrn_data[$data]}" | jq ".synonyms[$i]")")
  done
done
echo "${synonyms_flat[@]}"

Output:

"Cyclohexanepropanoic acid, ethyl ester" "Cyclohexanepropionic acid, ethyl ester" "Ethyl cyclohexanepropionate" "Ethyl cyclohexylpropanoate" "Ethyl 3-cyclohexylpropionate" "Ethyl 3-cyclohexylpropanoate" "3-Cyclohexylpropionic acid ethyl ester" "NSC 71463" "Ethyl 3-cyclohexanepropionate" "2-Nonynoic acid, ethyl ester" "Ethyl 2-nonynoate" "NSC 190985" "1<em>H</em>-Pyrazole-1-acetic acid, ethyl ester" "Pyrazole-1-acetic acid, ethyl ester" "Ethyl 1<em>H</em>-pyrazole-1-acetate" "Ethyl 1-pyrazoleacetate" "Ethyl 2-(1<em>H</em>-pyrazol-1-yl)acetate" "Benzenepropanoic acid, 3-(ethoxycarbonyl)-, ethyl ester" "Hydrocinnamic acid, <em>m</em>-carboxy-, diethyl ester" "Ethyl 3-(ethoxycarbonyl)benzenepropanoate" "1-Cyclohexene-1-carboximidic acid, ethyl ester" "Ethyl 1-cyclohexene-1-carboximidate"