Chronicling America API in C

Chronicling America API in C#

by Cyrus Gomes

LOC Chronicling America API Documentation: https://chroniclingamerica.loc.gov/about/api/

These recipe examples were tested November, 2023.

Attribution: We thank Professor Jessica Kincaid (UA Libraries, Hoole Special Collections) for the use-cases. All data was collected from the Library of Congress, Chronicling America: Historic American Newspapers site, using the API.

Note that the data from the Alabama state intelligencer, The age-herald, and the Birmingham age-herald were contributed to Chronicling America by The University of Alabama Libraries: https://chroniclingamerica.loc.gov/awardees/au/

Setup#

First, install the CURL package by typing the following command in the terminal:

!sudo apt install curl

Then, install the jq package by typing the following command in the terminal:

!sudo apt install jq

Now, we set a directory where we want the Chronam directory for our projects to be created:

!mkdir Chronam

Finally, we change the directory to the folder we created:

%cd Chronam

1. Basic API request#

The Chronicling America API identifies newspapers and other records using LCCNs. We can query the API once we have the LCCN for the newspaper and even ask for particular issues and editions. For example, the following link lists newspapers published in the state of Alabama, from which the LCCN can be obtained: https://chroniclingamerica.loc.gov/newspapers/?state=Alabama

Here is an example with the Alabama State Intelligencer:

First, we can initialize a folder for the current project that we are working on:

!mkdir APIdata

Then, we can change to our newly created directory:

%cd APIdata

We utilize the %%file command to create the following makefile, which will compile our program and create an executable.

%%file makefile

# Set the variable CC to gcc, which is used to build the program
CC=gcc

# Enable debugging information and enable all compiler warnings
CFLAGS=-g -Wall

# Set the bin variable as the name of the binary file we are creating
BIN=api_data

# Create the binary file with the name we put
all: $(BIN)

# Map any file ending in .c to a binary executable
# "$<" represents the .c file and "$@" represents the target binary executable
%: %.c

	# Compile the .c file using the gcc compiler with the CFLAGS and links 
	# resulting binary with the CURL library
	$(CC) $(CFLAGS) $< -o $@ -lcurl

# Clean target which removes specific files
clean:

	# Remove the binary file and an ".dSYM" (debug symbols for debugging) directories
	# the RM command used -r to remove directories and -f to force delete
	$(RM) -rf $(BIN) *.dSYM

The command is used again to create the .c file that contains the code for the program:

%%file api_data.c

#include <curl/curl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* CURL program that retrieves API data with added id.
Custom property fields can be added */

int main (int argc, char* argv[]){
    
    // If arguments are invalid just return
    if (argc < 2){                                                                                      
        printf("Error. Please try again correctly. (./apidata -id [id])\n");
        return -1;
    }

    // Initialize the CURL HTTP connection
    CURL *curl = curl_easy_init();

    // bit of the url that are joined together later                                                                      
    char api[] = "https://chroniclingamerica.loc.gov/";                            
    char url[1000];
    char label_1[] = "lccn/";
    char format[] = ".json";
    char default_id[] = "sn84023600";

    // Check if CURL initialization is a success or not
    if (!curl){                                                                                     
        fprintf(stderr, "init failed\n");
        return EXIT_FAILURE;
    }

    // Check if default id should be used
    if (((argc==2) && (strcmp(argv[1],"-id")==0))){
        
        // Combine all the bits to produce a functioning url
        sprintf(url, "%s%s%s%s", api, label_1, default_id, format); 
        
    }
    
    // Check if the conditions match for using a specified id
    else if (((argc==3) && (strcmp(argv[1],"-id")==0))){
        
        // Combines all the bits to produce a functioning url
        sprintf(url, "%s%s%s%s", api, label_1, argv[2], format);                                              
    
    }

    // If the arguments are invalid then return
    else {                                                                                              
        curl_easy_cleanup(curl);
        return 0;
    }                                            

    // Set the url to which the HTTP request will be sent to
    // first parameter is for the initialized curl HTTP request, second for the option to be set, and third for the value to be set
    curl_easy_setopt(curl, CURLOPT_URL, url);

    // If result is not retrieved then output error
    CURLcode result = curl_easy_perform(curl);

    // If result is not retrieved then output error
    if (result != CURLE_OK){                                                                            
        fprintf(stderr, "download problem: %s\n", curl_easy_strerror(result));
    }

    // Deallocate memory for the CURL connection
    curl_easy_cleanup(curl);                                                                            
    return EXIT_SUCCESS;
}

Now, we use the make command to compile our executable:

!make

gcc -g -Wall api_data.c -o api_data -lcurl

Now we can run the executable with the an LCCN as an input:

!./api_data -id sn84023600

{"place_of_publication": "Tuskaloosa [sic], Ala.", "lccn": "sn84023600", "start_year": "183?", "place": ["Alabama--Tuscaloosa--Tuscaloosa"], "name": "Alabama State intelligencer. [volume]", "publisher": "T.M. Bradford", "url": "https://chroniclingamerica.loc.gov/lccn/sn84023600.json", "end_year": "18??", "issues": [], "subject": []}

Indexing into the json output allows data to be extracted using key names as demonstrated below:

!./api_data -id sn84023600 | jq '.["name"]'

"Alabama State intelligencer. [volume]"

!./api_data -id sn84023600 | jq '.["publisher"]'

"T.M. Bradford"

Moving on to another publication, we can get the 182nd page (seq-182) of the Evening Star newspaper published on November 19, 1961:

!./api_data -id "sn83045462/1961-11-19/ed-1/seq-182" | jq '.'

{
  "jp2": "https://chroniclingamerica.loc.gov/lccn/sn83045462/1961-11-19/ed-1/seq-182.jp2",
  "sequence": 182,
  "text": "https://chroniclingamerica.loc.gov/lccn/sn83045462/1961-11-19/ed-1/seq-182/ocr.txt",
  "title": {
    "url": "https://chroniclingamerica.loc.gov/lccn/sn83045462.json",
    "name": "Evening star. [volume]"
  },
  "pdf": "https://chroniclingamerica.loc.gov/lccn/sn83045462/1961-11-19/ed-1/seq-182.pdf",
  "ocr": "https://chroniclingamerica.loc.gov/lccn/sn83045462/1961-11-19/ed-1/seq-182/ocr.xml",
  "issue": {
    "url": "https://chroniclingamerica.loc.gov/lccn/sn83045462/1961-11-19/ed-1.json",
    "date_issued": "1961-11-19"
  }
}

We can also download this page as a PDF:

%%bash

# Call the api to get the pdf link
url=$(./api_data -id "sn83045462/1961-11-19/ed-1/seq-182" | jq -r '.pdf')

# Use the wget function to download the pdf file
wget "$url" -O file.pdf

2. Frequency of “University of Alabama” mentions#

The URL below limits to searching newspapers in the state of Alabama and provides 75 results of “University of Alabama” mentions. Note that phrases can be searched by putting them inside parentheses for the query.

We change the directory in the Chronam folder to create a new one for this project

%cd ..

!mkdir frequency

%cd frequency

We utilize the %%file command to create the following makefile, which will compile our program and create an executable:

%%file makefile

# Set the variable CC to gcc, which is used to build the program
CC=gcc

# Enable debugging information and enable all compiler warnings
CFLAGS=-g -Wall

# Set the bin variable as the name of the binary file we are creating
BIN=frequency_of_mentions

# Create the binary file with the name we put
all: $(BIN)

# Map any file ending in .c to a binary executable. 
# "$<" represents the .c file and "$@" represents the target binary executable
%: %.c

	# Compile the .c file using the gcc compiler with the CFLAGS and links 
	# resulting binary with the CURL library
	$(CC) $(CFLAGS) $< -o $@ -lcurl

# Clean target which removes specific files
clean:

	# Remove the binary file and an ".dSYM" (debug symbols for debugging) directories
	# the RM command used -r to remove directories and -f to force delete
	$(RM) -rf $(BIN) *.dSYM

The %%file command is used again to create our .c file which contains the code for the program

%%file frequency_of_mentions.c

#include <curl/curl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* CURL program that retrieves frequency of mentions with added search id.
Custom property fields can be added*/

int main (int argc, char* argv[]){
    
    // If arguments are invalid then return
    if (argc < 2) {                                                                                      
        printf("Error. Please try again correctly. (./frequency_of_mentions -s [s])\n");
        return -1;
    }

    // Initialize the CURL HTTP connection
    CURL *curl = curl_easy_init();

    // Bits of the url that are joined together later                                                                      
    char api[] = "https://chroniclingamerica.loc.gov/";                            
    char url[1000];
    char default_id[] = "search/pages/results/?state=Alabama&proxtext=(University%20of%20Alabama)&rows=75&format=json";

    // Check if CURL initialization is a success or not
    if (!curl) {                                                                                         
        fprintf(stderr, "init failed\n");
        return EXIT_FAILURE;
    }

    // Check if default search id should be used
    if ((argc==2) && (strcmp(argv[1],"-s")==0)) {
        
        // Combine all the bits to produce a functioning url
        sprintf(url, "%s%s", api, default_id); 
        
    }
    
    // Check if the conditions match for using an id specified
    else if ((argc==3) && (strcmp(argv[1],"-s")==0)) {
        
        // Combine all the bits to produce a functioning url
        sprintf(url, "%s%s", api, argv[2]);                                              
    
    }

    // If the arguments are invalid then return
    else {                                                                                              
        curl_easy_cleanup(curl);
        return 0;
    }                                            

    // Set the url to which the HTTP request will be sent to
    // first parameter is for the initialized curl HTTP request, second for the option to be set, and third for the value to be set
    curl_easy_setopt(curl, CURLOPT_URL, url);

    // If result is not retrieved then output error
    CURLcode result = curl_easy_perform(curl);

    // If result is not retrieved then output error
    if (result != CURLE_OK) {                                                                            
        fprintf(stderr, "download problem: %s\n", curl_easy_strerror(result));
    }

    // Deallocate memory for the CURL connection
    curl_easy_cleanup(curl);                                                                            
    return EXIT_SUCCESS;
}

!make

gcc -g -Wall frequency_of_mentions.c -o frequency_of_mentions -lcurl

# Output not shown because it is too long
!./frequency_of_mentions -s "search/pages/results/?state=Alabama&proxtext=(University%20of%20Alabama)&rows=75&format=json" | jq "."

Here’s the first result from the API:

# Output not shown because it is too long
!./frequency_of_mentions -s | jq '.["items"][0]'

To view the number of results retrieved by the API:

!./frequency_of_mentions -s | jq '.["items"] | length'

We retrieve each date and store all of them in a file called “dates.txt” by using the tee command:

%%bash

# Create a list of dates (YYYY-MM-DD) from each item record
# Show the first 10 lines
# Algorithm adapted from ChatGPT


# Create the associative array to store dates
dates=(); 

# Store the number of results retrieved
length=$(./frequency_of_mentions -s | jq '.["items"] | length'); 

for ((i = 0; i < length; i++)); do 

    # Retrieve the date for each result
    date=$(./frequency_of_mentions -s | jq ".items[$i].date")
    
    # Sleep delay
    sleep 1
    
    # Modify the data to be yyyy-mm-dd
    date=${date//\"/}
    date=$(date -d "${date}" "+%Y-%m-%d")
    
    # Add the 
    dates+=("$date")
        
    echo "${dates[$i]}"
    
done | tee "dates.txt" | head -n 5

We can also use the data to output the years and the frequency:

%%bash

# Read the dates in the file and count the number of times a year is repeated
# Algorithm adapted from ChatGPT

input_file="dates.txt"
if [ ! -f "$input_file" ]; then
    echo "Input file not found: $input_file"
    exit 1
fi
 
# Create an array to store year counts
declare -A year_count

# Read data from file
while read -r date; do
    year="${date%%-*}"
    
    ((year_count[$year]++))
done < "$input_file"

# Print the frequencies
for year in "${!year_count[@]}"; do
    count="${year_count[$year]}"
    echo "$year $count"
done | tee "frequencies.txt" | head -n 5

3. Sunday Comic Titles in the Age-herald#

The Age - Herald published comics every Sunday, we will try to extract the titles of those published on page 15 of the 17th October 1897 edition.

%cd ..

!mkdir title_extraction

%cd title_extraction

We reuse the api_data.c program from Step 1 and modify it to give us .txt outputs

%%file makefile

# Set the variable CC to gcc, which is used to build the program
CC=gcc

# Enable debugging information and enable all compiler warnings
CFLAGS=-g -Wall

# Set the bin variable as the name of the binary file we are creating
BIN=title_extract

# Create the binary file with the name we put
all: $(BIN)

# Map any file ending in .c to a binary executable. 
# "$<" represents the .c file and "$@" represents the target binary executable
%: %.c

	# Compile the .c file using the gcc compiler with the CFLAGS and links 
	# resulting binary with the CURL library
	$(CC) $(CFLAGS) $< -o $@ -lcurl

# Clean target which removes specific files
clean:

	# Remove the binary file and an ".dSYM" (debug symbols for debugging) directories
	# the RM command used -r to remove directories and -f to force delete
	$(RM) -rf $(BIN) *.dSYM

%%file title_extract.c

#include <curl/curl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* CURL program that extracts title from given id. 
Custom property fields can be added */

int main (int argc, char* argv[]) {
    
    // If arguments are invalid then return
    if (argc < 2) {                                                                                      
        printf("Error. Please try again correctly. (./title_extract -id [id])\n");
        return -1;
    }

    // Initialize the CURL HTTP connection
    CURL *curl = curl_easy_init();

    // Bits of the url that are joined together later                                                                      
    char api[] = "https://chroniclingamerica.loc.gov/";                            
    char url[1000];
    char label_1[] = "lccn/";
    char default_id[] = "sn84023600";

    // Check if CURL initialization is a success or not
    if (!curl) {                                                                                         
        fprintf(stderr, "init failed\n");
        return EXIT_FAILURE;
    }

    // Check if default id should be used
    if ((argc==2) && (strcmp(argv[1],"-id")==0)) {
        
        // Combine all the bits to produce a functioning url
        sprintf(url, "%s%s%s", api, label_1, default_id); 
        
    }
    
    // Check if the conditions match for using an id specified
    else if ((argc==3) && (strcmp(argv[1],"-id")==0)) {
        
        // Combine all the bits to produce a functioning url
        sprintf(url, "%s%s%s", api, label_1, argv[2]);                                              
    
    }
    
    // If the arguments are invalid then return
    else {                                                                                              
        curl_easy_cleanup(curl);
        return 0;
    }                                            

    // Set the url to which the HTTP request will be sent to
    // first parameter is for the initialized curl HTTP request, second for the option to be set, and third for the value to be set
    curl_easy_setopt(curl, CURLOPT_URL, url);

    // If result is not retrieved then output error
    CURLcode result = curl_easy_perform(curl);

    // If result is not retrieved then output error
    if (result != CURLE_OK) {                                                                            
        fprintf(stderr, "download problem: %s\n", curl_easy_strerror(result));
    }

    // Deallocate memory for the CURL connection
    curl_easy_cleanup(curl);                                                                            
    return EXIT_SUCCESS;
}

!make

gcc -g -Wall title_extract.c -o title_extract -lcurl

!./title_extract -id "sn86072192/1897-10-31/ed-1/seq-14/ocr.txt" | head -10

SONGS AND THEIR SINGERS.
V- —rm QBmAigb* ,-• ... *•** -j.
ih
” 'Tis hard to give the hand where the heart can never be!”
—Punch.
A SIMILE FAR FETCHED. A CHANGE OF HEART.
Priscilla is my Klondike girl, He—I think I shall have to preach
At least I call her so. a bicycle sermon tomorrow advis
There's gold in every straggling ing all my parishioners to ride a
•i- curl, wheel.

In order to extract the readable date from the text with random characters and non-interpretable characters, we create a bash script:

%%bash

# There is a lot of text here along with random characters and non-interpretable characters.
# Our approach here to get some of the titles will be to only keep 
# uppercase letters and lines that are at least 75% letters
# We use an IFS (Internal Field Seperator) to seperate the lines read by a newline
# Algorithm adapted from ChatGPT

input_text=$(./title_extract -id "sn86072192/1897-10-31/ed-1/seq-14/ocr.txt")
IFS=$'\n'

for line in $input_text
do
    line=$(echo "$line" | sed 's/[^A-Z]/ /g')
    spaces=$(echo "$line" | tr -cd ' ' | wc -c)
    size=${#line}
    letters=$((size - spaces))
    
    if ((letters * 4 >= size * 3))
    then
        echo "$line"
    fi
done

SONGS AND THEIR SINGERS 
A SIMILE FAR FETCHED  A CHANGE OF HEART 
THE PUG DOG PAPA S LAMENT 
TRUE UP TO A CERTAIN POINT  SURE TEST 
    SCORCHING AFTER NEW YORK S BICYCLE VOTE 
A HORSE SHOW SUGGESTION 
VAN WYCK ON ONE WHEEL  GEORGE IN A BROWN STUDY 
THE DOCTOR S MOTTO  SHREWDNESS NEEDED   X
HER REPUTATIONS 
THE FINAL CALL 
THE REPLY OF SPAIN 
LOW RIDES ERECT  GEN  TRACY S CLEVER DODGE 
WHY HE LIKED IT 
PAPA KNOWS 
AN EXCUSE 
NOT FOR HIM 
IN THE FIRELIGHT 
L WHERE NIGHTS LAST SIX MONTHS 
ALACK  ALACK 
MUCH THE SAME THING 
A KLONDIKER 
THE MAN WHO IS WEARING A DIAMOND RING FOR THE FIRST TIME 

4. Industrialization keywords frequency in the Birmingham Age-herald#

We will try to obtain the frequency of “Iron” on the front pages of the Birmingham Age- herald newspapers from the year 1900 to 1920 (limited to the first 75 rows for testing here).

%cd ..

!mkdir Industrialization_keywords

%cd Industrialization_keywords

We reuse the frequency_of_mentions.c program to obtain the frequency of the keywords:

%%file makefile

# Set the variable CC to gcc, which is used to build the program
CC=gcc

# Enable debugging information and enable all compiler warnings
CFLAGS=-g -Wall

# Set the bin variable as the name of the binary file we are creating
BIN=frequency_of_mentions

# Create the binary file with the name we put
all: $(BIN)

# Map any file ending in .c to a binary executable. 
# "$<" represents the .c file and "$@" represents the target binary executable
%: %.c

	# Compile the .c file using the gcc compiler with the CFLAGS and links 
	# resulting binary with the CURL library
	$(CC) $(CFLAGS) $< -o $@ -lcurl

# Clean target which removes specific files
clean:

	# Remove the binary file and an ".dSYM" (debug symbols for debugging) directories
	# the RM command used -r to remove directories and -f to force delete
	$(RM) -rf $(BIN) *.dSYM

%%file frequency_of_mentions.c

#include <curl/curl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* CURL program that retrieves frequency of mentions with added search id.
Custom property fields can be added */

int main (int argc, char* argv[]) {
    
    // If arguments are invalid then return
    if (argc < 2) {                                                                                      
        printf("Error. Please try again correctly. (./frequency_of_mentions -s [s])\n");
        return -1;
    }

    // Initialize the CURL HTTP connection
    CURL *curl = curl_easy_init();

    // Bits of the url that are joined together later                                                                      
    char api[] = "https://chroniclingamerica.loc.gov/";                            
    char url[1000];
    char default_id[] = "search/pages/results/?state=Alabama&proxtext=(University%20of%20Alabama)&rows=75&format=json";

    // Check if CURL initialization is a success or not
    if (!curl) {                                                                                         
        fprintf(stderr, "init failed\n");
        return EXIT_FAILURE;
    }

    // Check if default search id should be used
    if ((argc==2) && (strcmp(argv[1],"-s")==0)) {
        
        // Combine all the bits to produce a functioning url
        sprintf(url, "%s%s", api, default_id); 
        
    }
    
    // Check if the conditions match for using an id specified
    else if ((argc==3) && (strcmp(argv[1],"-s")==0)) {
        
        // Combine all the bits to produce a functioning url
        sprintf(url, "%s%s", api, argv[2]);                                              
    
    }
    
    // If the arguments are invalid then return
    else {                                                                                              
        curl_easy_cleanup(curl);
        return 0;
    }                                            

    // Set the url to which the HTTP request will be sent to
    // first parameter is for the initialized curl HTTP request, second for the option to be set, and third for the value to be set
    curl_easy_setopt(curl, CURLOPT_URL, url);

    // If result is not retrieved then output error
    CURLcode result = curl_easy_perform(curl);

    // If result is not retrieved then output error
    if (result != CURLE_OK) {                                                                            
        fprintf(stderr, "download problem: %s\n", curl_easy_strerror(result));
    }

    // Deallocate memory for the CURL connection
    curl_easy_cleanup(curl);                                                                            
    return EXIT_SUCCESS;
}

!make

gcc -g -Wall frequency_of_mentions.c -o frequency_of_mentions -lcurl

# Output not shown because it is too long
!./frequency_of_mentions -s "search/pages/results/?state=Alabama&lccn=sn85038485&dateFilterType=yearRange&date1=1900&date2=1920&sequence=1&andtext=Iron&rows=75&searchType=advanced&format=json" | jq '.["items"]'

%%bash

# Create a list of dates (YYYY-MM-DD) from each item record
dates=()

# Calculate the length of the dois
length=$(./frequency_of_mentions -s "search/pages/results/?state=Alabama&lccn=sn85038485&dateFilterType=yearRange&date1=1900&date2=1920&sequence=1&andtext=Iron&rows=75&searchType=advanced&format=json" | jq '.["items"] | length')

# Sleep delay
sleep 1

for ((i = 0; i < length; i++)); do 
    
    date=$(./frequency_of_mentions -s "search/pages/results/?state=Alabama&lccn=sn85038485&dateFilterType=yearRange&date1=1900&date2=1920&sequence=1&andtext=Iron&rows=75&searchType=advanced&format=json" | jq ".items[$i].date")
    
    # Sleep delay
    sleep 1
    
    date=${date//\"/}
    date=$(date -d "${date}" "+%Y-%m-%d")
    dates+=("$date")
    echo "${dates[$i]}"
    
done | tee "dates.txt" | head -n 5

!wc -l dates.txt

75 dates.txt

We can also use the data to output the years and the frequency and store them in a file:

%%bash

# Read the dates in the file and count the number of times a year is repeated
# useful for plotting graphs
# Algorithm adapted from ChatGPT

input_file="dates.txt"
if [ ! -f "$input_file" ]; then
    echo "Input file not found: $input_file"
    exit 1
fi
 
# Create an array to store year counts
declare -A year_count

# Read data from file
while read -r date; do
    year="${date%%-*}"
    
    ((year_count[$year]++))
done < "$input_file"

# Print the first 5 frequencies
for year in "${!year_count[@]}"; do
    count="${year_count[$year]}"
    echo "$year $count"
done | tee "frequencies.txt" | head -n 5