U.S. Census Geocoding API in Python#

by Michael T. Moen

U.S. Census Geocoding API documentation: https://geocoding.geo.census.gov/geocoder/Geocoding_Services_API.html

U.S. Census Bureau APIs terms of use: https://www.census.gov/data/developers/about/terms-of-service.html

This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.

These recipe examples were tested on June 6, 2024.

Import Libraries#

This tutorial uses the following libraries:

import requests             # Manages API requests
import csv                  # Facilitates reading and writing to CSV files    
from pprint import pprint   # Formats code outputs

1. Address Lookup#

One of the main use cases of this API is finding the latitude and longitude of an address. In this example, we find the latitude and longitude of the Bruno Business Library at the University of Alabama.

The API allows searching through two methods: address and onelineaddress. These methods are nearly identical, with the only difference being the format of the parameters passed to API.

2. Batch Address Lookup#

The U.S. Census Geocoding API also allows for batch geocoding with the submission of a CSV, TXT, DAT, XLS, or XLSX file. These files must be formatted with one record per line, where each record must be formatted as followed: Unique ID, Street address, City, State, ZIP. Users are limited to 10,000 records per batch file.

This example uses the CSV file created below:

# Create list of addresses for the batch lookup
# Note that each record must begin with a unique ID
addresses = []
addresses.append(['1', '425 Stadium Dr', 'Tuscaloosa', 'AL', '35401'])
addresses.append(['2', '1600 Pennsylvania Avenue NW', 'Washington', 'DC', '20500'])
addresses.append(['3', '350 Fifth Avenue', 'New York', 'NY', '10118'])
addresses.append(['4', '660 Cannery Row', 'Monterey', 'CA', '93940'])
addresses.append(['5', '700 Clark Ave', 'St. Louis', 'MO', '63102'])

# Export addresses to a CSV file
input_filename = 'batch_addresses.csv'
with open(input_filename, 'w', newline='') as f:
    csv_writer = csv.writer(f)
    csv_writer.writerows(addresses)
# Format parameters needed for POST request
return_type = 'locations'
parameters = {
    'benchmark' : 'Public_AR_Current'
}
files = {
    'addressFile': open(input_filename, "rb")
}

url = f'https://geocoding.geo.census.gov/geocoder/{return_type}/addressbatch'
response = requests.post(url, data=parameters, files=files)

# Status code of 200 indicates success
response.status_code
200
# Save content of response to a new CSV
output_filename = 'geocoded_addresses.csv'
with open(output_filename, 'wb') as f:
    f.write(response.content)

# Printing contents of CSV for demonstation purposes
with open(output_filename, newline='') as f:
    csv_reader = csv.reader(f)
    for row in csv_reader:
        print(row)
['1', '425 Stadium Dr, Tuscaloosa, AL, 35401', 'Match', 'Exact', '425 STADIUM DR, TUSCALOOSA, AL, 35401', '-87.54970041625677,33.211054033780556', '636109874', 'L']
['2', '1600 Pennsylvania Avenue NW, Washington, DC, 20500', 'Match', 'Exact', '1600 PENNSYLVANIA AVE NW, WASHINGTON, DC, 20500', '-77.03654395730786,38.89869091865552', '76225813', 'L']
['3', '350 Fifth Avenue, New York, NY, 10118', 'Match', 'Exact', '350 5TH AVE, NEW YORK, NY, 10118', '-73.98507715289111,40.747848600317354', '59653473', 'L']
['4', '660 Cannery Row, Monterey, CA, 93940', 'Match', 'Exact', '660 CANNERY ROW, MONTEREY, CA, 93940', '-121.90128030457356,36.617235842516266', '647390330', 'R']
['5', '700 Clark Ave, St. Louis, MO, 63102', 'Match', 'Non_Exact', '700 CLARK AVE, SAINT LOUIS, MO, 63119', '-90.34036943803642,38.60242241714883', '100141071', 'R']

Note that the last two columns of the above data are the TIGER/Line ID and TIGER/Line Side. For more information on these values, please see the U.S. Census TIGER/Line Geodatabase Documentation. However, this tutorial does not utilize any TIGER/Line data.

3. Retrieving Additional Geographic Data#

The geographies return type allows for the retrieval of additional data associated for a given address or set of coordinates. The example below retrieves this data using the address of the Bruno Business Library at the University of Alabama.

Note that the geographies return type requires the vintage parameter to be specified.

Users may additionally include the layers parameter, which determines the types of geography data returned. For a list of all layers, see here.

return_type = 'geographies'
search_type = 'address'

parameters = '&'.join([

    # Specify the address to lookup with the following parameters
    'street=425 Stadium Dr',
    'city=Tuscaloosa',
    'state=AL',
    'zip=35401',

    # Specify the version of the locator to be searched
    'benchmark=Public_AR_Current',

    # Specify the vintage
    'vintage=Current_Current',

    # Specify what categories of geographic data to retrieve
    'layers=all',

    # Specify that data should be returned in JSON format
    'format=json'
])

url = f'https://geocoding.geo.census.gov/geocoder/{return_type}/{search_type}?{parameters}'
response = requests.get(url)

# Status code of 200 indicates success
response.status_code
200

Note that the geographies return type returns all of the data that the locations return type does in addition to the geographies data.

pprint(response.json()['result']['addressMatches'][0], depth=1)
{'addressComponents': {...},
 'coordinates': {...},
 'geographies': {...},
 'matchedAddress': '425 STADIUM DR, TUSCALOOSA, AL, 35401',
 'tigerLine': {...}}

The geographies data contains the following categories:

pprint(response.json()['result']['addressMatches'][0]['geographies'], depth=1)
{'118th Congressional Districts': [...],
 '2020 Census Blocks': [...],
 '2020 Census Public Use Microdata Areas': [...],
 '2020 Census ZIP Code Tabulation Areas': [...],
 '2022 State Legislative Districts - Lower': [...],
 '2022 State Legislative Districts - Upper': [...],
 'Census Block Groups': [...],
 'Census Divisions': [...],
 'Census Regions': [...],
 'Census Tracts': [...],
 'Counties': [...],
 'County Subdivisions': [...],
 'Incorporated Places': [...],
 'Metropolitan Statistical Areas': [...],
 'States': [...],
 'Unified School Districts': [...],
 'Urban Areas': [...]}

As an example, this is how the Counties data is formatted.

response.json()['result']['addressMatches'][0]['geographies']['Counties']
[{'GEOID': '01125',
  'CENTLAT': '+33.2894031',
  'AREAWATER': 78703449,
  'STATE': '01',
  'BASENAME': 'Tuscaloosa',
  'OID': '2759075608325',
  'LSADC': '06',
  'FUNCSTAT': 'A',
  'INTPTLAT': '+33.2902197',
  'NAME': 'Tuscaloosa County',
  'OBJECTID': 598,
  'CENTLON': '-087.5250366',
  'COUNTYCC': 'H1',
  'COUNTYNS': '00161588',
  'AREALAND': 3420980038,
  'INTPTLON': '-087.5227834',
  'MTFCC': 'G4020',
  'COUNTY': '125'}]