@jefftj86 made this script using the census data csv
I’ll let you know how the agent takes it. I’m taking yours and @mikeamancuso feedback and I’m pacing around refining my prompt to the agent. This one has to be surgical.
I’m hoping once I scaffold the counties the agent will be able to take the 50 states sites.
I may need to go further with the site collection scraping though and have Claude pull all of the counties direct links.
I’d prefer to do it once so that in 2026 this automatically updates with the actual site which makes pointing to the source of truth the real game changer for longevity.
I mean if a few change the url NBD I just don’t want to have to upload data again next year. I want this sucker to pull it automatically moving forward though.
I’m not planning to generate revenue from this, by the way. I’m actually going to use it for myself and my family members over several years, so it’s not just a learning project. I’m genuinely going to use this tool.
That said, I might want to self-host and run my own database when using my own real data. This way, Replit won’t be able to store and have access to all my information, haha. I think it could be useful for other gig workers though.
Regardless, I knew I needed this tool, and this was an excellent way to immerse myself in government data and discover the best ways to integrate it so that I can complete all my learning now.
I’m ranting so…
Here’s the Claude code and slop post:
Tomorrow I’m going to get back into it. Took the day off to get out “Slop Talk”
GigPrep County Ingest Script (ingest_counties.py)
Seeds the GigPrep SQLite database with all 50 US states + DC and their 3,143 counties, sourced directly from the Census Bureau FIPS file (united-states.xlsx).
What it does
- Reads
united-states.xlsx (3 columns: FIPS code, geographic name, Census RAE report URL)
- Splits rows automatically — short FIPS (1–2 digits) = state, long FIPS (4–5 digits) = county
- Creates two tables:
states (51 rows) and counties (3,143 rows)
- Stores the Census RAE report URL per county in
rae_url — ready to use as a per-county source of truth
- Idempotent — safe to re-run, duplicates are skipped via
INSERT OR IGNORE
- Works with SQLAlchemy (Flask stack) or falls back to raw
sqlite3
Requirements
pip install pandas openpyxl sqlalchemy
Usage
# Default — looks for united-states.xlsx in project root, writes to instance/app.db
python ingest_counties.py
# Custom paths
python ingest_counties.py --xlsx path/to/united-states.xlsx
python ingest_counties.py --xlsx united-states.xlsx --db sqlite:///instance/app.db
Database schema
states
| column |
type |
notes |
| id |
INTEGER PK |
|
| fips |
TEXT |
zero-padded, e.g. "01" |
| abbr |
TEXT |
e.g. "AL" |
| name |
TEXT |
e.g. "Alabama" |
| rae_url |
TEXT |
Census RAE state report link |
| created_at |
DATETIME |
|
counties
| column |
type |
notes |
| id |
INTEGER PK |
|
| fips_full |
TEXT UNIQUE |
5-digit, e.g. "01001" |
| fips_state |
TEXT |
e.g. "01" |
| fips_county |
TEXT |
e.g. "001" |
| name |
TEXT |
e.g. "Autauga County" |
| full_name |
TEXT |
e.g. "Autauga County, Alabama" |
| state_abbr |
TEXT |
e.g. "AL" |
| state_name |
TEXT |
e.g. "Alabama" |
| rae_url |
TEXT |
Census RAE county report link |
| created_at |
DATETIME |
|
Full script
"""
ingest_counties.py — GigPrep
================================
Reads united-states.xlsx (Census Bureau FIPS + RAE Report Links)
and seeds two tables into the GigPrep SQLite database:
states — 51 rows (50 states + DC)
counties — 3,143 rows
Source file columns:
FIPS State and County Codes | Geographic area name | RAE Report Links
State rows : FIPS is 1–2 digits (e.g. 1 → Alabama)
County rows : FIPS is 4–5 digits (e.g. 1001 → Autauga County, Alabama)
USAGE
-----
python ingest_counties.py
python ingest_counties.py --xlsx path/to/united-states.xlsx
python ingest_counties.py --xlsx united-states.xlsx --db sqlite:///instance/app.db
REQUIREMENTS
------------
pip install pandas openpyxl sqlalchemy
"""
import argparse
import os
import re
import sys
from datetime import datetime
import pandas as pd
# ---------------------------------------------------------------------------
# Optional SQLAlchemy — falls back to raw sqlite3
# ---------------------------------------------------------------------------
try:
from sqlalchemy import (
Column, Integer, String, DateTime, UniqueConstraint,
create_engine,
)
from sqlalchemy.orm import DeclarativeBase, Session
SQLALCHEMY = True
except ImportError:
import sqlite3
SQLALCHEMY = False
# ---------------------------------------------------------------------------
# Defaults
# ---------------------------------------------------------------------------
DEFAULT_XLSX = "united-states.xlsx"
DEFAULT_DB = os.environ.get("DATABASE_URL", "sqlite:///instance/app.db")
# ---------------------------------------------------------------------------
# FIPS → (abbreviation, full state name)
# ---------------------------------------------------------------------------
FIPS_TO_STATE = {
"01": ("AL", "Alabama"), "02": ("AK", "Alaska"),
"04": ("AZ", "Arizona"), "05": ("AR", "Arkansas"),
"06": ("CA", "California"), "08": ("CO", "Colorado"),
"09": ("CT", "Connecticut"), "10": ("DE", "Delaware"),
"11": ("DC", "District of Columbia"),
"12": ("FL", "Florida"), "13": ("GA", "Georgia"),
"15": ("HI", "Hawaii"), "16": ("ID", "Idaho"),
"17": ("IL", "Illinois"), "18": ("IN", "Indiana"),
"19": ("IA", "Iowa"), "20": ("KS", "Kansas"),
"21": ("KY", "Kentucky"), "22": ("LA", "Louisiana"),
"23": ("ME", "Maine"), "24": ("MD", "Maryland"),
"25": ("MA", "Massachusetts"), "26": ("MI", "Michigan"),
"27": ("MN", "Minnesota"), "28": ("MS", "Mississippi"),
"29": ("MO", "Missouri"), "30": ("MT", "Montana"),
"31": ("NE", "Nebraska"), "32": ("NV", "Nevada"),
"33": ("NH", "New Hampshire"), "34": ("NJ", "New Jersey"),
"35": ("NM", "New Mexico"), "36": ("NY", "New York"),
"37": ("NC", "North Carolina"), "38": ("ND", "North Dakota"),
"39": ("OH", "Ohio"), "40": ("OK", "Oklahoma"),
"41": ("OR", "Oregon"), "42": ("PA", "Pennsylvania"),
"44": ("RI", "Rhode Island"), "45": ("SC", "South Carolina"),
"46": ("SD", "South Dakota"), "47": ("TN", "Tennessee"),
"48": ("TX", "Texas"), "49": ("UT", "Utah"),
"50": ("VT", "Vermont"), "51": ("VA", "Virginia"),
"53": ("WA", "Washington"), "54": ("WV", "West Virginia"),
"55": ("WI", "Wisconsin"), "56": ("WY", "Wyoming"),
}
# ===========================================================================
# SCHEMA (SQLAlchemy)
# ===========================================================================
if SQLALCHEMY:
class Base(DeclarativeBase):
pass
class State(Base):
__tablename__ = "states"
id = Column(Integer, primary_key=True)
fips = Column(String(2), nullable=False, unique=True)
abbr = Column(String(2), nullable=False, unique=True)
name = Column(String(64), nullable=False, unique=True)
rae_url = Column(String(512))
created_at = Column(DateTime, default=datetime.utcnow)
class County(Base):
__tablename__ = "counties"
id = Column(Integer, primary_key=True)
fips_full = Column(String(5), nullable=False, unique=True) # "01001"
fips_state = Column(String(2), nullable=False) # "01"
fips_county = Column(String(3), nullable=False) # "001"
name = Column(String(128), nullable=False) # "Autauga County"
full_name = Column(String(192), nullable=False) # "Autauga County, Alabama"
state_abbr = Column(String(2), nullable=False)
state_name = Column(String(64), nullable=False)
rae_url = Column(String(512))
created_at = Column(DateTime, default=datetime.utcnow)
__table_args__ = (
UniqueConstraint("fips_full", name="uq_county_fips"),
)
# ===========================================================================
# PARSE XLSX
# ===========================================================================
def load_xlsx(path: str) -> tuple[list[dict], list[dict]]:
"""
Returns (state_records, county_records) parsed from the Census xlsx.
State rows have FIPS length <= 2 (e.g. "1", "2" ... "56").
County rows have FIPS length >= 4 (e.g. "1001", "56045").
"""
df = pd.read_excel(path, dtype=str)
df.columns = ["fips_raw", "geo_name", "rae_url"]
df = df.fillna("")
df["fips_raw"] = df["fips_raw"].str.strip()
states: list[dict] = []
counties: list[dict] = []
for _, row in df.iterrows():
raw = row["fips_raw"]
geo_name = row["geo_name"].strip()
rae_url = row["rae_url"].strip()
is_state = len(raw) <= 2
if is_state:
fips_s = raw.zfill(2)
abbr, full_name = FIPS_TO_STATE.get(fips_s, ("??", geo_name))
states.append({
"fips": fips_s,
"abbr": abbr,
"name": full_name,
"rae_url": rae_url,
})
else:
fips_padded = raw.zfill(5)
fips_state = fips_padded[:2]
fips_county = fips_padded[2:]
abbr, s_name = FIPS_TO_STATE.get(fips_state, ("??", "Unknown"))
# Strip ", StateName" suffix to get just the county name
county_name = re.sub(r",\s*.+$", "", geo_name).strip()
counties.append({
"fips_full": fips_padded,
"fips_state": fips_state,
"fips_county": fips_county,
"name": county_name,
"full_name": geo_name,
"state_abbr": abbr,
"state_name": s_name,
"rae_url": rae_url,
})
return states, counties
# ===========================================================================
# INGEST — SQLAlchemy
# ===========================================================================
def ingest_sqlalchemy(states: list[dict], counties: list[dict], db_url: str) -> None:
print(f"[ingest] SQLAlchemy → {db_url}")
if db_url.startswith("sqlite:///"):
db_file = db_url[len("sqlite:///"):]
os.makedirs(os.path.dirname(db_file) or ".", exist_ok=True)
engine = create_engine(db_url, echo=False)
Base.metadata.create_all(engine)
with Session(engine) as session:
s_inserted = s_skipped = 0
for rec in states:
if session.query(State).filter_by(fips=rec["fips"]).first():
s_skipped += 1
continue
session.add(State(**rec))
s_inserted += 1
session.commit()
print(f"[ingest] States — inserted: {s_inserted} skipped: {s_skipped}")
c_inserted = c_skipped = 0
for rec in counties:
if session.query(County).filter_by(fips_full=rec["fips_full"]).first():
c_skipped += 1
continue
session.add(County(**rec))
c_inserted += 1
if c_inserted % 500 == 0:
session.commit()
print(f"[ingest] … {c_inserted} counties committed")
session.commit()
print(f"[ingest] Counties — inserted: {c_inserted} skipped: {c_skipped}")
# ===========================================================================
# INGEST — raw sqlite3 fallback
# ===========================================================================
def ingest_sqlite3(states: list[dict], counties: list[dict], db_path: str) -> None:
import sqlite3 as _sqlite3
print(f"[ingest] sqlite3 → {db_path}")
os.makedirs(os.path.dirname(db_path) or ".", exist_ok=True)
con = _sqlite3.connect(db_path)
cur = con.cursor()
cur.executescript("""
CREATE TABLE IF NOT EXISTS states (
id INTEGER PRIMARY KEY AUTOINCREMENT,
fips TEXT NOT NULL UNIQUE,
abbr TEXT NOT NULL UNIQUE,
name TEXT NOT NULL UNIQUE,
rae_url TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS counties (
id INTEGER PRIMARY KEY AUTOINCREMENT,
fips_full TEXT NOT NULL UNIQUE,
fips_state TEXT NOT NULL,
fips_county TEXT NOT NULL,
name TEXT NOT NULL,
full_name TEXT NOT NULL,
state_abbr TEXT NOT NULL,
state_name TEXT NOT NULL,
rae_url TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
""")
con.commit()
s_inserted = s_skipped = 0
for rec in states:
cur.execute(
"INSERT OR IGNORE INTO states (fips, abbr, name, rae_url) VALUES (?,?,?,?)",
(rec["fips"], rec["abbr"], rec["name"], rec["rae_url"])
)
s_inserted += cur.rowcount
s_skipped += 1 - cur.rowcount
con.commit()
print(f"[ingest] States — inserted: {s_inserted} skipped: {s_skipped}")
c_inserted = c_skipped = 0
for rec in counties:
cur.execute(
"""INSERT OR IGNORE INTO counties
(fips_full, fips_state, fips_county, name, full_name,
state_abbr, state_name, rae_url)
VALUES (?,?,?,?,?,?,?,?)""",
(rec["fips_full"], rec["fips_state"], rec["fips_county"],
rec["name"], rec["full_name"], rec["state_abbr"],
rec["state_name"], rec["rae_url"])
)
c_inserted += cur.rowcount
c_skipped += 1 - cur.rowcount
if c_inserted % 500 == 0 and c_inserted:
con.commit()
print(f"[ingest] … {c_inserted} counties committed")
con.commit()
con.close()
print(f"[ingest] Counties — inserted: {c_inserted} skipped: {c_skipped}")
# ===========================================================================
# ENTRY POINT
# ===========================================================================
def main():
parser = argparse.ArgumentParser(
description="Seed GigPrep states + counties from Census XLSX."
)
parser.add_argument("--xlsx", default=DEFAULT_XLSX,
help=f"Path to united-states.xlsx (default: {DEFAULT_XLSX})")
parser.add_argument("--db", default=DEFAULT_DB,
help=f"SQLAlchemy DB URL or sqlite path (default: {DEFAULT_DB})")
args = parser.parse_args()
if not os.path.exists(args.xlsx):
sys.exit(
f"[ingest] ERROR: File not found — {args.xlsx}\n"
f" Place united-states.xlsx in the project root and retry."
)
print(f"[ingest] Reading {args.xlsx} …")
states, counties = load_xlsx(args.xlsx)
print(f"[ingest] Parsed {len(states)} states + {len(counties)} counties")
if SQLALCHEMY:
ingest_sqlalchemy(states, counties, args.db)
else:
db_path = args.db.replace("sqlite:///", "")
ingest_sqlite3(states, counties, db_path)
print("\n[ingest] ✓ Complete.")
print(" Tables ready : states, counties")
print(" rae_url field: Census RAE report link per county — use as source of truth")
print(" Next step : point the Replit agent at rae_url for per-county data lookups")
if __name__ == "__main__":
main()
Expected output
[ingest] Reading united-states.xlsx …
[ingest] Parsed 51 states + 3143 counties
[ingest] SQLAlchemy → sqlite:///instance/app.db
[ingest] States — inserted: 51 skipped: 0
[ingest] … 500 counties committed
[ingest] … 1000 counties committed
[ingest] … 1500 counties committed
[ingest] … 2000 counties committed
[ingest] … 2500 counties committed
[ingest] … 3000 counties committed
[ingest] Counties — inserted: 3143 skipped: 0
[ingest] ✓ Complete.
Tables ready : states, counties
rae_url field: Census RAE report link per county — use as source of truth
Next step : point the Replit agent at rae_url for per-county data lookups