Architecture

Datasets

What the curated catalog covers.

The dataset catalog lives at config/datasets.yaml. It is the source of truth for discover_datasets. The frontend mirror lives at app/lib/catalog.ts.

Section

Coverage today

Austin (data.austintexas.gov)

DatasetSocrata IDUpdated
Issued Construction Permits3syk-w9eudaily
Food Establishment Inspection Scoresecmv-9xxiweekly
Austin 311 Public Dataxwdj-i9hedaily
Austin Code Complaint Cases6wtj-zbtbdaily
Crime Reportsfdj4-gpfuweekly
Austin Crash Report Datay2wy-tgr5monthly

Texas (data.texas.gov)

DatasetSocrata IDUpdated
Active Franchise Tax Permit Holders9cir-efmmquarterly
Texas State Expenditures2zpi-yjjsmonthly
Mixed Beverage Gross Receiptsnaix-2893monthly

The skill is also scoped to Dallas, San Antonio, and Houston open-data portals — those entries are added to the catalog as the team verifies field names against the live APIs.

Section

Adding a dataset

  1. Add an entry to config/datasets.yaml under the right city.
  2. Document id, name, key_columns, updated cadence, and any sensitivity flags.
  3. Verify field names against the live API:
curl https://<portal>/api/views/<id>.json | jq '.columns[].fieldName'
  1. The MCP server picks it up on next reload — no code change required.

Section

Adding a city portal

  1. Add a new top-level key in config/datasets.yaml with its portal host.
  2. Confirm the portal speaks SODA (most TX cities do).
  3. Add a smoke-test query in tests/smoke/.

Section

Sources we use

SourceURLAPI
Texas Open Datadata.texas.govSocrata SODA
Austin Open Datadata.austintexas.govSocrata SODA
Dallas Open Datadallasopendata.comSocrata SODA
San Antonio Open Datadata.sanantonio.govSocrata SODA
Houston Open Datadata.houstontx.govSocrata SODA
TX Secretary of Statesos.state.tx.usWeb scraping (Playwright)
TX Comptrollercomptroller.texas.govCSV/Excel downloads

Section

Field-name gotchas

A few that have bitten us — documented in config/datasets.yaml:

  • Austin code violations: status is Active | Closed | Pending, not Open.
  • Austin crime reports: date field is occ_date, not occurred_date. Use category_description for text labels.
  • Austin crashes: crash_fatal_fl is the string 'true'/'false', not a boolean.
  • TX state expenditures: no county field — use agency_name + major_spending_category + amount.
  • TX mixed beverage: no location_city — use taxpayer_city.

Always call get_dataset_schema before composing a non-trivial query.