Architecture
Datasets
What the curated catalog covers.
The dataset catalog lives at config/datasets.yaml. It is the source of truth for discover_datasets. The frontend mirror lives at app/lib/catalog.ts.
Section
Coverage today
Austin (data.austintexas.gov)
| Dataset | Socrata ID | Updated |
|---|---|---|
| Issued Construction Permits | 3syk-w9eu | daily |
| Food Establishment Inspection Scores | ecmv-9xxi | weekly |
| Austin 311 Public Data | xwdj-i9he | daily |
| Austin Code Complaint Cases | 6wtj-zbtb | daily |
| Crime Reports | fdj4-gpfu | weekly |
| Austin Crash Report Data | y2wy-tgr5 | monthly |
Texas (data.texas.gov)
| Dataset | Socrata ID | Updated |
|---|---|---|
| Active Franchise Tax Permit Holders | 9cir-efmm | quarterly |
| Texas State Expenditures | 2zpi-yjjs | monthly |
| Mixed Beverage Gross Receipts | naix-2893 | monthly |
The skill is also scoped to Dallas, San Antonio, and Houston open-data portals — those entries are added to the catalog as the team verifies field names against the live APIs.
Section
Adding a dataset
- Add an entry to
config/datasets.yamlunder the right city. - Document
id,name,key_columns,updatedcadence, and any sensitivity flags. - Verify field names against the live API:
curl https://<portal>/api/views/<id>.json | jq '.columns[].fieldName'
- The MCP server picks it up on next reload — no code change required.
Section
Adding a city portal
- Add a new top-level key in
config/datasets.yamlwith itsportalhost. - Confirm the portal speaks SODA (most TX cities do).
- Add a smoke-test query in
tests/smoke/.
Section
Sources we use
| Source | URL | API |
|---|---|---|
| Texas Open Data | data.texas.gov | Socrata SODA |
| Austin Open Data | data.austintexas.gov | Socrata SODA |
| Dallas Open Data | dallasopendata.com | Socrata SODA |
| San Antonio Open Data | data.sanantonio.gov | Socrata SODA |
| Houston Open Data | data.houstontx.gov | Socrata SODA |
| TX Secretary of State | sos.state.tx.us | Web scraping (Playwright) |
| TX Comptroller | comptroller.texas.gov | CSV/Excel downloads |
Section
Field-name gotchas
A few that have bitten us — documented in config/datasets.yaml:
- Austin code violations: status is
Active | Closed | Pending, notOpen. - Austin crime reports: date field is
occ_date, notoccurred_date. Usecategory_descriptionfor text labels. - Austin crashes:
crash_fatal_flis the string'true'/'false', not a boolean. - TX state expenditures: no
countyfield — useagency_name+major_spending_category+amount. - TX mixed beverage: no
location_city— usetaxpayer_city.
Always call get_dataset_schema before composing a non-trivial query.