TXLookup · Civic-data agent · v0.1

Texas civic data, in plain English.

6,061 Texas datasets indexed across 6 open-data portals — Austin, Dallas, San Antonio, Houston, TX state. 11 are deeply curated (full schema, locally mirrored). The rest are answered live: an agent reads catalog metadata, plans a query, runs it on the source-of-truth portal. A smart layer over public data — every claim citable, every step replayable.

Corpus · indexed
6.1k datasets
3 active · 6 portals

The motivation

Texas publishes everything.
Hard to navigate, until now.

The state and its cities run six open-data portals. Together they expose 6,061 datasets covering permits, inspections, 311 calls, code violations, traffic fatalities, franchise tax, contracts, library checkouts — millions of rows refreshed daily. All of it public. To use it directly you have to write SoQL by hand against six different APIs.

Six portals

6

different APIs

Austin runs Socrata. San Antonio runs CKAN. Houston runs CKAN. Dallas runs Socrata. Different IDs, different conventions, different filters.

Schema drift

180+

columns just for permits

Each dataset has its own column names, types, code values. permittype vs work_class vs permit_class_mapped — same idea, three columns, three meanings.

SoQL syntax

Brutal

to hand-write

$select, $where, $group, $order, $limit, date_extract_y, double-quoting strings, escaping single quotes. One typo and the whole query 400s.

Download + sift

Hours

of CSV manual review

The current path: download a 200,000-row CSV, open it in a spreadsheet, filter by hand, hope you didn't miss a column. Most people give up before getting to an answer.

A team of OpenAI-powered agents stands between you and 6,061 datasets. If you can search Google or read a news article, you can ask Texas civic data anything. The planner picks the dataset; the analyst writes the SoQL; the reporter composes the answer; the critic verifies citations. Same data the experts use — now reachable in plain English.

Local mirror · refreshed every 6h

9 curated, locally mirrored. The other 6,052 answered live.

The 9 datasets behind these tiles are mirrored to a local SQLite store every 6 hours by an autonomous ingestor cron. Pages render from the mirror in milliseconds and survive upstream throttling. The remaining 6,061-dataset catalog across 6 portals is queried on demand — each tile shows a freshness badge so the source is never ambiguous.

Corpus · indexed
6.1k datasets
3 active · 6 portals

Austin · top inspection zip · 30d

ecmv-9xxi

Live · just now

Dallas · 311 requests · 30d

643

gc4d-8a49

Live · just now

TX · active franchise permits

9cir-efmm

Mirror · 3d ago

Austin · 311 requests · 30d

25,328

xwdj-i9he

Live · just now

Dallas · police active calls

63

9fxf-t2tr

Live · just now

Austin · open code violations

3,397

6wtj-zbtb

Live · just now

Austin permits · 7-day pulse

+1,217

Improvement flywheel

Five agents.
One sourced answer.

The orchestrator dispatches parallel queries. The critic catches a window bug. The reporter composes the answer. The citation locks in. A real run, looped on autoplay.

5
agents
1
self-correct
7.3s
end-to-end

Live replay · marquee question

cycle: 0.00s / 7.4s

Orchestrator0.00s

reason: parsing · domain=permits geo=78704 window=2024-Q4

The selling point

Any dataset. Any portal. Knowledge in 24 hours.

The scout + ingestor + multi-agent loop is portable. Texas is the demo corpus — the same pipeline ingests Chicago, NYC, federal data.gov, anywhere with a Socrata-compatible API.

Use as agent

Install in 30 seconds.

MCP server + agent skill. Drops into Claude Code, Codex, Cursor. Bounded queries, citation enforced.

~/txlookup · install
# 1. install in claude code
$ claude mcp add txlookup -- python -m mcp.server

# 2. ask
$ claude
> use txlookup: food truck permits 78702 last 6 months

# 3. answer with citation
→ count by month, % change vs prior 6mo
→ cite: dataset_id · portal_url · age_seconds

8 tools · 5,000-row cap · 30s timeout · backoff on 429 · citation enforced