Architecture · the whole system
How the system fits together.
TXLookup is one Codex-driven multi-agent loop surfaced through six flows. All six share a typed plan/dispatch contract, a single skill policy, the same bounded Socrata + CKAN client, and a local-mirror resilience layer. The diagram below shows the layers; the cards walk through each flow end-to-end.
6,061
Datasets indexed
6 portals · Socrata + CKAN
11
Deeply curated
Schema + cached rows + glossary
7
Specialists
5 in /q loop · 2 scheduled crons
8
MCP tools
Installable in Claude Code, Cursor, Codex
Why agents
Each agent does the work you used to do by hand.
The legacy path was: visit a portal, find a dataset, download a 200k-row CSV, open it in a spreadsheet, filter by hand, build a chart. TXLookup replaces every step with a specialist agent powered by OpenAI Codex / GPT-4o. You ask in plain English. The agents do the rest.
Planner
What you used to do
Pick the right dataset, learn its schema, write SoQL.
What this agent does
Reads catalog metadata, picks the dataset, drafts a structured plan with bounded tool calls.
Data analyst
What you used to do
Hand-write group-by + window math + null handling.
What this agent does
Runs the bounded query, computes deltas / top-N / YoY with quality flags (null rate, top concentration, sample factor).
Reporter
What you used to do
Skim the spreadsheet, paraphrase, hope you didn't misread.
What this agent does
Composes a plain-English answer grounded in the analyst's findings — no hallucinated numbers.
Critic
What you used to do
Hope the answer is right. No way to check.
What this agent does
Reviews plan + answer for groundedness, scope, citation. Forces a corrective revision on reject.
Support
What you used to do
Re-Google when a column name is unfamiliar.
What this agent does
Handles meta-questions and disambiguation ("south austin" → which zip?). No SoQL fired.
Scout + ingestor
What you used to do
Notice when a portal added a new dataset. (Most people don't.)
What this agent does
Cron-driven. Scout indexes new portal datasets every 6h. Ingestor refreshes the local-mirror cache so pages stay fast and survive throttling.
The user-facing change: if you can search Google or read a news article, you can ask civic data a question. Same data the experts use — reachable from a single search box, with citations on every claim.
Seven layers, top to bottom.
Markdown source: docs/architecture.md
Six flows, one agent.
User asks a question (live agent)
- 01Browser POST /api/agent { query }
- 02Server SSE stream opens (text/event-stream)
- 03phase=reasoning — Codex parses intent
- 04phase=planning — Codex returns structured Plan { intent, steps[] }
- 05phase=executing — for each step, dispatch tool
- 06phase=replanning (if step fails ≤2 times) — Codex emits a new Plan
- 07phase=completing — Codex synthesizes final answer
- 08phase=done — answer + citation + artifacts streamed back
Browse a dataset
- 01Server component renders at request time (revalidate 600s)
- 02Promise.all fetches /api/views/{id}.json + /resource/{id}.json?$limit=5
- 03Schema columns + sample rows + last refresh rendered as static HTML
- 04Scoped 'ask about this dataset' search submits back to /q with dataset=<id>
Live homepage stats
- 01Server-render at request time (revalidate 300s)
- 02Promise.all fans out 5+ Socrata queries:
- 03 • Austin permits last 7 days (group by day)
- 04 • Austin permits 7d total
- 05 • Top inspection zip last 30 days
- 06 • 311 requests last 30 days
- 07 • Open code violations
- 08 • Per-dataset metadata for the cards
- 09Sparkline + ticker render with real numbers
Cache-resilience layer (the local mirror)
- 01GitHub Actions ingestor cron fires every 6h
- 02Pulls 5,000 most-recent rows per curated dataset to JSON
- 03Commits data/cache/*.json (~5 MB total) to main
- 04Vercel build bundles cache files into every serverless function
- 05Reader: try cache → on miss, hit live Socrata → on 429/5xx, fall back to stale cache with caveat
- 06Each visible stat tile carries a freshness badge (Mirror · Nh ago / Live · just now)
External agent installs TXLookup
- 01Developer runs claude/codex/cursor mcp add against mcp/server.py
- 02FastMCP advertises 8 tools (ask_data, discover_datasets, get_dataset_schema, fetch_data, get_task_status, create_miro_board, add_to_miro, list_known_tools)
- 03Skill doc (skills/txlookup/SKILL.md) teaches the runtime when to call each
- 04Tool calls land at the same data layer (agent/tools/data.py)
- 05Citations enforced — every reply includes portal + dataset_id + last_refreshed
Agent-to-agent (A2A) — render to Miro
- 01Planner emits create_miro_board for visualizable answers
- 02Executor calls Miro REST API with title + summary + records
- 03Miro returns board_id + view_link
- 04View link surfaced as an artifact alongside the answer
- 05Judge clicks → opens the live, persistent Miro board
Companion docs.
How it works
End-to-end live trace of the marquee question — every SSE event, every tool call, every Socrata response.
Read on GitHub →
Agents strategy
Codex's five distinct roles in the loop, and the explicit 'why this isn't a wrapper' framing for the Agents Track.
Read on GitHub →
Agent skill
The deliverable agent skill — when to invoke TXLookup, which tool to pick, the safety rules, worked examples.
Read on GitHub →
Integration guide
Install the MCP server in Claude Code / Codex / standalone; full tool catalog with examples.
Read on GitHub →