Architecture · the whole system

How the system fits together.

TXLookup is one Codex-driven multi-agent loop surfaced through six flows. All six share a typed plan/dispatch contract, a single skill policy, the same bounded Socrata + CKAN client, and a local-mirror resilience layer. The diagram below shows the layers; the cards walk through each flow end-to-end.

6,061

Datasets indexed

6 portals · Socrata + CKAN

11

Deeply curated

Schema + cached rows + glossary

7

Specialists

5 in /q loop · 2 scheduled crons

8

MCP tools

Installable in Claude Code, Cursor, Codex

Why agents

Each agent does the work you used to do by hand.

The legacy path was: visit a portal, find a dataset, download a 200k-row CSV, open it in a spreadsheet, filter by hand, build a chart. TXLookup replaces every step with a specialist agent powered by OpenAI Codex / GPT-4o. You ask in plain English. The agents do the rest.

  • Planner

    What you used to do

    Pick the right dataset, learn its schema, write SoQL.

    What this agent does

    Reads catalog metadata, picks the dataset, drafts a structured plan with bounded tool calls.

  • Data analyst

    What you used to do

    Hand-write group-by + window math + null handling.

    What this agent does

    Runs the bounded query, computes deltas / top-N / YoY with quality flags (null rate, top concentration, sample factor).

  • Reporter

    What you used to do

    Skim the spreadsheet, paraphrase, hope you didn't misread.

    What this agent does

    Composes a plain-English answer grounded in the analyst's findings — no hallucinated numbers.

  • Critic

    What you used to do

    Hope the answer is right. No way to check.

    What this agent does

    Reviews plan + answer for groundedness, scope, citation. Forces a corrective revision on reject.

  • Support

    What you used to do

    Re-Google when a column name is unfamiliar.

    What this agent does

    Handles meta-questions and disambiguation ("south austin" → which zip?). No SoQL fired.

  • Scout + ingestor

    What you used to do

    Notice when a portal added a new dataset. (Most people don't.)

    What this agent does

    Cron-driven. Scout indexes new portal datasets every 6h. Ingestor refreshes the local-mirror cache so pages stay fast and survive throttling.

The user-facing change: if you can search Google or read a news article, you can ask civic data a question. Same data the experts use — reachable from a single search box, with citations on every claim.

Seven layers, top to bottom.

01User surface
BrowserMCP-client (Claude Code, Codex, Cursor)
02Edge
Vercel — / · /q · /chat · /datasets · /reports · /sources · /api/agent (SSE)
03Agent loop
7 specialists (planner · analyst · reporter · support · critic · scout · ingestor)Doom-loop guardReplannerSynthesizer
04Tool dispatch
8 MCP tools · discover · describe · fetch · summarize · cite · status · miro_create · miro_add
05Data + I/O
Socrata SODA (Austin / Austin Hub / Dallas / TX state)CKAN (San Antonio / Houston)data/cache/*.json local mirrorMiro REST API
06Resilience
cache → live → stale-cache → error chain5,000-row cap · 30s timeout · 429 backofffreshness badge per visible stat
07Bound + safety
Skill doc · citation enforced · doom-loop pattern detection · replan preserves intent

Markdown source: docs/architecture.md

Six flows, one agent.

Flow 01/q?q=…

User asks a question (live agent)

  1. 01Browser POST /api/agent { query }
  2. 02Server SSE stream opens (text/event-stream)
  3. 03phase=reasoning — Codex parses intent
  4. 04phase=planning — Codex returns structured Plan { intent, steps[] }
  5. 05phase=executing — for each step, dispatch tool
  6. 06phase=replanning (if step fails ≤2 times) — Codex emits a new Plan
  7. 07phase=completing — Codex synthesizes final answer
  8. 08phase=done — answer + citation + artifacts streamed back
Nodes:Browser/api/agent (Vercel)Codex (gpt-4o)Socrata SODAMiro REST
Flow 02/datasets/[id]

Browse a dataset

  1. 01Server component renders at request time (revalidate 600s)
  2. 02Promise.all fetches /api/views/{id}.json + /resource/{id}.json?$limit=5
  3. 03Schema columns + sample rows + last refresh rendered as static HTML
  4. 04Scoped 'ask about this dataset' search submits back to /q with dataset=<id>
Nodes:Browser/datasets/[id] server componentSocrata SODA
Flow 03/

Live homepage stats

  1. 01Server-render at request time (revalidate 300s)
  2. 02Promise.all fans out 5+ Socrata queries:
  3. 03 • Austin permits last 7 days (group by day)
  4. 04 • Austin permits 7d total
  5. 05 • Top inspection zip last 30 days
  6. 06 • 311 requests last 30 days
  7. 07 • Open code violations
  8. 08 • Per-dataset metadata for the cards
  9. 09Sparkline + ticker render with real numbers
Nodes:Browser/ server componentSocrata SODA
Flow 04data/cache/<id>.json

Cache-resilience layer (the local mirror)

  1. 01GitHub Actions ingestor cron fires every 6h
  2. 02Pulls 5,000 most-recent rows per curated dataset to JSON
  3. 03Commits data/cache/*.json (~5 MB total) to main
  4. 04Vercel build bundles cache files into every serverless function
  5. 05Reader: try cache → on miss, hit live Socrata → on 429/5xx, fall back to stale cache with caveat
  6. 06Each visible stat tile carries a freshness badge (Mirror · Nh ago / Live · just now)
Nodes:GitHub Actions croningestor.pydata/cache/*.jsonapp/lib/cache.ts
Flow 05claude mcp add txlookup …

External agent installs TXLookup

  1. 01Developer runs claude/codex/cursor mcp add against mcp/server.py
  2. 02FastMCP advertises 8 tools (ask_data, discover_datasets, get_dataset_schema, fetch_data, get_task_status, create_miro_board, add_to_miro, list_known_tools)
  3. 03Skill doc (skills/txlookup/SKILL.md) teaches the runtime when to call each
  4. 04Tool calls land at the same data layer (agent/tools/data.py)
  5. 05Citations enforced — every reply includes portal + dataset_id + last_refreshed
Nodes:External agent runtimeTXLookup MCP serverSocrata SODA + CKAN
Flow 06create_miro_board tool

Agent-to-agent (A2A) — render to Miro

  1. 01Planner emits create_miro_board for visualizable answers
  2. 02Executor calls Miro REST API with title + summary + records
  3. 03Miro returns board_id + view_link
  4. 04View link surfaced as an artifact alongside the answer
  5. 05Judge clicks → opens the live, persistent Miro board
Nodes:TXLookup agentMiro REST APIMiro board (persistent)