Dataset Scout
Grows the corpus
Scans Socrata catalog APIs across Austin, Dallas, San Antonio, Houston, and TX state portals. Scores newly-published or recently-updated datasets on quality (row count, temporal column, geographic column, freshness). Files GitHub issues with suggested catalog entries + 4 sample questions for human curation.
What this agent actually does, step by step.
- 01
GitHub Actions workflow .github/workflows/dataset-scout.yml fires on cron
- 02
Hits /api/views.json on each portal with $where=updatedAt > last_seen
- 03
For each candidate: probes /api/views/{id}.json for schema
- 04
Computes quality score: row count > 1000, has time + geo cols, < 30d freshness, license clarity
- 05
Top-N candidates above threshold → opens GitHub issue with metadata + suggested catalog entry
- 06
Updates data/scout/last_seen.json and commits via github-actions[bot]
Inputs & outputs.
Inputs
- · portals: string[] (configurable)
- · since: datetime (last_seen)
Outputs
- · GitHub issues with label scout-find
- · data/scout/last_seen.json updates
Where this agent lives in the codebase.
Python scout — catalog scan + scoring
State file
Last 0 runs that touched this agent.
auto-refresh 60s