Dataset Scout

Grows the corpus

Scans Socrata catalog APIs across Austin, Dallas, San Antonio, Houston, and TX state portals. Scores newly-published or recently-updated datasets on quality (row count, temporal column, geographic column, freshness). Files GitHub issues with suggested catalog entries + 4 sample questions for human curation.

Cron: 0 */6 * * * (every 6 hours)scheduled

What this agent actually does, step by step.

01
GitHub Actions workflow .github/workflows/dataset-scout.yml fires on cron
02
Hits /api/views.json on each portal with $where=updatedAt > last_seen
03
For each candidate: probes /api/views/{id}.json for schema
04
Computes quality score: row count > 1000, has time + geo cols, < 30d freshness, license clarity
05
Top-N candidates above threshold → opens GitHub issue with metadata + suggested catalog entry
06
Updates data/scout/last_seen.json and commits via github-actions[bot]

Inputs & outputs.

Inputs

· portals: string[] (configurable)
· since: datetime (last_seen)

Outputs

· GitHub issues with label scout-find
· data/scout/last_seen.json updates

Where this agent lives in the codebase.

agent/specialists/dataset_scout.py

Python scout — catalog scan + scoring

.github/workflows/dataset-scout.yml

6h cron

data/scout/last_seen.json

State file

Last 0 runs that touched this agent.

auto-refresh 60s

No recent runs touched Dataset Scout yet.