Dataset Scout

Grows the corpus

Scans Socrata catalog APIs across Austin, Dallas, San Antonio, Houston, and TX state portals. Scores newly-published or recently-updated datasets on quality (row count, temporal column, geographic column, freshness). Files GitHub issues with suggested catalog entries + 4 sample questions for human curation.

Cron: 0 */6 * * * (every 6 hours)scheduled

What this agent actually does, step by step.

  1. 01

    GitHub Actions workflow .github/workflows/dataset-scout.yml fires on cron

  2. 02

    Hits /api/views.json on each portal with $where=updatedAt > last_seen

  3. 03

    For each candidate: probes /api/views/{id}.json for schema

  4. 04

    Computes quality score: row count > 1000, has time + geo cols, < 30d freshness, license clarity

  5. 05

    Top-N candidates above threshold → opens GitHub issue with metadata + suggested catalog entry

  6. 06

    Updates data/scout/last_seen.json and commits via github-actions[bot]

Inputs & outputs.

Inputs

  • · portals: string[] (configurable)
  • · since: datetime (last_seen)

Outputs

  • · GitHub issues with label scout-find
  • · data/scout/last_seen.json updates

Where this agent lives in the codebase.

Last 0 runs that touched this agent.

auto-refresh 60s

No recent runs touched Dataset Scout yet.