What shipped
Built a focused, runnable Python CLI MVP for CivicBrief Radar, a Nashville-first civic agenda monitor. The pipeline:
- Loads
config/sources.jsonandconfig/watchlist.json. - Captures fixture or live source documents into
data/raw/<source>/. - Extracts text from HTML, plain text, and optionally PDFs.
- Splits documents into normalized agenda items.
- Matches items against watchlist keywords, regexes, entities, addresses, and urgency terms.
- Scores alerts with deterministic urgency heuristics.
- Writes operator-ready artifacts:
- alerts.json
- daily_digest.md
- source_index.json
- watchlist_matches.csv
The seeded demo covers Metro Council, Planning Commission, Board of Zoning Appeals, MNPS Board, WeGo Transit, and Franklin BOMA-style public bodies.
Architecture
- Python standard library first, so the demo runs without package installation.
- File-backed JSON/CSV/Markdown outputs to match the service-led pilot workflow.
- Configurable JSON registries instead of hard-coded public bodies or watchlists.
- Deterministic matching and scoring for inspectability.
- Optional live URL mode with
urllibplus HTML link discovery. - Optional PDF extraction through
pypdfwhen installed; missing PDF support does not break the run. - Dataclasses for transparent internal models instead of a heavier framework.
Trimmed scope
- No SaaS dashboard.
- No automatic email or Slack delivery.
- No LLM summaries in this pass.
- No video transcription.
- No database; all state is local filesystem output.
- No attempt to perfectly support every agenda portal style.
- Live sources are supported generically, but the verified path is seeded local data.
Limitations
- Agenda splitting is heuristic and will need source-specific adapters for difficult portal/PDF formats.
- PDF page-level extraction depends on optional
pypdfavailability and source PDF quality. - Matching is keyword/regex based and may miss semantic matches that do not share configured language.
- Scoring is deterministic but not calibrated against historical meeting outcomes.
- Live crawling is conservative and limited to a small number of candidate documents per source.
- No deduplicated longitudinal state across multiple days beyond content hashes in the current run outputs.
Suggested next steps
- Replace demo fixture URLs with verified official Nashville source URLs and run in
--fetch-livemode. - Add source-specific parsers for Metro Council, Planning Commission, BZA, MNPS, and WeGo portals.
- Add a small review UI or static HTML report for the daily digest.
- Add persisted dedupe/history with SQLite once daily operation begins.
- Add optional OpenAI summarization behind a flag after deterministic outputs are stable.
- Add manual QA fields for service-led delivery, such as analyst notes and customer-ready status.
- Build a weekly rollup generator using the same alert JSON outputs.