feat(X13): scheduled drain — fully-autonomous digest→fetch→ingest loop

- scripts/drain_court_fetch.py: drives orchestrator.drain_pending (host-only;
  no-op when queue empty). Mirrors drain_halacha_queue.py.
- scripts/legal-court-fetch-drain.config.cjs: pm2 cron (hourly :17, one-shot),
  COURT_FETCH_DRAIN_CRON override.
- fix: orchestrator default service URL 127.0.0.1 → 10.0.1.1 (the service binds
  the docker0 gateway; the host can't reach it on loopback). Found live — the
  first drain failed "connection refused" until corrected.
- SCRIPTS.md entries.

Validated end-to-end in PRODUCTION on a real digest: עת"מ 43830-12-24
(החברה להגנת הטבע) fetched from נט המשפט → case_law (79 chunks, source_url),
digest relinked (INV-DIG3 closed), halacha queued pending_review. job=done.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-07 20:31:53 +00:00
parent 540d39b958
commit f4f110f0d1
4 changed files with 90 additions and 4 deletions

View File

@@ -41,11 +41,12 @@ logger = logging.getLogger(__name__)
# human (INV-CF3). Kept low — the .gov site shouldn't be hammered (INV-CF4).
MAX_AUTONOMOUS_ATTEMPTS = int(os.environ.get("COURT_FETCH_MAX_ATTEMPTS", "2"))
# The host-side Tier-1 browser service (pm2). The MCP server runs on the host,
# so it reaches the service over loopback directly (the container bridge in
# web/court_fetch_proxy.py is a separate, optional entry point).
# The host-side Tier-1 browser service (pm2). It binds the docker0 bridge
# gateway (10.0.1.1) — same as legal-chat-service — so both the host MCP server
# and containers can reach it; the host reaches 10.0.1.1 as a local interface.
# Override with COURT_FETCH_SERVICE_URL.
COURT_FETCH_SERVICE_URL = os.environ.get(
"COURT_FETCH_SERVICE_URL", "http://127.0.0.1:8771"
"COURT_FETCH_SERVICE_URL", "http://10.0.1.1:8771"
)
_SHARED_SECRET = os.environ.get("COURT_FETCH_SHARED_SECRET", "").strip()
_TIER1_TIMEOUT_S = float(os.environ.get("COURT_FETCH_TIER1_TIMEOUT_S", "300"))