fix(claude_session): surface real CLI error + sanitize nested env (#85)

write_interim_draft failed for all blocks from the CEO MCP instance with
"Claude CLI failed (exit 1): unknown error". Two fixes:

1. Error surfacing (the certain win): on non-zero exit, capture and log
   both stderr AND stdout (the CLI sometimes writes its diagnostic to
   stdout or nowhere), so the next occurrence is diagnosable instead of
   collapsing to "unknown error". This is why #85 was unsolved — the real
   error was swallowed (engineering rule §6: no silent swallow).

2. Defensive hardening: strip Claude Code session markers (CLAUDECODE,
   CLAUDE_CODE_*, CLAUDE_AGENT_*, AI_AGENT, CLAUDE_EFFORT) from the env of
   nested `claude -p` calls and run them from $HOME, decoupling them from
   the parent agent's session/project state. Aligns query() with the
   existing query_streaming() path (which already sets cwd=HOME). Auth/
   config vars are preserved.

Note: the original adapter-context failure could not be reproduced in a
plain interactive session (nested claude -p succeeds there in both old and
new code), so the env markers are a suspect, not a proven cause. The real
value is the diagnostics. Verified: nested query() returns PONG from
inside a CLAUDECODE=1 session; unit tests cover env sanitization.

Invariants: G1 (normalize at source — fix the spawn, not readers),
G2 (no parallel path — same query()), §6 (no silent error swallow).
INV: feedback_claude_session_local_only preserved (all calls stay local).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-06 19:29:36 +00:00
parent 37c00bac13
commit 8ec24cf822
2 changed files with 87 additions and 2 deletions

View File

@@ -29,6 +29,7 @@ from __future__ import annotations
import asyncio
import json
import logging
import os
from legal_mcp.config import parse_llm_json
@@ -40,6 +41,32 @@ logger = logging.getLogger(__name__)
DEFAULT_TIMEOUT = 1800
LONG_TIMEOUT = 3600 # opus block writing on full case context
# Environment markers a running Claude Code session exports into its child
# processes. When an agent itself runs as ``claude -p`` (e.g. the CEO MCP
# instance launched by Paperclip's claude_local adapter), its MCP server
# inherits these, and a *nested* ``claude -p`` we spawn here inherits them
# too. Stripping them makes every nested invocation launch as a clean
# top-level session, decoupled from the parent's session/project state —
# defensive hardening against the #85 nested-``exit 1`` failure (which could
# not be reproduced in a plain interactive session, so the markers are a
# suspect, not a proven cause). Auth/config vars (CLAUDE_CONFIG_DIR,
# ANTHROPIC_*, PATH, HOME) are kept. See TaskMaster legal-ai #85.
_SESSION_MARKER_PREFIXES = ("CLAUDECODE", "CLAUDE_CODE_", "CLAUDE_AGENT_")
_SESSION_MARKER_EXACT = frozenset({"AI_AGENT", "CLAUDE_EFFORT"})
def _clean_subprocess_env() -> dict[str, str]:
"""Copy the current env minus Claude Code session markers.
Lets a nested ``claude -p`` start fresh instead of detecting it is
already inside a Claude Code session (the #85 nested-exit-1 bug).
"""
env = dict(os.environ)
for key in list(env):
if key in _SESSION_MARKER_EXACT or key.startswith(_SESSION_MARKER_PREFIXES):
del env[key]
return env
async def query(
prompt: str,
@@ -100,6 +127,8 @@ async def query(
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env=_clean_subprocess_env(),
cwd=os.path.expanduser("~"),
)
except FileNotFoundError:
raise RuntimeError(
@@ -125,9 +154,20 @@ async def query(
raise RuntimeError(f"Claude CLI timed out after {timeout}s")
if proc.returncode != 0:
stderr = stderr_b.decode("utf-8", errors="replace").strip()[:500] or "unknown error"
# Surface the real cause: the CLI sometimes writes its diagnostic to
# stdout (or nowhere) rather than stderr, so a stderr-only message
# collapsed to "unknown error" and hid the actual failure (#85).
stderr = stderr_b.decode("utf-8", errors="replace").strip()
stdout = stdout_b.decode("utf-8", errors="replace").strip()
diagnostic = stderr or stdout or "no output on stderr/stdout"
size_info = f"; prompt_len={len(full_prompt):,} chars" if len(full_prompt) > 100_000 else ""
raise RuntimeError(f"Claude CLI failed (exit {proc.returncode}): {stderr}{size_info}")
logger.error(
"claude -p failed (exit %s): stderr=%r stdout=%r",
proc.returncode, stderr[:500], stdout[:500],
)
raise RuntimeError(
f"Claude CLI failed (exit {proc.returncode}): {diagnostic[:500]}{size_info}"
)
stdout = stdout_b.decode("utf-8", errors="replace").strip()
if not stdout:
@@ -232,6 +272,7 @@ async def query_streaming(
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=cwd,
env=_clean_subprocess_env(),
)
except FileNotFoundError:
yield {