# 2026-04-03 L3 Runtime Hardening Plan ## Goal 先把 Jarvis 的 L3 主链夯实,只处理 runtime / graph / tools / service integration / tests / docs 的一致性问题;暂不继续扩 unrelated feature domain。 ## Scope - `backend/app/agents/graph.py` - `backend/app/agents/state.py` - `backend/app/agents/tools/__init__.py` - `backend/app/agents/tools/search.py` - `backend/app/agents/tools/schedule.py` - `backend/app/agents/tools/task.py` - `backend/app/services/agent_service.py` - `backend/app/services/document_service.py` - `backend/app/services/memory_service.py` - `backend/tests/backend/app/agents/test_graph*.py` - `backend/tests/backend/app/services/test_brain_ingestion.py` - related design/plan docs under `docs/superpowers/` ## Non-goals - 不在本轮新增前端页面 - 不在 L3 未稳定前继续扩 accounting / weather / RSS 等运行时域 - 不重做 graph 架构,只做收敛、对齐和补测试 ## Current High-Priority Gaps 1. **continuity / clarification schema drift** - graph runtime 已使用 `owning_agent` / `owning_sub_commander` / `target_action` - brain ingestion tests 仍大量使用旧快照字段:`active_sub_flow` / `awaiting_user_input` 等 2. **tool execution drift** - `search.py` 的 `_run_async()` 在 running loop 下实现不一致 - schedule/task canonicalization 仍存在参数映射漂移 3. **service integration drift** - `agent_service` 已派生 role-scoped memory sections,但 continuity snapshot / graph runtime / persisted attachments 需要继续收口 4. **docs drift** - 现有文档已记录 L3 merge progress,但缺少一份当天可执行的 hardening 追踪文档 ## Workstreams ### Workstream A — Continuity Contract Owner: worker-1 Target: - 对齐 clarification / continuity canonical schema - 让 graph runtime 与 persisted snapshot 使用同一套契约,或显式兼容旧字段 - 补针对性测试 Done when: - graph 与 ingestion tests 对 clarification/continuity 断言一致 - stale continuity / resume-after-clarification 场景有回归覆盖 - 文档明确列出 canonical 字段和兼容规则 ### Workstream B — Tool Execution Path Owner: worker-2 Target: - 修复 search async bridge - 对齐 task / schedule canonicalization - 固定当前 L3 scope 下真实支持的 tool/fallback 规则 Current status: - 已统一 `search.py` / `schedule.py` / `task.py` 到共享 `app.agents.tools.async_bridge.run_async`,避免 running loop 下的同步桥接漂移。 - 已收敛 graph canonicalization:`create_todo` 保留 date/todo_date 语义;仅在出现 timed task 信号时提升为 `create_schedule_task`;`create_goal` 统一落到 `goal_date`;`create_reminder` clarification 前会先标准化 `date`。 - 已补 targeted regressions,覆盖 active event loop search path、timed todo promotion、reminder clarification date normalization。 Done when: - 相关工具测试通过 - graph canonicalization 行为清晰且无死分支 - 文档明确说明支持的 tool path 与 deferred domains ### Workstream C — Service Integration Owner: worker-3 Target: - 对齐 graph runtime 与 `agent_service` 入口语义 - 收敛 continuity snapshot、role-scoped context、stream/sync 行为 - 补接入层测试或针对性断言 Done when: - `agent_service` 与 graph 状态注入规则一致 - continuity snapshot load/persist 行为有测试证据 - 文档明确 graph/service 边界和责任 ## Runtime Contract Notes ### Clarification context Canonical target shape: - `owning_agent` - `owning_sub_commander` - `target_action` - `question` - `partial_args` - `missing_fields` - `status` ### Continuity state Current known active markers: - `status: fresh|stale` - `mode: resume_after_clarification` for clarification continuation - routing continuation should only survive when the new request is still semantically a continuation ### Tool strategy Current target contract: - native tools and JSON fallback should converge on the same normalized tool name + normalized args before execution - system messages should remain coalesced into one system message for OpenAI-compatible providers that reject multiple system messages - sync tool shims in current L3 scope must route through shared `async_bridge.run_async` instead of per-file event-loop wrappers ### Current L3 tool path rules - `librarian_retrieval` current allowlist: `search_knowledge`, `hybrid_search`, `web_search`, `get_knowledge_graph_context` - search-family sync wrappers must be safe under an already-running event loop - `create_todo` keeps day-level intent on `todo_date`; do not silently remap date-only todo requests to task due dates - `create_todo` upgrades to `create_schedule_task` only for timed/task-shaped payloads such as `due_time`, `due_datetime`, `start_time`, `end_time` - `create_goal` date aliases normalize to `goal_date` - `create_reminder` aliases normalize before clarification so resumed flows keep canonical partial args ### Explicitly deferred domains in this hardening pass - accounting runtime expansion - weather runtime expansion - RSS runtime expansion - any new tool domains outside current schedule / task / forum / knowledge L3 path ## Documentation Rule For This Hardening Pass 每完成一个 workstream: 1. 更新本文件的 status 2. 在相关 spec/notes 中补一段“当前状态 / 已决策 / 已知边界” 3. 再标记任务完成 ## Status - [x] Hardening tracker created - [x] Workstream A complete - [x] Workstream B complete - [x] Workstream C complete - [x] Final verification pass complete ## Verification Checklist - [x] `test_graph_system_messages.py` → 8 passed - [x] `test_tool_async_bridge.py` + `test_task_tools.py` → 18 passed - [x] `test_brain_ingestion.py` full file → 40 passed - [x] targeted continuity persistence/rehydration checks → 3 passed - [x] targeted graph regressions for timed todo / reminder clarification / active event loop paths - [ ] broader graph suite beyond this L3 slice ## Final Notes - L3 continuity persistence now uses one canonical envelope and normalizes legacy snapshot shapes on rehydration. - Service/runtime integration is aligned on the canonical continuity schema rather than legacy raw snapshot persistence. - Tool sync shims now share one async bridge across search / schedule / task / forum paths. - Final verification was executed with `uv run pytest` from `backend/`, which bypassed the broken plain `python` launcher in this environment. - A reviewer flagged async bridge timeout/cancellation semantics as a follow-up reliability concern for mutating tools, but it is not blocking this L3 hardening pass. ## Next Action - Treat this L3 hardening slice as complete. - If continuing, the next best follow-up is either broader graph regression coverage or a dedicated fix for async bridge timeout/cancellation semantics.