docs/superpowers/plans/2026-04-03-l3-runtime-hardening-plan.md

# 2026-04-03 L3 Runtime Hardening Plan

## Goal
先把 Jarvis 的 L3 主链夯实，只处理 runtime / graph / tools / service integration / tests / docs 的一致性问题；暂不继续扩 unrelated feature domain。

## Scope
- `backend/app/agents/graph.py`
- `backend/app/agents/state.py`
- `backend/app/agents/tools/__init__.py`
- `backend/app/agents/tools/search.py`
- `backend/app/agents/tools/schedule.py`
- `backend/app/agents/tools/task.py`
- `backend/app/services/agent_service.py`
- `backend/app/services/document_service.py`
- `backend/app/services/memory_service.py`
- `backend/tests/backend/app/agents/test_graph*.py`
- `backend/tests/backend/app/services/test_brain_ingestion.py`
- related design/plan docs under `docs/superpowers/`

## Non-goals
- 不在本轮新增前端页面
- 不在 L3 未稳定前继续扩 accounting / weather / RSS 等运行时域
- 不重做 graph 架构，只做收敛、对齐和补测试

## Current High-Priority Gaps
1. **continuity / clarification schema drift**
   - graph runtime 已使用 `owning_agent` / `owning_sub_commander` / `target_action`
   - brain ingestion tests 仍大量使用旧快照字段：`active_sub_flow` / `awaiting_user_input` 等
2. **tool execution drift**
   - `search.py` 的 `_run_async()` 在 running loop 下实现不一致
   - schedule/task canonicalization 仍存在参数映射漂移
3. **service integration drift**
   - `agent_service` 已派生 role-scoped memory sections，但 continuity snapshot / graph runtime / persisted attachments 需要继续收口
4. **docs drift**
   - 现有文档已记录 L3 merge progress，但缺少一份当天可执行的 hardening 追踪文档

## Workstreams

### Workstream A — Continuity Contract
Owner: worker-1

Target:
- 对齐 clarification / continuity canonical schema
- 让 graph runtime 与 persisted snapshot 使用同一套契约，或显式兼容旧字段
- 补针对性测试

Done when:
- graph 与 ingestion tests 对 clarification/continuity 断言一致
- stale continuity / resume-after-clarification 场景有回归覆盖
- 文档明确列出 canonical 字段和兼容规则

### Workstream B — Tool Execution Path
Owner: worker-2

Target:
- 修复 search async bridge
- 对齐 task / schedule canonicalization
- 固定当前 L3 scope 下真实支持的 tool/fallback 规则

Current status:
- 已统一 `search.py` / `schedule.py` / `task.py` 到共享 `app.agents.tools.async_bridge.run_async`，避免 running loop 下的同步桥接漂移。
- 已收敛 graph canonicalization：`create_todo` 保留 date/todo_date 语义；仅在出现 timed task 信号时提升为 `create_schedule_task`；`create_goal` 统一落到 `goal_date`；`create_reminder` clarification 前会先标准化 `date`。
- 已补 targeted regressions，覆盖 active event loop search path、timed todo promotion、reminder clarification date normalization。

Done when:
- 相关工具测试通过
- graph canonicalization 行为清晰且无死分支
- 文档明确说明支持的 tool path 与 deferred domains

### Workstream C — Service Integration
Owner: worker-3

Target:
- 对齐 graph runtime 与 `agent_service` 入口语义
- 收敛 continuity snapshot、role-scoped context、stream/sync 行为
- 补接入层测试或针对性断言

Done when:
- `agent_service` 与 graph 状态注入规则一致
- continuity snapshot load/persist 行为有测试证据
- 文档明确 graph/service 边界和责任

## Runtime Contract Notes
### Clarification context
Canonical target shape:
- `owning_agent`
- `owning_sub_commander`
- `target_action`
- `question`
- `partial_args`
- `missing_fields`
- `status`

### Continuity state
Current known active markers:
- `status: fresh|stale`
- `mode: resume_after_clarification` for clarification continuation
- routing continuation should only survive when the new request is still semantically a continuation

### Tool strategy
Current target contract:
- native tools and JSON fallback should converge on the same normalized tool name + normalized args before execution
- system messages should remain coalesced into one system message for OpenAI-compatible providers that reject multiple system messages
- sync tool shims in current L3 scope must route through shared `async_bridge.run_async` instead of per-file event-loop wrappers

### Current L3 tool path rules
- `librarian_retrieval` current allowlist: `search_knowledge`, `hybrid_search`, `web_search`, `get_knowledge_graph_context`
- search-family sync wrappers must be safe under an already-running event loop
- `create_todo` keeps day-level intent on `todo_date`; do not silently remap date-only todo requests to task due dates
- `create_todo` upgrades to `create_schedule_task` only for timed/task-shaped payloads such as `due_time`, `due_datetime`, `start_time`, `end_time`
- `create_goal` date aliases normalize to `goal_date`
- `create_reminder` aliases normalize before clarification so resumed flows keep canonical partial args

### Explicitly deferred domains in this hardening pass
- accounting runtime expansion
- weather runtime expansion
- RSS runtime expansion
- any new tool domains outside current schedule / task / forum / knowledge L3 path

## Documentation Rule For This Hardening Pass
每完成一个 workstream：
1. 更新本文件的 status
2. 在相关 spec/notes 中补一段“当前状态 / 已决策 / 已知边界”
3. 再标记任务完成

## Status
- [x] Hardening tracker created
- [x] Workstream A complete
- [x] Workstream B complete
- [x] Workstream C complete
- [x] Final verification pass complete

## Verification Checklist
- [x] `test_graph_system_messages.py` → 8 passed
- [x] `test_tool_async_bridge.py` + `test_task_tools.py` → 18 passed
- [x] `test_brain_ingestion.py` full file → 40 passed
- [x] targeted continuity persistence/rehydration checks → 3 passed
- [x] targeted graph regressions for timed todo / reminder clarification / active event loop paths
- [ ] broader graph suite beyond this L3 slice

## Final Notes
- L3 continuity persistence now uses one canonical envelope and normalizes legacy snapshot shapes on rehydration.
- Service/runtime integration is aligned on the canonical continuity schema rather than legacy raw snapshot persistence.
- Tool sync shims now share one async bridge across search / schedule / task / forum paths.
- Final verification was executed with `uv run pytest` from `backend/`, which bypassed the broken plain `python` launcher in this environment.
- A reviewer flagged async bridge timeout/cancellation semantics as a follow-up reliability concern for mutating tools, but it is not blocking this L3 hardening pass.

## Next Action
- Treat this L3 hardening slice as complete.
- If continuing, the next best follow-up is either broader graph regression coverage or a dedicated fix for async bridge timeout/cancellation semantics.
-												fix: harden L3 runtime continuity and tool execution

Align the L3 graph, agent service, and sync tool shims on one canonical continuity contract so clarification resumes and persisted snapshots behave consistently. Add targeted regressions and hardening notes covering system-message coalescing, async bridge usage, and continuity rehydration.

											
										
										
											2026-04-03 13:14:59 +08:00
+								# 2026-04-03 L3 Runtime Hardening Plan
 								## Goal
 								先把 Jarvis 的 L3 主链夯实，只处理 runtime / graph / tools / service integration / tests / docs 的一致性问题；暂不继续扩 unrelated feature domain。
 								## Scope
 								- `backend/app/agents/graph.py`
 								- `backend/app/agents/state.py`
 								- `backend/app/agents/tools/__init__.py`
 								- `backend/app/agents/tools/search.py`
 								- `backend/app/agents/tools/schedule.py`
 								- `backend/app/agents/tools/task.py`
 								- `backend/app/services/agent_service.py`
 								- `backend/app/services/document_service.py`
 								- `backend/app/services/memory_service.py`
 								- `backend/tests/backend/app/agents/test_graph*.py`
 								- `backend/tests/backend/app/services/test_brain_ingestion.py`
 								- related design/plan docs under `docs/superpowers/`
 								## Non-goals
 								- 不在本轮新增前端页面
 								- 不在 L3 未稳定前继续扩 accounting / weather / RSS 等运行时域
 								- 不重做 graph 架构，只做收敛、对齐和补测试
 								## Current High-Priority Gaps
 . **continuity / clarification schema drift**
 								   - graph runtime 已使用 `owning_agent` / `owning_sub_commander` / `target_action`
 								   - brain ingestion tests 仍大量使用旧快照字段：`active_sub_flow` / `awaiting_user_input` 等
 . **tool execution drift**
 								   - `search.py` 的 `_run_async()` 在 running loop 下实现不一致
 								   - schedule/task canonicalization 仍存在参数映射漂移
 . **service integration drift**
 								   - `agent_service` 已派生 role-scoped memory sections，但 continuity snapshot / graph runtime / persisted attachments 需要继续收口
 . **docs drift**
 								   - 现有文档已记录 L3 merge progress，但缺少一份当天可执行的 hardening 追踪文档
 								## Workstreams
 								### Workstream A — Continuity Contract
 								Owner: worker-1
 								Target:
 								- 对齐 clarification / continuity canonical schema
 								- 让 graph runtime 与 persisted snapshot 使用同一套契约，或显式兼容旧字段
 								- 补针对性测试
 								Done when:
 								- graph 与 ingestion tests 对 clarification/continuity 断言一致
 								- stale continuity / resume-after-clarification 场景有回归覆盖
 								- 文档明确列出 canonical 字段和兼容规则
 								### Workstream B — Tool Execution Path
 								Owner: worker-2
 								Target:
 								- 修复 search async bridge
 								- 对齐 task / schedule canonicalization
 								- 固定当前 L3 scope 下真实支持的 tool/fallback 规则
 								Current status:
 								- 已统一 `search.py` / `schedule.py` / `task.py` 到共享 `app.agents.tools.async_bridge.run_async`，避免 running loop 下的同步桥接漂移。
 								- 已收敛 graph canonicalization：`create_todo` 保留 date/todo_date 语义；仅在出现 timed task 信号时提升为 `create_schedule_task`；`create_goal` 统一落到 `goal_date`；`create_reminder` clarification 前会先标准化 `date`。
 								- 已补 targeted regressions，覆盖 active event loop search path、timed todo promotion、reminder clarification date normalization。
 								Done when:
 								- 相关工具测试通过
 								- graph canonicalization 行为清晰且无死分支
 								- 文档明确说明支持的 tool path 与 deferred domains
 								### Workstream C — Service Integration
 								Owner: worker-3
 								Target:
 								- 对齐 graph runtime 与 `agent_service` 入口语义
 								- 收敛 continuity snapshot、role-scoped context、stream/sync 行为
 								- 补接入层测试或针对性断言
 								Done when:
 								- `agent_service` 与 graph 状态注入规则一致
 								- continuity snapshot load/persist 行为有测试证据
 								- 文档明确 graph/service 边界和责任
 								## Runtime Contract Notes
 								### Clarification context
 								Canonical target shape:
 								- `owning_agent`
 								- `owning_sub_commander`
 								- `target_action`
 								- `question`
 								- `partial_args`
 								- `missing_fields`
 								- `status`
 								### Continuity state
 								Current known active markers:
 								- `status: fresh|stale`
 								- `mode: resume_after_clarification` for clarification continuation
 								- routing continuation should only survive when the new request is still semantically a continuation
 								### Tool strategy
 								Current target contract:
 								- native tools and JSON fallback should converge on the same normalized tool name + normalized args before execution
 								- system messages should remain coalesced into one system message for OpenAI-compatible providers that reject multiple system messages
 								- sync tool shims in current L3 scope must route through shared `async_bridge.run_async` instead of per-file event-loop wrappers
 								### Current L3 tool path rules
 								- `librarian_retrieval` current allowlist: `search_knowledge`, `hybrid_search`, `web_search`, `get_knowledge_graph_context`
 								- search-family sync wrappers must be safe under an already-running event loop
 								- `create_todo` keeps day-level intent on `todo_date`; do not silently remap date-only todo requests to task due dates
 								- `create_todo` upgrades to `create_schedule_task` only for timed/task-shaped payloads such as `due_time`, `due_datetime`, `start_time`, `end_time`
 								- `create_goal` date aliases normalize to `goal_date`
 								- `create_reminder` aliases normalize before clarification so resumed flows keep canonical partial args
 								### Explicitly deferred domains in this hardening pass
 								- accounting runtime expansion
 								- weather runtime expansion
 								- RSS runtime expansion
 								- any new tool domains outside current schedule / task / forum / knowledge L3 path
 								## Documentation Rule For This Hardening Pass
 								每完成一个 workstream：
 . 更新本文件的 status
 . 在相关 spec/notes 中补一段“当前状态 / 已决策 / 已知边界”
 . 再标记任务完成
 								## Status
 								- [x] Hardening tracker created
 								- [x] Workstream A complete
 								- [x] Workstream B complete
 								- [x] Workstream C complete
 								- [x] Final verification pass complete
 								## Verification Checklist
 								- [x] `test_graph_system_messages.py` → 8 passed
 								- [x] `test_tool_async_bridge.py` + `test_task_tools.py` → 18 passed
 								- [x] `test_brain_ingestion.py` full file → 40 passed
 								- [x] targeted continuity persistence/rehydration checks → 3 passed
 								- [x] targeted graph regressions for timed todo / reminder clarification / active event loop paths
 								- [ ] broader graph suite beyond this L3 slice
 								## Final Notes
 								- L3 continuity persistence now uses one canonical envelope and normalizes legacy snapshot shapes on rehydration.
 								- Service/runtime integration is aligned on the canonical continuity schema rather than legacy raw snapshot persistence.
 								- Tool sync shims now share one async bridge across search / schedule / task / forum paths.
 								- Final verification was executed with `uv run pytest` from `backend/`, which bypassed the broken plain `python` launcher in this environment.
 								- A reviewer flagged async bridge timeout/cancellation semantics as a follow-up reliability concern for mutating tools, but it is not blocking this L3 hardening pass.
 								## Next Action
 								- Treat this L3 hardening slice as complete.
 								- If continuing, the next best follow-up is either broader graph regression coverage or a dedicated fix for async bridge timeout/cancellation semantics.