fix: harden L3 runtime continuity and tool execution

Align the L3 graph, agent service, and sync tool shims on one canonical continuity contract so clarification resumes and persisted snapshots behave consistently. Add targeted regressions and hardening notes covering system-message coalescing, async bridge usage, and continuity rehydration.
This commit is contained in:
2026-04-03 13:14:59 +08:00
parent b3f9b5e715
commit 4972b4e6b1
18 changed files with 4755 additions and 735 deletions

View File

@@ -0,0 +1,150 @@
# 2026-04-03 L3 Runtime Hardening Plan
## Goal
先把 Jarvis 的 L3 主链夯实,只处理 runtime / graph / tools / service integration / tests / docs 的一致性问题;暂不继续扩 unrelated feature domain。
## Scope
- `backend/app/agents/graph.py`
- `backend/app/agents/state.py`
- `backend/app/agents/tools/__init__.py`
- `backend/app/agents/tools/search.py`
- `backend/app/agents/tools/schedule.py`
- `backend/app/agents/tools/task.py`
- `backend/app/services/agent_service.py`
- `backend/app/services/document_service.py`
- `backend/app/services/memory_service.py`
- `backend/tests/backend/app/agents/test_graph*.py`
- `backend/tests/backend/app/services/test_brain_ingestion.py`
- related design/plan docs under `docs/superpowers/`
## Non-goals
- 不在本轮新增前端页面
- 不在 L3 未稳定前继续扩 accounting / weather / RSS 等运行时域
- 不重做 graph 架构,只做收敛、对齐和补测试
## Current High-Priority Gaps
1. **continuity / clarification schema drift**
- graph runtime 已使用 `owning_agent` / `owning_sub_commander` / `target_action`
- brain ingestion tests 仍大量使用旧快照字段:`active_sub_flow` / `awaiting_user_input`
2. **tool execution drift**
- `search.py``_run_async()` 在 running loop 下实现不一致
- schedule/task canonicalization 仍存在参数映射漂移
3. **service integration drift**
- `agent_service` 已派生 role-scoped memory sections但 continuity snapshot / graph runtime / persisted attachments 需要继续收口
4. **docs drift**
- 现有文档已记录 L3 merge progress但缺少一份当天可执行的 hardening 追踪文档
## Workstreams
### Workstream A — Continuity Contract
Owner: worker-1
Target:
- 对齐 clarification / continuity canonical schema
- 让 graph runtime 与 persisted snapshot 使用同一套契约,或显式兼容旧字段
- 补针对性测试
Done when:
- graph 与 ingestion tests 对 clarification/continuity 断言一致
- stale continuity / resume-after-clarification 场景有回归覆盖
- 文档明确列出 canonical 字段和兼容规则
### Workstream B — Tool Execution Path
Owner: worker-2
Target:
- 修复 search async bridge
- 对齐 task / schedule canonicalization
- 固定当前 L3 scope 下真实支持的 tool/fallback 规则
Current status:
- 已统一 `search.py` / `schedule.py` / `task.py` 到共享 `app.agents.tools.async_bridge.run_async`,避免 running loop 下的同步桥接漂移。
- 已收敛 graph canonicalization`create_todo` 保留 date/todo_date 语义;仅在出现 timed task 信号时提升为 `create_schedule_task``create_goal` 统一落到 `goal_date``create_reminder` clarification 前会先标准化 `date`
- 已补 targeted regressions覆盖 active event loop search path、timed todo promotion、reminder clarification date normalization。
Done when:
- 相关工具测试通过
- graph canonicalization 行为清晰且无死分支
- 文档明确说明支持的 tool path 与 deferred domains
### Workstream C — Service Integration
Owner: worker-3
Target:
- 对齐 graph runtime 与 `agent_service` 入口语义
- 收敛 continuity snapshot、role-scoped context、stream/sync 行为
- 补接入层测试或针对性断言
Done when:
- `agent_service` 与 graph 状态注入规则一致
- continuity snapshot load/persist 行为有测试证据
- 文档明确 graph/service 边界和责任
## Runtime Contract Notes
### Clarification context
Canonical target shape:
- `owning_agent`
- `owning_sub_commander`
- `target_action`
- `question`
- `partial_args`
- `missing_fields`
- `status`
### Continuity state
Current known active markers:
- `status: fresh|stale`
- `mode: resume_after_clarification` for clarification continuation
- routing continuation should only survive when the new request is still semantically a continuation
### Tool strategy
Current target contract:
- native tools and JSON fallback should converge on the same normalized tool name + normalized args before execution
- system messages should remain coalesced into one system message for OpenAI-compatible providers that reject multiple system messages
- sync tool shims in current L3 scope must route through shared `async_bridge.run_async` instead of per-file event-loop wrappers
### Current L3 tool path rules
- `librarian_retrieval` current allowlist: `search_knowledge`, `hybrid_search`, `web_search`, `get_knowledge_graph_context`
- search-family sync wrappers must be safe under an already-running event loop
- `create_todo` keeps day-level intent on `todo_date`; do not silently remap date-only todo requests to task due dates
- `create_todo` upgrades to `create_schedule_task` only for timed/task-shaped payloads such as `due_time`, `due_datetime`, `start_time`, `end_time`
- `create_goal` date aliases normalize to `goal_date`
- `create_reminder` aliases normalize before clarification so resumed flows keep canonical partial args
### Explicitly deferred domains in this hardening pass
- accounting runtime expansion
- weather runtime expansion
- RSS runtime expansion
- any new tool domains outside current schedule / task / forum / knowledge L3 path
## Documentation Rule For This Hardening Pass
每完成一个 workstream
1. 更新本文件的 status
2. 在相关 spec/notes 中补一段“当前状态 / 已决策 / 已知边界”
3. 再标记任务完成
## Status
- [x] Hardening tracker created
- [x] Workstream A complete
- [x] Workstream B complete
- [x] Workstream C complete
- [x] Final verification pass complete
## Verification Checklist
- [x] `test_graph_system_messages.py` → 8 passed
- [x] `test_tool_async_bridge.py` + `test_task_tools.py` → 18 passed
- [x] `test_brain_ingestion.py` full file → 40 passed
- [x] targeted continuity persistence/rehydration checks → 3 passed
- [x] targeted graph regressions for timed todo / reminder clarification / active event loop paths
- [ ] broader graph suite beyond this L3 slice
## Final Notes
- L3 continuity persistence now uses one canonical envelope and normalizes legacy snapshot shapes on rehydration.
- Service/runtime integration is aligned on the canonical continuity schema rather than legacy raw snapshot persistence.
- Tool sync shims now share one async bridge across search / schedule / task / forum paths.
- Final verification was executed with `uv run pytest` from `backend/`, which bypassed the broken plain `python` launcher in this environment.
- A reviewer flagged async bridge timeout/cancellation semantics as a follow-up reliability concern for mutating tools, but it is not blocking this L3 hardening pass.
## Next Action
- Treat this L3 hardening slice as complete.
- If continuing, the next best follow-up is either broader graph regression coverage or a dedicated fix for async bridge timeout/cancellation semantics.

View File

@@ -40,3 +40,18 @@
- normalized_content should be persisted on documents so preview, rebuild, and future chunking can reuse the same canonical text.
- Lightweight hierarchy should be represented in chunk metadata first, not in a new relational tree schema.
- Current DOCX upload failure in the running environment is caused by a missing python-docx installation in the active backend environment.
## Additional Findings: L3 Merge Progress
- `backend/app/agents/state.py` has been expanded to the newer L3 runtime state shape so graph/runtime code can rely on structured continuity, tool-round, retry, routing-hop, and datetime-reference fields.
- `backend/app/agents/graph.py` no longer contains merge markers and the phantom `EXECUTOR_ACCOUNTING` branch has been removed from graph registration and routing.
- Accounting-style prompts are currently normalized onto `AgentRole.EXECUTOR` instead of a separate executor-accounting role, which avoids dangling enum/runtime references while keeping those intents routable.
- `backend/tests/backend/app/agents/test_graph.py` has been reconciled onto the newer L3 runtime test branch and stale `EXECUTOR_ACCOUNTING` expectations were updated to `AgentRole.EXECUTOR`.
- Tool execution now uses a shared async bridge in `backend/app/agents/tools/async_bridge.py`, and `search.py`, `schedule.py`, `task.py`, plus `forum.py` all route synchronous tool entrypoints through that same bridge to keep runtime behavior consistent inside and outside active event loops.
- Current task/schedule canonicalization remains intentionally narrow for L3: task aliases (`content`, `date`, legacy priorities) and reminder aliases (`datetime`, `at`, `remind_at`, `time`, timezone variants) are normalized; deferred domains such as weather/accounting-specific tool routing remain outside this stabilization slice.
- Targeted verification now covers async bridge behavior plus task/schedule alias persistence tests; local pytest invocation still depends on resolving environment-level startup issues when the interpreter exits before running the selected files.
- L3 runtime/service integration now persists continuity snapshots in a single canonical envelope (`kind`, `version`, `state`) on both assistant message attachments and `Conversation.agent_state`, so streaming and sync chat entrypoints rehydrate the same shape.
- The continuity rehydration path is also tolerant of older `Conversation` rows/models that do not expose `agent_state`, falling back to assistant message attachments instead of failing before graph execution.
- The finalized L3 continuity contract persists a canonical `agent_continuity_state` snapshot: `turn_context.active_sub_commander`, `pending_action.type|owner_agent|owner_sub_commander|status`, `clarification_context.owning_agent|owning_sub_commander|target_action|question|status`, and `continuity_state.status|mode`.
- `backend/app/services/agent_service.py` normalizes legacy persisted snapshots (`active_sub_flow`, `agent`, `sub_flow`, `action_type`, `awaiting_user_input`, `awaiting_clarification`) into that canonical shape on both save and rehydration so older brain-ingestion records still resume correctly.
- Edge cases: explicit new requests may keep stale continuity in memory for override-aware routing, but only `continuity_state.status == fresh` participates in active continuation; clarification resumes use `continuity_state.mode = resume_after_clarification`.
- `memory_service.build_memory_context(...)` remains the shared retrieval join point for conversation summaries, user memory, and BrainMemory recall, while `document_service` continues emitting BrainEvent records from upload flow without changing the graph runtime contract.