151 lines
6.7 KiB
Markdown
151 lines
6.7 KiB
Markdown
|
|
# 2026-04-03 L3 Runtime Hardening Plan
|
|||
|
|
|
|||
|
|
## Goal
|
|||
|
|
先把 Jarvis 的 L3 主链夯实,只处理 runtime / graph / tools / service integration / tests / docs 的一致性问题;暂不继续扩 unrelated feature domain。
|
|||
|
|
|
|||
|
|
## Scope
|
|||
|
|
- `backend/app/agents/graph.py`
|
|||
|
|
- `backend/app/agents/state.py`
|
|||
|
|
- `backend/app/agents/tools/__init__.py`
|
|||
|
|
- `backend/app/agents/tools/search.py`
|
|||
|
|
- `backend/app/agents/tools/schedule.py`
|
|||
|
|
- `backend/app/agents/tools/task.py`
|
|||
|
|
- `backend/app/services/agent_service.py`
|
|||
|
|
- `backend/app/services/document_service.py`
|
|||
|
|
- `backend/app/services/memory_service.py`
|
|||
|
|
- `backend/tests/backend/app/agents/test_graph*.py`
|
|||
|
|
- `backend/tests/backend/app/services/test_brain_ingestion.py`
|
|||
|
|
- related design/plan docs under `docs/superpowers/`
|
|||
|
|
|
|||
|
|
## Non-goals
|
|||
|
|
- 不在本轮新增前端页面
|
|||
|
|
- 不在 L3 未稳定前继续扩 accounting / weather / RSS 等运行时域
|
|||
|
|
- 不重做 graph 架构,只做收敛、对齐和补测试
|
|||
|
|
|
|||
|
|
## Current High-Priority Gaps
|
|||
|
|
1. **continuity / clarification schema drift**
|
|||
|
|
- graph runtime 已使用 `owning_agent` / `owning_sub_commander` / `target_action`
|
|||
|
|
- brain ingestion tests 仍大量使用旧快照字段:`active_sub_flow` / `awaiting_user_input` 等
|
|||
|
|
2. **tool execution drift**
|
|||
|
|
- `search.py` 的 `_run_async()` 在 running loop 下实现不一致
|
|||
|
|
- schedule/task canonicalization 仍存在参数映射漂移
|
|||
|
|
3. **service integration drift**
|
|||
|
|
- `agent_service` 已派生 role-scoped memory sections,但 continuity snapshot / graph runtime / persisted attachments 需要继续收口
|
|||
|
|
4. **docs drift**
|
|||
|
|
- 现有文档已记录 L3 merge progress,但缺少一份当天可执行的 hardening 追踪文档
|
|||
|
|
|
|||
|
|
## Workstreams
|
|||
|
|
|
|||
|
|
### Workstream A — Continuity Contract
|
|||
|
|
Owner: worker-1
|
|||
|
|
|
|||
|
|
Target:
|
|||
|
|
- 对齐 clarification / continuity canonical schema
|
|||
|
|
- 让 graph runtime 与 persisted snapshot 使用同一套契约,或显式兼容旧字段
|
|||
|
|
- 补针对性测试
|
|||
|
|
|
|||
|
|
Done when:
|
|||
|
|
- graph 与 ingestion tests 对 clarification/continuity 断言一致
|
|||
|
|
- stale continuity / resume-after-clarification 场景有回归覆盖
|
|||
|
|
- 文档明确列出 canonical 字段和兼容规则
|
|||
|
|
|
|||
|
|
### Workstream B — Tool Execution Path
|
|||
|
|
Owner: worker-2
|
|||
|
|
|
|||
|
|
Target:
|
|||
|
|
- 修复 search async bridge
|
|||
|
|
- 对齐 task / schedule canonicalization
|
|||
|
|
- 固定当前 L3 scope 下真实支持的 tool/fallback 规则
|
|||
|
|
|
|||
|
|
Current status:
|
|||
|
|
- 已统一 `search.py` / `schedule.py` / `task.py` 到共享 `app.agents.tools.async_bridge.run_async`,避免 running loop 下的同步桥接漂移。
|
|||
|
|
- 已收敛 graph canonicalization:`create_todo` 保留 date/todo_date 语义;仅在出现 timed task 信号时提升为 `create_schedule_task`;`create_goal` 统一落到 `goal_date`;`create_reminder` clarification 前会先标准化 `date`。
|
|||
|
|
- 已补 targeted regressions,覆盖 active event loop search path、timed todo promotion、reminder clarification date normalization。
|
|||
|
|
|
|||
|
|
Done when:
|
|||
|
|
- 相关工具测试通过
|
|||
|
|
- graph canonicalization 行为清晰且无死分支
|
|||
|
|
- 文档明确说明支持的 tool path 与 deferred domains
|
|||
|
|
|
|||
|
|
### Workstream C — Service Integration
|
|||
|
|
Owner: worker-3
|
|||
|
|
|
|||
|
|
Target:
|
|||
|
|
- 对齐 graph runtime 与 `agent_service` 入口语义
|
|||
|
|
- 收敛 continuity snapshot、role-scoped context、stream/sync 行为
|
|||
|
|
- 补接入层测试或针对性断言
|
|||
|
|
|
|||
|
|
Done when:
|
|||
|
|
- `agent_service` 与 graph 状态注入规则一致
|
|||
|
|
- continuity snapshot load/persist 行为有测试证据
|
|||
|
|
- 文档明确 graph/service 边界和责任
|
|||
|
|
|
|||
|
|
## Runtime Contract Notes
|
|||
|
|
### Clarification context
|
|||
|
|
Canonical target shape:
|
|||
|
|
- `owning_agent`
|
|||
|
|
- `owning_sub_commander`
|
|||
|
|
- `target_action`
|
|||
|
|
- `question`
|
|||
|
|
- `partial_args`
|
|||
|
|
- `missing_fields`
|
|||
|
|
- `status`
|
|||
|
|
|
|||
|
|
### Continuity state
|
|||
|
|
Current known active markers:
|
|||
|
|
- `status: fresh|stale`
|
|||
|
|
- `mode: resume_after_clarification` for clarification continuation
|
|||
|
|
- routing continuation should only survive when the new request is still semantically a continuation
|
|||
|
|
|
|||
|
|
### Tool strategy
|
|||
|
|
Current target contract:
|
|||
|
|
- native tools and JSON fallback should converge on the same normalized tool name + normalized args before execution
|
|||
|
|
- system messages should remain coalesced into one system message for OpenAI-compatible providers that reject multiple system messages
|
|||
|
|
- sync tool shims in current L3 scope must route through shared `async_bridge.run_async` instead of per-file event-loop wrappers
|
|||
|
|
|
|||
|
|
### Current L3 tool path rules
|
|||
|
|
- `librarian_retrieval` current allowlist: `search_knowledge`, `hybrid_search`, `web_search`, `get_knowledge_graph_context`
|
|||
|
|
- search-family sync wrappers must be safe under an already-running event loop
|
|||
|
|
- `create_todo` keeps day-level intent on `todo_date`; do not silently remap date-only todo requests to task due dates
|
|||
|
|
- `create_todo` upgrades to `create_schedule_task` only for timed/task-shaped payloads such as `due_time`, `due_datetime`, `start_time`, `end_time`
|
|||
|
|
- `create_goal` date aliases normalize to `goal_date`
|
|||
|
|
- `create_reminder` aliases normalize before clarification so resumed flows keep canonical partial args
|
|||
|
|
|
|||
|
|
### Explicitly deferred domains in this hardening pass
|
|||
|
|
- accounting runtime expansion
|
|||
|
|
- weather runtime expansion
|
|||
|
|
- RSS runtime expansion
|
|||
|
|
- any new tool domains outside current schedule / task / forum / knowledge L3 path
|
|||
|
|
|
|||
|
|
## Documentation Rule For This Hardening Pass
|
|||
|
|
每完成一个 workstream:
|
|||
|
|
1. 更新本文件的 status
|
|||
|
|
2. 在相关 spec/notes 中补一段“当前状态 / 已决策 / 已知边界”
|
|||
|
|
3. 再标记任务完成
|
|||
|
|
|
|||
|
|
## Status
|
|||
|
|
- [x] Hardening tracker created
|
|||
|
|
- [x] Workstream A complete
|
|||
|
|
- [x] Workstream B complete
|
|||
|
|
- [x] Workstream C complete
|
|||
|
|
- [x] Final verification pass complete
|
|||
|
|
|
|||
|
|
## Verification Checklist
|
|||
|
|
- [x] `test_graph_system_messages.py` → 8 passed
|
|||
|
|
- [x] `test_tool_async_bridge.py` + `test_task_tools.py` → 18 passed
|
|||
|
|
- [x] `test_brain_ingestion.py` full file → 40 passed
|
|||
|
|
- [x] targeted continuity persistence/rehydration checks → 3 passed
|
|||
|
|
- [x] targeted graph regressions for timed todo / reminder clarification / active event loop paths
|
|||
|
|
- [ ] broader graph suite beyond this L3 slice
|
|||
|
|
|
|||
|
|
## Final Notes
|
|||
|
|
- L3 continuity persistence now uses one canonical envelope and normalizes legacy snapshot shapes on rehydration.
|
|||
|
|
- Service/runtime integration is aligned on the canonical continuity schema rather than legacy raw snapshot persistence.
|
|||
|
|
- Tool sync shims now share one async bridge across search / schedule / task / forum paths.
|
|||
|
|
- Final verification was executed with `uv run pytest` from `backend/`, which bypassed the broken plain `python` launcher in this environment.
|
|||
|
|
- A reviewer flagged async bridge timeout/cancellation semantics as a follow-up reliability concern for mutating tools, but it is not blocking this L3 hardening pass.
|
|||
|
|
|
|||
|
|
## Next Action
|
|||
|
|
- Treat this L3 hardening slice as complete.
|
|||
|
|
- If continuing, the next best follow-up is either broader graph regression coverage or a dedicated fix for async bridge timeout/cancellation semantics.
|