Files
JARVIS/development-doc/plan/hermes-update/phase-h3-durable-session-lifecycle.md

72 lines
1.8 KiB
Markdown
Raw Permalink Normal View History

# H3 Durable Session Lifecycle
## 1. 目标
`HermesSessionManager` 从“进程内 session 缓存”升级成支持恢复、重建、观测的 durable lifecycle manager。
## 2. 当前问题
当前 `backend/app/services/agent_runtime/hermes_session_manager.py` 已有:
- conversation -> session 基础映射
- per-conversation lock
- last_used / restart_count / metadata
但它仍然偏原型:
- 依赖当前进程内内存
- 后端重启后的恢复能力不足
- warm / resumed / cold 没有显式状态
- recovery policy 不够清晰
## 3. 生命周期目标
```text
message arrives
-> lookup by conversation
-> warm session exists? reuse
-> else hydrate from agent_state
-> if hydrate success => resumed
-> else create fresh => cold
-> execute turn
-> update state/metrics
-> idle reclaim if needed
```
## 4. 必要能力
1. warm / resumed / cold 状态区分
2. conversation 级别锁
3. runtime health 检查
4. restart / recreate 策略
5. idle reclaim
6. safe rehydrate
7. stale session 检测
8. error 状态记录
## 5. 与 envelope 的关系
持久化来源:
- `Conversation.agent_state.runtime_state.hermes`
运行态来源:
- `HermesSessionManager`
原则:
- warm session 提升性能
- durable metadata 保障可恢复性
- 不能要求一个 Hermes 进程永远不死
## 6. 推荐文件变更
- `backend/app/services/agent_runtime/hermes_session_manager.py`
- `backend/app/services/agent_runtime/hermes_runtime.py`
- `backend/app/services/agent_service.py`
- `backend/app/models/conversation.py`
- 新增或补充测试session resume / recreate / restart / idle reclaim
## 7. 完成标准
- [ ] conversation 能恢复到正确 Hermes session 或重建新 session
- [ ] warm / resumed / cold 状态可区分
- [ ] 后端重启后 continuity 不直接断裂
- [ ] recovery/failure 有清晰记录