Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
72 lines
1.8 KiB
Markdown
72 lines
1.8 KiB
Markdown
# H3 Durable Session Lifecycle
|
||
|
||
## 1. 目标
|
||
|
||
把 `HermesSessionManager` 从“进程内 session 缓存”升级成支持恢复、重建、观测的 durable lifecycle manager。
|
||
|
||
## 2. 当前问题
|
||
|
||
当前 `backend/app/services/agent_runtime/hermes_session_manager.py` 已有:
|
||
- conversation -> session 基础映射
|
||
- per-conversation lock
|
||
- last_used / restart_count / metadata
|
||
|
||
但它仍然偏原型:
|
||
- 依赖当前进程内内存
|
||
- 后端重启后的恢复能力不足
|
||
- warm / resumed / cold 没有显式状态
|
||
- recovery policy 不够清晰
|
||
|
||
## 3. 生命周期目标
|
||
|
||
```text
|
||
message arrives
|
||
-> lookup by conversation
|
||
-> warm session exists? reuse
|
||
-> else hydrate from agent_state
|
||
-> if hydrate success => resumed
|
||
-> else create fresh => cold
|
||
-> execute turn
|
||
-> update state/metrics
|
||
-> idle reclaim if needed
|
||
```
|
||
|
||
## 4. 必要能力
|
||
|
||
1. warm / resumed / cold 状态区分
|
||
2. conversation 级别锁
|
||
3. runtime health 检查
|
||
4. restart / recreate 策略
|
||
5. idle reclaim
|
||
6. safe rehydrate
|
||
7. stale session 检测
|
||
8. error 状态记录
|
||
|
||
## 5. 与 envelope 的关系
|
||
|
||
持久化来源:
|
||
- `Conversation.agent_state.runtime_state.hermes`
|
||
|
||
运行态来源:
|
||
- `HermesSessionManager`
|
||
|
||
原则:
|
||
- warm session 提升性能
|
||
- durable metadata 保障可恢复性
|
||
- 不能要求一个 Hermes 进程永远不死
|
||
|
||
## 6. 推荐文件变更
|
||
|
||
- `backend/app/services/agent_runtime/hermes_session_manager.py`
|
||
- `backend/app/services/agent_runtime/hermes_runtime.py`
|
||
- `backend/app/services/agent_service.py`
|
||
- `backend/app/models/conversation.py`
|
||
- 新增或补充测试:session resume / recreate / restart / idle reclaim
|
||
|
||
## 7. 完成标准
|
||
|
||
- [ ] conversation 能恢复到正确 Hermes session 或重建新 session
|
||
- [ ] warm / resumed / cold 状态可区分
|
||
- [ ] 后端重启后 continuity 不直接断裂
|
||
- [ ] recovery/failure 有清晰记录
|