Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
1.8 KiB
1.8 KiB
H3 Durable Session Lifecycle
1. 目标
把 HermesSessionManager 从“进程内 session 缓存”升级成支持恢复、重建、观测的 durable lifecycle manager。
2. 当前问题
当前 backend/app/services/agent_runtime/hermes_session_manager.py 已有:
- conversation -> session 基础映射
- per-conversation lock
- last_used / restart_count / metadata
但它仍然偏原型:
- 依赖当前进程内内存
- 后端重启后的恢复能力不足
- warm / resumed / cold 没有显式状态
- recovery policy 不够清晰
3. 生命周期目标
message arrives
-> lookup by conversation
-> warm session exists? reuse
-> else hydrate from agent_state
-> if hydrate success => resumed
-> else create fresh => cold
-> execute turn
-> update state/metrics
-> idle reclaim if needed
4. 必要能力
- warm / resumed / cold 状态区分
- conversation 级别锁
- runtime health 检查
- restart / recreate 策略
- idle reclaim
- safe rehydrate
- stale session 检测
- error 状态记录
5. 与 envelope 的关系
持久化来源:
Conversation.agent_state.runtime_state.hermes
运行态来源:
HermesSessionManager
原则:
- warm session 提升性能
- durable metadata 保障可恢复性
- 不能要求一个 Hermes 进程永远不死
6. 推荐文件变更
backend/app/services/agent_runtime/hermes_session_manager.pybackend/app/services/agent_runtime/hermes_runtime.pybackend/app/services/agent_service.pybackend/app/models/conversation.py- 新增或补充测试:session resume / recreate / restart / idle reclaim
7. 完成标准
- conversation 能恢复到正确 Hermes session 或重建新 session
- warm / resumed / cold 状态可区分
- 后端重启后 continuity 不直接断裂
- recovery/failure 有清晰记录