2026-05-11 01:53:30 +00:00
|
|
|
|
# 语义本体协议设计
|
|
|
|
|
|
|
|
|
|
|
|
## 1. 定位
|
|
|
|
|
|
|
|
|
|
|
|
语义本体协议是用户问题、定时任务、规则中心、MCP、数据库查询和 Agent 之间的统一中间层。
|
|
|
|
|
|
|
|
|
|
|
|
它解决的问题是:
|
|
|
|
|
|
|
|
|
|
|
|
- 用户到底在问哪个业务域?
|
|
|
|
|
|
- 这属于什么场景?
|
|
|
|
|
|
- 用户想做什么?
|
|
|
|
|
|
- 问题中涉及哪些对象?
|
|
|
|
|
|
- 有没有时间、金额、状态、部门等过滤条件?
|
|
|
|
|
|
- 是否涉及风险?
|
|
|
|
|
|
- 下一步应该查知识库、查数据库、跑规则、调 MCP,还是追问?
|
|
|
|
|
|
|
|
|
|
|
|
## 2. 第一版核心字段
|
|
|
|
|
|
|
|
|
|
|
|
第一版建议只强制落 8 个字段。
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"domain": "",
|
|
|
|
|
|
"scenario": "",
|
|
|
|
|
|
"intent": "",
|
|
|
|
|
|
"entities": [],
|
|
|
|
|
|
"time_range": {},
|
|
|
|
|
|
"constraints": {},
|
|
|
|
|
|
"risk_signals": [],
|
|
|
|
|
|
"next_step": ""
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 2.1 domain
|
|
|
|
|
|
|
|
|
|
|
|
一级业务域。
|
|
|
|
|
|
|
|
|
|
|
|
建议枚举:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
reimbursement
|
|
|
|
|
|
accounts_receivable
|
|
|
|
|
|
accounts_payable
|
|
|
|
|
|
general_finance
|
|
|
|
|
|
system_operation
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
含义:
|
|
|
|
|
|
|
|
|
|
|
|
- `reimbursement`:报销、差旅、发票、补件。
|
|
|
|
|
|
- `accounts_receivable`:应收账款、客户开票、收款、账龄。
|
|
|
|
|
|
- `accounts_payable`:应付账款、供应商发票、付款、对账。
|
|
|
|
|
|
- `general_finance`:通用财务知识、制度、统计。
|
|
|
|
|
|
- `system_operation`:系统巡检、任务运行、规则维护、MCP 健康检查。
|
|
|
|
|
|
|
|
|
|
|
|
### 2.2 scenario
|
|
|
|
|
|
|
|
|
|
|
|
细分场景。
|
|
|
|
|
|
|
|
|
|
|
|
报销:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
travel_reimbursement
|
|
|
|
|
|
daily_expense
|
|
|
|
|
|
invoice_validation
|
|
|
|
|
|
attachment_review
|
|
|
|
|
|
policy_overrun
|
|
|
|
|
|
reimbursement_audit
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
应收:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
customer_invoice
|
|
|
|
|
|
collection_followup
|
|
|
|
|
|
receivable_aging
|
|
|
|
|
|
payment_matching
|
|
|
|
|
|
bad_debt_risk
|
|
|
|
|
|
contract_receivable
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
应付:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
vendor_invoice
|
|
|
|
|
|
payment_request
|
|
|
|
|
|
payable_aging
|
|
|
|
|
|
vendor_reconciliation
|
|
|
|
|
|
invoice_matching
|
|
|
|
|
|
cash_outflow_forecast
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
系统运营:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
daily_risk_scan
|
|
|
|
|
|
daily_finance_statistics
|
|
|
|
|
|
knowledge_accumulation
|
|
|
|
|
|
mcp_health_check
|
|
|
|
|
|
rule_quality_review
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 2.3 intent
|
|
|
|
|
|
|
|
|
|
|
|
用户或任务的意图。
|
|
|
|
|
|
|
|
|
|
|
|
建议枚举:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
query
|
|
|
|
|
|
explain
|
|
|
|
|
|
create
|
|
|
|
|
|
validate
|
|
|
|
|
|
summarize
|
|
|
|
|
|
reconcile
|
|
|
|
|
|
monitor
|
|
|
|
|
|
predict
|
|
|
|
|
|
remind
|
|
|
|
|
|
generate
|
|
|
|
|
|
optimize
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 2.4 entities
|
|
|
|
|
|
|
|
|
|
|
|
识别出的业务对象。
|
|
|
|
|
|
|
|
|
|
|
|
统一结构:
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"type": "invoice",
|
|
|
|
|
|
"value": "INV-202605001",
|
|
|
|
|
|
"normalized_value": "INV-202605001",
|
|
|
|
|
|
"role": "target",
|
|
|
|
|
|
"confidence": 0.92
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
常见实体:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
employee
|
|
|
|
|
|
department
|
|
|
|
|
|
customer
|
|
|
|
|
|
vendor
|
|
|
|
|
|
invoice
|
|
|
|
|
|
contract
|
|
|
|
|
|
reimbursement_request
|
|
|
|
|
|
payment_order
|
|
|
|
|
|
receipt
|
|
|
|
|
|
bank_transaction
|
|
|
|
|
|
cost_center
|
|
|
|
|
|
project
|
|
|
|
|
|
policy
|
|
|
|
|
|
approval_node
|
|
|
|
|
|
rule
|
|
|
|
|
|
task
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 2.5 time_range
|
|
|
|
|
|
|
|
|
|
|
|
统一描述时间。
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"raw": "上个月",
|
|
|
|
|
|
"start": "2026-04-01",
|
|
|
|
|
|
"end": "2026-04-30",
|
|
|
|
|
|
"granularity": "month"
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
Hermes 定时任务也使用同一字段。
|
|
|
|
|
|
|
|
|
|
|
|
例如每日风险巡检:
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"raw": "昨日",
|
|
|
|
|
|
"start": "2026-05-09",
|
|
|
|
|
|
"end": "2026-05-09",
|
|
|
|
|
|
"granularity": "day"
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 2.6 constraints
|
|
|
|
|
|
|
|
|
|
|
|
查询、判断或执行条件。
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"status": "overdue",
|
|
|
|
|
|
"aging_days": ">30",
|
|
|
|
|
|
"amount": {
|
|
|
|
|
|
"operator": ">",
|
|
|
|
|
|
"value": 50000,
|
|
|
|
|
|
"currency": "CNY"
|
|
|
|
|
|
},
|
|
|
|
|
|
"department": "销售部",
|
|
|
|
|
|
"risk_level": ["medium", "high"]
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 2.7 risk_signals
|
|
|
|
|
|
|
|
|
|
|
|
风险信号。
|
|
|
|
|
|
|
|
|
|
|
|
建议枚举:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
duplicate_invoice
|
|
|
|
|
|
missing_attachment
|
|
|
|
|
|
policy_overrun
|
|
|
|
|
|
over_budget
|
|
|
|
|
|
overdue_receivable
|
|
|
|
|
|
bad_debt_risk
|
|
|
|
|
|
vendor_payment_risk
|
|
|
|
|
|
payment_mismatch
|
|
|
|
|
|
contract_mismatch
|
|
|
|
|
|
cashflow_pressure
|
|
|
|
|
|
mcp_unavailable
|
|
|
|
|
|
rule_quality_issue
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 2.8 next_step
|
|
|
|
|
|
|
|
|
|
|
|
下一步动作。
|
|
|
|
|
|
|
|
|
|
|
|
建议枚举:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
answer
|
|
|
|
|
|
ask_clarification
|
|
|
|
|
|
query_database
|
|
|
|
|
|
run_rule
|
|
|
|
|
|
call_mcp
|
|
|
|
|
|
search_knowledge
|
|
|
|
|
|
create_draft
|
|
|
|
|
|
create_task
|
|
|
|
|
|
generate_report
|
|
|
|
|
|
notify_user
|
|
|
|
|
|
escalate_to_human
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 3. 扩展字段
|
|
|
|
|
|
|
|
|
|
|
|
后续可以增加:
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"schema_version": "1.1",
|
|
|
|
|
|
"confidence": 0.86,
|
|
|
|
|
|
"ambiguity": [],
|
|
|
|
|
|
"missing_slots": [],
|
|
|
|
|
|
"required_capabilities": [],
|
|
|
|
|
|
"normalized_query": "",
|
|
|
|
|
|
"permission_scope": {},
|
|
|
|
|
|
"audit_tags": []
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2026-05-12 01:20:53 +00:00
|
|
|
|
## 4. 混合语义解析架构
|
|
|
|
|
|
|
|
|
|
|
|
第一版可上线实现不应只依赖关键词和正则。
|
|
|
|
|
|
|
|
|
|
|
|
推荐采用:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
输入上下文装配
|
|
|
|
|
|
用户文本 + 页面上下文 + 附件名称 + OCR/VLM 摘要
|
|
|
|
|
|
↓
|
|
|
|
|
|
预抽取
|
|
|
|
|
|
时间、金额、单号、显式对象
|
|
|
|
|
|
↓
|
|
|
|
|
|
LLM 结构化解析
|
|
|
|
|
|
输出 scenario / intent / entities / missing_slots / ambiguity
|
|
|
|
|
|
↓
|
|
|
|
|
|
Schema 校验
|
|
|
|
|
|
JSON 解析、字段枚举、必填校验、类型归一化
|
|
|
|
|
|
↓
|
|
|
|
|
|
规则兜底
|
|
|
|
|
|
模型失败、低置信度或字段缺失时回退到规则解析
|
|
|
|
|
|
↓
|
|
|
|
|
|
澄清追问
|
|
|
|
|
|
低置信度、歧义、缺槽位时不允许直接查库
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
设计原则:
|
|
|
|
|
|
|
|
|
|
|
|
- 模型优先负责“理解意图和场景”。
|
|
|
|
|
|
- 规则优先负责“校验、补全和兜底”。
|
|
|
|
|
|
- 附件名称、OCR、VLM 结果只能作为证据,不等于已确认事实。
|
|
|
|
|
|
- 所有语义输出都必须标记置信度和来源。
|
|
|
|
|
|
|
|
|
|
|
|
## 5. 推荐新增字段
|
|
|
|
|
|
|
|
|
|
|
|
为支持模型优先解析,建议在扩展字段中至少增加:
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"missing_slots": [],
|
|
|
|
|
|
"ambiguity": [],
|
|
|
|
|
|
"field_confidence": {},
|
|
|
|
|
|
"field_source": {},
|
|
|
|
|
|
"attachment_context": [],
|
|
|
|
|
|
"parse_strategy": "llm_primary_with_rule_fallback"
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
字段说明:
|
|
|
|
|
|
|
|
|
|
|
|
- `missing_slots`:还缺哪些关键字段,例如费用类型、单据号、客户单位。
|
|
|
|
|
|
- `ambiguity`:当前可能混淆的理解结果。
|
|
|
|
|
|
- `field_confidence`:字段级置信度,而不是只给整体分数。
|
|
|
|
|
|
- `field_source`:字段来自 `llm`、`rule`、`ocr`、`vlm` 还是 `user_context`。
|
|
|
|
|
|
- `attachment_context`:本次可供语义解析使用的附件摘要。
|
|
|
|
|
|
- `parse_strategy`:标记本次是模型主解析还是规则回退。
|
|
|
|
|
|
|
|
|
|
|
|
## 6. 叙述型财务输入
|
|
|
|
|
|
|
|
|
|
|
|
语义层必须支持“不是查询句”的自然叙述。
|
|
|
|
|
|
|
|
|
|
|
|
典型样例:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
我今天去客户现场,招待了客户,花销了1000元
|
|
|
|
|
|
我垫付了打车费和餐费,帮我看看怎么报
|
|
|
|
|
|
上传了三张票,帮我整理成报销草稿
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
这类输入不能默认识别成 `query`。
|
|
|
|
|
|
|
|
|
|
|
|
建议默认策略:
|
|
|
|
|
|
|
|
|
|
|
|
- 优先识别为 `reimbursement` 域。
|
|
|
|
|
|
- 场景优先落到 `daily_expense`、`travel_reimbursement` 或 `attachment_review`。
|
|
|
|
|
|
- 意图优先落到 `create`、`generate` 或 `validate`。
|
|
|
|
|
|
- 缺失关键字段时返回 `ask_clarification`,而不是直接查数据库。
|
|
|
|
|
|
|
|
|
|
|
|
## 7. 模糊短句与澄清规则
|
|
|
|
|
|
|
|
|
|
|
|
以下输入应优先追问:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
我要报销
|
|
|
|
|
|
这个为什么还没处理
|
|
|
|
|
|
帮我看一下这个
|
|
|
|
|
|
上传好了,下一步呢
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
处理原则:
|
|
|
|
|
|
|
|
|
|
|
|
- 不允许直接执行工具。
|
|
|
|
|
|
- 不允许直接落到应收、应付查询。
|
|
|
|
|
|
- 必须生成澄清问题。
|
|
|
|
|
|
- 必须在审计中记录触发追问的原因。
|
|
|
|
|
|
|
2026-05-11 01:53:30 +00:00
|
|
|
|
扩展原则:
|
|
|
|
|
|
|
|
|
|
|
|
- 先不要把所有字段都做成数据库列。
|
|
|
|
|
|
- 语义结果建议存 JSONB。
|
|
|
|
|
|
- 使用 `schema_version` 管理版本。
|
|
|
|
|
|
- Orchestrator 只依赖稳定字段。
|
|
|
|
|
|
- 新字段以可选方式加入,不影响老任务。
|
|
|
|
|
|
|
|
|
|
|
|
## 4. 示例
|
|
|
|
|
|
|
|
|
|
|
|
### 4.1 用户查询应收账龄
|
|
|
|
|
|
|
|
|
|
|
|
用户问:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
上个月哪些客户应收逾期超过 30 天?
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
解析:
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"domain": "accounts_receivable",
|
|
|
|
|
|
"scenario": "receivable_aging",
|
|
|
|
|
|
"intent": "query",
|
|
|
|
|
|
"entities": [
|
|
|
|
|
|
{
|
|
|
|
|
|
"type": "customer",
|
|
|
|
|
|
"value": "客户",
|
|
|
|
|
|
"role": "group_by"
|
|
|
|
|
|
}
|
|
|
|
|
|
],
|
|
|
|
|
|
"time_range": {
|
|
|
|
|
|
"raw": "上个月",
|
|
|
|
|
|
"start": "2026-04-01",
|
|
|
|
|
|
"end": "2026-04-30",
|
|
|
|
|
|
"granularity": "month"
|
|
|
|
|
|
},
|
|
|
|
|
|
"constraints": {
|
|
|
|
|
|
"aging_days": ">30",
|
|
|
|
|
|
"status": "overdue"
|
|
|
|
|
|
},
|
|
|
|
|
|
"risk_signals": ["overdue_receivable"],
|
|
|
|
|
|
"next_step": "query_database"
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 4.2 用户解释发票拦截
|
|
|
|
|
|
|
|
|
|
|
|
用户问:
|
|
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
这张发票为什么报销被拦截?
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
解析:
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"domain": "reimbursement",
|
|
|
|
|
|
"scenario": "invoice_validation",
|
|
|
|
|
|
"intent": "explain",
|
|
|
|
|
|
"entities": [
|
|
|
|
|
|
{
|
|
|
|
|
|
"type": "invoice",
|
|
|
|
|
|
"value": "这张发票",
|
|
|
|
|
|
"role": "target"
|
|
|
|
|
|
}
|
|
|
|
|
|
],
|
|
|
|
|
|
"time_range": {},
|
|
|
|
|
|
"constraints": {},
|
|
|
|
|
|
"risk_signals": ["unknown"],
|
|
|
|
|
|
"next_step": "run_rule"
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 4.3 Hermes 每日风险巡检
|
|
|
|
|
|
|
|
|
|
|
|
任务配置:
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"domain": "reimbursement",
|
|
|
|
|
|
"scenario": "daily_risk_scan",
|
|
|
|
|
|
"intent": "monitor",
|
|
|
|
|
|
"entities": [],
|
|
|
|
|
|
"time_range": {
|
|
|
|
|
|
"raw": "昨日"
|
|
|
|
|
|
},
|
|
|
|
|
|
"constraints": {
|
|
|
|
|
|
"risk_level": ["medium", "high"]
|
|
|
|
|
|
},
|
|
|
|
|
|
"risk_signals": [
|
|
|
|
|
|
"duplicate_invoice",
|
|
|
|
|
|
"missing_attachment",
|
|
|
|
|
|
"policy_overrun"
|
|
|
|
|
|
],
|
|
|
|
|
|
"next_step": "run_rule"
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|