446 lines
7.2 KiB
Markdown
446 lines
7.2 KiB
Markdown
|
|
# 数据契约与治理要求
|
|||
|
|
|
|||
|
|
## 1. 推荐数据表
|
|||
|
|
|
|||
|
|
### 1.1 语义本体
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
semantic_ontology_schemas
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
字段:
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
id
|
|||
|
|
schema_version
|
|||
|
|
schema_json
|
|||
|
|
status
|
|||
|
|
created_by
|
|||
|
|
created_at
|
|||
|
|
updated_at
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
semantic_parse_logs
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
字段:
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
id
|
|||
|
|
source
|
|||
|
|
user_id
|
|||
|
|
raw_text
|
|||
|
|
ontology_json
|
|||
|
|
confidence
|
|||
|
|
parse_strategy
|
|||
|
|
created_at
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 1.2 Agent 资产
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
agent_rules
|
|||
|
|
agent_skills
|
|||
|
|
agent_mcp_services
|
|||
|
|
agent_tasks
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
通用字段:
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
id
|
|||
|
|
code
|
|||
|
|
name
|
|||
|
|
description
|
|||
|
|
status
|
|||
|
|
owner
|
|||
|
|
reviewer
|
|||
|
|
config_json
|
|||
|
|
created_at
|
|||
|
|
updated_at
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 1.3 版本与审核
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
agent_asset_versions
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
字段:
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
id
|
|||
|
|
asset_type
|
|||
|
|
asset_id
|
|||
|
|
version
|
|||
|
|
content
|
|||
|
|
change_note
|
|||
|
|
created_by
|
|||
|
|
created_at
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
agent_asset_reviews
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
字段:
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
id
|
|||
|
|
asset_type
|
|||
|
|
asset_id
|
|||
|
|
version
|
|||
|
|
reviewer
|
|||
|
|
review_status
|
|||
|
|
review_note
|
|||
|
|
reviewed_at
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 1.4 运行日志
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
agent_runs
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
字段:
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
id
|
|||
|
|
agent
|
|||
|
|
source
|
|||
|
|
task_id
|
|||
|
|
user_id
|
|||
|
|
ontology_json
|
|||
|
|
status
|
|||
|
|
started_at
|
|||
|
|
finished_at
|
|||
|
|
result_summary
|
|||
|
|
error_message
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
agent_tool_calls
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
字段:
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
id
|
|||
|
|
run_id
|
|||
|
|
tool_type
|
|||
|
|
tool_name
|
|||
|
|
request_json
|
|||
|
|
response_json
|
|||
|
|
status
|
|||
|
|
duration_ms
|
|||
|
|
created_at
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 1.5 财务业务主表
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
expense_claims
|
|||
|
|
expense_claim_items
|
|||
|
|
accounts_receivable
|
|||
|
|
accounts_payable
|
|||
|
|
approval_records
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
治理要求:
|
|||
|
|
|
|||
|
|
- `expense_claims` 作为报销主表,不再继续扩张 `reimbursement_requests`。
|
|||
|
|
- `expense_claim_items` 作为报销明细最小粒度,OCR 匹配、风险识别、票据挂接都优先挂到该粒度。
|
|||
|
|
- `accounts_receivable` 与 `accounts_payable` 保持独立,避免因为 Agent 语义层接入而混用口径。
|
|||
|
|
|
|||
|
|
### 1.6 票据与文件资产表
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
document_assets
|
|||
|
|
document_asset_versions
|
|||
|
|
document_derivatives
|
|||
|
|
expense_item_documents
|
|||
|
|
document_access_logs
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
职责:
|
|||
|
|
|
|||
|
|
- `document_assets`:原始附件主索引
|
|||
|
|
- `document_asset_versions`:原件版本留痕
|
|||
|
|
- `document_derivatives`:预览件、缩略图、脱敏件、逐页图片
|
|||
|
|
- `expense_item_documents`:报销明细与票据关联
|
|||
|
|
- `document_access_logs`:预览、下载、导出审计
|
|||
|
|
|
|||
|
|
### 1.7 OCR、验真与风险表
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
document_ocr_results
|
|||
|
|
invoice_structured_records
|
|||
|
|
invoice_verification_records
|
|||
|
|
risk_events
|
|||
|
|
risk_actions
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
职责:
|
|||
|
|
|
|||
|
|
- `document_ocr_results`:每次 OCR 执行快照
|
|||
|
|
- `invoice_structured_records`:标准化发票字段
|
|||
|
|
- `invoice_verification_records`:发票验真结果留痕
|
|||
|
|
- `risk_events`:风险命中事实
|
|||
|
|
- `risk_actions`:风险处置动作
|
|||
|
|
|
|||
|
|
## 2. API 契约
|
|||
|
|
|
|||
|
|
### 2.1 语义解析
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
POST /api/v1/semantic/parse
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
请求:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"source": "user_message",
|
|||
|
|
"text": "这张发票为什么被拦截?",
|
|||
|
|
"context": {
|
|||
|
|
"user_id": "emp_001",
|
|||
|
|
"current_page": "reimbursement_detail"
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
响应:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"domain": "reimbursement",
|
|||
|
|
"scenario": "invoice_validation",
|
|||
|
|
"intent": "explain",
|
|||
|
|
"entities": [],
|
|||
|
|
"time_range": {},
|
|||
|
|
"constraints": {},
|
|||
|
|
"risk_signals": ["unknown"],
|
|||
|
|
"parse_strategy": "llm_primary",
|
|||
|
|
"next_step": "run_rule"
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.2 Orchestrator 执行
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
POST /api/v1/agent/orchestrate
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
请求:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"source": "user_message",
|
|||
|
|
"ontology": {},
|
|||
|
|
"context": {}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
响应:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"agent": "user_agent",
|
|||
|
|
"tools_called": [],
|
|||
|
|
"answer": "",
|
|||
|
|
"requires_confirmation": false,
|
|||
|
|
"audit_id": ""
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.3 文件上传契约
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
POST /api/v1/documents/upload
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
请求:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"biz_domain": "expense",
|
|||
|
|
"biz_object_type": "expense_claim",
|
|||
|
|
"biz_object_id": "claim_001",
|
|||
|
|
"upload_source": "user_workbench",
|
|||
|
|
"files": [
|
|||
|
|
{
|
|||
|
|
"filename": "invoice.jpg",
|
|||
|
|
"mime_type": "image/jpeg"
|
|||
|
|
}
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
响应:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"documents": [
|
|||
|
|
{
|
|||
|
|
"document_id": "",
|
|||
|
|
"version_no": 1,
|
|||
|
|
"storage_status": "stored",
|
|||
|
|
"ocr_status": "pending"
|
|||
|
|
}
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.4 Hermes 任务
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
POST /api/v1/hermes/tasks/run
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
请求:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"task_code": "daily_risk_scan",
|
|||
|
|
"ontology": {},
|
|||
|
|
"dry_run": false,
|
|||
|
|
"context_json": {
|
|||
|
|
"folder": "报销制度",
|
|||
|
|
"changed_only": true,
|
|||
|
|
"force": false
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
响应:
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"run_id": "",
|
|||
|
|
"status": "accepted"
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
补充:
|
|||
|
|
|
|||
|
|
- Hermes 任务应优先调用系统后台 Hermes CLI 或等价 Hermes 进程。
|
|||
|
|
- `changed_only=true` 时,只处理知识库中发生变化的文档。
|
|||
|
|
- 文档变化判断至少包含 `original_name`、`stored_name`、`sha256`、`version_number`、`updated_at`。
|
|||
|
|
- 若文档无变化,应返回 `unchanged_skipped`,而不是重新形成 LLM Wiki。
|
|||
|
|
|
|||
|
|
## 3. 安全原则
|
|||
|
|
|
|||
|
|
### 3.1 最小权限
|
|||
|
|
|
|||
|
|
Agent 调工具时不能使用超级权限。
|
|||
|
|
|
|||
|
|
权限来源:
|
|||
|
|
|
|||
|
|
- 用户权限
|
|||
|
|
- 任务权限
|
|||
|
|
- 服务账号权限
|
|||
|
|
|
|||
|
|
### 3.2 高风险动作确认
|
|||
|
|
|
|||
|
|
以下动作必须确认:
|
|||
|
|
|
|||
|
|
- 提交报销
|
|||
|
|
- 发起付款
|
|||
|
|
- 生成正式审批意见
|
|||
|
|
- 发布规则
|
|||
|
|
- 发布知识库
|
|||
|
|
- 创建外部通知
|
|||
|
|
|
|||
|
|
### 3.3 审计不可省略
|
|||
|
|
|
|||
|
|
必须记录:
|
|||
|
|
|
|||
|
|
- 谁触发
|
|||
|
|
- 输入是什么
|
|||
|
|
- 解析结果是什么
|
|||
|
|
- 调了哪些工具
|
|||
|
|
- 输出是什么
|
|||
|
|
- 是否确认
|
|||
|
|
|
|||
|
|
### 3.4 文件存储治理
|
|||
|
|
|
|||
|
|
必须遵守:
|
|||
|
|
|
|||
|
|
- 原始文件二进制不落业务主表,不存入大字段 blob。
|
|||
|
|
- 所有文件必须有 `storage_provider`、`storage_key`、`sha256`、`file_size_bytes`、`mime_type`。
|
|||
|
|
- 原件不可覆盖,只能新增版本。
|
|||
|
|
- 删除默认是解除业务关联或逻辑删除,物理删除必须走审计流程。
|
|||
|
|
- 对象存储访问必须使用签名 URL 或后端代理,不直接暴露固定公网地址。
|
|||
|
|
|
|||
|
|
### 3.5 敏感数据治理
|
|||
|
|
|
|||
|
|
对于发票、行程单、合同、付款凭证中的敏感信息:
|
|||
|
|
|
|||
|
|
- 应支持脱敏衍生件
|
|||
|
|
- 应记录查看与下载行为
|
|||
|
|
- 应区分申请人、审批人、财务、管理员可见范围
|
|||
|
|
- 应支持争议单据 `legal_hold` 保留策略
|
|||
|
|
|
|||
|
|
### 3.6 AI 证据治理
|
|||
|
|
|
|||
|
|
Agent 和 OCR 相关能力必须遵守:
|
|||
|
|
|
|||
|
|
- 未经 OCR/VLM 实际解析,不得假设附件内容已知。
|
|||
|
|
- Agent 输出若引用发票金额、号码、日期,必须能追溯到 `invoice_structured_records` 或人工修正记录。
|
|||
|
|
- 风险解释若引用“重复报销”“金额不一致”等判断,必须能追溯到 `risk_events.evidence_json`。
|
|||
|
|
|
|||
|
|
## 4. 数据质量要求
|
|||
|
|
|
|||
|
|
### 4.1 关键唯一性
|
|||
|
|
|
|||
|
|
- `expense_claims.claim_no` 唯一
|
|||
|
|
- `document_assets.sha256` 可重复但必须可检索
|
|||
|
|
- `document_asset_versions(document_id, version_no)` 唯一
|
|||
|
|
- `invoice_structured_records.duplicate_fingerprint` 必须可索引
|
|||
|
|
|
|||
|
|
### 4.2 时间与状态字段
|
|||
|
|
|
|||
|
|
- 所有业务主表必须有 `created_at`、`updated_at`
|
|||
|
|
- 文件上传、OCR、验真、风控、处置必须有独立时间戳
|
|||
|
|
- 状态字段应使用受控枚举,不允许前端自由拼写
|
|||
|
|
|
|||
|
|
### 4.3 可追溯性
|
|||
|
|
|
|||
|
|
任一笔报销单、发票或风险结论,至少应能追到:
|
|||
|
|
|
|||
|
|
- 原始输入文本
|
|||
|
|
- 原始附件
|
|||
|
|
- 结构化结果
|
|||
|
|
- 规则或模型判断
|
|||
|
|
- 人工修正动作
|
|||
|
|
|
|||
|
|
## 5. 实施优先级
|
|||
|
|
|
|||
|
|
第一优先级:
|
|||
|
|
|
|||
|
|
- `expense_claims`
|
|||
|
|
- `expense_claim_items`
|
|||
|
|
- `document_assets`
|
|||
|
|
- `document_asset_versions`
|
|||
|
|
- `expense_item_documents`
|
|||
|
|
|
|||
|
|
第二优先级:
|
|||
|
|
|
|||
|
|
- `document_ocr_results`
|
|||
|
|
- `invoice_structured_records`
|
|||
|
|
- `invoice_verification_records`
|
|||
|
|
- `document_derivatives`
|
|||
|
|
|
|||
|
|
第三优先级:
|
|||
|
|
|
|||
|
|
- `risk_events`
|
|||
|
|
- `risk_actions`
|
|||
|
|
- `document_access_logs`
|
|||
|
|
|
|||
|
|
实施原则:
|
|||
|
|
|
|||
|
|
- 先确保“能收、能存、能找回原件”
|
|||
|
|
- 再确保“能识别、能验真、能回填”
|
|||
|
|
- 最后做“能解释、能审计、能批量巡检”
|