document/development/agent%20plan/11_ocr_invoice_architecture.md

# OCR 票据识别架构

## 1. 定位

OCR 票据识别不是一个简单的图片转文字功能。

它在 X-Financial 中承担四件事：

1. 把用户上传的附件变成结构化票据信息。
2. 为规则中心提供可判断的字段。
3. 为 User Agent 和 Hermes 提供可解释的证据。
4. 为后续审计、复核、争议处理保留可回溯原件。

因此 OCR 应作为独立能力纳入 Capability Registry。

```text
capability_type = mcp | document_processor
capability_code = invoice_ocr
```

## 2. 总体链路

```text
附件上传
  ↓
文件落盘 / 对象存储
  ↓
文件分类
  ↓
OCR 识别
  ↓
字段结构化
  ↓
票据类型归一化
  ↓
发票验真 MCP
  ↓
与报销明细匹配
  ↓
规则中心检查
  ↓
人工修正
  ↓
修正结果沉淀
```

关键原则：

- 文件先持久化，再做 OCR，不允许只在内存里跑完就丢。
- 原件不可覆盖，只能新增版本。
- Agent 不得假设图片内容已知；只有 OCR/VLM 实际解析后才能引用附件内容。

## 3. 阶段拆分

### Phase A：附件接入与文件分类

目标：先识别上传的是什么。

输入：

- 图片
- PDF
- Excel
- Word
- 压缩包

输出：

```json
{
  "document_type": "invoice",
  "mime_type": "image/png",
  "page_count": 1,
  "confidence": 0.91
}
```

分类结果：

```text
invoice
itinerary
contract
payment_receipt
approval_screenshot
other
```

### Phase B：OCR 字段提取

目标：从图片或 PDF 中提取票据字段。

结构：

```json
{
  "invoice_code": "",
  "invoice_number": "",
  "seller_name": "",
  "seller_tax_no": "",
  "buyer_name": "",
  "buyer_tax_no": "",
  "issue_date": "",
  "total_amount": 0,
  "tax_amount": 0,
  "currency": "CNY",
  "ocr_confidence": 0.88
}
```

### Phase C：字段归一化

目标：不同 OCR 服务返回不同字段名，必须统一。

示例：

```text
发票号码 / invoiceNo / invoice_number
  -> invoice_number
```

金额统一：

```json
{
  "raw": "￥1,280.00",
  "value": 1280.00,
  "currency": "CNY"
}
```

### Phase D：验真与状态检查

调用发票验真 MCP。

输出：

```json
{
  "verify_status": "verified",
  "voided": false,
  "red_reversed": false,
  "verified_at": ""
}
```

### Phase E：与报销明细匹配

对比：

- 发票金额 vs 报销金额
- 开票日期 vs 费用日期
- 销售方 vs 商户
- 发票类型 vs 费用类型

输出：

```json
{
  "match_status": "matched",
  "mismatch_fields": [],
  "match_confidence": 0.94
}
```

### Phase F：人工修正与回流

OCR 结果必须允许人工修正。

修正内容进入反馈池：

```json
{
  "field": "invoice_number",
  "before": "12345B",
  "after": "123456",
  "corrected_by": "finance_user",
  "corrected_at": ""
}
```

## 4. 文件存储策略

### 4.1 为什么不能直接把文件塞进数据库

- 原始票据、合同、行程单体积大，数据库行膨胀明显。
- 预览件、缩略图、逐页图片、脱敏件都属于衍生文件，不适合和业务行混存。
- 财务原件需要版本留痕和不可变追溯，文件系统或对象存储更适合。

结论：

- 文件二进制存文件系统或对象存储。
- 数据库仅保存元数据、索引、版本、OCR 结果、验真结果、访问审计和业务关联。

### 4.2 开发环境目录方案

根目录使用后端配置中的 `STORAGE_ROOT_DIR`。

建议目录：

```text
<STORAGE_ROOT_DIR>/
  finance-documents/
    expense_claim/
      2026/
        05/
          <claim_id>/
            <document_id>/
              v1/
                original/
                  source.jpg
                preview/
                  preview.pdf
                pages/
                  page-1.png
                thumbs/
                  thumb.webp
                ocr/
                  ocr-1.json
                verify/
                  verify-1.json
```

说明：

- `claim_id` 为空时，可先挂到 `draft/<conversation_id>/<document_id>/...`，待正式建单后再回填业务关联。
- `v1`、`v2` 表示文件版本，不允许直接覆盖 `v1`。
- 原始文件名用于展示，真实定位依赖 `storage_key` 和 `sha256`。

### 4.3 生产环境存储方案

生产环境建议使用：

- MinIO
- S3
- 阿里云 OSS
- 腾讯云 COS

对象存储推荐键名：

```text
finance-documents/expense_claim/2026/05/<claim_id>/<document_id>/v1/original/source.jpg
finance-documents/expense_claim/2026/05/<claim_id>/<document_id>/v1/preview/preview.pdf
finance-documents/expense_claim/2026/05/<claim_id>/<document_id>/v1/thumbs/thumb.webp
```

数据库必须保存：

```text
storage_provider
storage_bucket
storage_key
sha256
file_size_bytes
mime_type
current_version_no
```

### 4.4 原件、版本与衍生件规则

- 原件不可变：上传后不得覆盖。
- 替换附件只能新增 `document_asset_versions` 记录。
- OCR 原始输出、验真响应、预览件、缩略图都作为衍生件管理。
- 删除操作默认只允许逻辑删除业务关联，不允许物理删除原件。
- 命中审计或争议流程的单据可切换到 `legal_hold` 保留策略，暂停清理。

### 4.5 去重与追溯

- 每个原始文件必须计算 `sha256`。
- 同一个 `sha256` 可提示重复上传，但不能自动覆盖旧版本。
- 发票查重不能只靠文件哈希，还要结合 `invoice_code + invoice_number + issue_date + total_amount`。

## 5. 数据模型建议

推荐配套表：

```text
document_assets
document_asset_versions
document_derivatives
document_ocr_results
invoice_structured_records
invoice_verification_records
expense_item_documents
document_access_logs
```

各表职责：

- `document_assets`：文件主索引
- `document_asset_versions`：原件版本
- `document_derivatives`：缩略图、预览、逐页图片、脱敏件
- `document_ocr_results`：每次 OCR 执行结果
- `invoice_structured_records`：标准化票据字段
- `invoice_verification_records`：验真结果
- `expense_item_documents`：报销明细与票据挂接
- `document_access_logs`：文件查看、下载、导出审计

## 6. 与规则中心关系

OCR 输出供规则使用：

```text
重复报销识别规则
作废发票检查规则
发票抬头异常规则
附件完整性规则
金额不一致规则
OCR 低置信度补录规则
```

规则读取原则：

- 读标准化字段，不直接依赖某个 OCR 服务的原始字段名。
- 需要追证时，从 `document_assets` 和 `document_asset_versions` 找原件。
- 需要解释时，从 `document_ocr_results` 和 `invoice_verification_records` 给证据。

## 7. 与 Agent 关系

User Agent 使用 OCR：

- 解释发票为什么被拦截
- 帮用户补充发票信息
- 提醒上传清晰附件
- 根据 OCR 结果自动回填报销草稿

Hermes 使用 OCR：

- 夜间批量验真
- 扫描重复票据
- 统计发票异常趋势
- 回刷历史低置信度票据

## 8. 安全与审计要求

### 8.1 访问控制

- 原始票据预览、下载应按用户角色控制。
- 财务、审批人、申请人看到的文件范围可以不同。
- 对象存储不要暴露永久公网链接，统一走签名 URL 或后端代理下载。

### 8.2 敏感信息处理

- 身份证、银行卡、手机号等敏感字段如被识别，应支持脱敏预览件。
- 对外展示尽量用衍生件，不直接暴露原件。

### 8.3 审计要求

必须记录：

- 谁上传了原件
- 谁触发了 OCR
- 谁查看或下载了原件
- 谁修正了 OCR 结果
- 谁发起了验真
- 哪次风险判断引用了哪些票据

## 9. 开发阶段建议

```text
Step 1: 附件上传与 document_assets / document_asset_versions 落库
Step 2: 本地文件目录方案打通
Step 3: 接入 OCR MCP 或 OCR 服务
Step 4: 结构化字段归一化
Step 5: 发票验真 MCP
Step 6: 与 expense_claim_items 匹配
Step 7: 风险规则中心接入
Step 8: 人工修正界面
Step 9: Hermes 夜间批量 OCR 与验真巡检
```

当前阶段优先级：

- 先把“文件原件可存、可找、可追溯”做实。
- 再把 OCR 和验真接进来。
- 最后再做大规模自动巡检和脱敏导出。
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
+								# OCR 票据识别架构
 								## 1. 定位
 								OCR 票据识别不是一个简单的图片转文字功能。
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								它在 X-Financial 中承担四件事：
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
 . 把用户上传的附件变成结构化票据信息。
 . 为规则中心提供可判断的字段。
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+. 为 User Agent 和 Hermes 提供可解释的证据。
 . 为后续审计、复核、争议处理保留可回溯原件。
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
 								因此 OCR 应作为独立能力纳入 Capability Registry。
 								```text
 								capability_type = mcp | document_processor
 								capability_code = invoice_ocr
 								```
 								## 2. 总体链路
 								```text
 								附件上传
 								  ↓
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								文件落盘 / 对象存储
 								  ↓
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
+								文件分类
 								  ↓
 								OCR 识别
 								  ↓
 								字段结构化
 								  ↓
 								票据类型归一化
 								  ↓
 								发票验真 MCP
 								  ↓
 								与报销明细匹配
 								  ↓
 								规则中心检查
 								  ↓
 								人工修正
 								  ↓
 								修正结果沉淀
 								```
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								关键原则：
 								- 文件先持久化，再做 OCR，不允许只在内存里跑完就丢。
 								- 原件不可覆盖，只能新增版本。
 								- Agent 不得假设图片内容已知；只有 OCR/VLM 实际解析后才能引用附件内容。
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
+								## 3. 阶段拆分
 								### Phase A：附件接入与文件分类
 								目标：先识别上传的是什么。
 								输入：
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								- 图片
 								- PDF
 								- Excel
 								- Word
 								- 压缩包
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
 								输出：
 								```json
 								{
 								  "document_type": "invoice",
 								  "mime_type": "image/png",
 								  "page_count": 1,
 								  "confidence": 0.91
 								}
 								```
 								分类结果：
 								```text
 								invoice
 								itinerary
 								contract
 								payment_receipt
 								approval_screenshot
 								other
 								```
 								### Phase B：OCR 字段提取
 								目标：从图片或 PDF 中提取票据字段。
 								结构：
 								```json
 								{
 								  "invoice_code": "",
 								  "invoice_number": "",
 								  "seller_name": "",
 								  "seller_tax_no": "",
 								  "buyer_name": "",
 								  "buyer_tax_no": "",
 								  "issue_date": "",
 								  "total_amount": 0,
 								  "tax_amount": 0,
 								  "currency": "CNY",
 								  "ocr_confidence": 0.88
 								}
 								```
 								### Phase C：字段归一化
 								目标：不同 OCR 服务返回不同字段名，必须统一。
 								示例：
 								```text
 								发票号码 / invoiceNo / invoice_number
 								  -> invoice_number
 								```
 								金额统一：
 								```json
 								{
 								  "raw": "￥1,280.00",
 								  "value": 1280.00,
 								  "currency": "CNY"
 								}
 								```
 								### Phase D：验真与状态检查
 								调用发票验真 MCP。
 								输出：
 								```json
 								{
 								  "verify_status": "verified",
 								  "voided": false,
 								  "red_reversed": false,
 								  "verified_at": ""
 								}
 								```
 								### Phase E：与报销明细匹配
 								对比：
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								- 发票金额 vs 报销金额
 								- 开票日期 vs 费用日期
 								- 销售方 vs 商户
 								- 发票类型 vs 费用类型
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
 								输出：
 								```json
 								{
 								  "match_status": "matched",
 								  "mismatch_fields": [],
 								  "match_confidence": 0.94
 								}
 								```
 								### Phase F：人工修正与回流
 								OCR 结果必须允许人工修正。
 								修正内容进入反馈池：
 								```json
 								{
 								  "field": "invoice_number",
 								  "before": "12345B",
 								  "after": "123456",
 								  "corrected_by": "finance_user",
 								  "corrected_at": ""
 								}
 								```
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								## 4. 文件存储策略
 								### 4.1 为什么不能直接把文件塞进数据库
 								- 原始票据、合同、行程单体积大，数据库行膨胀明显。
 								- 预览件、缩略图、逐页图片、脱敏件都属于衍生文件，不适合和业务行混存。
 								- 财务原件需要版本留痕和不可变追溯，文件系统或对象存储更适合。
 								结论：
 								- 文件二进制存文件系统或对象存储。
 								- 数据库仅保存元数据、索引、版本、OCR 结果、验真结果、访问审计和业务关联。
 								### 4.2 开发环境目录方案
 								根目录使用后端配置中的 `STORAGE_ROOT_DIR`。
 								建议目录：
 								```text
 								<STORAGE_ROOT_DIR>/
 								  finance-documents/
 								    expense_claim/
 /
 /
 								          <claim_id>/
 								            <document_id>/
 								              v1/
 								                original/
 								                  source.jpg
 								                preview/
 								                  preview.pdf
 								                pages/
 								                  page-1.png
 								                thumbs/
 								                  thumb.webp
 								                ocr/
 								                  ocr-1.json
 								                verify/
 								                  verify-1.json
 								```
 								说明：
 								- `claim_id` 为空时，可先挂到 `draft/<conversation_id>/<document_id>/...`，待正式建单后再回填业务关联。
 								- `v1`、`v2` 表示文件版本，不允许直接覆盖 `v1`。
 								- 原始文件名用于展示，真实定位依赖 `storage_key` 和 `sha256`。
 								### 4.3 生产环境存储方案
 								生产环境建议使用：
 								- MinIO
 								- S3
 								- 阿里云 OSS
 								- 腾讯云 COS
 								对象存储推荐键名：
 								```text
 								finance-documents/expense_claim/2026/05/<claim_id>/<document_id>/v1/original/source.jpg
 								finance-documents/expense_claim/2026/05/<claim_id>/<document_id>/v1/preview/preview.pdf
 								finance-documents/expense_claim/2026/05/<claim_id>/<document_id>/v1/thumbs/thumb.webp
 								```
 								数据库必须保存：
 								```text
 								storage_provider
 								storage_bucket
 								storage_key
 								sha256
 								file_size_bytes
 								mime_type
 								current_version_no
 								```
 								### 4.4 原件、版本与衍生件规则
 								- 原件不可变：上传后不得覆盖。
 								- 替换附件只能新增 `document_asset_versions` 记录。
 								- OCR 原始输出、验真响应、预览件、缩略图都作为衍生件管理。
 								- 删除操作默认只允许逻辑删除业务关联，不允许物理删除原件。
 								- 命中审计或争议流程的单据可切换到 `legal_hold` 保留策略，暂停清理。
 								### 4.5 去重与追溯
 								- 每个原始文件必须计算 `sha256`。
 								- 同一个 `sha256` 可提示重复上传，但不能自动覆盖旧版本。
 								- 发票查重不能只靠文件哈希，还要结合 `invoice_code + invoice_number + issue_date + total_amount`。
 								## 5. 数据模型建议
 								推荐配套表：
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
 								```text
 								document_assets
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								document_asset_versions
 								document_derivatives
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
+								document_ocr_results
 								invoice_structured_records
 								invoice_verification_records
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								expense_item_documents
 								document_access_logs
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
+								```
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								各表职责：
 								- `document_assets`：文件主索引
 								- `document_asset_versions`：原件版本
 								- `document_derivatives`：缩略图、预览、逐页图片、脱敏件
 								- `document_ocr_results`：每次 OCR 执行结果
 								- `invoice_structured_records`：标准化票据字段
 								- `invoice_verification_records`：验真结果
 								- `expense_item_documents`：报销明细与票据挂接
 								- `document_access_logs`：文件查看、下载、导出审计
 								## 6. 与规则中心关系
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
 								OCR 输出供规则使用：
 								```text
 								重复报销识别规则
 								作废发票检查规则
 								发票抬头异常规则
 								附件完整性规则
 								金额不一致规则
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								OCR 低置信度补录规则
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
+								```
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								规则读取原则：
 								- 读标准化字段，不直接依赖某个 OCR 服务的原始字段名。
 								- 需要追证时，从 `document_assets` 和 `document_asset_versions` 找原件。
 								- 需要解释时，从 `document_ocr_results` 和 `invoice_verification_records` 给证据。
 								## 7. 与 Agent 关系
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
 								User Agent 使用 OCR：
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								- 解释发票为什么被拦截
 								- 帮用户补充发票信息
 								- 提醒上传清晰附件
 								- 根据 OCR 结果自动回填报销草稿
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
 								Hermes 使用 OCR：
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								- 夜间批量验真
 								- 扫描重复票据
 								- 统计发票异常趋势
 								- 回刷历史低置信度票据
 								## 8. 安全与审计要求
 								### 8.1 访问控制
 								- 原始票据预览、下载应按用户角色控制。
 								- 财务、审批人、申请人看到的文件范围可以不同。
 								- 对象存储不要暴露永久公网链接，统一走签名 URL 或后端代理下载。
 								### 8.2 敏感信息处理
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								- 身份证、银行卡、手机号等敏感字段如被识别，应支持脱敏预览件。
 								- 对外展示尽量用衍生件，不直接暴露原件。
 								### 8.3 审计要求
 								必须记录：
 								- 谁上传了原件
 								- 谁触发了 OCR
 								- 谁查看或下载了原件
 								- 谁修正了 OCR 结果
 								- 谁发起了验真
 								- 哪次风险判断引用了哪些票据
 								## 9. 开发阶段建议
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
 								```text
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								Step 1: 附件上传与 document_assets / document_asset_versions 落库
 								Step 2: 本地文件目录方案打通
 								Step 3: 接入 OCR MCP 或 OCR 服务
 								Step 4: 结构化字段归一化
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
+								Step 5: 发票验真 MCP
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								Step 6: 与 expense_claim_items 匹配
 								Step 7: 风险规则中心接入
 								Step 8: 人工修正界面
 								Step 9: Hermes 夜间批量 OCR 与验真巡检
-												feat: 重构 AuditView 支持规则/技能分类，新增 Agent 开发文档

											
										
										
											2026-05-11 01:53:30 +00:00
+								```
-												docs(agent-plan): update architecture docs and remove weekly_execution_details

- Update 00_README.md: refresh architecture overview
- Update 02_semantic_ontology.md: expand semantic layer design
- Update 04_orchestrator_and_runtime_flow.md: add runtime flow details
- Update 05_development_roadmap.md: refine milestone timeline
- Update 06_data_contracts_and_governance.md: add contract specifications
- Update 10_evaluation_and_testset.md: add evaluation framework
- Update 11_ocr_invoice_architecture.md: enhance OCR architecture
- Update 14_financial_document_canonical_model.md: complete model design
- Remove weekly_execution_details/: deprecated in favor of agent week plan

											
										
										
											2026-05-12 01:20:53 +00:00
+								当前阶段优先级：
 								- 先把“文件原件可存、可找、可追溯”做实。
 								- 再把 OCR 和验真接进来。
 								- 最后再做大规模自动巡检和脱敏导出。