20 Commits

Author SHA1 Message Date
caoxiaozhu
222ba0bfdc refactor(server): split oversized backend services 2026-05-22 10:42:31 +08:00
caoxiaozhu
2e57702638 docs: add agent code size standards 2026-05-22 10:42:19 +08:00
caoxiaozhu
5fe3b201d9 feat: 重构报销单服务并完善前端提交与审核交互
重构 expense_claims 服务模块结构并优化差旅票据审核逻辑,
增强用户代理服务的票据类型识别,前端报销创建页面拆分为
附件模型和会话模型模块,重构提交编排器和草稿关联确认流
程,更新知识库索引,补充单元测试。
2026-05-22 08:58:59 +08:00
caoxiaozhu
f6f787ff38 refactor(frontend): split large reimbursement and audit modules 2026-05-21 23:53:03 +08:00
caoxiaozhu
2908dda024 fix(reimbursement): harden assistant draft and claim cleanup 2026-05-21 23:52:34 +08:00
caoxiaozhu
e701fa01da feat: 增强差旅报销审核流程与票据智能推理
优化本体解析和编排器的差旅场景处理能力,完善报销单草稿
保存和费用明细同步逻辑,前端报销创建页面增加行程推理和
票据审核交互,新增助手会话快照工具函数,补充单元测试。
2026-05-21 16:09:47 +08:00
caoxiaozhu
f28d7e6d16 feat: 完善差旅票据行程提取与费用明细回填逻辑
增强文档智能识别的票据场景关键词和字段提取能力,优化
会话关联草稿报销单的解析路径,修复费用明细合并和票据
去重边界问题,前端改进报销创建和审批详情交互,补充单
元测试覆盖。
2026-05-21 14:24:51 +08:00
caoxiaozhu
b183b0bd5e feat: 细化差旅票据费用明细分类并自动计算出差补贴
将差旅费用明细拆分为火车票、机票、住宿票、乘车等细分类
型,根据票据字段自动生成行程/事由描述,结合规则引擎自
动计算出差补贴金额,前端适配费用明细编辑和差旅票据审
核交互,补充单元测试覆盖。
2026-05-21 10:57:06 +08:00
caoxiaozhu
8f65661809 feat: 增加差旅报销标准测算和财务终审流程
新增差旅报销测算接口及 Spreadsheet 规则解析,审批流程拆分
直属领导审批与财务终审两阶段并细分权限,修复 PDF 文本层
缺失时自动回退 OCR,提交后清理关联会话,前端适配审批流
交互并补充单元测试。
2026-05-21 09:28:33 +08:00
caoxiaozhu
002bf4f756 feat: 完善报销单审批流程及退回原因追踪
新增直属领导审批通过接口和审批待办列表查询,报销单退回
支持原因码分类和审批环节标记,优化票据附件去重和路径
回退查找,前端新增退回原因对话框、审批收件箱和工作台
图标组件,补充工具函数和单元测试覆盖。
2026-05-20 21:00:47 +08:00
caoxiaozhu
f8b25a7ccc fix: 修复员工服务、报销单审批及前端交互细节
- 修复员工创建时组织架构关联与邮箱校验逻辑
- 修复报销单API端点参数及预审流程调用
- 优化审批中心、差旅详情等前端页面交互
- 更新侧边栏导航与请求视图模型
- 补充员工服务与报销单相关测试用例
2026-05-20 14:32:35 +08:00
caoxiaozhu
d7e98a58b9 feat: 增强员工管理与报销单全流程功能
- 新增员工Excel导入服务(employee_spreadsheet)及导入/导出API端点
- 员工服务增加批量创建、邮箱唯一校验、组织架构关联等能力
- 报销单提交补充身份回填、部门信息透传及预审结果展示优化
- 认证流程增加部门信息(departmentName)并在schema中同步扩展
- 用户Agent服务增加部门关联与报销单回填逻辑
- 前端员工管理页面全面重构,新增导入导出、搜索过滤、分页等功能
- 前端审批中心、审计、差旅报销等视图交互与样式优化
- 新增TableLoadingState共享组件及员工导入测试用例
2026-05-20 14:21:56 +08:00
caoxiaozhu
57957d11a0 feat: 重构报销单AI预审流程并添加平台风险规则引擎
- 将AI验审改为AI预审,高风险不再拦截而是随单流转给审批人复核
- 新增平台风险规则评估引擎,支持事由过短、票据异常、重复发票等多种评估器
- 用户上下文增加部门信息(department_name),认证流程同步关联组织架构
- 规则scenario_json改为中文标签(差旅/费用科目),统一场景分类
- 新增orchestrator审核流程测试用例
- 前端更新审计视图、差旅报销等相关页面
2026-05-20 09:36:01 +08:00
caoxiaozhu
2574bc81d1 chore: 更新个人工作台和差旅报销相关功能 2026-05-19 17:24:13 +00:00
caoxiaozhu
54ffef66d3 feat: 添加风险规则及 agent assets 功能增强 2026-05-19 16:19:03 +00:00
caoxiaozhu
d460ee0fe7 fix(agent): 修复规则中心表格版本和修改记录
补齐规则资产 JSON 读写接口和前端调用,修复 AuditView 导入缺失。

Excel 在线编辑改为比对所有页签并生成最近修改记录,版本快照统一保存到 rules/finance-rules/.versions。

隔离规则表测试存储,避免测试或旧入口写入真实规则目录与 storage/agent_assets。
2026-05-19 15:41:53 +00:00
caoxiaozhu
9472813739 refactor: 重构 AuditView 和 TravelReimbursementCreateView 相关代码
- 优化 agent_assets、agent_foundation、user_agent 服务层结构
- 更新 AuditView 视图和脚本
- 更新 TravelReimbursementCreateView 脚本

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 20:23:58 +08:00
caoxiaozhu
dc007f948a feat(rules): 添加公司通信费报销规则 Excel 文件并更新差旅费规则表 2026-05-18 10:06:23 +00:00
caoxiaozhu
9db663e81f feat(AuditView): 扩展场景标签配置
- 新增通信费报销(communication_expense)和费用标准(expense_standard)场景标签
2026-05-18 10:02:04 +00:00
caoxiaozhu
813ac81950 feat(finance): 添加公司通信费报销规则
- 新增通信费报销规则代码和文件名常量
- 在初始化数据中创建公司通信费报销规则资产
- 添加对应的版本和审核记录
- 标记为 v1.0.0 版本并审核通过
2026-05-18 10:01:40 +00:00
279 changed files with 67857 additions and 33811 deletions

39
AGENTS.md Normal file
View File

@@ -0,0 +1,39 @@
# X-Financial Agent 协作规范
## 语言规范
- 所有分析、解释、计划、提交说明和最终回复默认使用简体中文。
- 技术结论要直击重点,必要时给出可验证的文件、命令或测试结果。
## 通用代码拆分规范
无论写前端、后端还是算法代码,都必须主动避免“所有方法堆在一个类里 / 一个组件里 / 一个模块里”的写法。遇到类、组件或核心模块持续变大时,优先按职责拆分,而不是继续追加方法和状态。
### 行数与复杂度目标
- 单个类、核心组件、核心算法模块硬上限为 800 行。
- 普通文件建议控制在 300-600 行。
- 复杂业务文件可以接近 800 行,但必须有清晰职责边界。
- 文件或类超过 800 行必须视为重构预警,不应继续直接追加功能。
- 单个类不应长期承载几十个无关方法,更不应演化成上百个方法的万能类。
### 拆分原则
- 对外 API 尽量保持稳定,先把内部实现拆到小模块。
- 按职责拆分编排、状态管理、持久化、权限、文件存储、OCR/票据分析、规则审核、响应构建、序列化、UI 交互、算法策略、数据转换。
- 新增能力时先判断归属模块;没有合适归属时新增小模块,不要默认塞回主类、主组件或主 Service。
- 拆分必须小步进行,每次提取一个明确职责,并配套运行相关测试。
### X-Financial 重点关注对象
- `ExpenseClaimService`:优先拆分申请单、明细项、附件、票据分析、草稿、规则审核、权限、序列化。
- `UserAgentService`:优先拆分知识库问答、报销预审 payload、Markdown 回复、差旅政策、表单槽位、票据分类、建议动作。
- `OrchestratorService`:优先拆分 agent 路由、工具调用、报销查询、响应构建。
- 前端大型 Vue 页面:优先拆分 composable、view model、样式分片、业务工具函数和子组件。
- 算法/规则模块:优先拆分输入解析、规则匹配、评分策略、结果解释和异常处理。
## 验证规范
- 后端改动优先在 Docker 容器 `x-financial-main` 中运行验证。
- 单元测试设置合理超时,避免长时间卡死。
- 每次重构后至少运行对应服务的定向测试;涉及公共协议时补充端到端或接口测试。

View File

@@ -32,7 +32,21 @@ services:
- >
apt-get update &&
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends
python3 python3-pip python3-venv &&
python3 python3-pip python3-venv fontconfig fonts-noto-cjk fonts-noto-cjk-extra &&
printf '%s\n'
'<?xml version="1.0"?>'
'<!DOCTYPE fontconfig SYSTEM "fonts.dtd">'
'<fontconfig>'
' <alias><family>SimSun</family><prefer><family>Noto Serif CJK SC</family></prefer></alias>'
' <alias><family>NSimSun</family><prefer><family>Noto Serif CJK SC</family></prefer></alias>'
' <alias><family>KaiTi</family><prefer><family>Noto Serif CJK SC</family></prefer></alias>'
' <alias><family>FangSong</family><prefer><family>Noto Serif CJK SC</family></prefer></alias>'
' <alias><family>SimHei</family><prefer><family>Noto Sans CJK SC</family></prefer></alias>'
' <alias><family>DengXian</family><prefer><family>Noto Sans CJK SC</family></prefer></alias>'
' <alias><family>Microsoft YaHei</family><prefer><family>Noto Sans CJK SC</family></prefer></alias>'
'</fontconfig>'
> /etc/fonts/local.conf &&
fc-cache -f &&
mkdir -p /run/sshd && /usr/sbin/sshd &&
printf '%s\n' 'cd /app >/dev/null 2>&1 || true' > /etc/profile.d/zz-x-financial-app-dir.sh &&
chmod 644 /etc/profile.d/zz-x-financial-app-dir.sh &&

1
nul
View File

@@ -1 +0,0 @@
/usr/bin/bash: line 1: rg: command not found

View File

@@ -0,0 +1,32 @@
{
"schema_version": "1.0",
"rule_code": "risk.expense.consecutive_transport_receipts",
"name": "连号交通票据",
"enabled": true,
"risk_dimension": "consecutive_receipts",
"ontology_signal": "consecutive_transport_receipts",
"evaluator": "consecutive_transport_receipts",
"applies_to": {
"expense_types": ["transport", "travel"],
"min_attachments": 2
},
"inputs": {
"invoice_no": "attachment.invoice_no"
},
"params": {
"min_consecutive_count": 3
},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "medium",
"action": "manual_review"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 三、车辆交通 / 连号票集中报销",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,29 @@
{
"schema_version": "1.0",
"rule_code": "risk.expense.entertainment_missing_detail",
"name": "招待费事由不完整",
"enabled": true,
"risk_dimension": "entertainment_detail",
"ontology_signal": "entertainment_missing_detail",
"evaluator": "entertainment_reason_missing",
"applies_to": {
"domains": ["meal"]
},
"inputs": {
"reason": "claim.reason_corpus"
},
"params": {},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "medium",
"action": "warn"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 三、餐费招待 / 业务招待无事由对象",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,30 @@
{
"schema_version": "1.0",
"rule_code": "risk.expense.meal_localized_as_travel",
"name": "同城餐饮混入差旅",
"enabled": true,
"risk_dimension": "meal_travel_mix",
"ontology_signal": "meal_as_travel",
"evaluator": "meal_as_travel_same_city",
"applies_to": {
"domains": ["travel"]
},
"inputs": {
"declared": "claim.location",
"meal_city": "attachment.cities"
},
"params": {},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "medium",
"action": "warn"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 三、餐费招待 / 同城餐饮归集异地差旅",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,29 @@
{
"schema_version": "1.0",
"rule_code": "risk.expense.reason_too_brief",
"name": "报销事由过短",
"enabled": true,
"risk_dimension": "reason_quality",
"ontology_signal": "reason_too_brief",
"evaluator": "reason_too_brief",
"applies_to": {},
"inputs": {
"reason": "claim.reason_corpus"
},
"params": {
"min_reason_length": 6
},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "medium",
"action": "warn"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 通用 / 事由不足以支撑真实性判断",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,32 @@
{
"schema_version": "1.0",
"rule_code": "risk.invoice.claimant_buyer_name_match",
"name": "报销人与发票抬头一致",
"enabled": true,
"risk_dimension": "identity_consistency",
"ontology_signal": "buyer_name_mismatch",
"evaluator": "identity_consistency",
"applies_to": {
"min_attachments": 1
},
"inputs": {
"claimant": "claim.employee_name",
"buyer": "attachment.buyer_name"
},
"params": {
"allow_keywords": ["代报", "集团", "公司", "有限公司"]
},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "high",
"action": "manual_review"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 二、发票类 / 抬头错误",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,30 @@
{
"schema_version": "1.0",
"rule_code": "risk.invoice.cross_year_invoice",
"name": "跨年发票入账",
"enabled": true,
"risk_dimension": "cross_year_invoice",
"ontology_signal": "cross_year_invoice",
"evaluator": "cross_year_invoice",
"applies_to": {
"min_attachments": 1
},
"inputs": {
"invoice_date": "attachment.invoice_date",
"claim_date": ["claim.occurred_at", "item.item_date"]
},
"params": {},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "medium",
"action": "warn"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 二、发票类 / 跨年发票",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,30 @@
{
"schema_version": "1.0",
"rule_code": "risk.invoice.document_expense_mismatch",
"name": "开票内容与报销场景不符",
"enabled": true,
"risk_dimension": "document_expense_mismatch",
"ontology_signal": "document_expense_mismatch",
"evaluator": "document_expense_mismatch",
"applies_to": {
"min_attachments": 1
},
"inputs": {
"document_type": "attachment.document_type",
"expense_type": ["claim.expense_type", "item.item_type"]
},
"params": {},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "medium",
"action": "warn"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 二、发票类 / 开票内容与业务不符",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,29 @@
{
"schema_version": "1.0",
"rule_code": "risk.invoice.duplicate_invoice",
"name": "发票重复报销",
"enabled": true,
"risk_dimension": "duplicate_invoice",
"ontology_signal": "duplicate_invoice",
"evaluator": "duplicate_invoice",
"applies_to": {
"min_attachments": 1
},
"inputs": {
"invoice_no": "attachment.invoice_no"
},
"params": {},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "high",
"action": "block"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 二、发票类 / 重复报销",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,30 @@
{
"schema_version": "1.0",
"rule_code": "risk.invoice.vague_goods_description",
"name": "发票品名过于笼统",
"enabled": true,
"risk_dimension": "vague_goods_description",
"ontology_signal": "vague_goods_description",
"evaluator": "vague_goods_description",
"applies_to": {
"expense_types": ["office", "other"],
"min_attachments": 1
},
"inputs": {
"ocr": "attachment.ocr_text"
},
"params": {},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "medium",
"action": "warn"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 二、发票类 / 品名笼统",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,30 @@
{
"schema_version": "1.0",
"rule_code": "risk.invoice.void_or_red_invoice",
"name": "作废或红冲发票",
"enabled": true,
"risk_dimension": "void_or_red_invoice",
"ontology_signal": "void_or_red_invoice",
"evaluator": "invoice_void_or_red",
"applies_to": {
"min_attachments": 1
},
"inputs": {
"status": "attachment.invoice_status",
"ocr": "attachment.ocr_text"
},
"params": {},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "high",
"action": "block"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 二、发票类 / 作废红冲发票",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,30 @@
{
"schema_version": "1.0",
"rule_code": "risk.travel.base_location_overlap",
"name": "常驻地重合出差风险",
"enabled": true,
"risk_dimension": "base_location_overlap",
"ontology_signal": "base_location_overlap",
"evaluator": "base_location_overlap",
"applies_to": {
"domains": ["travel"]
},
"inputs": {
"employee_base": "employee.location",
"declared": "claim.location"
},
"params": {},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "high",
"action": "manual_review"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 一、出差类 / 两头在外",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,29 @@
{
"schema_version": "1.0",
"rule_code": "risk.travel.destination_receipt_location",
"name": "申报地点与票据地点一致",
"risk_dimension": "location_consistency",
"ontology_signal": "location_mismatch",
"evaluator": "location_consistency",
"inputs": {
"declared": "claim.location",
"evidence": ["attachment.cities", "item.item_location"]
},
"params": {
"match_mode": "city_fuzzy",
"missing_evidence": "warn"
},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "high",
"action": "manual_review",
"message_template": "申报地点 {declared} 与票据识别地点 {evidence} 不一致"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"updated_at": "2026-05-18"
}
}

View File

@@ -0,0 +1,32 @@
{
"schema_version": "1.0",
"rule_code": "risk.travel.hotel_without_itinerary",
"name": "住宿城市与行程不一致",
"enabled": true,
"risk_dimension": "hotel_itinerary",
"ontology_signal": "hotel_itinerary_mismatch",
"evaluator": "hotel_without_itinerary",
"applies_to": {
"domains": ["travel"],
"expense_types": ["hotel", "travel"]
},
"inputs": {
"declared": "claim.location",
"hotel": "attachment.hotel_city",
"itinerary": "attachment.route_cities"
},
"params": {},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "high",
"action": "manual_review"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 三、住宿费 / 夜间异地住宿、酒店连续多天",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,30 @@
{
"schema_version": "1.0",
"rule_code": "risk.travel.intracity_travel_claim",
"name": "同城虚报差旅补贴",
"enabled": true,
"risk_dimension": "intracity_travel",
"ontology_signal": "intracity_travel",
"evaluator": "intracity_travel_claim",
"applies_to": {
"domains": ["travel"]
},
"inputs": {
"declared": "claim.location",
"evidence": ["attachment.route", "attachment.cities"]
},
"params": {},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "high",
"action": "manual_review"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 一、出差类 / 同城虚报差旅",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,30 @@
{
"schema_version": "1.0",
"rule_code": "risk.travel.multi_city_reason_required",
"name": "多城市行程需说明",
"enabled": true,
"risk_dimension": "multi_city_itinerary",
"ontology_signal": "multi_city_itinerary",
"evaluator": "multi_city_reason_required",
"applies_to": {
"domains": ["travel"]
},
"inputs": {
"reason": "claim.reason_corpus",
"cities": ["attachment.cities", "item.item_location"]
},
"params": {},
"outcomes": {
"pass": { "severity": "none", "action": "continue" },
"fail": {
"severity": "medium",
"action": "warn"
}
},
"metadata": {
"owner": "风控与审计部",
"stability": "platform_builtin",
"source_ref": "常用risk.txt / 一、出差类 / 绕道出行、行程不符",
"updated_at": "2026-05-19"
}
}

View File

@@ -0,0 +1,28 @@
#!/usr/bin/env python3
"""Sync platform risk rule assets from server/rules/risk-rules/*.json."""
from __future__ import annotations
import sys
from pathlib import Path
SERVER_SRC = Path(__file__).resolve().parents[1] / "src"
if str(SERVER_SRC) not in sys.path:
sys.path.insert(0, str(SERVER_SRC))
from app.db.session import get_session_factory # noqa: E402
from app.services.agent_foundation import AgentFoundationService # noqa: E402
def main() -> None:
db = get_session_factory()()
try:
count = AgentFoundationService(db).sync_platform_risk_rules_from_library()
db.commit()
print(f"Synced {count} risk rule manifest(s) from library.")
finally:
db.close()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,13 @@
#!/usr/bin/env python3
import json
import urllib.request
base = "http://127.0.0.1:8000/api/v1"
items = json.loads(urllib.request.urlopen(f"{base}/agent-assets?asset_type=rule").read())
risk = next((i for i in items if str(i.get("code", "")).startswith("risk.")), None)
print("risk asset:", risk.get("code") if risk else None)
if not risk:
raise SystemExit(1)
resp = urllib.request.urlopen(f"{base}/agent-assets/{risk['id']}/rule-json")
payload = json.loads(resp.read())
print("rule-json ok:", payload.get("file_name"), payload.get("evaluator"))

View File

@@ -22,6 +22,7 @@ class CurrentUserContext:
name: str
role_codes: list[str]
is_admin: bool
department_name: str = ""
def get_current_user(
@@ -41,6 +42,10 @@ def get_current_user(
str | None,
Header(description="是否管理员,支持 `true/false/1/0`。"),
] = None,
x_auth_department: Annotated[
str | None,
Header(description="当前登录人的所属部门。"),
] = None,
) -> CurrentUserContext:
role_codes = [item.strip() for item in (x_auth_role_codes or "").split(",") if item.strip()]
is_admin = str(x_auth_is_admin or "").strip().lower() in {"1", "true", "yes", "on"}
@@ -59,6 +64,7 @@ def get_current_user(
name=name or username,
role_codes=role_codes,
is_admin=is_admin,
department_name=(x_auth_department or "").strip(),
)

View File

@@ -23,8 +23,9 @@ from app.schemas.agent_asset import (
AgentAssetRead,
AgentAssetReviewCreate,
AgentAssetReviewRead,
AgentAssetRuleJsonRead,
AgentAssetRuleJsonWrite,
AgentAssetSpreadsheetChangeRecordRead,
AgentAssetVersionCompareRead,
AgentAssetUpdate,
AgentAssetVersionCreate,
AgentAssetVersionRead,
@@ -50,7 +51,7 @@ RuleReviewerUser = Annotated[CurrentUserContext, Depends(require_rule_reviewer_u
def _handle_asset_error(exc: Exception) -> None:
if isinstance(exc, LookupError):
if isinstance(exc, (LookupError, FileNotFoundError)):
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=str(exc)) from exc
if isinstance(exc, PermissionError):
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
@@ -111,6 +112,48 @@ def get_agent_asset(asset_id: str, db: DbSession) -> AgentAssetRead:
return asset
@router.get(
"/{asset_id}/rule-json",
response_model=AgentAssetRuleJsonRead,
summary="读取风险规则 JSON",
description="读取 JSON 风险规则资产绑定的规则文件内容。",
)
def get_agent_asset_rule_json(
asset_id: str,
_: CurrentUser,
db: DbSession,
) -> AgentAssetRuleJsonRead:
try:
return AgentAssetService(db).read_rule_json(asset_id)
except Exception as exc:
_handle_asset_error(exc)
@router.put(
"/{asset_id}/rule-json",
response_model=AgentAssetRuleJsonRead,
summary="保存风险规则 JSON",
description="保存 JSON 风险规则资产绑定的规则文件内容,并写入审计日志。",
)
def save_agent_asset_rule_json(
asset_id: str,
payload: AgentAssetRuleJsonWrite,
current_user: RuleEditorUser,
db: DbSession,
x_actor: ActorHeader = None,
x_request_id: RequestIdHeader = None,
) -> AgentAssetRuleJsonRead:
try:
return AgentAssetService(db).write_rule_json(
asset_id,
body=payload,
actor=(x_actor or current_user.name or "system").strip() or "system",
request_id=x_request_id,
)
except Exception as exc:
_handle_asset_error(exc)
@router.get(
"/{asset_id}/spreadsheet/onlyoffice-config",
response_model=AgentAssetOnlyOfficeConfigRead,
@@ -123,7 +166,7 @@ def get_agent_asset_spreadsheet_onlyoffice_config(
db: DbSession,
version: Annotated[
str | None,
Query(description="可选的规则版本号;不传时默认当前版本"),
Query(description="兼容旧前端的可选参数;表格规则始终打开当前规则表"),
] = None,
) -> AgentAssetOnlyOfficeConfigRead:
try:
@@ -140,7 +183,7 @@ def get_agent_asset_spreadsheet_onlyoffice_config(
"/{asset_id}/spreadsheet/content",
response_class=FileResponse,
summary="下载或预览规则 Excel 文件",
description="按版本返回规则 Excel 快照,用于浏览器预览或下载。",
description="返回当前规则 Excel 文件,用于浏览器预览或下载。",
)
def get_agent_asset_spreadsheet_content(
asset_id: str,
@@ -148,7 +191,7 @@ def get_agent_asset_spreadsheet_content(
db: DbSession,
version: Annotated[
str | None,
Query(description="可选的规则版本号;不传时默认当前版本"),
Query(description="兼容旧前端的可选参数;不传时返回当前规则表"),
] = None,
) -> FileResponse:
try:
@@ -171,18 +214,18 @@ def get_agent_asset_spreadsheet_content(
def get_agent_asset_spreadsheet_onlyoffice_content(
asset_id: str,
db: DbSession,
version: Annotated[
str,
Query(min_length=1, description="规则版本号。"),
],
access_token: Annotated[
str,
Query(min_length=1, description="ONLYOFFICE 临时访问令牌。"),
],
version: Annotated[
str | None,
Query(description="兼容旧 ONLYOFFICE URL当前表格模式不再使用。"),
] = None,
) -> FileResponse:
try:
service = AgentAssetService(db)
service.validate_rule_spreadsheet_access_token(asset_id, version, access_token)
service.validate_rule_spreadsheet_access_token(asset_id, access_token)
file_path, media_type, filename = service.get_rule_spreadsheet_content(
asset_id,
version=version,
@@ -202,7 +245,7 @@ def get_agent_asset_spreadsheet_onlyoffice_content(
response_model=AgentAssetRead,
status_code=status.HTTP_201_CREATED,
summary="上传规则 Excel 文件",
description="为指定规则上传新的 Excel 快照,并自动生成新规则版本",
description="为指定规则上传新的 Excel 文件,并记录本次表格修改",
)
def upload_agent_asset_spreadsheet(
asset_id: str,
@@ -267,16 +310,16 @@ def import_agent_asset_spreadsheet_content(
"/{asset_id}/spreadsheet/onlyoffice/callback",
response_model=AgentAssetOnlyOfficeCallbackRead,
summary="接收规则 Excel 的 ONLYOFFICE 回调",
description="接收 ONLYOFFICE 回写内容,并自动生成新的规则版本",
description="接收 ONLYOFFICE 回写内容,并记录本次表格修改",
)
def handle_agent_asset_spreadsheet_onlyoffice_callback(
asset_id: str,
payload: AgentAssetOnlyOfficeCallbackWrite,
db: DbSession,
version: Annotated[
str,
Query(min_length=1, description="打开编辑器时对应的规则版本号"),
],
str | None,
Query(description="兼容旧 ONLYOFFICE 回调;当前表格模式不再使用"),
] = None,
actor_name: Annotated[
str | None,
Query(description="发起编辑的用户显示名。"),
@@ -557,25 +600,3 @@ def get_agent_asset_version_timeline(
except Exception as exc:
_handle_asset_error(exc)
@router.get(
"/{asset_id}/versions/compare",
response_model=AgentAssetVersionCompareRead,
summary="比较两个规则表版本",
description="对比两个 Excel 规则表版本的工作表变化与单元格级差异。",
)
def compare_agent_asset_spreadsheet_versions(
asset_id: str,
_: CurrentUser,
db: DbSession,
base_version: Annotated[str, Query(min_length=1, description="基准版本号")],
target_version: Annotated[str, Query(min_length=1, description="对比版本号")],
) -> AgentAssetVersionCompareRead:
try:
return AgentAssetService(db).compare_spreadsheet_versions(
asset_id,
base_version=base_version,
target_version=target_version,
)
except Exception as exc:
_handle_asset_error(exc)

View File

@@ -2,12 +2,19 @@ from __future__ import annotations
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, Query, status
from fastapi import APIRouter, Depends, File, HTTPException, Query, UploadFile, status
from fastapi.responses import Response
from sqlalchemy.orm import Session
from app.api.deps import get_db
from app.schemas.common import ErrorResponse
from app.schemas.employee import EmployeeCreate, EmployeeMetaRead, EmployeeRead, EmployeeUpdate
from app.schemas.employee import (
EmployeeCreate,
EmployeeImportResultRead,
EmployeeMetaRead,
EmployeeRead,
EmployeeUpdate,
)
from app.services.employee import EmployeeService
router = APIRouter()
@@ -44,6 +51,67 @@ def list_employees(
return EmployeeService(db).list_employees(status=status_filter, keyword=keyword)
@router.get(
"/import-template",
summary="下载员工导入模板",
description="下载固定格式的员工 Excel 导入模板。",
)
def download_employee_import_template(db: DbSession) -> Response:
content = EmployeeService(db).build_import_template()
return Response(
content=content,
media_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
headers={
"Content-Disposition": 'attachment; filename="employee-import-template.xlsx"'
},
)
@router.get(
"/export",
summary="导出员工 Excel",
description="按筛选条件导出员工目录 Excel 文件。",
)
def export_employees(
db: DbSession,
status_filter: Annotated[
str | None,
Query(alias="status", description="员工状态筛选值。"),
] = None,
keyword: Annotated[
str | None,
Query(description="姓名、工号、邮箱等关键字模糊查询。"),
] = None,
) -> Response:
content = EmployeeService(db).export_employees(status=status_filter, keyword=keyword)
return Response(
content=content,
media_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
headers={"Content-Disposition": 'attachment; filename="employee-export.xlsx"'},
)
@router.post(
"/import",
response_model=EmployeeImportResultRead,
summary="导入员工 Excel",
description="按模板批量导入员工。全部校验通过后才写入数据库,任一行有错则整批不导入。",
)
async def import_employees(
db: DbSession,
file: Annotated[UploadFile, File(description="待导入的员工 Excel 文件。")],
) -> EmployeeImportResultRead:
filename = (file.filename or "").lower()
if not filename.endswith(".xlsx"):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="当前仅支持上传 .xlsx 格式的员工表格。",
)
content = await file.read()
return EmployeeService(db).import_employees(content)
@router.post(
"",
response_model=EmployeeRead,

View File

@@ -12,15 +12,21 @@ from app.schemas.reimbursement import (
ExpenseClaimAttachmentActionResponse,
ExpenseClaimActionResponse,
ExpenseClaimAttachmentRead,
ExpenseClaimApprovalPayload,
ExpenseClaimItemCreate,
ExpenseClaimItemActionResponse,
ExpenseClaimItemUpdate,
ExpenseClaimRead,
ExpenseClaimReturnPayload,
ExpenseClaimUpdate,
ReimbursementCreate,
ReimbursementRead,
TravelReimbursementCalculatorRequest,
TravelReimbursementCalculatorResponse,
)
from app.services.expense_claims import ExpenseClaimService
from app.services.reimbursement import ReimbursementService
from app.services.travel_reimbursement_calculator import TravelReimbursementCalculatorService
router = APIRouter()
DbSession = Annotated[Session, Depends(get_db)]
@@ -48,6 +54,29 @@ def create_reimbursement(payload: ReimbursementCreate, db: DbSession) -> Reimbur
return ReimbursementService(db).create_reimbursement(payload)
@router.post(
"/travel-calculator",
response_model=TravelReimbursementCalculatorResponse,
summary="差旅报销标准测算",
description="根据规则中心的差旅报销表、当前员工职级、出差天数与地点测算住宿和补贴参考金额。",
responses={
status.HTTP_400_BAD_REQUEST: {
"model": ErrorResponse,
"description": "测算入参或规则匹配失败。",
}
},
)
def calculate_travel_reimbursement(
payload: TravelReimbursementCalculatorRequest,
db: DbSession,
current_user: CurrentUser,
) -> TravelReimbursementCalculatorResponse:
try:
return TravelReimbursementCalculatorService(db).calculate(payload, current_user)
except ValueError as error:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(error)) from error
@router.get(
"/claims",
response_model=list[ExpenseClaimRead],
@@ -58,6 +87,16 @@ def list_expense_claims(db: DbSession, current_user: CurrentUser) -> list[Expens
return ExpenseClaimService(db).list_claims(current_user)
@router.get(
"/claims/approvals",
response_model=list[ExpenseClaimRead],
summary="查询当前用户审批待办报销单列表",
description="返回当前登录用户有权处理的待审批报销单据,不混入个人报销列表。",
)
def list_expense_claim_approvals(db: DbSession, current_user: CurrentUser) -> list[ExpenseClaimRead]:
return ExpenseClaimService(db).list_approval_claims(current_user)
@router.get(
"/claims/{claim_id}",
response_model=ExpenseClaimRead,
@@ -77,6 +116,43 @@ def get_expense_claim(claim_id: str, db: DbSession, current_user: CurrentUser) -
return claim
@router.patch(
"/claims/{claim_id}",
response_model=ExpenseClaimRead,
summary="更新草稿报销单",
description="更新草稿待提交报销单的主说明等草稿字段。",
responses={
status.HTTP_404_NOT_FOUND: {
"model": ErrorResponse,
"description": "报销单不存在。",
},
status.HTTP_400_BAD_REQUEST: {
"model": ErrorResponse,
"description": "报销单状态不允许更新。",
},
},
)
def update_expense_claim(
claim_id: str,
payload: ExpenseClaimUpdate,
db: DbSession,
current_user: CurrentUser,
) -> ExpenseClaimRead:
service = ExpenseClaimService(db)
try:
claim = service.update_claim(
claim_id=claim_id,
payload=payload,
current_user=current_user,
)
except ValueError as error:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(error)) from error
if claim is None:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Claim not found")
return claim
@router.patch(
"/claims/{claim_id}/items/{item_id}",
response_model=ExpenseClaimRead,
@@ -415,11 +491,11 @@ def submit_expense_claim(claim_id: str, db: DbSession, current_user: CurrentUser
return claim
@router.delete(
"/claims/{claim_id}",
response_model=ExpenseClaimActionResponse,
summary="删除个人报销草稿",
description="删除当前登录用户可见的草稿报销单",
@router.post(
"/claims/{claim_id}/return",
response_model=ExpenseClaimRead,
summary="退回报销单",
description="财务人员、高级管理人员或当前审批人可将可见报销单退回到待提交状态",
responses={
status.HTTP_404_NOT_FOUND: {
"model": ErrorResponse,
@@ -427,7 +503,73 @@ def submit_expense_claim(claim_id: str, db: DbSession, current_user: CurrentUser
},
status.HTTP_400_BAD_REQUEST: {
"model": ErrorResponse,
"description": "仅草稿状态允许删除",
"description": "当前用户或单据状态允许退回",
},
},
)
def return_expense_claim(
claim_id: str,
payload: ExpenseClaimReturnPayload,
db: DbSession,
current_user: CurrentUser,
) -> ExpenseClaimRead:
service = ExpenseClaimService(db)
try:
claim = service.return_claim(claim_id, current_user, reason=payload.reason, reason_codes=payload.reason_codes)
except ValueError as error:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(error)) from error
if claim is None:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Claim not found")
return claim
@router.post(
"/claims/{claim_id}/approve",
response_model=ExpenseClaimRead,
summary="审批通过报销单",
description="直属领导审批通过后流转到财务审批;财务终审通过后进入归档入账。",
responses={
status.HTTP_404_NOT_FOUND: {
"model": ErrorResponse,
"description": "报销单不存在。",
},
status.HTTP_400_BAD_REQUEST: {
"model": ErrorResponse,
"description": "当前用户或单据状态不允许审批通过。",
},
},
)
def approve_expense_claim(
claim_id: str,
payload: ExpenseClaimApprovalPayload,
db: DbSession,
current_user: CurrentUser,
) -> ExpenseClaimRead:
service = ExpenseClaimService(db)
try:
claim = service.approve_claim(claim_id, current_user, opinion=payload.opinion)
except ValueError as error:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(error)) from error
if claim is None:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Claim not found")
return claim
@router.delete(
"/claims/{claim_id}",
response_model=ExpenseClaimActionResponse,
summary="删除报销单",
description="申请人仅可删除自己的草稿、待补充或退回单据;高级管理人员可删除可见单据,财务人员没有删除权限。",
responses={
status.HTTP_404_NOT_FOUND: {
"model": ErrorResponse,
"description": "报销单不存在。",
},
status.HTTP_400_BAD_REQUEST: {
"model": ErrorResponse,
"description": "当前用户或单据状态不允许删除。",
},
},
)
@@ -442,7 +584,7 @@ def delete_expense_claim(claim_id: str, db: DbSession, current_user: CurrentUser
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Claim not found")
return ExpenseClaimActionResponse(
message=f"{claim.claim_no} 草稿已删除。",
message=f"{claim.claim_no} 报销单已删除。",
claim_id=claim.id,
status="deleted",
)

View File

@@ -93,6 +93,10 @@ class ExpenseClaimItem(Base):
claim = relationship("ExpenseClaim", back_populates="items")
@property
def is_system_generated(self) -> bool:
return str(self.item_type or "").strip().lower() in {"travel_allowance"}
class AccountsReceivableRecord(Base):
__tablename__ = "accounts_receivable"

View File

@@ -56,6 +56,17 @@ class AgentAssetRepository:
stmt = stmt.limit(limit)
return list(self.db.scalars(stmt).all())
def list_versions_for_assets(self, asset_ids: list[str]) -> list[AgentAssetVersion]:
if not asset_ids:
return []
stmt = (
select(AgentAssetVersion)
.where(AgentAssetVersion.asset_id.in_(asset_ids))
.order_by(AgentAssetVersion.asset_id, AgentAssetVersion.created_at.desc())
)
return list(self.db.scalars(stmt).all())
def get_version(self, asset_id: str, version: str) -> AgentAssetVersion | None:
stmt = select(AgentAssetVersion).where(
AgentAssetVersion.asset_id == asset_id,

View File

@@ -28,6 +28,28 @@ class AuditLogRepository:
stmt = stmt.order_by(AuditLog.created_at.desc()).limit(limit)
return list(self.db.scalars(stmt).all())
def list_for_resources(
self,
*,
resource_type: str,
resource_ids: list[str],
action: str | None = None,
limit: int | None = None,
) -> list[AuditLog]:
if not resource_ids:
return []
stmt = select(AuditLog).where(
AuditLog.resource_type == resource_type,
AuditLog.resource_id.in_(resource_ids),
)
if action:
stmt = stmt.where(AuditLog.action == action)
stmt = stmt.order_by(AuditLog.created_at.desc())
if limit is not None:
stmt = stmt.limit(limit)
return list(self.db.scalars(stmt).all())
def create(self, log: AuditLog) -> AuditLog:
self.db.add(log)
self.db.commit()

View File

@@ -93,6 +93,22 @@ class AgentAssetOnlyOfficeCallbackWrite(BaseModel):
users: list[str] = Field(default_factory=list, description="当前编辑用户列表。")
class AgentAssetRuleJsonWrite(BaseModel):
payload: dict[str, Any] = Field(default_factory=dict)
class AgentAssetRuleJsonRead(BaseModel):
file_name: str
rule_code: str
name: str
description: str = ""
evaluator: str = ""
ontology_signal: str | None = None
inputs: dict[str, Any] = Field(default_factory=dict)
outcomes: dict[str, Any] = Field(default_factory=dict)
payload: dict[str, Any] = Field(default_factory=dict)
class AgentAssetVersionTimelineItemRead(BaseModel):
event_type: str
version: str
@@ -117,18 +133,8 @@ class AgentAssetSpreadsheetDiffSheetRead(BaseModel):
change_type: str
class AgentAssetVersionCompareRead(BaseModel):
base_version: str
target_version: str
added_sheet_count: int = 0
removed_sheet_count: int = 0
changed_sheet_count: int = 0
changed_cell_count: int = 0
sheet_changes: list[AgentAssetSpreadsheetDiffSheetRead] = Field(default_factory=list)
cell_changes: list[AgentAssetSpreadsheetDiffCellRead] = Field(default_factory=list)
class AgentAssetSpreadsheetChangeRecordRead(BaseModel):
id: str
actor: str
changed_at: datetime
summary: str
@@ -172,6 +178,8 @@ class AgentAssetListItem(BaseModel):
published_version: str | None
working_version: str | None
config_json: dict[str, Any]
change_count: int = 0
modified_by: str | None = None
created_at: datetime
updated_at: datetime

View File

@@ -1,5 +1,7 @@
from __future__ import annotations
from typing import Any
from pydantic import BaseModel, EmailStr, Field
@@ -12,8 +14,16 @@ class AuthUserRead(BaseModel):
username: str
name: str
role: str
department: str = ""
departmentName: str = ""
position: str = ""
grade: str = ""
employeeNo: str = ""
managerName: str = ""
location: str = ""
costCenter: str = ""
financeOwnerName: str = ""
riskProfile: dict[str, Any] = Field(default_factory=dict)
roleCodes: list[str] = Field(default_factory=list)
email: EmailStr | str
avatar: str

View File

@@ -50,6 +50,7 @@ class EmployeeMetaRead(BaseModel):
totalEmployees: int
statusSummary: list[EmployeeStatusSummaryRead]
roleOptions: list[EmployeeRoleOptionRead]
organizationOptions: list[EmployeeOrganizationRead] = Field(default_factory=list)
class EmployeeRead(BaseModel):
@@ -63,6 +64,7 @@ class EmployeeRead(BaseModel):
position: str
grade: str
manager: str
managerEmployeeNo: str | None = None
financeOwner: str
roles: list[str] = Field(default_factory=list)
roleCodes: list[str] = Field(default_factory=list)
@@ -112,6 +114,28 @@ class EmployeeCreate(BaseModel):
return _parse_optional_date(self.join_date, "入职日期")
class EmployeeImportErrorRead(BaseModel):
row: int
column: str
employeeNo: str = ""
message: str
class EmployeeImportSummaryRead(BaseModel):
totalRows: int = 0
created: int = 0
updated: int = 0
errorCount: int = 0
class EmployeeImportResultRead(BaseModel):
success: bool
message: str
summary: EmployeeImportSummaryRead
errors: list[EmployeeImportErrorRead] = Field(default_factory=list)
importedAt: str | None = None
class EmployeeUpdate(BaseModel):
name: str | None = Field(default=None, min_length=1, max_length=100)
gender: str | None = Field(default=None, max_length=20)
@@ -124,6 +148,8 @@ class EmployeeUpdate(BaseModel):
grade: str | None = Field(default=None, min_length=1, max_length=20)
cost_center: str | None = Field(default=None, max_length=50)
finance_owner_name: str | None = Field(default=None, max_length=100)
organization_unit_code: str | None = Field(default=None, max_length=50)
manager_employee_no: str | None = Field(default=None, max_length=50)
role_codes: list[str] | None = None
password: str | None = Field(default=None, min_length=5, max_length=128)

View File

@@ -41,6 +41,7 @@ class ExpenseClaimItemRead(BaseModel):
item_location: str
item_amount: Decimal
invoice_id: str | None
is_system_generated: bool = False
created_at: datetime
updated_at: datetime
@@ -51,6 +52,7 @@ class ExpenseClaimAttachmentAnalysisRead(BaseModel):
headline: str
summary: str
points: list[str] = Field(default_factory=list)
rule_basis: list[str] = Field(default_factory=list)
suggestion: str = ""
@@ -112,6 +114,10 @@ class ExpenseClaimItemCreate(BaseModel):
invoice_id: str | None = None
class ExpenseClaimUpdate(BaseModel):
reason: str | None = Field(default=None, max_length=500)
class ExpenseClaimRead(BaseModel):
model_config = ConfigDict(from_attributes=True)
@@ -148,11 +154,54 @@ class ExpenseClaimActionResponse(BaseModel):
status: str | None = None
class ExpenseClaimReturnPayload(BaseModel):
reason: str | None = Field(default=None, max_length=500)
reason_codes: list[str] = Field(default_factory=list, max_length=10)
class ExpenseClaimApprovalPayload(BaseModel):
opinion: str | None = Field(default=None, max_length=500)
class TravelReimbursementCalculatorRequest(BaseModel):
days: int = Field(ge=1, le=365)
location: str = Field(min_length=1, max_length=120)
grade: str | None = Field(default=None, max_length=30)
class TravelReimbursementCalculatorResponse(BaseModel):
days: int
location: str
matched_city: str
city_tier: str
grade: str
grade_band: str
grade_band_label: str
hotel_rate: Decimal
hotel_amount: Decimal
allowance_region: str
meal_allowance_rate: Decimal
basic_allowance_rate: Decimal
total_allowance_rate: Decimal
allowance_amount: Decimal
total_amount: Decimal
rule_name: str
rule_version: str
formula_text: str
summary_text: str
class ExpenseClaimAttachmentActionResponse(BaseModel):
message: str
claim_id: str
item_id: str
invoice_id: str | None = None
item_date: date | None = None
item_type: str | None = None
item_reason: str | None = None
item_location: str | None = None
item_amount: Decimal | None = None
claim_amount: Decimal | None = None
attachment: ExpenseClaimAttachmentRead | None = None

View File

@@ -22,6 +22,7 @@ class UserAgentSuggestedAction(BaseModel):
label: str = Field(description="建议动作文案。")
action_type: str = Field(description="动作类型,例如 open_detail / create_draft。")
description: str = Field(default="", description="动作说明。")
payload: dict[str, Any] = Field(default_factory=dict, description="动作携带的结构化参数。")
class UserAgentDraftPayload(BaseModel):
@@ -85,6 +86,8 @@ class UserAgentReviewRiskBrief(BaseModel):
title: str = Field(description="风险或注意事项标题。")
level: str = Field(default="info", description="级别,例如 info / warning / high。")
content: str = Field(description="面向用户展示的摘要说明。")
detail: str = Field(default="", description="点击风险项后展示的详细解释。")
suggestion: str = Field(default="", description="面向用户的处理建议。")
class UserAgentReviewSlotCard(BaseModel):

View File

@@ -0,0 +1,98 @@
from __future__ import annotations
from app.models.agent_asset import AgentAsset
from app.schemas.agent_asset import AgentAssetRuleJsonRead, AgentAssetRuleJsonWrite
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY, RULE_LIBRARY_NAMES
class AgentAssetJsonRuleMixin:
def _resolve_json_risk_rule_document(self, asset: AgentAsset) -> tuple[str, str]:
config_json = dict(asset.config_json or {})
detail_mode = str(config_json.get("detail_mode") or "").strip().lower()
if detail_mode != "json_risk":
raise ValueError("当前资产不是 JSON 风险规则。")
rule_library = str(config_json.get("rule_library") or RISK_RULES_LIBRARY).strip()
if rule_library not in RULE_LIBRARY_NAMES:
raise ValueError("规则库目录不合法。")
rule_document = config_json.get("rule_document")
if not isinstance(rule_document, dict):
raise ValueError("规则资产缺少 rule_document 配置。")
file_name = str(rule_document.get("file_name") or "").strip()
if not file_name:
raise ValueError("规则资产缺少 JSON 文件名。")
return rule_library, file_name
def read_rule_json(self, asset_id: str) -> AgentAssetRuleJsonRead:
asset = self.repository.get(asset_id)
if asset is None:
raise LookupError("资产不存在。")
rule_library, file_name = self._resolve_json_risk_rule_document(asset)
payload = self.rule_library_manager.read_rule_library_json(
library=rule_library,
file_name=file_name,
)
return AgentAssetRuleJsonRead(
file_name=file_name,
rule_code=str(payload.get("rule_code") or asset.code or ""),
name=str(payload.get("name") or asset.name or ""),
description=str(payload.get("description") or asset.description or "").strip(),
evaluator=str(payload.get("evaluator") or ""),
ontology_signal=str(payload.get("ontology_signal") or "") or None,
inputs=payload.get("inputs") if isinstance(payload.get("inputs"), dict) else {},
outcomes=payload.get("outcomes") if isinstance(payload.get("outcomes"), dict) else {},
payload=payload,
)
def write_rule_json(
self,
asset_id: str,
*,
body: AgentAssetRuleJsonWrite,
actor: str,
request_id: str | None = None,
) -> AgentAssetRuleJsonRead:
asset = self.repository.get(asset_id)
if asset is None:
raise LookupError("资产不存在。")
rule_library, file_name = self._resolve_json_risk_rule_document(asset)
payload = dict(body.payload or {})
asset_code = str(asset.code or "").strip()
if asset_code and str(payload.get("rule_code") or "").strip() not in {"", asset_code}:
raise ValueError("规则 JSON 的 rule_code 必须与资产编码一致。")
if asset_code and not str(payload.get("rule_code") or "").strip():
payload["rule_code"] = asset_code
saved = self.rule_library_manager.write_rule_library_json(
library=rule_library,
file_name=file_name,
payload=payload,
)
rule_description = str(saved.get("description") or "").strip()
if rule_description:
asset.description = rule_description
rule_name = str(saved.get("name") or "").strip()
if rule_name:
asset.name = rule_name
risk_category = str(saved.get("risk_category") or "").strip()
if risk_category:
config_json = dict(asset.config_json or {})
config_json["risk_category"] = risk_category
asset.config_json = config_json
asset.scenario_json = [risk_category]
self.audit_service.log_action(
actor=actor,
action="update_agent_asset_rule_json",
resource_type=asset.asset_type,
resource_id=asset.id,
before_json={"file_name": file_name},
after_json={"file_name": file_name, "rule_code": saved.get("rule_code")},
request_id=request_id,
)
self.db.commit()
return self.read_rule_json(asset_id)

View File

@@ -0,0 +1,450 @@
from __future__ import annotations
from dataclasses import dataclass
from datetime import UTC, datetime
from pathlib import Path
from typing import Any
from urllib.parse import quote
from urllib.request import Request, urlopen
import jwt
from app.api.deps import CurrentUserContext
from app.core.config import get_settings
from app.schemas.agent_asset import AgentAssetOnlyOfficeConfigRead
from app.services.agent_asset_spreadsheet import (
COMPANY_TRAVEL_EXPENSE_RULE_FILENAME,
FINANCE_RULES_LIBRARY,
SPREADSHEET_MIME_TYPE,
AgentAssetSpreadsheetManager,
RuleSpreadsheetMeta,
)
from app.services.settings import resolve_onlyoffice_settings
PREVIEW_RULE_ASSET_ID = "preview-rule-expense-company-travel-expense"
PREVIEW_RULE_CURRENT_VERSION = "v1.2.0"
PREVIEW_RULE_VERSION_FILENAMES = {
PREVIEW_RULE_CURRENT_VERSION: COMPANY_TRAVEL_EXPENSE_RULE_FILENAME,
"v1.1.0": "鍏徃宸梾璐规姤閿€瑙勫垯-v1.1.0.xlsx",
"v1.0.0": "鍏徃宸梾璐规姤閿€瑙勫垯-v1.0.0.xlsx",
}
@dataclass(slots=True)
class OnlyOfficeCallbackPayload:
status: int
download_url: str
users: list[str]
class AgentAssetOnlyOfficeMixin:
@staticmethod
def _resolve_onlyoffice_settings():
from app.services import agent_assets
return agent_assets.resolve_onlyoffice_settings()
def build_rule_spreadsheet_onlyoffice_config(
self,
asset_id: str,
current_user: CurrentUserContext,
*,
version: str | None = None,
) -> AgentAssetOnlyOfficeConfigRead:
self._ensure_ready()
if asset_id == PREVIEW_RULE_ASSET_ID:
resolved_version, metadata = self._ensure_preview_rule_spreadsheet(version=version)
return self._build_onlyoffice_spreadsheet_config(
asset_id=asset_id,
current_user=current_user,
metadata=metadata,
editable=resolved_version == PREVIEW_RULE_CURRENT_VERSION,
)
asset = self._require_spreadsheet_rule(asset_id)
_, metadata = self._resolve_current_spreadsheet_meta(asset)
editable = self._can_edit_current_spreadsheet(current_user)
return self._build_onlyoffice_spreadsheet_config(
asset_id=asset.id,
current_user=current_user,
metadata=metadata,
editable=editable,
)
def get_rule_spreadsheet_content(
self,
asset_id: str,
*,
version: str | None = None,
) -> tuple[Path, str, str]:
self._ensure_ready()
if asset_id == PREVIEW_RULE_ASSET_ID:
_, metadata = self._ensure_preview_rule_spreadsheet(version=version)
file_path = self.spreadsheet_manager.resolve_storage_path(metadata.storage_key)
if not file_path.exists():
raise FileNotFoundError(metadata.file_name)
return file_path, metadata.mime_type, metadata.file_name
asset = self._require_spreadsheet_rule(asset_id)
requested_version = str(version or "").strip()
if requested_version and requested_version != "current":
_, metadata = self._resolve_spreadsheet_version_meta(asset, version=requested_version)
else:
_, metadata = self._resolve_current_spreadsheet_meta(asset)
file_path = self.spreadsheet_manager.resolve_storage_path(metadata.storage_key)
if not file_path.exists():
raise FileNotFoundError(metadata.file_name)
return file_path, metadata.mime_type, metadata.file_name
def validate_rule_spreadsheet_access_token(
self,
asset_id: str,
access_token: str,
) -> None:
onlyoffice_settings = self._resolve_onlyoffice_settings()
try:
payload = jwt.decode(
access_token,
onlyoffice_settings.jwt_secret,
algorithms=["HS256"],
)
except jwt.PyJWTError as exc:
raise ValueError("ONLYOFFICE 文件访问令牌无效。") from exc
if (
payload.get("scope") != "agent-asset-spreadsheet"
or payload.get("asset_id") != asset_id
):
raise ValueError("ONLYOFFICE 文件访问令牌无效。")
def upload_rule_spreadsheet(
self,
asset_id: str,
*,
filename: str,
content: bytes,
actor: str,
request_id: str | None = None,
change_note: str | None = None,
source: str = "upload",
) -> AgentAssetRead:
self._ensure_ready()
asset = self._require_spreadsheet_rule(asset_id)
normalized_name = Path(str(filename or "").strip()).name.strip()
if not normalized_name:
raise ValueError("规则表文件名不能为空。")
if Path(normalized_name).suffix.lower() != ".xlsx":
raise ValueError("当前仅支持上传 .xlsx 格式的规则表。")
if not content:
raise ValueError("规则表文件内容不能为空。")
_, current_metadata = self._resolve_current_spreadsheet_meta(asset)
file_name = current_metadata.file_name or self._resolve_default_spreadsheet_file_name(asset)
sheet_changes, cell_changes = self._collect_workbook_changes_from_content(
current_metadata,
content,
)
changed_sheet_count = self._count_changed_sheets(sheet_changes, cell_changes)
changed_cell_count = len(cell_changes)
metadata = self._store_current_rule_spreadsheet(
asset,
file_name=file_name,
content=content,
actor=actor,
source=source,
)
summary = self._build_spreadsheet_change_summary(
sheet_changes,
cell_changes,
)
self.audit_service.log_action(
actor=actor,
action="edit_rule_spreadsheet",
resource_type=asset.asset_type,
resource_id=asset.id,
before_json={"storage_key": current_metadata.storage_key},
after_json={
"summary": summary,
"changed_sheet_count": changed_sheet_count,
"changed_cell_count": changed_cell_count,
"sheet_changes": [item.model_dump() for item in sheet_changes],
"cell_changes": [item.model_dump() for item in cell_changes[:500]],
"storage_key": metadata.storage_key,
},
request_id=request_id,
)
return self.get_asset(asset.id) # type: ignore[return-value]
def import_rule_spreadsheet_content(
self,
asset_id: str,
*,
filename: str,
content: bytes,
actor: str,
request_id: str | None = None,
) -> AgentAssetRead:
self._ensure_ready()
asset = self._require_spreadsheet_rule(asset_id)
normalized_name = Path(str(filename or "").strip()).name.strip()
if not normalized_name:
raise ValueError("待导入表格文件名不能为空。")
if Path(normalized_name).suffix.lower() != ".xlsx":
raise ValueError("当前仅支持导入 .xlsx 格式的规则表。")
_, current_metadata = self._resolve_current_spreadsheet_meta(asset)
imported_content = self.spreadsheet_manager.rebuild_from_uploaded_content(content)
return self.upload_rule_spreadsheet(
asset.id,
filename=current_metadata.file_name,
content=imported_content,
actor=actor,
request_id=request_id,
change_note=f"导入 Excel 表格内容:{normalized_name}",
source="content-import",
)
def handle_rule_spreadsheet_onlyoffice_callback(
self,
asset_id: str,
*,
version: str | None = None,
payload: dict[str, Any],
actor_name: str | None = None,
) -> None:
self._ensure_ready()
if asset_id == PREVIEW_RULE_ASSET_ID:
self._handle_preview_rule_spreadsheet_onlyoffice_callback(
version=version,
payload=payload,
)
return
asset = self._require_spreadsheet_rule(asset_id)
callback = self._parse_onlyoffice_callback(payload)
if callback.status not in {2, 6} or not callback.download_url:
return
_, current_metadata = self._resolve_current_spreadsheet_meta(asset)
request = Request(
callback.download_url,
headers={"User-Agent": "x-financial-onlyoffice-agent-asset"},
)
with urlopen(request, timeout=30) as response: # noqa: S310
content = response.read()
if current_metadata.checksum and current_metadata.checksum == self._hash_bytes(content):
return
resolved_actor_name = str(actor_name or "").strip() or (
callback.users[0] if callback.users else "ONLYOFFICE"
)
self.upload_rule_spreadsheet(
asset.id,
filename=current_metadata.file_name,
content=content,
actor=resolved_actor_name,
source="onlyoffice",
)
@staticmethod
def _can_edit_current_spreadsheet(current_user: CurrentUserContext) -> bool:
role_codes = {str(item).strip() for item in current_user.role_codes}
return current_user.is_admin or "manager" in role_codes or "finance" in role_codes
@staticmethod
def _build_onlyoffice_document_key(
asset_id: str,
metadata: RuleSpreadsheetMeta,
) -> str:
fingerprint = metadata.checksum or metadata.updated_at or metadata.file_name
raw_key = f"{asset_id}-{fingerprint}"
return "".join(
character if character.isalnum() or character in {"-", "_", ".", "="} else "_"
for character in raw_key
)
def _build_onlyoffice_access_token(self, asset_id: str) -> str:
onlyoffice_settings = self._resolve_onlyoffice_settings()
payload = {
"scope": "agent-asset-spreadsheet",
"asset_id": asset_id,
}
return jwt.encode(payload, onlyoffice_settings.jwt_secret, algorithm="HS256")
@staticmethod
def _parse_onlyoffice_callback(payload: dict[str, Any]) -> OnlyOfficeCallbackPayload:
return OnlyOfficeCallbackPayload(
status=int(payload.get("status") or 0),
download_url=str(payload.get("url") or "").strip(),
users=[str(item).strip() for item in payload.get("users") or [] if str(item).strip()],
)
def _build_onlyoffice_spreadsheet_config(
self,
*,
asset_id: str,
current_user: CurrentUserContext,
metadata: RuleSpreadsheetMeta,
editable: bool,
) -> AgentAssetOnlyOfficeConfigRead:
onlyoffice_settings = self._resolve_onlyoffice_settings()
settings = get_settings()
if not onlyoffice_settings.enabled:
raise ValueError("ONLYOFFICE 预览未启用。")
if not onlyoffice_settings.public_url or not onlyoffice_settings.backend_url:
raise ValueError("ONLYOFFICE 地址配置不完整。")
if not onlyoffice_settings.jwt_secret:
raise ValueError("ONLYOFFICE JWT 密钥未配置。")
backend_base_url = onlyoffice_settings.backend_url.rstrip("/")
public_url = onlyoffice_settings.public_url.rstrip("/")
access_token = self._build_onlyoffice_access_token(asset_id)
document_url = (
f"{backend_base_url}{settings.api_v1_prefix}/agent-assets/{asset_id}/spreadsheet/onlyoffice/content"
f"?access_token={access_token}"
)
callback_url = (
f"{backend_base_url}{settings.api_v1_prefix}/agent-assets/{asset_id}/spreadsheet/onlyoffice/callback"
f"?actor_name={quote(current_user.name)}"
)
config: dict[str, Any] = {
"documentType": "cell",
"document": {
"fileType": Path(metadata.file_name).suffix.lstrip(".").lower() or "xlsx",
"key": self._build_onlyoffice_document_key(asset_id, metadata),
"title": metadata.file_name,
"url": document_url,
"permissions": {
"download": True,
"edit": editable,
"print": True,
"copy": True,
},
},
"editorConfig": {
"mode": "edit" if editable else "view",
"lang": "zh-CN",
"callbackUrl": callback_url,
"user": {
"id": current_user.username,
"name": current_user.name,
},
"customization": {
"compactHeader": True,
"compactToolbar": False,
"toolbarNoTabs": False,
"autosave": False,
"forcesave": editable,
},
},
"width": "100%",
"height": "100%",
}
config["token"] = jwt.encode(config, onlyoffice_settings.jwt_secret, algorithm="HS256")
return AgentAssetOnlyOfficeConfigRead(documentServerUrl=public_url, config=config)
def _ensure_preview_rule_spreadsheet(
self,
*,
version: str | None = None,
) -> tuple[str, RuleSpreadsheetMeta]:
resolved_version = str(version or PREVIEW_RULE_CURRENT_VERSION).strip()
if resolved_version not in PREVIEW_RULE_VERSION_FILENAMES:
raise LookupError(f"版本 {resolved_version} 不存在")
file_name = PREVIEW_RULE_VERSION_FILENAMES[resolved_version]
storage_key = (
Path("rules")
/ FINANCE_RULES_LIBRARY
/ ".versions"
/ PREVIEW_RULE_ASSET_ID
/ resolved_version
/ file_name
).as_posix()
try:
file_path = self.spreadsheet_manager.resolve_storage_path(storage_key)
except FileNotFoundError:
file_path = None
if file_path is not None and file_path.exists():
content = file_path.read_bytes()
updated_at = datetime.fromtimestamp(file_path.stat().st_mtime, UTC).isoformat()
return resolved_version, RuleSpreadsheetMeta(
file_name=file_name,
storage_key=storage_key,
mime_type=SPREADSHEET_MIME_TYPE,
size_bytes=file_path.stat().st_size,
checksum=self._hash_bytes(content),
updated_at=updated_at,
updated_by="ONLYOFFICE 预览",
source="preview",
)
metadata = self.spreadsheet_manager.store_rule_library_spreadsheet_snapshot(
library=FINANCE_RULES_LIBRARY,
asset_id=PREVIEW_RULE_ASSET_ID,
version=resolved_version,
file_name=file_name,
content=AgentAssetSpreadsheetManager.build_company_travel_rule_template(),
actor_name="ONLYOFFICE 预览",
source="preview",
)
return resolved_version, metadata
def _handle_preview_rule_spreadsheet_onlyoffice_callback(
self,
*,
version: str,
payload: dict[str, Any],
) -> None:
callback = self._parse_onlyoffice_callback(payload)
if callback.status not in {2, 6} or not callback.download_url:
return
resolved_version, metadata = self._ensure_preview_rule_spreadsheet(version=version)
request = Request(
callback.download_url,
headers={"User-Agent": "x-financial-onlyoffice-agent-asset-preview"},
)
with urlopen(request, timeout=30) as response: # noqa: S310
content = response.read()
if metadata.checksum and metadata.checksum == self._hash_bytes(content):
return
actor_name = callback.users[0] if callback.users else "ONLYOFFICE"
self.spreadsheet_manager.store_rule_library_spreadsheet_snapshot(
library=FINANCE_RULES_LIBRARY,
asset_id=PREVIEW_RULE_ASSET_ID,
version=resolved_version,
file_name=metadata.file_name,
content=content,
actor_name=actor_name,
source="onlyoffice-preview",
)
@staticmethod
def _read_current_rule_document_meta(asset: AgentAsset) -> RuleSpreadsheetMeta | None:
payload = (asset.config_json or {}).get("rule_document")
if not isinstance(payload, dict):
return None
return RuleSpreadsheetMeta(
file_name=str(payload.get("file_name") or "").strip(),
storage_key=str(payload.get("storage_key") or "").strip(),
mime_type=(
str(payload.get("mime_type") or "").strip()
or "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
),
size_bytes=int(payload.get("size_bytes") or 0),
checksum=str(payload.get("checksum") or "").strip(),
updated_at=str(payload.get("updated_at") or "").strip(),
updated_by=str(payload.get("updated_by") or "system").strip() or "system",
source=str(payload.get("source") or "upload").strip() or "upload",
)

View File

@@ -0,0 +1,84 @@
from __future__ import annotations
import json
from pathlib import Path
from typing import Any
from app.core.config import SERVER_DIR
from app.services.agent_asset_spreadsheet import RULE_LIBRARY_NAMES
JSON_RULE_MIME_TYPE = "application/json"
class AgentAssetRuleLibraryManager:
def __init__(self, rule_root: Path | None = None) -> None:
self.rule_root = Path(rule_root or (SERVER_DIR / "rules")).resolve()
def ensure_rule_library_dirs(self) -> None:
for library in sorted(RULE_LIBRARY_NAMES):
(self.rule_root / library).mkdir(parents=True, exist_ok=True)
def resolve_rule_library_path(self, *, library: str, file_name: str) -> Path:
normalized_library = str(library or "").strip()
if normalized_library not in RULE_LIBRARY_NAMES:
raise ValueError("Invalid rule library.")
normalized_name = Path(str(file_name or "").strip()).name.strip()
if not normalized_name or not normalized_name.endswith(".json"):
raise ValueError("Rule JSON file name must end with .json.")
library_dir = (self.rule_root / normalized_library).resolve()
target_path = (library_dir / normalized_name).resolve()
try:
target_path.relative_to(library_dir)
except ValueError:
raise ValueError("Invalid rule JSON path.") from None
return target_path
def read_rule_library_json(self, *, library: str, file_name: str) -> dict[str, Any]:
target_path = self.resolve_rule_library_path(library=library, file_name=file_name)
if not target_path.exists():
raise FileNotFoundError("Rule JSON file not found.")
try:
payload = json.loads(target_path.read_text(encoding="utf-8"))
except json.JSONDecodeError as exc:
raise ValueError("Rule JSON file is invalid.") from exc
if not isinstance(payload, dict):
raise ValueError("Rule JSON payload must be an object.")
return payload
def write_rule_library_json(
self,
*,
library: str,
file_name: str,
payload: dict[str, Any],
) -> dict[str, Any]:
if not isinstance(payload, dict):
raise ValueError("Rule JSON payload must be an object.")
rule_code = str(payload.get("rule_code") or "").strip()
if not rule_code:
raise ValueError("Rule JSON must include rule_code.")
evaluator = str(payload.get("evaluator") or "").strip()
if not evaluator:
raise ValueError("Rule JSON must include evaluator.")
target_path = self.resolve_rule_library_path(library=library, file_name=file_name)
target_path.parent.mkdir(parents=True, exist_ok=True)
target_path.write_text(
f"{json.dumps(payload, ensure_ascii=False, indent=2)}\n",
encoding="utf-8",
)
return payload
def list_rule_library_json_files(self, *, library: str) -> list[str]:
library_dir = self.resolve_rule_library_path(
library=library,
file_name="placeholder.json",
).parent
library_dir.mkdir(parents=True, exist_ok=True)
return sorted(path.name for path in library_dir.glob("*.json") if path.is_file())

View File

@@ -22,6 +22,8 @@ RULE_SPREADSHEET_BLOCK_PATTERN = re.compile(
COMPANY_TRAVEL_EXPENSE_RULE_CODE = "rule.expense.company_travel_expense_reimbursement"
COMPANY_TRAVEL_EXPENSE_RULE_FILENAME = "公司差旅费报销规则.xlsx"
COMPANY_COMMUNICATION_EXPENSE_RULE_CODE = "rule.expense.company_communication_expense_reimbursement"
COMPANY_COMMUNICATION_EXPENSE_RULE_FILENAME = "公司通信费报销规则.xlsx"
FINANCE_RULES_LIBRARY = "finance-rules"
RISK_RULES_LIBRARY = "risk-rules"
RULE_LIBRARY_NAMES = {FINANCE_RULES_LIBRARY, RISK_RULES_LIBRARY}
@@ -67,26 +69,13 @@ class AgentAssetSpreadsheetManager:
actor_name: str,
source: str = "upload",
) -> RuleSpreadsheetMeta:
normalized_name = Path(str(file_name or "").strip()).name.strip()
if not normalized_name:
raise ValueError("规则表文件名不能为空。")
if not content:
raise ValueError("规则表文件内容不能为空。")
relative_path = Path("agent_assets") / asset_id / "rule_spreadsheets" / version / normalized_name
target_path = (self.storage_root / relative_path).resolve()
target_path.parent.mkdir(parents=True, exist_ok=True)
target_path.write_bytes(content)
mime_type = mimetypes.guess_type(normalized_name)[0] or SPREADSHEET_MIME_TYPE
return RuleSpreadsheetMeta(
file_name=normalized_name,
storage_key=relative_path.as_posix(),
mime_type=mime_type,
size_bytes=len(content),
checksum=hashlib.sha256(content).hexdigest(),
updated_at=datetime.now(UTC).isoformat(),
updated_by=str(actor_name or "system").strip() or "system",
return self.store_rule_library_spreadsheet_snapshot(
library=FINANCE_RULES_LIBRARY,
asset_id=asset_id,
version=version,
file_name=file_name,
content=content,
actor_name=actor_name,
source=source,
)
@@ -115,7 +104,74 @@ class AgentAssetSpreadsheetManager:
try:
target_path.relative_to(self.rule_root)
except ValueError:
raise ValueError("规则库文件路径不合法。")
raise ValueError("规则库文件路径不合法。") from None
target_path.parent.mkdir(parents=True, exist_ok=True)
target_path.write_bytes(content)
mime_type = mimetypes.guess_type(normalized_name)[0] or SPREADSHEET_MIME_TYPE
return RuleSpreadsheetMeta(
file_name=normalized_name,
storage_key=relative_path.as_posix(),
mime_type=mime_type,
size_bytes=len(content),
checksum=hashlib.sha256(content).hexdigest(),
updated_at=datetime.now(UTC).isoformat(),
updated_by=str(actor_name or "system").strip() or "system",
source=source,
)
def store_rule_library_spreadsheet_snapshot(
self,
*,
library: str,
asset_id: str,
version: str,
file_name: str,
content: bytes,
actor_name: str,
source: str = "rule-library-version",
) -> RuleSpreadsheetMeta:
normalized_library = str(library or "").strip()
if normalized_library not in RULE_LIBRARY_NAMES:
raise ValueError("规则库目录不合法。")
raw_asset_id = str(asset_id or "").strip()
raw_version = str(version or "").strip()
normalized_asset_id = Path(raw_asset_id).name.strip()
normalized_version = Path(raw_version).name.strip()
normalized_name = Path(str(file_name or "").strip()).name.strip()
if (
not normalized_asset_id
or normalized_asset_id in {".", ".."}
or normalized_asset_id != raw_asset_id
):
raise ValueError("规则资产 ID 不合法。")
if (
not normalized_version
or normalized_version in {".", ".."}
or normalized_version != raw_version
):
raise ValueError("规则表版本号不合法。")
if not normalized_name:
raise ValueError("规则表文件名不能为空。")
if not content:
raise ValueError("规则表文件内容不能为空。")
self.ensure_rule_library_dirs()
relative_path = (
Path("rules")
/ normalized_library
/ ".versions"
/ normalized_asset_id
/ normalized_version
/ normalized_name
)
target_path = (SERVER_DIR / relative_path).resolve()
try:
target_path.relative_to(self.rule_root)
except ValueError:
raise ValueError("规则库版本文件路径不合法。") from None
target_path.parent.mkdir(parents=True, exist_ok=True)
target_path.write_bytes(content)
@@ -147,7 +203,7 @@ class AgentAssetSpreadsheetManager:
try:
resolved.relative_to(allowed_root)
except ValueError:
raise FileNotFoundError("规则表文件不存在。")
raise FileNotFoundError("规则表文件不存在。") from None
return resolved
@staticmethod
@@ -228,11 +284,46 @@ class AgentAssetSpreadsheetManager:
def build_company_travel_rule_template() -> bytes:
standard_rows = [
["费用分类", "适用场景", "票据要求", "报销标准", "审批要求", "备注"],
["长途交通", "飞机、高铁、火车等跨城出行", "行程单、车票、发票", "据实报销", "超预算需直属领导审批", "优先选择公共交通"],
["住宿费", "出差住宿", "酒店发票、入住清单", "一线城市 650/晚;二线城市 500/晚;其他城市 380/晚", "超标需总监审批", "协议酒店优先"],
["市内交通", "出租车、网约车、地铁、公交", "发票或电子行程单", "150/天", "超限需补充说明", "夜间或无公共交通场景可豁免"],
["餐补", "出差期间日常补助", "无需票据", "120/天", "系统自动核定", "当天往返默认不享受"],
["招待餐费", "客户接待或项目宴请", "餐饮发票、参与人清单", "300/人", "需业务负责人审批", "需关联客户或项目"],
[
"长途交通",
"飞机、高铁、火车等跨城出行",
"行程单、车票、发票",
"据实报销",
"超预算需直属领导审批",
"优先选择公共交通",
],
[
"住宿费",
"出差住宿",
"酒店发票、入住清单",
"一线城市 650/晚;二线城市 500/晚;其他城市 380/晚",
"超标需总监审批",
"协议酒店优先",
],
[
"市内交通",
"出租车、网约车、地铁、公交",
"发票或电子行程单",
"150/天",
"超限需补充说明",
"夜间或无公共交通场景可豁免",
],
[
"餐补",
"出差期间日常补助",
"无需票据",
"120/天",
"系统自动核定",
"当天往返默认不享受",
],
[
"招待餐费",
"客户接待或项目宴请",
"餐饮发票、参与人清单",
"300/人",
"需业务负责人审批",
"需关联客户或项目",
],
]
instruction_rows = [
["字段", "填写说明"],
@@ -306,21 +397,41 @@ def _build_xlsx_bytes(sheets: list[tuple[str, list[list[object]]]]) -> bytes:
def _build_content_types_xml(sheets: list[tuple[str, list[list[object]]]]) -> str:
overrides = [
'<Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/>',
'<Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/>',
'<Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>',
'<Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/>',
(
'<Override PartName="/xl/workbook.xml" '
'ContentType="application/vnd.openxmlformats-officedocument.'
'spreadsheetml.sheet.main+xml"/>'
),
(
'<Override PartName="/xl/styles.xml" '
'ContentType="application/vnd.openxmlformats-officedocument.'
'spreadsheetml.styles+xml"/>'
),
(
'<Override PartName="/docProps/core.xml" '
'ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>'
),
(
'<Override PartName="/docProps/app.xml" '
'ContentType="application/vnd.openxmlformats-officedocument.'
'extended-properties+xml"/>'
),
]
overrides.extend(
[
f'<Override PartName="/xl/worksheets/sheet{index}.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>'
(
f'<Override PartName="/xl/worksheets/sheet{index}.xml" '
'ContentType="application/vnd.openxmlformats-officedocument.'
'spreadsheetml.worksheet+xml"/>'
)
for index, _ in enumerate(sheets, start=1)
]
)
return (
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>'
'<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">'
'<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>'
'<Default Extension="rels" '
'ContentType="application/vnd.openxmlformats-package.relationships+xml"/>'
'<Default Extension="xml" ContentType="application/xml"/>'
f'{"".join(overrides)}'
"</Types>"
@@ -331,9 +442,15 @@ def _build_root_rels_xml() -> str:
return (
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>'
'<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">'
'<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="xl/workbook.xml"/>'
'<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties" Target="docProps/core.xml"/>'
'<Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties" Target="docProps/app.xml"/>'
'<Relationship Id="rId1" '
'Type="http://schemas.openxmlformats.org/officeDocument/2006/'
'relationships/officeDocument" Target="xl/workbook.xml"/>'
'<Relationship Id="rId2" '
'Type="http://schemas.openxmlformats.org/package/2006/relationships/'
'metadata/core-properties" Target="docProps/core.xml"/>'
'<Relationship Id="rId3" '
'Type="http://schemas.openxmlformats.org/officeDocument/2006/'
'relationships/extended-properties" Target="docProps/app.xml"/>'
"</Relationships>"
)
@@ -345,11 +462,16 @@ def _build_app_xml(sheets: list[tuple[str, list[list[object]]]]) -> str:
sheet_count = len(sheets)
return (
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>'
'<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" '
'<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/'
'extended-properties" '
'xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">'
'<Application>Microsoft Excel</Application>'
f"<HeadingPairs><vt:vector size=\"2\" baseType=\"variant\"><vt:variant><vt:lpstr>Worksheets</vt:lpstr></vt:variant><vt:variant><vt:i4>{sheet_count}</vt:i4></vt:variant></vt:vector></HeadingPairs>"
f"<TitlesOfParts><vt:vector size=\"{sheet_count}\" baseType=\"lpstr\">{titles}</vt:vector></TitlesOfParts>"
'<HeadingPairs><vt:vector size="2" baseType="variant">'
"<vt:variant><vt:lpstr>Worksheets</vt:lpstr></vt:variant>"
f"<vt:variant><vt:i4>{sheet_count}</vt:i4></vt:variant>"
"</vt:vector></HeadingPairs>"
f'<TitlesOfParts><vt:vector size="{sheet_count}" baseType="lpstr">'
f"{titles}</vt:vector></TitlesOfParts>"
"</Properties>"
)
@@ -357,7 +479,8 @@ def _build_app_xml(sheets: list[tuple[str, list[list[object]]]]) -> str:
def _build_core_xml(created_at: str) -> str:
return (
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>'
'<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" '
'<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/'
'2006/metadata/core-properties" '
'xmlns:dc="http://purl.org/dc/elements/1.1/" '
'xmlns:dcterms="http://purl.org/dc/terms/" '
'xmlns:dcmitype="http://purl.org/dc/dcmitype/" '
@@ -390,7 +513,11 @@ def _build_workbook_xml(sheets: list[tuple[str, list[list[object]]]]) -> str:
def _build_workbook_rels_xml(sheets: list[tuple[str, list[list[object]]]]) -> str:
relationships = "".join(
[
f'<Relationship Id="rId{index}" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet{index}.xml"/>'
(
f'<Relationship Id="rId{index}" '
'Type="http://schemas.openxmlformats.org/officeDocument/2006/'
f'relationships/worksheet" Target="worksheets/sheet{index}.xml"/>'
)
for index, _ in enumerate(sheets, start=1)
]
)
@@ -412,10 +539,15 @@ def _build_styles_xml() -> str:
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>'
'<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">'
'<fonts count="1"><font><sz val="11"/><name val="Calibri"/></font></fonts>'
'<fills count="2"><fill><patternFill patternType="none"/></fill><fill><patternFill patternType="gray125"/></fill></fills>'
'<fills count="2"><fill><patternFill patternType="none"/></fill>'
'<fill><patternFill patternType="gray125"/></fill></fills>'
'<borders count="1"><border><left/><right/><top/><bottom/><diagonal/></border></borders>'
'<cellStyleXfs count="1"><xf numFmtId="0" fontId="0" fillId="0" borderId="0"/></cellStyleXfs>'
'<cellXfs count="1"><xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/></cellXfs>'
'<cellStyleXfs count="1">'
'<xf numFmtId="0" fontId="0" fillId="0" borderId="0"/>'
"</cellStyleXfs>"
'<cellXfs count="1">'
'<xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>'
"</cellXfs>"
'<cellStyles count="1"><cellStyle name="Normal" xfId="0" builtinId="0"/></cellStyles>'
'</styleSheet>'
)

View File

@@ -0,0 +1,298 @@
from __future__ import annotations
from datetime import UTC, datetime
from pathlib import Path
from typing import Any
from app.core.agent_enums import AgentAssetType
from app.models.agent_asset import AgentAsset
from app.schemas.agent_asset import (
AgentAssetSpreadsheetDiffCellRead,
AgentAssetSpreadsheetDiffSheetRead,
)
from app.services.agent_asset_spreadsheet import (
COMPANY_COMMUNICATION_EXPENSE_RULE_CODE,
COMPANY_COMMUNICATION_EXPENSE_RULE_FILENAME,
COMPANY_TRAVEL_EXPENSE_RULE_CODE,
COMPANY_TRAVEL_EXPENSE_RULE_FILENAME,
FINANCE_RULES_LIBRARY,
RULE_LIBRARY_NAMES,
SPREADSHEET_MIME_TYPE,
AgentAssetSpreadsheetManager,
RuleSpreadsheetMeta,
)
class AgentAssetSpreadsheetHelperMixin:
def _require_spreadsheet_rule(self, asset_id: str) -> AgentAsset:
asset = self.repository.get(asset_id)
if asset is None:
raise LookupError("Asset not found")
if asset.asset_type != AgentAssetType.RULE.value:
raise ValueError("仅规则资产支持 Excel 规则表。")
detail_mode = str((asset.config_json or {}).get("detail_mode") or "").strip().lower()
if detail_mode != "spreadsheet":
raise ValueError("当前规则未配置 Excel 规则表。")
return asset
def _resolve_spreadsheet_version_meta(
self,
asset: AgentAsset,
*,
version: str | None = None,
) -> tuple[str, RuleSpreadsheetMeta]:
resolved_version = str(version or self._resolve_working_version(asset) or "").strip()
if not resolved_version:
raise ValueError("当前规则尚未配置表格版本。")
version_row = self.repository.get_version(asset.id, resolved_version)
if version_row is None:
raise LookupError(f"版本 {resolved_version} 不存在")
# 版本记录中的快照才是不变的事实来源。`/rules` 下的工作簿只是当前
# 可编辑副本,后续写入不应该反向污染某个已存在版本的内容。
metadata = self.spreadsheet_manager.parse_version_markdown(str(version_row.content or ""))
if metadata is None and self._resolve_working_version(asset) == resolved_version:
metadata = self._read_current_rule_document_meta(asset)
if metadata is None:
raise FileNotFoundError("规则表版本快照不存在。")
return resolved_version, metadata
def _resolve_current_spreadsheet_meta(
self,
asset: AgentAsset,
) -> tuple[str, RuleSpreadsheetMeta]:
config_json = dict(asset.config_json or {})
current_meta = self._read_current_rule_document_meta(asset)
file_name = (
current_meta.file_name
if current_meta is not None and current_meta.file_name
else self._resolve_default_spreadsheet_file_name(asset)
)
library = self._resolve_spreadsheet_rule_library(asset)
storage_key = (Path("rules") / library / file_name).as_posix()
file_path = self.spreadsheet_manager.resolve_storage_path(storage_key)
if not file_path.exists():
content: bytes | None = None
if current_meta is not None and current_meta.storage_key:
try:
legacy_path = self.spreadsheet_manager.resolve_storage_path(
current_meta.storage_key
)
except FileNotFoundError:
legacy_path = None
if legacy_path is not None and legacy_path.exists():
content = legacy_path.read_bytes()
if content is None:
content = AgentAssetSpreadsheetManager.build_blank_rule_workbook(
Path(file_name).stem or "规则表"
)
meta = self.spreadsheet_manager.store_rule_library_spreadsheet(
library=library,
file_name=file_name,
content=content,
actor_name=(
current_meta.updated_by
if current_meta is not None and current_meta.updated_by
else "system"
),
source="current-rule",
)
else:
content = file_path.read_bytes()
meta = RuleSpreadsheetMeta(
file_name=file_name,
storage_key=storage_key,
mime_type=(
current_meta.mime_type
if current_meta is not None and current_meta.mime_type
else SPREADSHEET_MIME_TYPE
),
size_bytes=file_path.stat().st_size,
checksum=self._hash_bytes(content),
updated_at=datetime.fromtimestamp(file_path.stat().st_mtime, UTC).isoformat(),
updated_by=(
current_meta.updated_by
if current_meta is not None and current_meta.updated_by
else "system"
),
source=(
current_meta.source
if current_meta is not None and current_meta.source
else "current-rule"
),
)
expected_document = {
**self.spreadsheet_manager.build_rule_document_config(
meta,
asset_version="current",
),
"storage_key": meta.storage_key,
}
if config_json.get("rule_document") != expected_document:
config_json["detail_mode"] = "spreadsheet"
config_json["tag"] = str(config_json.get("tag") or "财务规则").strip() or "财务规则"
config_json["rule_library"] = library
config_json["rule_document"] = expected_document
asset.config_json = config_json
self.repository.save_asset(asset)
return "current", meta
def _store_current_rule_spreadsheet(
self,
asset: AgentAsset,
*,
file_name: str,
content: bytes,
actor: str,
source: str,
) -> RuleSpreadsheetMeta:
library = self._resolve_spreadsheet_rule_library(asset)
metadata = self.spreadsheet_manager.store_rule_library_spreadsheet(
library=library,
file_name=file_name,
content=content,
actor_name=actor,
source=source,
)
config_json = dict(asset.config_json or {})
config_json["detail_mode"] = "spreadsheet"
config_json["tag"] = str(config_json.get("tag") or "财务规则").strip() or "财务规则"
config_json["rule_library"] = library
config_json["rule_document"] = {
**self.spreadsheet_manager.build_rule_document_config(
metadata,
asset_version="current",
),
"storage_key": metadata.storage_key,
}
asset.config_json = config_json
self.repository.save_asset(asset)
return metadata
@staticmethod
def _resolve_spreadsheet_rule_library(asset: AgentAsset) -> str:
config_json = dict(asset.config_json or {})
library = str(config_json.get("rule_library") or FINANCE_RULES_LIBRARY).strip()
if library not in RULE_LIBRARY_NAMES:
return FINANCE_RULES_LIBRARY
return library
@staticmethod
def _resolve_default_spreadsheet_file_name(asset: AgentAsset) -> str:
if asset.code == COMPANY_TRAVEL_EXPENSE_RULE_CODE:
return COMPANY_TRAVEL_EXPENSE_RULE_FILENAME
if asset.code == COMPANY_COMMUNICATION_EXPENSE_RULE_CODE:
return COMPANY_COMMUNICATION_EXPENSE_RULE_FILENAME
fallback = Path(str(asset.name or "规则表").strip()).name
return fallback if fallback.lower().endswith(".xlsx") else f"{fallback}.xlsx"
def _load_spreadsheet_for_compare(self, metadata: RuleSpreadsheetMeta):
from io import BytesIO
from openpyxl import load_workbook
file_path = self.spreadsheet_manager.resolve_storage_path(metadata.storage_key)
if not file_path.exists():
raise FileNotFoundError(metadata.file_name)
return load_workbook(BytesIO(file_path.read_bytes()), read_only=False, data_only=False)
def _collect_workbook_changes_from_content(
self,
base_metadata: RuleSpreadsheetMeta,
target_content: bytes,
) -> tuple[list[AgentAssetSpreadsheetDiffSheetRead], list[AgentAssetSpreadsheetDiffCellRead]]:
from io import BytesIO
from openpyxl import load_workbook
base_workbook = self._load_spreadsheet_for_compare(base_metadata)
target_workbook = load_workbook(BytesIO(target_content), read_only=False, data_only=False)
return self._collect_workbook_changes(base_workbook, target_workbook)
def _collect_workbook_changes(
self, base_workbook, target_workbook
) -> tuple[list[AgentAssetSpreadsheetDiffSheetRead], list[AgentAssetSpreadsheetDiffCellRead]]:
base_sheet_names = set(base_workbook.sheetnames)
target_sheet_names = set(target_workbook.sheetnames)
sheet_changes: list[AgentAssetSpreadsheetDiffSheetRead] = []
for sheet_name in sorted(target_sheet_names - base_sheet_names):
sheet_changes.append(
AgentAssetSpreadsheetDiffSheetRead(sheet_name=sheet_name, change_type="added")
)
for sheet_name in sorted(base_sheet_names - target_sheet_names):
sheet_changes.append(
AgentAssetSpreadsheetDiffSheetRead(sheet_name=sheet_name, change_type="removed")
)
cell_changes: list[AgentAssetSpreadsheetDiffCellRead] = []
for sheet_name in sorted(base_sheet_names & target_sheet_names):
base_sheet = base_workbook[sheet_name]
target_sheet = target_workbook[sheet_name]
max_row = max(base_sheet.max_row, target_sheet.max_row)
max_column = max(base_sheet.max_column, target_sheet.max_column)
for row_index in range(1, max_row + 1):
for column_index in range(1, max_column + 1):
before_value = base_sheet.cell(row=row_index, column=column_index).value
after_value = target_sheet.cell(row=row_index, column=column_index).value
if before_value == after_value:
continue
if before_value in (None, ""):
change_type = "added"
elif after_value in (None, ""):
change_type = "removed"
else:
change_type = "modified"
cell_changes.append(
AgentAssetSpreadsheetDiffCellRead(
sheet_name=sheet_name,
cell=target_sheet.cell(row=row_index, column=column_index).coordinate,
change_type=change_type,
before_value=before_value,
after_value=after_value,
)
)
for sheet_name in sorted({item.sheet_name for item in cell_changes}):
sheet_changes.append(
AgentAssetSpreadsheetDiffSheetRead(sheet_name=sheet_name, change_type="modified")
)
return sheet_changes, cell_changes
@staticmethod
def _count_changed_sheets(
sheet_changes: list[AgentAssetSpreadsheetDiffSheetRead],
cell_changes: list[AgentAssetSpreadsheetDiffCellRead],
) -> int:
return len(
{item.sheet_name for item in sheet_changes}
| {item.sheet_name for item in cell_changes}
)
@staticmethod
def _build_spreadsheet_change_summary(
sheet_changes: list[AgentAssetSpreadsheetDiffSheetRead],
cell_changes: list[AgentAssetSpreadsheetDiffCellRead],
) -> str:
sheet_names = sorted(
{item.sheet_name for item in sheet_changes}
| {item.sheet_name for item in cell_changes}
)
if not sheet_names:
return "文件内容已保存,未发现单元格级差异。"
preview = "".join(sheet_names[:3])
if len(sheet_names) > 3:
preview = f"{preview}"
sheet_text = f"涉及 {len(sheet_names)} 个工作表({preview}"
if cell_changes:
return f"{sheet_text},共 {len(cell_changes)} 处单元格改动。"
return f"{sheet_text},工作表结构发生变化。"

View File

@@ -0,0 +1,132 @@
from __future__ import annotations
from app.core.agent_enums import AgentReviewStatus
from app.schemas.agent_asset import (
AgentAssetSpreadsheetChangeRecordRead,
AgentAssetSpreadsheetDiffCellRead,
AgentAssetSpreadsheetDiffSheetRead,
AgentAssetVersionTimelineItemRead,
)
class AgentAssetTimelineMixin:
def list_version_timeline(self, asset_id: str) -> list[AgentAssetVersionTimelineItemRead]:
self._ensure_ready()
asset = self.repository.get(asset_id)
if asset is None:
raise LookupError("Asset not found")
events: list[AgentAssetVersionTimelineItemRead] = []
versions = self.repository.list_versions(asset_id)
for version in versions:
source_version = self._extract_restore_source_version(version.change_note)
events.append(
AgentAssetVersionTimelineItemRead(
event_type="restored" if source_version else "created",
version=version.version,
actor=version.created_by,
event_time=version.created_at,
title="恢复生成工作稿" if source_version else "创建工作版本",
description=version.change_note or "生成新版本",
note=version.change_note,
source_version=source_version,
)
)
for review in self.repository.list_reviews(asset_id):
event_type = {
AgentReviewStatus.PENDING.value: "submitted",
AgentReviewStatus.APPROVED.value: "approved",
AgentReviewStatus.REJECTED.value: "rejected",
}.get(review.review_status, "reviewed")
title = {
"submitted": "提交审核",
"approved": "审核通过",
"rejected": "审核驳回",
}.get(event_type, "审核处理")
events.append(
AgentAssetVersionTimelineItemRead(
event_type=event_type,
version=review.version,
actor=review.reviewer,
event_time=review.reviewed_at or review.created_at,
title=title,
description=review.review_note or "",
note=review.review_note,
)
)
audit_logs = self.audit_service.repository.list(
resource_type=asset.asset_type,
resource_id=asset.id,
limit=200,
)
for log in audit_logs:
if log.action != "activate_agent_asset":
continue
after_json = log.after_json or {}
version = str(
after_json.get("published_version")
or after_json.get("current_version")
or ""
).strip()
if not version:
continue
events.append(
AgentAssetVersionTimelineItemRead(
event_type="published",
version=version,
actor=log.actor,
event_time=log.created_at,
title="正式上线",
description="该版本已切换为线上正式版本。",
)
)
return sorted(events, key=lambda item: item.event_time)
def list_spreadsheet_change_records(
self,
asset_id: str,
*,
limit: int = 30,
) -> list[AgentAssetSpreadsheetChangeRecordRead]:
self._ensure_ready()
asset = self._require_spreadsheet_rule(asset_id)
logs = self.audit_service.repository.list(
resource_type=asset.asset_type,
resource_id=asset.id,
action="edit_rule_spreadsheet",
limit=min(max(limit, 1), 30),
)
return [
AgentAssetSpreadsheetChangeRecordRead(
id=log.id,
actor=log.actor,
changed_at=log.created_at,
summary=str((log.after_json or {}).get("summary") or "表格内容已保存。"),
sheet_changes=[
AgentAssetSpreadsheetDiffSheetRead.model_validate(item)
for item in ((log.after_json or {}).get("sheet_changes") or [])
],
cell_changes=[
AgentAssetSpreadsheetDiffCellRead.model_validate(item)
for item in ((log.after_json or {}).get("cell_changes") or [])
],
changed_sheet_count=int((log.after_json or {}).get("changed_sheet_count") or 0),
changed_cell_count=int((log.after_json or {}).get("changed_cell_count") or 0),
)
for log in logs
]
@staticmethod
def _extract_restore_source_version(change_note: str | None) -> str | None:
normalized = str(change_note or "").strip()
prefix = "基于历史版本 "
suffix = " 恢复生成工作稿"
if not normalized.startswith(prefix) or suffix not in normalized:
return None
return normalized.removeprefix(prefix).split(suffix, 1)[0].strip() or None

File diff suppressed because it is too large Load Diff

View File

@@ -18,6 +18,50 @@ STATEFUL_CONTEXT_KEYS = (
"attachment_count",
"ocr_summary",
"ocr_documents",
"review_form_values",
"business_time_context",
)
REVIEW_FLOW_CONTEXT_KEYS = {
"draft_claim_id",
"draft_claim_no",
"draft_status",
"request_context",
"attachment_names",
"attachment_count",
"ocr_summary",
"ocr_documents",
"review_form_values",
"business_time_context",
}
REVIEW_FLOW_CONTINUATION_KEYWORDS = (
"补充",
"继续",
"继续上传",
"当前",
"这张",
"这个",
"该单据",
"现有",
"已有",
"关联",
"合并",
"修改",
"更正",
"改成",
"调整",
"下一步",
"保存草稿",
)
NEW_EXPENSE_PROMPT_KEYWORDS = (
"申请报销",
"我要报销",
"我想报销",
"帮我报销",
"发起报销",
"提交报销",
"生成报销",
"创建报销",
"新建报销",
)
DEFAULT_CONVERSATION_RETENTION_DAYS = 3
@@ -39,6 +83,7 @@ class AgentConversationService:
normalized_id = str(conversation_id or "").strip()
normalized_user_id = str(user_id or "").strip() or None
incoming_session_type = str(context_json.get("session_type") or "").strip() or "expense"
incoming_draft_claim_id = self._resolve_draft_claim_id(context_json)
conversation = self.get_conversation(normalized_id) if normalized_id else None
if conversation is not None and conversation.user_id != normalized_user_id:
normalized_id = ""
@@ -56,6 +101,7 @@ class AgentConversationService:
source=source,
entry_source=str(context_json.get("entry_source") or "").strip() or None,
title=self._resolve_title(context_json),
draft_claim_id=incoming_draft_claim_id or None,
state_json=self._extract_state_json(context_json),
)
self.db.add(conversation)
@@ -69,6 +115,8 @@ class AgentConversationService:
conversation.entry_source = str(context_json.get("entry_source") or "").strip() or None
if not conversation.title:
conversation.title = self._resolve_title(context_json)
if incoming_draft_claim_id:
conversation.draft_claim_id = incoming_draft_claim_id
conversation.state_json = self._merge_state_json(
conversation.state_json,
self._extract_state_json(context_json),
@@ -86,7 +134,11 @@ class AgentConversationService:
resolved_retention_days = retention_days or self._resolve_retention_days()
cutoff = datetime.now(UTC) - timedelta(days=max(1, resolved_retention_days))
stmt = select(AgentConversation).where(AgentConversation.updated_at < cutoff)
expired_conversations = list(self.db.scalars(stmt).all())
expired_conversations = [
conversation
for conversation in self.db.scalars(stmt).all()
if not self._is_saved_conversation(conversation)
]
if not expired_conversations:
return 0
@@ -96,6 +148,13 @@ class AgentConversationService:
self.db.commit()
return len(expired_conversations)
@staticmethod
def _is_saved_conversation(conversation: AgentConversation) -> bool:
if str(conversation.draft_claim_id or "").strip():
return True
state_json = dict(conversation.state_json or {})
return bool(str(state_json.get("draft_claim_id") or "").strip())
def _resolve_retention_days(self) -> int:
try:
settings_row, _ = SettingsService(self.db).ensure_settings_ready()
@@ -178,10 +237,18 @@ class AgentConversationService:
*,
conversation: AgentConversation,
context_json: dict[str, Any],
message: str | None = None,
history_limit: int = 8,
) -> dict[str, Any]:
merged = dict(context_json or {})
state_json = dict(conversation.state_json or {})
should_hydrate_review_flow = self._should_hydrate_review_flow_context(
context_json=merged,
message=message,
)
if not should_hydrate_review_flow:
for key in REVIEW_FLOW_CONTEXT_KEYS:
merged.pop(key, None)
merged["conversation_id"] = conversation.conversation_id
merged["conversation_history"] = self.list_message_history(
@@ -192,16 +259,58 @@ class AgentConversationService:
merged.setdefault("conversation_scenario", conversation.last_scenario)
if conversation.last_intent:
merged.setdefault("conversation_intent", conversation.last_intent)
if conversation.draft_claim_id and not str(merged.get("draft_claim_id") or "").strip():
if (
should_hydrate_review_flow
and conversation.draft_claim_id
and not str(merged.get("draft_claim_id") or "").strip()
):
merged["draft_claim_id"] = conversation.draft_claim_id
merged["conversation_state"] = state_json
for key in STATEFUL_CONTEXT_KEYS:
if key in REVIEW_FLOW_CONTEXT_KEYS and not should_hydrate_review_flow:
continue
if self._is_empty_value(merged.get(key)) and not self._is_empty_value(state_json.get(key)):
merged[key] = state_json.get(key)
return merged
@staticmethod
def _should_hydrate_review_flow_context(
*,
context_json: dict[str, Any],
message: str | None,
) -> bool:
if isinstance(context_json.get("expense_scene_selection"), dict):
return True
if AgentConversationService._resolve_draft_claim_id(context_json):
compact_message = str(message or "").replace(" ", "")
if compact_message and any(keyword in compact_message for keyword in NEW_EXPENSE_PROMPT_KEYWORDS):
return False
return True
if str(context_json.get("review_action") or "").strip():
return True
if str(context_json.get("entry_source") or "").strip() == "detail":
return True
if not AgentConversationService._is_empty_value(context_json.get("attachment_names")):
return True
if not AgentConversationService._is_empty_value(context_json.get("ocr_documents")):
return True
if str(context_json.get("ocr_summary") or "").strip():
return True
try:
if int(context_json.get("attachment_count") or 0) > 0:
return True
except (TypeError, ValueError):
pass
compact_message = str(message or "").replace(" ", "")
if not compact_message:
return False
if any(keyword in compact_message for keyword in NEW_EXPENSE_PROMPT_KEYWORDS):
return False
return any(keyword in compact_message for keyword in REVIEW_FLOW_CONTINUATION_KEYWORDS)
def append_message(
self,
*,
@@ -354,6 +463,38 @@ class AgentConversationService:
self.db.commit()
return len(conversations)
def delete_conversations_for_draft_claim(
self,
*,
claim_id: str | None,
source: str | None = "user_message",
session_type: str | None = "expense",
) -> int:
normalized_claim_id = str(claim_id or "").strip()
if not normalized_claim_id:
return 0
stmt = select(AgentConversation).where(AgentConversation.draft_claim_id == normalized_claim_id)
if source:
stmt = stmt.where(AgentConversation.source == source)
conversations = list(self.db.scalars(stmt).all())
normalized_session_type = str(session_type or "").strip()
if normalized_session_type:
conversations = [
conversation
for conversation in conversations
if (str((conversation.state_json or {}).get("session_type") or "").strip() or "expense")
== normalized_session_type
]
if not conversations:
return 0
for conversation in conversations:
self.db.delete(conversation)
self.db.commit()
return len(conversations)
def delete_conversation(
self,
*,
@@ -478,11 +619,28 @@ class AgentConversationService:
continue
state_json[key] = value
draft_claim_id = str(context_json.get("draft_claim_id") or "").strip()
draft_claim_id = AgentConversationService._resolve_draft_claim_id(context_json)
if draft_claim_id:
state_json["draft_claim_id"] = draft_claim_id
return state_json
@staticmethod
def _resolve_draft_claim_id(context_json: dict[str, Any]) -> str:
draft_claim_id = str((context_json or {}).get("draft_claim_id") or "").strip()
if draft_claim_id:
return draft_claim_id
request_context = (context_json or {}).get("request_context")
if isinstance(request_context, dict):
return str(
request_context.get("claim_id")
or request_context.get("claimId")
or request_context.get("draft_claim_id")
or request_context.get("draftClaimId")
or ""
).strip()
return ""
@staticmethod
def _merge_state_json(
current_state: dict[str, Any] | None,

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,322 @@
from __future__ import annotations
import hashlib
import json
from datetime import UTC, date, datetime
from decimal import Decimal
from pathlib import Path
from sqlalchemy import inspect, select, text
from app.core.agent_enums import (
AgentAssetContentType,
AgentAssetDomain,
AgentAssetStatus,
AgentAssetType,
AgentName,
AgentPermissionLevel,
AgentReviewStatus,
AgentRunSource,
AgentRunStatus,
AgentToolType,
)
from app.models.agent_asset import AgentAsset, AgentAssetReview, AgentAssetVersion
from app.models.agent_run import AgentRun, AgentToolCall, SemanticParseLog
from app.models.audit_log import AuditLog
from app.models.financial_record import (
AccountsPayableRecord,
AccountsReceivableRecord,
ExpenseClaim,
ExpenseClaimItem,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import (
AgentAssetSpreadsheetManager,
COMPANY_COMMUNICATION_EXPENSE_RULE_CODE,
COMPANY_COMMUNICATION_EXPENSE_RULE_FILENAME,
COMPANY_TRAVEL_EXPENSE_RULE_CODE,
COMPANY_TRAVEL_EXPENSE_RULE_FILENAME,
FINANCE_RULES_LIBRARY,
RISK_RULES_LIBRARY,
)
from app.services.expense_rule_runtime import (
build_scene_submission_standard_markdown,
build_travel_risk_control_standard_markdown,
)
from app.services.agent_foundation_constants import (
ATTACHMENT_RULE_ASSET_CODE,
ATTACHMENT_RULE_RUNTIME_CONFIG,
COMPANY_COMMUNICATION_RULE_SCENARIO_JSON,
COMPANY_COMMUNICATION_RULE_VERSION,
COMPANY_TRAVEL_RULE_SCENARIO_JSON,
COMPANY_TRAVEL_RULE_VERSION,
DEMO_EXPENSE_CLAIM_SIGNATURES,
DEMO_PAYABLE_SIGNATURES,
DEMO_RECEIVABLE_SIGNATURES,
LEGACY_RULE_CODES,
PLATFORM_DESTINATION_LOCATION_RULE_FILENAME,
)
from app.core.logging import get_logger
logger = get_logger("app.services.agent_foundation")
class AgentFoundationAssetHelperMixin:
def _create_seed_asset(
self,
*,
asset_type: str,
code: str,
name: str,
description: str,
domain: str,
scenario_json: list[str],
owner: str,
reviewer: str,
status: str,
current_version: str,
config_json: dict[str, object],
) -> AgentAsset:
asset = AgentAsset(
asset_type=asset_type,
code=code,
name=name,
description=description,
domain=domain,
scenario_json=scenario_json,
owner=owner,
reviewer=reviewer,
status=status,
current_version=current_version,
published_version=current_version if status == AgentAssetStatus.ACTIVE.value else None,
working_version=current_version,
config_json=config_json,
)
self.db.add(asset)
self.db.flush()
return asset
def _ensure_asset_version(
self,
asset: AgentAsset,
*,
version: str,
content: str,
content_type: str,
change_note: str,
created_by: str,
) -> None:
existing = self.db.scalar(
select(AgentAssetVersion).where(
AgentAssetVersion.asset_id == asset.id,
AgentAssetVersion.version == version,
)
)
if existing is not None:
return
self.db.add(
AgentAssetVersion(
asset_id=asset.id,
version=version,
content=content,
content_type=content_type,
change_note=change_note,
created_by=created_by,
)
)
def _ensure_asset_review(
self,
asset: AgentAsset,
*,
version: str,
reviewer: str,
review_status: str,
review_note: str,
reviewed_at: datetime | None,
) -> None:
existing = self.db.scalar(
select(AgentAssetReview).where(
AgentAssetReview.asset_id == asset.id,
AgentAssetReview.version == version,
AgentAssetReview.review_status == review_status,
)
)
if existing is not None:
return
self.db.add(
AgentAssetReview(
asset_id=asset.id,
version=version,
reviewer=reviewer,
review_status=review_status,
review_note=review_note,
reviewed_at=reviewed_at,
)
)
def _remove_legacy_rule_assets(self) -> None:
assets = list(
self.db.scalars(
select(AgentAsset).where(AgentAsset.code.in_(LEGACY_RULE_CODES))
).all()
)
for asset in assets:
self.db.delete(asset)
obsolete_logs = list(
self.db.scalars(
select(AuditLog).where(AuditLog.resource_id.in_(LEGACY_RULE_CODES))
).all()
)
for log in obsolete_logs:
self.db.delete(log)
def _ensure_agent_asset_schema(self) -> None:
bind = self.db.get_bind()
inspector = inspect(bind)
if "agent_assets" not in inspector.get_table_names():
return
column_names = {column["name"] for column in inspector.get_columns("agent_assets")}
migration_statements: list[str] = []
if "published_version" not in column_names:
migration_statements.append("ALTER TABLE agent_assets ADD COLUMN published_version VARCHAR(30)")
if "working_version" not in column_names:
migration_statements.append("ALTER TABLE agent_assets ADD COLUMN working_version VARCHAR(30)")
for statement in migration_statements:
self.db.execute(text(statement))
self.db.execute(
text(
"UPDATE agent_assets "
"SET working_version = COALESCE(working_version, current_version), "
"published_version = CASE "
"WHEN published_version IS NOT NULL THEN published_version "
"WHEN status = 'active' THEN current_version "
"ELSE published_version END"
)
)
if migration_statements:
self.db.commit()

View File

@@ -0,0 +1,599 @@
from __future__ import annotations
import hashlib
import json
from datetime import UTC, date, datetime
from decimal import Decimal
from pathlib import Path
from sqlalchemy import inspect, select, text
from app.core.agent_enums import (
AgentAssetContentType,
AgentAssetDomain,
AgentAssetStatus,
AgentAssetType,
AgentName,
AgentPermissionLevel,
AgentReviewStatus,
AgentRunSource,
AgentRunStatus,
AgentToolType,
)
from app.models.agent_asset import AgentAsset, AgentAssetReview, AgentAssetVersion
from app.models.agent_run import AgentRun, AgentToolCall, SemanticParseLog
from app.models.audit_log import AuditLog
from app.models.financial_record import (
AccountsPayableRecord,
AccountsReceivableRecord,
ExpenseClaim,
ExpenseClaimItem,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import (
AgentAssetSpreadsheetManager,
COMPANY_COMMUNICATION_EXPENSE_RULE_CODE,
COMPANY_COMMUNICATION_EXPENSE_RULE_FILENAME,
COMPANY_TRAVEL_EXPENSE_RULE_CODE,
COMPANY_TRAVEL_EXPENSE_RULE_FILENAME,
FINANCE_RULES_LIBRARY,
RISK_RULES_LIBRARY,
)
from app.services.expense_rule_runtime import (
build_scene_submission_standard_markdown,
build_travel_risk_control_standard_markdown,
)
from app.services.agent_foundation_constants import (
ATTACHMENT_RULE_ASSET_CODE,
ATTACHMENT_RULE_RUNTIME_CONFIG,
COMPANY_COMMUNICATION_RULE_SCENARIO_JSON,
COMPANY_COMMUNICATION_RULE_VERSION,
COMPANY_TRAVEL_RULE_SCENARIO_JSON,
COMPANY_TRAVEL_RULE_VERSION,
DEMO_EXPENSE_CLAIM_SIGNATURES,
DEMO_PAYABLE_SIGNATURES,
DEMO_RECEIVABLE_SIGNATURES,
LEGACY_RULE_CODES,
PLATFORM_DESTINATION_LOCATION_RULE_FILENAME,
)
from app.core.logging import get_logger
logger = get_logger("app.services.agent_foundation")
class AgentFoundationAssetSeedMixin:
def _seed_agent_assets(self) -> None:
existing_codes = set(self.db.scalars(select(AgentAsset.code)).all())
if existing_codes:
self._top_up_agent_assets(existing_codes)
return
attachment_rule = AgentAsset(
asset_type=AgentAssetType.RULE.value,
code=ATTACHMENT_RULE_ASSET_CODE,
name="报销附件与单据完整性规则",
description="统一定义报销提交时的附件数量、票据类型和补件处理口径,作为上线前待审核规则。",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=["expense", "risk_check", "attachment_policy", "invoice_anomaly"],
owner="财务制度管理组",
reviewer="高嘉禾",
status=AgentAssetStatus.REVIEW.value,
current_version="v1.0.0",
published_version=None,
working_version="v1.0.0",
config_json={
"severity": "high",
"enabled": False,
"runtime_kind": "policy_rule_draft",
"rule_template_key": "attachment_requirement_v1",
"rule_template_label": "附件要求模板",
"runtime_rule": ATTACHMENT_RULE_RUNTIME_CONFIG,
},
)
scene_submission_rule = AgentAsset(
asset_type=AgentAssetType.RULE.value,
code="rule.expense.scene_submission_standard",
name="报销场景提交与附件标准",
description="统一定义各报销场景的必填字段、附件类型要求和金额阈值。",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=["expense", "risk_check", "scene_policy", "attachment_policy"],
owner="费用运营组",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
published_version="v1.0.0",
working_version="v1.0.0",
config_json={
"severity": "high",
"enabled": True,
"runtime_kind": "scene_matrix",
"rule_template_label": "系统内置场景矩阵规则",
},
)
travel_policy_rule = AgentAsset(
asset_type=AgentAssetType.RULE.value,
code="rule.expense.travel_risk_control_standard",
name="差旅报销风险管控制度",
description="统一定义差旅报销的行程闭环、酒店地点一致性、职级差标和风险处置口径。",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=["expense", "risk_check", "travel_policy", "travel_standard"],
owner="风控与审计部",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.1.0",
published_version="v1.1.0",
working_version="v1.1.0",
config_json={
"severity": "high",
"enabled": True,
"block_on_high_risk": True,
"warning_on_medium_risk": True,
"source_doc": "document/development/risks/travel-risk-control-standard.md",
"runtime_kind": "travel_policy",
"rule_template_key": "travel_standard_v1",
"rule_template_label": "差旅标准模板",
},
)
company_travel_rule = AgentAsset(
asset_type=AgentAssetType.RULE.value,
code=COMPANY_TRAVEL_EXPENSE_RULE_CODE,
name="公司差旅费报销规则",
description="通过 Excel 明细表维护差旅费报销标准、票据要求和审批口径。",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=list(COMPANY_TRAVEL_RULE_SCENARIO_JSON),
owner="财务制度管理组",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version=COMPANY_TRAVEL_RULE_VERSION,
published_version=COMPANY_TRAVEL_RULE_VERSION,
working_version=COMPANY_TRAVEL_RULE_VERSION,
config_json={
"severity": "medium",
"enabled": True,
"tag": "财务规则",
"detail_mode": "spreadsheet",
"rule_library": FINANCE_RULES_LIBRARY,
"scenario_category": COMPANY_TRAVEL_RULE_SCENARIO_JSON[0],
"ai_review_category": COMPANY_TRAVEL_RULE_SCENARIO_JSON[0],
"rule_template_label": "差旅报销 Excel 模板",
},
)
platform_risk_assets = self._build_platform_risk_seed_assets()
company_communication_rule = AgentAsset(
asset_type=AgentAssetType.RULE.value,
code=COMPANY_COMMUNICATION_EXPENSE_RULE_CODE,
name="公司通信费报销规则",
description="通过 Excel 明细表维护员工通信费报销标准、专项补充口径和审批要求。",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=list(COMPANY_COMMUNICATION_RULE_SCENARIO_JSON),
owner="财务制度管理组",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version=COMPANY_COMMUNICATION_RULE_VERSION,
published_version=COMPANY_COMMUNICATION_RULE_VERSION,
working_version=COMPANY_COMMUNICATION_RULE_VERSION,
config_json={
"severity": "medium",
"enabled": True,
"tag": "财务规则",
"detail_mode": "spreadsheet",
"rule_library": FINANCE_RULES_LIBRARY,
"scenario_category": COMPANY_COMMUNICATION_RULE_SCENARIO_JSON[0],
"ai_review_category": COMPANY_COMMUNICATION_RULE_SCENARIO_JSON[0],
"rule_template_label": "通信费报销 Excel 模板",
},
)
skill_expense_asset = AgentAsset(
asset_type=AgentAssetType.SKILL.value,
code="skill.expense.summary_lookup",
name="报销汇总查询技能",
description="根据时间、员工和部门汇总报销金额与单据数量。",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=["expense", "query", "summary"],
owner="平台研发组",
reviewer="陈硕",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
published_version="v1.0.0",
working_version="v1.0.0",
config_json={"input_schema": ["time_range", "employee", "department"]},
)
skill_ar_asset = AgentAsset(
asset_type=AgentAssetType.SKILL.value,
code="skill.ar.aging_summary",
name="应收账龄汇总技能",
description="按客户、账龄和逾期状态汇总应收风险分布。",
domain=AgentAssetDomain.AR.value,
scenario_json=["accounts_receivable", "query", "aging_summary"],
owner="平台研发组",
reviewer="陈硕",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
published_version="v1.0.0",
working_version="v1.0.0",
config_json={"input_schema": ["customer", "aging_bucket", "status"]},
)
invoice_mcp_asset = AgentAsset(
asset_type=AgentAssetType.MCP.value,
code="mcp.invoice.verify_mock",
name="发票验真 Mock 服务",
description="模拟发票验真、发票状态查询和异常降级说明。",
domain=AgentAssetDomain.SYSTEM.value,
scenario_json=["expense", "invoice_validation"],
owner="平台研发组",
reviewer="周悦宁",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
published_version="v1.0.0",
working_version="v1.0.0",
config_json={"endpoint": "mock://invoice/verify", "timeout_ms": 1200},
)
ledger_mcp_asset = AgentAsset(
asset_type=AgentAssetType.MCP.value,
code="mcp.ledger.snapshot_mock",
name="总账快照 Mock 服务",
description="模拟返回应收、应付和费用汇总快照,供 Agent 查询和巡检。",
domain=AgentAssetDomain.SYSTEM.value,
scenario_json=["expense", "accounts_receivable", "accounts_payable"],
owner="平台研发组",
reviewer="周悦宁",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
published_version="v1.0.0",
working_version="v1.0.0",
config_json={"endpoint": "mock://ledger/snapshot", "timeout_ms": 1500},
)
task_asset = AgentAsset(
asset_type=AgentAssetType.TASK.value,
code="task.hermes.daily_risk_scan",
name="Hermes 每日风险巡检",
description="每天早上巡检重复报销、金额超标、逾期应收和异常付款。",
domain=AgentAssetDomain.SYSTEM.value,
scenario_json=["schedule", "risk_check"],
owner="风控与审计部",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
published_version="v1.0.0",
working_version="v1.0.0",
config_json={"cron": "0 9 * * *", "agent": AgentName.HERMES.value},
)
ar_summary_task = AgentAsset(
asset_type=AgentAssetType.TASK.value,
code="task.hermes.weekly_ar_summary",
name="Hermes 每周应收账龄汇总",
description="每周汇总逾期应收、账龄分布和客户风险变化。",
domain=AgentAssetDomain.SYSTEM.value,
scenario_json=["schedule", "accounts_receivable", "summary"],
owner="风控与审计部",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
published_version="v1.0.0",
working_version="v1.0.0",
config_json={"cron": "0 10 * * 1", "agent": AgentName.HERMES.value},
)
rule_digest_task = AgentAsset(
asset_type=AgentAssetType.TASK.value,
code="task.hermes.rule_review_digest",
name="Hermes 规则待审摘要",
description="每天汇总待审规则、待补样例和被拒规则修订建议。",
domain=AgentAssetDomain.SYSTEM.value,
scenario_json=["schedule", "rule_center", "review_digest"],
owner="风控与审计部",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
published_version="v1.0.0",
working_version="v1.0.0",
config_json={"cron": "0 18 * * *", "agent": AgentName.HERMES.value},
)
knowledge_index_task = AgentAsset(
asset_type=AgentAssetType.TASK.value,
code="task.hermes.knowledge_index_sync",
name="Hermes ??????",
description="?????????? LightRAG ???????",
domain=AgentAssetDomain.SYSTEM.value,
scenario_json=["schedule", "knowledge", "rule_center"],
owner="财务制度管理组",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
published_version="v1.0.0",
working_version="v1.0.0",
config_json={"cron": "0 0 * * *", "agent": AgentName.HERMES.value},
)
self.db.add_all(
[
attachment_rule,
scene_submission_rule,
travel_policy_rule,
*platform_risk_assets,
company_travel_rule,
company_communication_rule,
skill_expense_asset,
skill_ar_asset,
invoice_mcp_asset,
ledger_mcp_asset,
task_asset,
ar_summary_task,
rule_digest_task,
knowledge_index_task,
]
)
self.db.flush()
company_travel_rule_meta = self._ensure_company_travel_rule_spreadsheet_seed(
company_travel_rule,
version=COMPANY_TRAVEL_RULE_VERSION,
actor_name="系统初始化",
)
company_communication_rule_meta = self._ensure_company_communication_rule_spreadsheet_seed(
company_communication_rule,
version=COMPANY_COMMUNICATION_RULE_VERSION,
actor_name="系统初始化",
)
self.db.add_all(
[
AgentAssetVersion(
asset=attachment_rule,
version="v0.9.0",
content=self._attachment_submission_requirement_markdown(
version_note="首版附件完整性规则草稿,覆盖基础票据与补件口径。",
include_review_note=True,
),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="首版草稿。",
created_by="高嘉禾",
),
AgentAssetVersion(
asset=attachment_rule,
version="v1.0.0",
content=self._attachment_submission_requirement_markdown(
version_note="补充票据缺失、收据替代和差旅等效凭证口径,待审核。",
include_review_note=True,
),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="补充票据替代与差旅等效凭证口径,待审核。",
created_by="高嘉禾",
),
AgentAssetVersion(
asset=scene_submission_rule,
version="v1.0.0",
content=self._scene_submission_standard_markdown(),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="首版报销场景提交标准,覆盖附件类型、必填字段和金额阈值。",
created_by="系统初始化",
),
AgentAssetVersion(
asset=travel_policy_rule,
version="v1.0.0",
content=self._travel_risk_control_standard_markdown(version="v1.0.0"),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="首版差旅制度执行规则,覆盖行程闭环与基础差标校验。",
created_by="系统初始化",
),
AgentAssetVersion(
asset=travel_policy_rule,
version="v1.1.0",
content=self._travel_risk_control_standard_markdown(version="v1.1.0"),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="补充可执行规则块,供审核引擎直接消费差旅制度标准。",
created_by="系统初始化",
),
*[
AgentAssetVersion(
asset=asset,
version="v1.0.0",
content=self._platform_risk_rule_markdown(asset),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note=f"平台通用风险规则:{asset.name}",
created_by="系统初始化",
)
for asset in platform_risk_assets
],
AgentAssetVersion(
asset=company_travel_rule,
version=COMPANY_TRAVEL_RULE_VERSION,
content=AgentAssetSpreadsheetManager.build_version_markdown(
rule_name=company_travel_rule.name,
version=COMPANY_TRAVEL_RULE_VERSION,
metadata=company_travel_rule_meta,
),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="初始化差旅费报销 Excel 规则表。",
created_by="系统初始化",
),
AgentAssetVersion(
asset=company_communication_rule,
version=COMPANY_COMMUNICATION_RULE_VERSION,
content=AgentAssetSpreadsheetManager.build_version_markdown(
rule_name=company_communication_rule.name,
version=COMPANY_COMMUNICATION_RULE_VERSION,
metadata=company_communication_rule_meta,
),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="初始化通信费报销 Excel 规则表。",
created_by="系统初始化",
),
AgentAssetVersion(
asset=skill_expense_asset,
version="v1.0.0",
content=self._json_content(
{
"inputs": ["time_range", "employee", "department"],
"outputs": ["total_amount", "claim_count"],
"dependencies": ["database.expense_claims"],
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化技能快照。",
created_by="系统初始化",
),
AgentAssetVersion(
asset=skill_ar_asset,
version="v1.0.0",
content=self._json_content(
{
"inputs": ["customer", "aging_bucket", "status"],
"outputs": ["receivable_total", "overdue_total", "customer_count"],
"dependencies": ["database.accounts_receivable"],
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化应收账龄技能快照。",
created_by="系统初始化",
),
AgentAssetVersion(
asset=invoice_mcp_asset,
version="v1.0.0",
content=self._json_content(
{
"service_type": "mock",
"auth_mode": "none",
"degrade_strategy": "return_stub_with_warning",
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化 MCP 快照。",
created_by="系统初始化",
),
AgentAssetVersion(
asset=ledger_mcp_asset,
version="v1.0.0",
content=self._json_content(
{
"service_type": "mock",
"auth_mode": "service_account",
"degrade_strategy": "return_cached_snapshot_with_warning",
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化总账快照 MCP。",
created_by="系统初始化",
),
AgentAssetVersion(
asset=task_asset,
version="v1.0.0",
content=self._json_content(
{
"task_type": "daily_risk_scan",
"schedule": "0 9 * * *",
"target_agent": AgentName.HERMES.value,
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化任务快照。",
created_by="系统初始化",
),
AgentAssetVersion(
asset=ar_summary_task,
version="v1.0.0",
content=self._json_content(
{
"task_type": "weekly_ar_summary",
"schedule": "0 10 * * 1",
"target_agent": AgentName.HERMES.value,
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化应收账龄汇总任务。",
created_by="系统初始化",
),
AgentAssetVersion(
asset=rule_digest_task,
version="v1.0.0",
content=self._json_content(
{
"task_type": "rule_review_digest",
"schedule": "0 18 * * *",
"target_agent": AgentName.HERMES.value,
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化规则待审摘要任务。",
created_by="系统初始化",
),
AgentAssetVersion(
asset=knowledge_index_task,
version="v1.0.0",
content=self._json_content(
{
"task_type": "knowledge_index_sync",
"schedule": "0 0 * * *",
"target_agent": AgentName.HERMES.value,
"folder": "报销制度",
"changed_only": True,
"index_engine": "lightrag",
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化制度知识与规则草稿形成任务。",
created_by="系统初始化",
),
]
)
self.db.add_all(
[
AgentAssetReview(
asset=attachment_rule,
version="v1.0.0",
reviewer="高嘉禾",
review_status=AgentReviewStatus.PENDING.value,
review_note="等待制度管理员确认收据替代与补件时限口径。",
reviewed_at=None,
),
AgentAssetReview(
asset=scene_submission_rule,
version="v1.0.0",
reviewer="顾承宇",
review_status=AgentReviewStatus.APPROVED.value,
review_note="可作为报销场景统一审核标准正式执行。",
reviewed_at=datetime.now(UTC),
),
AgentAssetReview(
asset=travel_policy_rule,
version="v1.1.0",
reviewer="顾承宇",
review_status=AgentReviewStatus.APPROVED.value,
review_note="制度口径已确认,并已补充可执行配置供审核引擎读取。",
reviewed_at=datetime.now(UTC),
),
AgentAssetReview(
asset=company_travel_rule,
version=COMPANY_TRAVEL_RULE_VERSION,
reviewer="顾承宇",
review_status=AgentReviewStatus.APPROVED.value,
review_note="首版 Excel 规则表已确认,可作为财务规则使用。",
reviewed_at=datetime.now(UTC),
),
AgentAssetReview(
asset=company_communication_rule,
version=COMPANY_COMMUNICATION_RULE_VERSION,
reviewer="顾承宇",
review_status=AgentReviewStatus.APPROVED.value,
review_note="首版 Excel 规则表已确认,可作为财务规则使用。",
reviewed_at=datetime.now(UTC),
),
]
)

View File

@@ -0,0 +1,667 @@
from __future__ import annotations
import hashlib
import json
from datetime import UTC, date, datetime
from decimal import Decimal
from pathlib import Path
from sqlalchemy import inspect, select, text
from app.core.agent_enums import (
AgentAssetContentType,
AgentAssetDomain,
AgentAssetStatus,
AgentAssetType,
AgentName,
AgentPermissionLevel,
AgentReviewStatus,
AgentRunSource,
AgentRunStatus,
AgentToolType,
)
from app.models.agent_asset import AgentAsset, AgentAssetReview, AgentAssetVersion
from app.models.agent_run import AgentRun, AgentToolCall, SemanticParseLog
from app.models.audit_log import AuditLog
from app.models.financial_record import (
AccountsPayableRecord,
AccountsReceivableRecord,
ExpenseClaim,
ExpenseClaimItem,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import (
AgentAssetSpreadsheetManager,
COMPANY_COMMUNICATION_EXPENSE_RULE_CODE,
COMPANY_COMMUNICATION_EXPENSE_RULE_FILENAME,
COMPANY_TRAVEL_EXPENSE_RULE_CODE,
COMPANY_TRAVEL_EXPENSE_RULE_FILENAME,
FINANCE_RULES_LIBRARY,
RISK_RULES_LIBRARY,
)
from app.services.expense_rule_runtime import (
build_scene_submission_standard_markdown,
build_travel_risk_control_standard_markdown,
)
from app.services.agent_foundation_constants import (
ATTACHMENT_RULE_ASSET_CODE,
ATTACHMENT_RULE_RUNTIME_CONFIG,
COMPANY_COMMUNICATION_RULE_SCENARIO_JSON,
COMPANY_COMMUNICATION_RULE_VERSION,
COMPANY_TRAVEL_RULE_SCENARIO_JSON,
COMPANY_TRAVEL_RULE_VERSION,
DEMO_EXPENSE_CLAIM_SIGNATURES,
DEMO_PAYABLE_SIGNATURES,
DEMO_RECEIVABLE_SIGNATURES,
LEGACY_RULE_CODES,
PLATFORM_DESTINATION_LOCATION_RULE_FILENAME,
)
from app.core.logging import get_logger
logger = get_logger("app.services.agent_foundation")
class AgentFoundationAssetTopUpMixin:
def _top_up_agent_assets(self, existing_codes: set[str]) -> None:
self._remove_legacy_rule_assets()
existing_codes = set(self.db.scalars(select(AgentAsset.code)).all())
attachment_rule = self.db.scalar(
select(AgentAsset).where(AgentAsset.code == ATTACHMENT_RULE_ASSET_CODE)
)
scene_submission_rule = self.db.scalar(
select(AgentAsset).where(AgentAsset.code == "rule.expense.scene_submission_standard")
)
travel_policy_rule = self.db.scalar(
select(AgentAsset).where(AgentAsset.code == "rule.expense.travel_risk_control_standard")
)
company_travel_rule = self.db.scalar(
select(AgentAsset).where(AgentAsset.code == COMPANY_TRAVEL_EXPENSE_RULE_CODE)
)
company_communication_rule = self.db.scalar(
select(AgentAsset).where(AgentAsset.code == COMPANY_COMMUNICATION_EXPENSE_RULE_CODE)
)
if ATTACHMENT_RULE_ASSET_CODE not in existing_codes:
attachment_rule = self._create_seed_asset(
asset_type=AgentAssetType.RULE.value,
code=ATTACHMENT_RULE_ASSET_CODE,
name="报销附件与单据完整性规则",
description="统一定义报销提交时的附件数量、票据类型和补件处理口径,作为上线前待审核规则。",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=["expense", "risk_check", "attachment_policy", "invoice_anomaly"],
owner="财务制度管理组",
reviewer="高嘉禾",
status=AgentAssetStatus.REVIEW.value,
current_version="v1.0.0",
config_json={
"severity": "high",
"enabled": False,
"runtime_kind": "policy_rule_draft",
"rule_template_key": "attachment_requirement_v1",
"rule_template_label": "附件要求模板",
"runtime_rule": ATTACHMENT_RULE_RUNTIME_CONFIG,
},
)
if attachment_rule is not None:
if not str(attachment_rule.current_version or "").strip():
attachment_rule.current_version = "v1.0.0"
if not str(attachment_rule.working_version or "").strip():
attachment_rule.working_version = attachment_rule.current_version
attachment_rule.status = attachment_rule.status or AgentAssetStatus.REVIEW.value
attachment_rule.description = (
"统一定义报销提交时的附件数量、票据类型和补件处理口径,作为上线前待审核规则。"
)
attachment_rule.config_json = {
"severity": "high",
"enabled": False,
"runtime_kind": "policy_rule_draft",
"rule_template_key": "attachment_requirement_v1",
"rule_template_label": "附件要求模板",
"runtime_rule": ATTACHMENT_RULE_RUNTIME_CONFIG,
}
self._ensure_asset_version(
attachment_rule,
version="v0.9.0",
content=self._attachment_submission_requirement_markdown(
version_note="首版附件完整性规则草稿,覆盖基础票据与补件口径。",
include_review_note=True,
),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="首版草稿。",
created_by="高嘉禾",
)
self._ensure_asset_version(
attachment_rule,
version="v1.0.0",
content=self._attachment_submission_requirement_markdown(
version_note="补充票据缺失、收据替代和差旅等效凭证口径,待审核。",
include_review_note=True,
),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="补充票据替代与差旅等效凭证口径,待审核。",
created_by="高嘉禾",
)
self._ensure_asset_review(
attachment_rule,
version="v1.0.0",
reviewer="高嘉禾",
review_status=AgentReviewStatus.PENDING.value,
review_note="等待制度管理员确认收据替代与补件时限口径。",
reviewed_at=None,
)
if "rule.expense.scene_submission_standard" not in existing_codes:
scene_submission_rule = self._create_seed_asset(
asset_type=AgentAssetType.RULE.value,
code="rule.expense.scene_submission_standard",
name="报销场景提交与附件标准",
description="统一定义各报销场景的必填字段、附件类型要求和金额阈值。",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=["expense", "risk_check", "scene_policy", "attachment_policy"],
owner="费用运营组",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
config_json={
"severity": "high",
"enabled": True,
"runtime_kind": "scene_matrix",
"rule_template_label": "系统内置场景矩阵规则",
},
)
if scene_submission_rule is not None:
if not str(scene_submission_rule.current_version or "").strip():
scene_submission_rule.current_version = "v1.0.0"
if not str(scene_submission_rule.working_version or "").strip():
scene_submission_rule.working_version = scene_submission_rule.current_version
if not str(scene_submission_rule.published_version or "").strip():
scene_submission_rule.published_version = scene_submission_rule.current_version
scene_submission_rule.status = (
scene_submission_rule.status or AgentAssetStatus.ACTIVE.value
)
scene_submission_rule.description = (
"统一定义各报销场景的必填字段、附件类型要求和金额阈值。"
)
scene_submission_rule.config_json = {
"severity": "high",
"enabled": True,
"runtime_kind": "scene_matrix",
"rule_template_label": "系统内置场景矩阵规则",
}
self._ensure_asset_version(
scene_submission_rule,
version="v1.0.0",
content=self._scene_submission_standard_markdown(),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="首版报销场景提交标准,覆盖附件类型、必填字段和金额阈值。",
created_by="系统初始化",
)
self._ensure_asset_review(
scene_submission_rule,
version="v1.0.0",
reviewer="顾承宇",
review_status=AgentReviewStatus.APPROVED.value,
review_note="可作为报销场景统一审核标准正式执行。",
reviewed_at=datetime.now(UTC),
)
if "rule.expense.travel_risk_control_standard" not in existing_codes:
travel_policy_rule = self._create_seed_asset(
asset_type=AgentAssetType.RULE.value,
code="rule.expense.travel_risk_control_standard",
name="差旅报销风险管控制度",
description="统一定义差旅报销的行程闭环、酒店地点一致性、职级差标和风险处置口径。",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=["expense", "risk_check", "travel_policy", "travel_standard"],
owner="风控与审计部",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.1.0",
config_json={
"severity": "high",
"enabled": True,
"block_on_high_risk": True,
"warning_on_medium_risk": True,
"source_doc": "document/development/risks/travel-risk-control-standard.md",
"runtime_kind": "travel_policy",
"rule_template_key": "travel_standard_v1",
"rule_template_label": "差旅标准模板",
},
)
if travel_policy_rule is not None:
if not str(travel_policy_rule.current_version or "").strip():
travel_policy_rule.current_version = "v1.1.0"
if not str(travel_policy_rule.working_version or "").strip():
travel_policy_rule.working_version = travel_policy_rule.current_version
if not str(travel_policy_rule.published_version or "").strip():
travel_policy_rule.published_version = travel_policy_rule.current_version
travel_policy_rule.status = travel_policy_rule.status or AgentAssetStatus.ACTIVE.value
travel_policy_rule.config_json = {
"severity": "high",
"enabled": True,
"block_on_high_risk": True,
"warning_on_medium_risk": True,
"source_doc": "document/development/risks/travel-risk-control-standard.md",
"runtime_kind": "travel_policy",
"rule_template_key": "travel_standard_v1",
"rule_template_label": "差旅标准模板",
}
self._ensure_asset_version(
travel_policy_rule,
version="v1.0.0",
content=self._travel_risk_control_standard_markdown(version="v1.0.0"),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="首版差旅制度执行规则,覆盖行程闭环与基础差标校验。",
created_by="系统初始化",
)
self._ensure_asset_version(
travel_policy_rule,
version="v1.1.0",
content=self._travel_risk_control_standard_markdown(version="v1.1.0"),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="补充可执行规则块,供审核引擎直接消费差旅制度标准。",
created_by="系统初始化",
)
self._ensure_asset_review(
travel_policy_rule,
version="v1.1.0",
reviewer="顾承宇",
review_status=AgentReviewStatus.APPROVED.value,
review_note="制度口径已确认,并已补充可执行配置供审核引擎读取。",
reviewed_at=datetime.now(UTC),
)
self.sync_platform_risk_rules_from_library()
if COMPANY_TRAVEL_EXPENSE_RULE_CODE not in existing_codes:
company_travel_rule = self._create_seed_asset(
asset_type=AgentAssetType.RULE.value,
code=COMPANY_TRAVEL_EXPENSE_RULE_CODE,
name="公司差旅费报销规则",
description="通过 Excel 明细表维护差旅费报销标准、票据要求和审批口径。",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=list(COMPANY_TRAVEL_RULE_SCENARIO_JSON),
owner="财务制度管理组",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version=COMPANY_TRAVEL_RULE_VERSION,
config_json={
"severity": "medium",
"enabled": True,
"tag": "财务规则",
"detail_mode": "spreadsheet",
"scenario_category": COMPANY_TRAVEL_RULE_SCENARIO_JSON[0],
"ai_review_category": COMPANY_TRAVEL_RULE_SCENARIO_JSON[0],
"rule_template_label": "差旅报销 Excel 模板",
},
)
if COMPANY_COMMUNICATION_EXPENSE_RULE_CODE not in existing_codes:
company_communication_rule = self._create_seed_asset(
asset_type=AgentAssetType.RULE.value,
code=COMPANY_COMMUNICATION_EXPENSE_RULE_CODE,
name="公司通信费报销规则",
description="通过 Excel 明细表维护员工通信费报销标准、专项补充口径和审批要求。",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=list(COMPANY_COMMUNICATION_RULE_SCENARIO_JSON),
owner="财务制度管理组",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version=COMPANY_COMMUNICATION_RULE_VERSION,
config_json={
"severity": "medium",
"enabled": True,
"tag": "财务规则",
"detail_mode": "spreadsheet",
"scenario_category": COMPANY_COMMUNICATION_RULE_SCENARIO_JSON[0],
"ai_review_category": COMPANY_COMMUNICATION_RULE_SCENARIO_JSON[0],
"rule_template_label": "通信费报销 Excel 模板",
},
)
if company_travel_rule is not None:
company_travel_rule.scenario_json = list(COMPANY_TRAVEL_RULE_SCENARIO_JSON)
if not str(company_travel_rule.current_version or "").strip():
company_travel_rule.current_version = COMPANY_TRAVEL_RULE_VERSION
if not str(company_travel_rule.working_version or "").strip():
company_travel_rule.working_version = company_travel_rule.current_version
if not str(company_travel_rule.published_version or "").strip():
company_travel_rule.published_version = company_travel_rule.current_version
if not str(company_travel_rule.status or "").strip():
company_travel_rule.status = AgentAssetStatus.ACTIVE.value
company_travel_rule.description = (
"通过 Excel 明细表维护差旅费报销标准、票据要求和审批口径。"
)
company_travel_rule.config_json = {
**(company_travel_rule.config_json or {}),
"severity": "medium",
"enabled": True,
"tag": "财务规则",
"detail_mode": "spreadsheet",
"rule_library": FINANCE_RULES_LIBRARY,
"scenario_category": COMPANY_TRAVEL_RULE_SCENARIO_JSON[0],
"ai_review_category": COMPANY_TRAVEL_RULE_SCENARIO_JSON[0],
"rule_template_label": "差旅报销 Excel 模板",
}
company_travel_rule_meta = self._ensure_company_travel_rule_spreadsheet_seed(
company_travel_rule,
version=str(company_travel_rule.current_version or COMPANY_TRAVEL_RULE_VERSION),
actor_name="系统初始化",
)
self._ensure_asset_version(
company_travel_rule,
version=str(company_travel_rule.current_version or COMPANY_TRAVEL_RULE_VERSION),
content=AgentAssetSpreadsheetManager.build_version_markdown(
rule_name=company_travel_rule.name,
version=str(company_travel_rule.current_version or COMPANY_TRAVEL_RULE_VERSION),
metadata=company_travel_rule_meta,
),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="初始化差旅费报销 Excel 规则表。",
created_by="系统初始化",
)
if (
str(company_travel_rule.current_version or "").strip()
== COMPANY_TRAVEL_RULE_VERSION
):
self._ensure_asset_review(
company_travel_rule,
version=COMPANY_TRAVEL_RULE_VERSION,
reviewer="顾承宇",
review_status=AgentReviewStatus.APPROVED.value,
review_note="首版 Excel 规则表已确认,可作为财务规则使用。",
reviewed_at=datetime.now(UTC),
)
if company_communication_rule is not None:
company_communication_rule.scenario_json = list(
COMPANY_COMMUNICATION_RULE_SCENARIO_JSON
)
if not str(company_communication_rule.current_version or "").strip():
company_communication_rule.current_version = COMPANY_COMMUNICATION_RULE_VERSION
if not str(company_communication_rule.working_version or "").strip():
company_communication_rule.working_version = (
company_communication_rule.current_version
)
if not str(company_communication_rule.published_version or "").strip():
company_communication_rule.published_version = (
company_communication_rule.current_version
)
if not str(company_communication_rule.status or "").strip():
company_communication_rule.status = AgentAssetStatus.ACTIVE.value
company_communication_rule.description = (
"通过 Excel 明细表维护员工通信费报销标准、专项补充口径和审批要求。"
)
company_communication_rule.config_json = {
**(company_communication_rule.config_json or {}),
"severity": "medium",
"enabled": True,
"tag": "财务规则",
"detail_mode": "spreadsheet",
"rule_library": FINANCE_RULES_LIBRARY,
"scenario_category": COMPANY_COMMUNICATION_RULE_SCENARIO_JSON[0],
"ai_review_category": COMPANY_COMMUNICATION_RULE_SCENARIO_JSON[0],
"rule_template_label": "通信费报销 Excel 模板",
}
company_communication_rule_meta = (
self._ensure_company_communication_rule_spreadsheet_seed(
company_communication_rule,
version=str(
company_communication_rule.current_version
or COMPANY_COMMUNICATION_RULE_VERSION
),
actor_name="系统初始化",
)
)
self._ensure_asset_version(
company_communication_rule,
version=str(
company_communication_rule.current_version or COMPANY_COMMUNICATION_RULE_VERSION
),
content=AgentAssetSpreadsheetManager.build_version_markdown(
rule_name=company_communication_rule.name,
version=str(
company_communication_rule.current_version
or COMPANY_COMMUNICATION_RULE_VERSION
),
metadata=company_communication_rule_meta,
),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note="初始化通信费报销 Excel 规则表。",
created_by="系统初始化",
)
if (
str(company_communication_rule.current_version or "").strip()
== COMPANY_COMMUNICATION_RULE_VERSION
):
self._ensure_asset_review(
company_communication_rule,
version=COMPANY_COMMUNICATION_RULE_VERSION,
reviewer="顾承宇",
review_status=AgentReviewStatus.APPROVED.value,
review_note="首版 Excel 规则表已确认,可作为财务规则使用。",
reviewed_at=datetime.now(UTC),
)
if "skill.ar.aging_summary" not in existing_codes:
asset = self._create_seed_asset(
asset_type=AgentAssetType.SKILL.value,
code="skill.ar.aging_summary",
name="应收账龄汇总技能",
description="按客户、账龄和逾期状态汇总应收风险分布。",
domain=AgentAssetDomain.AR.value,
scenario_json=["accounts_receivable", "query", "aging_summary"],
owner="平台研发组",
reviewer="陈硕",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
config_json={"input_schema": ["customer", "aging_bucket", "status"]},
)
self._ensure_asset_version(
asset,
version="v1.0.0",
content=self._json_content(
{
"inputs": ["customer", "aging_bucket", "status"],
"outputs": ["receivable_total", "overdue_total", "customer_count"],
"dependencies": ["database.accounts_receivable"],
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化应收账龄技能快照。",
created_by="系统初始化",
)
if "mcp.ledger.snapshot_mock" not in existing_codes:
asset = self._create_seed_asset(
asset_type=AgentAssetType.MCP.value,
code="mcp.ledger.snapshot_mock",
name="总账快照 Mock 服务",
description="模拟返回应收、应付和费用汇总快照,供 Agent 查询和巡检。",
domain=AgentAssetDomain.SYSTEM.value,
scenario_json=["expense", "accounts_receivable", "accounts_payable"],
owner="平台研发组",
reviewer="周悦宁",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
config_json={"endpoint": "mock://ledger/snapshot", "timeout_ms": 1500},
)
self._ensure_asset_version(
asset,
version="v1.0.0",
content=self._json_content(
{
"service_type": "mock",
"auth_mode": "service_account",
"degrade_strategy": "return_cached_snapshot_with_warning",
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化总账快照 MCP。",
created_by="系统初始化",
)
if "task.hermes.weekly_ar_summary" not in existing_codes:
asset = self._create_seed_asset(
asset_type=AgentAssetType.TASK.value,
code="task.hermes.weekly_ar_summary",
name="Hermes 每周应收账龄汇总",
description="每周汇总逾期应收、账龄分布和客户风险变化。",
domain=AgentAssetDomain.SYSTEM.value,
scenario_json=["schedule", "accounts_receivable", "summary"],
owner="风控与审计部",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
config_json={"cron": "0 10 * * 1", "agent": AgentName.HERMES.value},
)
self._ensure_asset_version(
asset,
version="v1.0.0",
content=self._json_content(
{
"task_type": "weekly_ar_summary",
"schedule": "0 10 * * 1",
"target_agent": AgentName.HERMES.value,
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化应收账龄汇总任务。",
created_by="系统初始化",
)
if "task.hermes.rule_review_digest" not in existing_codes:
asset = self._create_seed_asset(
asset_type=AgentAssetType.TASK.value,
code="task.hermes.rule_review_digest",
name="Hermes 规则待审摘要",
description="每天汇总待审规则、待补样例和被拒规则修订建议。",
domain=AgentAssetDomain.SYSTEM.value,
scenario_json=["schedule", "rule_center", "review_digest"],
owner="风控与审计部",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
config_json={"cron": "0 18 * * *", "agent": AgentName.HERMES.value},
)
self._ensure_asset_version(
asset,
version="v1.0.0",
content=self._json_content(
{
"task_type": "rule_review_digest",
"schedule": "0 18 * * *",
"target_agent": AgentName.HERMES.value,
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化规则待审摘要任务。",
created_by="系统初始化",
)
if "task.hermes.knowledge_index_sync" not in existing_codes:
asset = self._create_seed_asset(
asset_type=AgentAssetType.TASK.value,
code="task.hermes.knowledge_index_sync",
name="Hermes ??????",
description="?????????? LightRAG ???????",
domain=AgentAssetDomain.SYSTEM.value,
scenario_json=["schedule", "knowledge", "rule_center"],
owner="财务制度管理组",
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
config_json={"cron": "0 0 * * *", "agent": AgentName.HERMES.value},
)
self._ensure_asset_version(
asset,
version="v1.0.0",
content=self._json_content(
{
"task_type": "knowledge_index_sync",
"schedule": "0 0 * * *",
"target_agent": AgentName.HERMES.value,
"folder": "报销制度",
"changed_only": True,
}
),
content_type=AgentAssetContentType.JSON.value,
change_note="初始化制度知识与规则草稿形成任务。",
created_by="系统初始化",
)

View File

@@ -0,0 +1,207 @@
from __future__ import annotations
PLATFORM_DESTINATION_LOCATION_RULE_CODE = "risk.travel.destination_receipt_location"
PLATFORM_DESTINATION_LOCATION_RULE_FILENAME = "risk.travel.destination_receipt_location.json"
DEMO_EXPENSE_CLAIM_SIGNATURES = {
(
"EXP-202605-001",
"张三",
"华南客户拜访差旅报销",
"3280.00",
"submitted",
),
(
"EXP-202605-002",
"李四",
"客户路演餐费",
"860.00",
"approved",
),
(
"EXP-202605-003",
"王五",
"市场活动会务差旅",
"3280.00",
"review",
),
}
DEMO_RECEIVABLE_SIGNATURES = {
("AR-202605-001", "客户A", "50000.00", "partial"),
("AR-202605-002", "客户B", "78000.00", "overdue"),
}
DEMO_PAYABLE_SIGNATURES = {
("AP-202605-001", "供应商A", "33000.00", "scheduled"),
("AP-202605-002", "供应商B", "96000.00", "overdue"),
}
LEGACY_RULE_CODES = (
"rule.expense.duplicate_expense_check",
"rule.expense.travel_receipt_requirements",
"rule.ap.payment_dual_review",
)
ATTACHMENT_RULE_ASSET_CODE = "rule.expense.attachment_submission_requirements"
COMPANY_TRAVEL_RULE_VERSION = "v1.0.0"
COMPANY_COMMUNICATION_RULE_VERSION = "v1.0.0"
COMPANY_TRAVEL_RULE_SCENARIO_JSON = ("差旅",)
COMPANY_COMMUNICATION_RULE_SCENARIO_JSON = ("费用科目",)
ATTACHMENT_RULE_RUNTIME_CONFIG = {
"kind": "policy_rule_draft",
"version": 1,
"template_key": "attachment_requirement_v1",
"rule_name": "报销附件与单据完整性规则",
"scenario": "attachment_policy",
"source_document_name": "报销制度 / 单据与附件要求",
"review_required": True,
"target": {
"expense_types": [
"travel",
"hotel",
"transport",
"meal",
"office",
"meeting",
"training",
"communication",
"welfare",
"other",
],
"scene_codes": ["expense", "attachment_policy", "invoice_anomaly"],
},
"attachment_requirements": {
"min_attachment_count": 1,
"items": [
{
"document_type": "vat_invoice",
"required": True,
"min_count": 1,
"description": "金额类报销原则上必须提供合法票据。",
},
{
"document_type": "receipt",
"required": False,
"min_count": 1,
"description": "特殊场景无发票时需补充收据与情况说明。",
},
{
"document_type": "flight_itinerary",
"required": False,
"min_count": 1,
"description": "差旅交通报销需提供行程单或等效凭证。",
},
{
"document_type": "hotel_invoice",
"required": False,
"min_count": 1,
"description": "住宿报销需提供酒店票据或等效住宿凭证。",
},
],
"manual_fill_required": False,
},
"missing_attachment_action": "block",
"output": {
"risk_code": "invoice_anomaly",
"action": "block",
"message": "附件或单据不完整,需补件后再提交。",
},
}

View File

@@ -0,0 +1,726 @@
from __future__ import annotations
import hashlib
import json
from datetime import UTC, date, datetime
from decimal import Decimal
from pathlib import Path
from sqlalchemy import inspect, select, text
from app.core.agent_enums import (
AgentAssetContentType,
AgentAssetDomain,
AgentAssetStatus,
AgentAssetType,
AgentName,
AgentPermissionLevel,
AgentReviewStatus,
AgentRunSource,
AgentRunStatus,
AgentToolType,
)
from app.models.agent_asset import AgentAsset, AgentAssetReview, AgentAssetVersion
from app.models.agent_run import AgentRun, AgentToolCall, SemanticParseLog
from app.models.audit_log import AuditLog
from app.models.financial_record import (
AccountsPayableRecord,
AccountsReceivableRecord,
ExpenseClaim,
ExpenseClaimItem,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import (
AgentAssetSpreadsheetManager,
COMPANY_COMMUNICATION_EXPENSE_RULE_CODE,
COMPANY_COMMUNICATION_EXPENSE_RULE_FILENAME,
COMPANY_TRAVEL_EXPENSE_RULE_CODE,
COMPANY_TRAVEL_EXPENSE_RULE_FILENAME,
FINANCE_RULES_LIBRARY,
RISK_RULES_LIBRARY,
)
from app.services.expense_rule_runtime import (
build_scene_submission_standard_markdown,
build_travel_risk_control_standard_markdown,
)
from app.services.agent_foundation_constants import (
ATTACHMENT_RULE_ASSET_CODE,
ATTACHMENT_RULE_RUNTIME_CONFIG,
COMPANY_COMMUNICATION_RULE_SCENARIO_JSON,
COMPANY_COMMUNICATION_RULE_VERSION,
COMPANY_TRAVEL_RULE_SCENARIO_JSON,
COMPANY_TRAVEL_RULE_VERSION,
DEMO_EXPENSE_CLAIM_SIGNATURES,
DEMO_PAYABLE_SIGNATURES,
DEMO_RECEIVABLE_SIGNATURES,
LEGACY_RULE_CODES,
PLATFORM_DESTINATION_LOCATION_RULE_FILENAME,
)
from app.core.logging import get_logger
logger = get_logger("app.services.agent_foundation")
class AgentFoundationFinancialSeedMixin:
def _seed_financial_records(self) -> None:
if self.db.scalar(select(ExpenseClaim.id).limit(1)) is not None:
return
claim_1 = ExpenseClaim(
claim_no="EXP-202605-001",
employee_name="张三",
department_name="财务共享中心",
project_code="PRJ-EXP-01",
expense_type="travel",
reason="华南客户拜访差旅报销",
location="深圳",
amount=Decimal("3280.00"),
currency="CNY",
invoice_count=3,
occurred_at=datetime(2026, 5, 6, 9, 0, tzinfo=UTC),
submitted_at=datetime(2026, 5, 7, 10, 20, tzinfo=UTC),
status="submitted",
approval_stage="finance_review",
risk_flags_json=["amount_over_limit"],
)
claim_1.items = [
ExpenseClaimItem(
item_date=date(2026, 5, 5),
item_type="hotel",
item_reason="客户拜访住宿",
item_location="深圳",
item_amount=Decimal("1880.00"),
invoice_id="INV-HOTEL-001",
),
ExpenseClaimItem(
item_date=date(2026, 5, 6),
item_type="transport",
item_reason="往返交通",
item_location="深圳",
item_amount=Decimal("1400.00"),
invoice_id="INV-TRANS-009",
),
]
claim_2 = ExpenseClaim(
claim_no="EXP-202605-002",
employee_name="李四",
department_name="华东销售部",
project_code="PRJ-SALES-02",
expense_type="meal",
reason="客户路演餐费",
location="上海",
amount=Decimal("860.00"),
currency="CNY",
invoice_count=1,
occurred_at=datetime(2026, 5, 8, 12, 0, tzinfo=UTC),
submitted_at=datetime(2026, 5, 8, 18, 30, tzinfo=UTC),
status="approved",
approval_stage="completed",
risk_flags_json=[],
)
claim_3 = ExpenseClaim(
claim_no="EXP-202605-003",
employee_name="王五",
department_name="市场品牌部",
project_code="PRJ-MKT-08",
expense_type="travel",
reason="市场活动会务差旅",
location="北京",
amount=Decimal("3280.00"),
currency="CNY",
invoice_count=2,
occurred_at=datetime(2026, 5, 6, 11, 30, tzinfo=UTC),
submitted_at=datetime(2026, 5, 8, 9, 10, tzinfo=UTC),
status="review",
approval_stage="risk_check",
risk_flags_json=["duplicate_expense"],
)
ar_records = [
AccountsReceivableRecord(
receivable_no="AR-202605-001",
customer_id="CUS-A",
customer_name="客户A",
contract_no="CTR-AR-1001",
invoice_no="INV-AR-9001",
amount_receivable=Decimal("120000.00"),
amount_received=Decimal("70000.00"),
amount_outstanding=Decimal("50000.00"),
currency="CNY",
posting_date=date(2026, 4, 1),
due_date=date(2026, 4, 30),
aging_days=11,
status="partial",
risk_flags_json=[],
),
AccountsReceivableRecord(
receivable_no="AR-202605-002",
customer_id="CUS-B",
customer_name="客户B",
contract_no="CTR-AR-1002",
invoice_no="INV-AR-9002",
amount_receivable=Decimal("88000.00"),
amount_received=Decimal("10000.00"),
amount_outstanding=Decimal("78000.00"),
currency="CNY",
posting_date=date(2026, 3, 15),
due_date=date(2026, 4, 15),
aging_days=26,
status="overdue",
risk_flags_json=["ar_overdue"],
),
]
ap_records = [
AccountsPayableRecord(
payable_no="AP-202605-001",
vendor_id="VEN-A",
vendor_name="供应商A",
invoice_no="INV-AP-5001",
amount_payable=Decimal("43000.00"),
amount_paid=Decimal("10000.00"),
amount_outstanding=Decimal("33000.00"),
currency="CNY",
posting_date=date(2026, 4, 20),
due_date=date(2026, 5, 12),
aging_days=0,
status="scheduled",
risk_flags_json=[],
),
AccountsPayableRecord(
payable_no="AP-202605-002",
vendor_id="VEN-B",
vendor_name="供应商B",
invoice_no="INV-AP-5002",
amount_payable=Decimal("96000.00"),
amount_paid=Decimal("0.00"),
amount_outstanding=Decimal("96000.00"),
currency="CNY",
posting_date=date(2026, 4, 10),
due_date=date(2026, 5, 5),
aging_days=6,
status="overdue",
risk_flags_json=["ap_overdue"],
),
]
self.db.add_all([claim_1, claim_2, claim_3, *ar_records, *ap_records])
def _purge_demo_financial_records(self) -> None:
demo_claims = list(self.db.scalars(select(ExpenseClaim)).all())
for claim in demo_claims:
signature = (
str(claim.claim_no or "").strip(),
str(claim.employee_name or "").strip(),
str(claim.reason or "").strip(),
f"{Decimal(claim.amount or 0):.2f}",
str(claim.status or "").strip(),
)
if signature in DEMO_EXPENSE_CLAIM_SIGNATURES:
self.db.delete(claim)
demo_receivables = list(self.db.scalars(select(AccountsReceivableRecord)).all())
for record in demo_receivables:
signature = (
str(record.receivable_no or "").strip(),
str(record.customer_name or "").strip(),
f"{Decimal(record.amount_outstanding or 0):.2f}",
str(record.status or "").strip(),
)
if signature in DEMO_RECEIVABLE_SIGNATURES:
self.db.delete(record)
demo_payables = list(self.db.scalars(select(AccountsPayableRecord)).all())
for record in demo_payables:
signature = (
str(record.payable_no or "").strip(),
str(record.vendor_name or "").strip(),
f"{Decimal(record.amount_outstanding or 0):.2f}",
str(record.status or "").strip(),
)
if signature in DEMO_PAYABLE_SIGNATURES:
self.db.delete(record)
def _seed_runs_and_logs(self) -> None:
if self.db.scalar(select(AgentRun.id).limit(1)) is not None:
return
task_asset = self.db.scalar(
select(AgentAsset).where(AgentAsset.code == "task.hermes.daily_risk_scan")
)
user_run = AgentRun(
run_id="run_user_20260511_001",
agent=AgentName.USER_AGENT.value,
source=AgentRunSource.USER_MESSAGE.value,
user_id="emp_001",
task_id=None,
ontology_json={"scenario": "expense", "intent": "query"},
route_json={"selected_agent": AgentName.USER_AGENT.value, "route_reason": "user query"},
permission_level=AgentPermissionLevel.READ.value,
status=AgentRunStatus.SUCCEEDED.value,
result_summary="已返回本周报销金额和风险摘要。",
started_at=datetime(2026, 5, 11, 8, 35, tzinfo=UTC),
finished_at=datetime(2026, 5, 11, 8, 35, 2, tzinfo=UTC),
)
hermes_run = AgentRun(
run_id="run_hermes_20260511_001",
agent=AgentName.HERMES.value,
source=AgentRunSource.SCHEDULE.value,
user_id=None,
task_id=task_asset.id if task_asset else None,
ontology_json={"scenario": "expense", "intent": "risk_check"},
route_json={
"selected_agent": AgentName.HERMES.value,
"route_reason": "scheduled risk scan",
},
permission_level=AgentPermissionLevel.READ.value,
status=AgentRunStatus.SUCCEEDED.value,
result_summary="Hermes 已生成今日风险巡检摘要。",
started_at=datetime(2026, 5, 11, 9, 0, tzinfo=UTC),
finished_at=datetime(2026, 5, 11, 9, 0, 4, tzinfo=UTC),
)
blocked_run = AgentRun(
run_id="run_user_20260511_002",
agent=AgentName.ORCHESTRATOR.value,
source=AgentRunSource.USER_MESSAGE.value,
user_id="emp_002",
task_id=None,
ontology_json={"scenario": "accounts_payable", "intent": "operate"},
route_json={
"selected_agent": AgentName.USER_AGENT.value,
"route_reason": "payment request",
},
permission_level=AgentPermissionLevel.APPROVAL_REQUIRED.value,
status=AgentRunStatus.BLOCKED.value,
result_summary="动作需要人工确认。",
error_message="直接付款属于高风险动作,已阻断自动执行。",
started_at=datetime(2026, 5, 11, 10, 5, tzinfo=UTC),
finished_at=datetime(2026, 5, 11, 10, 5, 1, tzinfo=UTC),
)
self.db.add_all([user_run, hermes_run, blocked_run])
self.db.flush()
self.db.add_all(
[
AgentToolCall(
run_id=user_run.run_id,
tool_type=AgentToolType.DATABASE.value,
tool_name="expense_claims.lookup",
request_json={"time_range": "this_week", "employee": "all"},
response_json={"claim_count": 3, "total_amount": "7420.00"},
status="succeeded",
duration_ms=48,
),
AgentToolCall(
run_id=hermes_run.run_id,
tool_type=AgentToolType.MCP.value,
tool_name="invoice.verify_mock",
request_json={"claim_no": "EXP-202605-003"},
response_json={
"warning": "external service degraded",
"fallback": "used mock response",
},
status="failed",
duration_ms=132,
error_message="mock upstream timeout",
),
AgentToolCall(
run_id=blocked_run.run_id,
tool_type=AgentToolType.RULE_ENGINE.value,
tool_name="permission.guard",
request_json={"action": "direct_payment"},
response_json={"requires_confirmation": True},
status="succeeded",
duration_ms=5,
),
SemanticParseLog(
run_id=user_run.run_id,
user_id="emp_001",
raw_query="查一下本周报销超标风险",
scenario="expense",
intent="risk_check",
entities_json=[],
time_range_json={"start_date": "2026-05-11", "end_date": "2026-05-17"},
metrics_json=["amount"],
constraints_json=[],
risk_flags_json=["amount_over_limit"],
permission_json={"level": AgentPermissionLevel.READ.value},
confidence=0.93,
),
SemanticParseLog(
run_id=blocked_run.run_id,
user_id="emp_002",
raw_query="帮我直接付款给供应商B",
scenario="accounts_payable",
intent="operate",
entities_json=[{"type": "vendor", "value": "供应商B"}],
time_range_json={},
metrics_json=["amount"],
constraints_json=[],
risk_flags_json=["ap_overdue"],
permission_json={"level": AgentPermissionLevel.APPROVAL_REQUIRED.value},
confidence=0.96,
),
]
)
if self.db.scalar(select(AuditLog.id).limit(1)) is None:
self.db.add_all(
[
AuditLog(
actor="系统初始化",
action="save_rule_markdown",
resource_type="rule",
resource_id=ATTACHMENT_RULE_ASSET_CODE,
before_json=None,
after_json={"version": "v1.0.0"},
request_id="seed-audit-001",
),
AuditLog(
actor="高嘉禾",
action="review_rule",
resource_type="rule",
resource_id=ATTACHMENT_RULE_ASSET_CODE,
before_json={"review_status": "pending"},
after_json={"review_status": "pending"},
request_id="seed-audit-002",
),
AuditLog(
actor="系统初始化",
action="activate_rule",
resource_type="rule",
resource_id="rule.expense.scene_submission_standard",
before_json={"status": "draft"},
after_json={"status": "active"},
request_id="seed-audit-003",
),
AuditLog(
actor="Hermes",
action="update_task_status",
resource_type="task",
resource_id="task.hermes.daily_risk_scan",
before_json={"status": "idle"},
after_json={"status": "succeeded"},
request_id="seed-audit-004",
),
]
)

View File

@@ -0,0 +1,202 @@
from __future__ import annotations
import hashlib
import json
from datetime import UTC, date, datetime
from decimal import Decimal
from pathlib import Path
from sqlalchemy import inspect, select, text
from app.core.agent_enums import (
AgentAssetContentType,
AgentAssetDomain,
AgentAssetStatus,
AgentAssetType,
AgentName,
AgentPermissionLevel,
AgentReviewStatus,
AgentRunSource,
AgentRunStatus,
AgentToolType,
)
from app.models.agent_asset import AgentAsset, AgentAssetReview, AgentAssetVersion
from app.models.agent_run import AgentRun, AgentToolCall, SemanticParseLog
from app.models.audit_log import AuditLog
from app.models.financial_record import (
AccountsPayableRecord,
AccountsReceivableRecord,
ExpenseClaim,
ExpenseClaimItem,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import (
AgentAssetSpreadsheetManager,
COMPANY_COMMUNICATION_EXPENSE_RULE_CODE,
COMPANY_COMMUNICATION_EXPENSE_RULE_FILENAME,
COMPANY_TRAVEL_EXPENSE_RULE_CODE,
COMPANY_TRAVEL_EXPENSE_RULE_FILENAME,
FINANCE_RULES_LIBRARY,
RISK_RULES_LIBRARY,
)
from app.services.expense_rule_runtime import (
build_scene_submission_standard_markdown,
build_travel_risk_control_standard_markdown,
)
from app.services.agent_foundation_constants import (
ATTACHMENT_RULE_ASSET_CODE,
ATTACHMENT_RULE_RUNTIME_CONFIG,
COMPANY_COMMUNICATION_RULE_SCENARIO_JSON,
COMPANY_COMMUNICATION_RULE_VERSION,
COMPANY_TRAVEL_RULE_SCENARIO_JSON,
COMPANY_TRAVEL_RULE_VERSION,
DEMO_EXPENSE_CLAIM_SIGNATURES,
DEMO_PAYABLE_SIGNATURES,
DEMO_RECEIVABLE_SIGNATURES,
LEGACY_RULE_CODES,
PLATFORM_DESTINATION_LOCATION_RULE_FILENAME,
)
from app.core.logging import get_logger
logger = get_logger("app.services.agent_foundation")
class AgentFoundationMarkdownMixin:
def _attachment_submission_requirement_markdown(
self,
*,
version_note: str,
include_review_note: bool,
) -> str:
sections = [
"# 报销附件与单据完整性规则",
"",
"## 模板信息",
"",
"- 模板键:`attachment_requirement_v1`",
"- 来源文档:报销制度 / 单据与附件要求",
"- 审核状态:待审核",
"",
"## 目标",
"",
"统一约束报销提交时的票据、附件与替代凭证要求,避免缺件、错件和无依据流转。",
"",
"## 适用范围",
"",
"适用于员工报销提交场景,重点覆盖差旅、住宿、交通、餐费、办公和其他费用的附件校验。",
"",
"## 输入字段",
"",
"- expense_type",
"- attachments",
"- invoice_count",
"- reason",
"",
"## 判断规则",
"",
"- 报销提交前至少需要 1 份有效附件。",
"- 金额类报销原则上应提供合法票据;特殊场景无发票时,必须补充收据与情况说明。",
"- 差旅交通报销需提供行程单或等效凭证;住宿报销需提供酒店票据或等效住宿凭证。",
"- 缺少必要附件时直接拦截,并提示补件后重新提交。",
"",
"## 输出",
"",
"- 风险编码:`invoice_anomaly`",
"- 默认动作:`block`",
"- 处理说明:附件或单据不完整时退回补充。",
"",
"## 来源依据",
"",
"- 报销制度对票据、附件、替代凭证和补件要求的统一约束。",
"",
"## 审核约束",
"",
"- 当前规则属于真实业务规则,但仍处于待审核状态。",
"- 上线前需由制度管理员确认收据替代、补件时限和特殊场景豁免口径。",
f"- 当前版本说明:{version_note}",
"",
"## 管理员备注",
"",
"需要结合公司正式报销制度,补充各场景附件替代口径与例外审批要求。",
]
if include_review_note:
sections.extend(["", "```expense-rule", json.dumps(ATTACHMENT_RULE_RUNTIME_CONFIG, ensure_ascii=False, indent=2), "```"])
return "\n".join(sections)
def _scene_submission_standard_markdown(self) -> str:
return self._markdown_content(build_scene_submission_standard_markdown())
def _travel_risk_control_standard_markdown(self, *, version: str = "v1.1.0") -> str:
return self._markdown_content(build_travel_risk_control_standard_markdown())
@staticmethod
def _markdown_content(content: str) -> str:
return content
@staticmethod
def _json_content(content: dict[str, object]) -> str:
return json.dumps(content, ensure_ascii=False, sort_keys=True, indent=2)

View File

@@ -0,0 +1,474 @@
from __future__ import annotations
import hashlib
import json
from datetime import UTC, date, datetime
from decimal import Decimal
from pathlib import Path
from sqlalchemy import inspect, select, text
from app.core.agent_enums import (
AgentAssetContentType,
AgentAssetDomain,
AgentAssetStatus,
AgentAssetType,
AgentName,
AgentPermissionLevel,
AgentReviewStatus,
AgentRunSource,
AgentRunStatus,
AgentToolType,
)
from app.models.agent_asset import AgentAsset, AgentAssetReview, AgentAssetVersion
from app.models.agent_run import AgentRun, AgentToolCall, SemanticParseLog
from app.models.audit_log import AuditLog
from app.models.financial_record import (
AccountsPayableRecord,
AccountsReceivableRecord,
ExpenseClaim,
ExpenseClaimItem,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import (
AgentAssetSpreadsheetManager,
COMPANY_COMMUNICATION_EXPENSE_RULE_CODE,
COMPANY_COMMUNICATION_EXPENSE_RULE_FILENAME,
COMPANY_TRAVEL_EXPENSE_RULE_CODE,
COMPANY_TRAVEL_EXPENSE_RULE_FILENAME,
FINANCE_RULES_LIBRARY,
RISK_RULES_LIBRARY,
)
from app.services.expense_rule_runtime import (
build_scene_submission_standard_markdown,
build_travel_risk_control_standard_markdown,
)
from app.services.agent_foundation_constants import (
ATTACHMENT_RULE_ASSET_CODE,
ATTACHMENT_RULE_RUNTIME_CONFIG,
COMPANY_COMMUNICATION_RULE_SCENARIO_JSON,
COMPANY_COMMUNICATION_RULE_VERSION,
COMPANY_TRAVEL_RULE_SCENARIO_JSON,
COMPANY_TRAVEL_RULE_VERSION,
DEMO_EXPENSE_CLAIM_SIGNATURES,
DEMO_PAYABLE_SIGNATURES,
DEMO_RECEIVABLE_SIGNATURES,
LEGACY_RULE_CODES,
PLATFORM_DESTINATION_LOCATION_RULE_FILENAME,
)
from app.core.logging import get_logger
logger = get_logger("app.services.agent_foundation")
class AgentFoundationRiskRuleMixin:
def _iter_platform_risk_manifests(self) -> list[tuple[str, dict[str, object]]]:
manager = AgentAssetRuleLibraryManager()
manifests: list[tuple[str, dict[str, object]]] = []
for file_name in sorted(manager.list_rule_library_json_files(library=RISK_RULES_LIBRARY)):
payload = manager.read_rule_library_json(library=RISK_RULES_LIBRARY, file_name=file_name)
if payload.get("enabled") is False:
continue
manifests.append((file_name, payload))
return manifests
@staticmethod
def _resolve_platform_risk_category(manifest: dict[str, object]) -> str:
explicit = str(manifest.get("risk_category") or "").strip()
if explicit:
return explicit
rule_code = str(manifest.get("rule_code") or "").strip().lower()
applies_to = manifest.get("applies_to") if isinstance(manifest.get("applies_to"), dict) else {}
domains = {str(item or "").strip().lower() for item in applies_to.get("domains") or []}
expense_types = {
str(item or "").strip().lower() for item in applies_to.get("expense_types") or []
}
if rule_code.startswith("risk.invoice."):
return "发票"
if "meal" in domains or "entertainment" in expense_types:
return "餐饮招待"
if "transport" in expense_types or "consecutive_transport" in rule_code:
return "交通出行"
if "office" in expense_types:
return "办公物料"
if "travel" in domains or rule_code.startswith("risk.travel."):
return "差旅"
if rule_code.startswith("risk.expense."):
return "费用科目"
return "通用"
def _platform_risk_scenario_json(self, manifest: dict[str, object]) -> list[str]:
category = self._resolve_platform_risk_category(manifest)
return [category] if category else ["通用"]
def _platform_risk_config_json(self, file_name: str, manifest: dict[str, object]) -> dict[str, object]:
outcomes = manifest.get("outcomes") if isinstance(manifest.get("outcomes"), dict) else {}
fail_outcome = outcomes.get("fail") if isinstance(outcomes.get("fail"), dict) else {}
risk_category = self._resolve_platform_risk_category(manifest)
return {
"severity": str(fail_outcome.get("severity") or "medium"),
"enabled": True,
"tag": "风险规则",
"detail_mode": "json_risk",
"risk_category": risk_category,
"rule_library": RISK_RULES_LIBRARY,
"rule_document": {
"file_name": file_name,
"storage_key": f"rules/{RISK_RULES_LIBRARY}/{file_name}",
},
"ontology_signal": str(manifest.get("ontology_signal") or "").strip(),
"evaluator": str(manifest.get("evaluator") or "").strip(),
"source_ref": (
(manifest.get("metadata") or {}).get("source_ref")
if isinstance(manifest.get("metadata"), dict)
else ""
),
}
def _build_platform_risk_seed_assets(self) -> list[AgentAsset]:
assets: list[AgentAsset] = []
for file_name, manifest in self._iter_platform_risk_manifests():
rule_code = str(manifest.get("rule_code") or "").strip()
if not rule_code:
continue
metadata = manifest.get("metadata") if isinstance(manifest.get("metadata"), dict) else {}
source_ref = str(metadata.get("source_ref") or "").strip()
rule_description = str(manifest.get("description") or "").strip()
assets.append(
AgentAsset(
asset_type=AgentAssetType.RULE.value,
code=rule_code,
name=str(manifest.get("name") or rule_code),
description=rule_description
or f"平台通用风险规则:{source_ref or manifest.get('name') or rule_code}",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=self._platform_risk_scenario_json(manifest),
owner=str(metadata.get("owner") or "风控与审计部"),
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
published_version="v1.0.0",
working_version="v1.0.0",
config_json=self._platform_risk_config_json(file_name, manifest),
)
)
return assets
def sync_platform_risk_rules_from_library(self) -> int:
existing_codes = set(self.db.scalars(select(AgentAsset.code)).all())
before_count = len(existing_codes)
self._ensure_platform_risk_rules_from_library(existing_codes)
self.db.flush()
after_codes = set(self.db.scalars(select(AgentAsset.code)).all())
synced = max(len(after_codes) - before_count, 0)
manifest_count = len(self._iter_platform_risk_manifests())
logger.info(
"Platform risk rules synced from library",
extra={"manifest_count": manifest_count, "created_count": synced, "total": len(after_codes)},
)
return manifest_count
def _ensure_platform_risk_rules_from_library(self, existing_codes: set[str]) -> None:
for file_name, manifest in self._iter_platform_risk_manifests():
rule_code = str(manifest.get("rule_code") or "").strip()
if not rule_code:
continue
metadata = manifest.get("metadata") if isinstance(manifest.get("metadata"), dict) else {}
source_ref = str(metadata.get("source_ref") or "").strip()
rule_description = str(manifest.get("description") or "").strip()
config_json = self._platform_risk_config_json(file_name, manifest)
scenario_json = self._platform_risk_scenario_json(manifest)
asset = self.db.scalar(select(AgentAsset).where(AgentAsset.code == rule_code))
if asset is None and rule_code not in existing_codes:
asset = self._create_seed_asset(
asset_type=AgentAssetType.RULE.value,
code=rule_code,
name=str(manifest.get("name") or rule_code),
description=rule_description
or f"平台通用风险规则:{source_ref or manifest.get('name') or rule_code}",
domain=AgentAssetDomain.EXPENSE.value,
scenario_json=scenario_json,
owner=str(metadata.get("owner") or "风控与审计部"),
reviewer="顾承宇",
status=AgentAssetStatus.ACTIVE.value,
current_version="v1.0.0",
config_json=config_json,
)
if asset is None:
continue
if not str(asset.current_version or "").strip():
asset.current_version = "v1.0.0"
if not str(asset.working_version or "").strip():
asset.working_version = asset.current_version
if not str(asset.published_version or "").strip():
asset.published_version = asset.current_version
asset.status = asset.status or AgentAssetStatus.ACTIVE.value
asset.name = str(manifest.get("name") or asset.name or rule_code)
if rule_description:
asset.description = rule_description
asset.config_json = config_json
asset.scenario_json = scenario_json
self._ensure_asset_version(
asset,
version="v1.0.0",
content=self._platform_risk_rule_markdown(asset, manifest=manifest, file_name=file_name),
content_type=AgentAssetContentType.MARKDOWN.value,
change_note=f"平台通用风险规则:{asset.name}",
created_by="系统初始化",
)
self._ensure_asset_review(
asset,
version="v1.0.0",
reviewer="顾承宇",
review_status=AgentReviewStatus.APPROVED.value,
review_note="平台内置风险规则,供提交验审与风险问答共用。",
reviewed_at=datetime.now(UTC),
)
@staticmethod
def _platform_risk_rule_markdown(
asset: AgentAsset,
*,
manifest: dict[str, object] | None = None,
file_name: str = "",
) -> str:
config = asset.config_json if isinstance(asset.config_json, dict) else {}
rule_document = config.get("rule_document") if isinstance(config.get("rule_document"), dict) else {}
resolved_file_name = file_name or str(rule_document.get("file_name") or "").strip()
evaluator = str(config.get("evaluator") or (manifest or {}).get("evaluator") or "").strip()
ontology_signal = str(config.get("ontology_signal") or (manifest or {}).get("ontology_signal") or "").strip()
source_ref = str(config.get("source_ref") or "").strip()
if not source_ref and isinstance(manifest, dict):
metadata = manifest.get("metadata") if isinstance(manifest.get("metadata"), dict) else {}
source_ref = str(metadata.get("source_ref") or "").strip()
lines = [
f"# {asset.name}",
"",
"## 规则类型",
"",
"- 平台内置通用风险规则(`json_risk`",
]
if evaluator:
lines.append(f"- 检查器:`{evaluator}`")
if ontology_signal:
lines.append(f"- 本体信号:`{ontology_signal}`")
if source_ref:
lines.extend(["", "## 来源", "", f"- {source_ref}"])
if resolved_file_name:
lines.extend(
[
"",
"## 配置文件",
"",
f"- `rules/{RISK_RULES_LIBRARY}/{resolved_file_name}`",
]
)
return "\n".join(lines)
@staticmethod
def _platform_destination_location_risk_markdown() -> str:
return AgentFoundationRiskRuleMixin._platform_risk_rule_markdown(
AgentAsset(name="申报地点与票据地点一致", config_json={"evaluator": "location_consistency"}),
manifest={
"evaluator": "location_consistency",
"ontology_signal": "location_mismatch",
"metadata": {"source_ref": "常用risk.txt / 一、出差类 / 行程不符"},
},
file_name=PLATFORM_DESTINATION_LOCATION_RULE_FILENAME,
)

View File

@@ -0,0 +1,400 @@
from __future__ import annotations
import hashlib
import json
from datetime import UTC, date, datetime
from decimal import Decimal
from pathlib import Path
from sqlalchemy import inspect, select, text
from app.core.agent_enums import (
AgentAssetContentType,
AgentAssetDomain,
AgentAssetStatus,
AgentAssetType,
AgentName,
AgentPermissionLevel,
AgentReviewStatus,
AgentRunSource,
AgentRunStatus,
AgentToolType,
)
from app.models.agent_asset import AgentAsset, AgentAssetReview, AgentAssetVersion
from app.models.agent_run import AgentRun, AgentToolCall, SemanticParseLog
from app.models.audit_log import AuditLog
from app.models.financial_record import (
AccountsPayableRecord,
AccountsReceivableRecord,
ExpenseClaim,
ExpenseClaimItem,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import (
AgentAssetSpreadsheetManager,
COMPANY_COMMUNICATION_EXPENSE_RULE_CODE,
COMPANY_COMMUNICATION_EXPENSE_RULE_FILENAME,
COMPANY_TRAVEL_EXPENSE_RULE_CODE,
COMPANY_TRAVEL_EXPENSE_RULE_FILENAME,
FINANCE_RULES_LIBRARY,
RISK_RULES_LIBRARY,
)
from app.services.expense_rule_runtime import (
build_scene_submission_standard_markdown,
build_travel_risk_control_standard_markdown,
)
from app.services.agent_foundation_constants import (
ATTACHMENT_RULE_ASSET_CODE,
ATTACHMENT_RULE_RUNTIME_CONFIG,
COMPANY_COMMUNICATION_RULE_SCENARIO_JSON,
COMPANY_COMMUNICATION_RULE_VERSION,
COMPANY_TRAVEL_RULE_SCENARIO_JSON,
COMPANY_TRAVEL_RULE_VERSION,
DEMO_EXPENSE_CLAIM_SIGNATURES,
DEMO_PAYABLE_SIGNATURES,
DEMO_RECEIVABLE_SIGNATURES,
LEGACY_RULE_CODES,
PLATFORM_DESTINATION_LOCATION_RULE_FILENAME,
)
from app.core.logging import get_logger
logger = get_logger("app.services.agent_foundation")
class AgentFoundationSpreadsheetMixin:
def _ensure_company_travel_rule_spreadsheet_seed(
self,
asset: AgentAsset,
*,
version: str,
actor_name: str,
):
manager = AgentAssetSpreadsheetManager()
manager.ensure_rule_library_dirs()
live_document = manager.store_rule_library_spreadsheet(
library=FINANCE_RULES_LIBRARY,
file_name=COMPANY_TRAVEL_EXPENSE_RULE_FILENAME,
content=self._read_or_build_company_travel_rule_file(manager),
actor_name=actor_name,
source="rule-library",
)
existing_document = (
asset.config_json.get("rule_document")
if isinstance(asset.config_json, dict)
else None
)
storage_key = (
str(existing_document.get("storage_key") or "").strip()
if isinstance(existing_document, dict)
else ""
)
if storage_key:
try:
existing_path = manager.resolve_storage_path(storage_key)
except FileNotFoundError:
existing_path = None
if existing_path is not None and existing_path.exists():
asset.config_json = {
**(asset.config_json or {}),
"detail_mode": "spreadsheet",
"tag": "财务规则",
"rule_library": FINANCE_RULES_LIBRARY,
"rule_document": {
**AgentAssetSpreadsheetManager.build_rule_document_config(
live_document,
asset_version=version,
),
"storage_key": live_document.storage_key,
},
}
return live_document
asset.config_json = {
**(asset.config_json or {}),
"detail_mode": "spreadsheet",
"tag": "财务规则",
"rule_library": FINANCE_RULES_LIBRARY,
"rule_document": {
**AgentAssetSpreadsheetManager.build_rule_document_config(
live_document,
asset_version=version,
),
"storage_key": live_document.storage_key,
},
}
return live_document
def _ensure_company_communication_rule_spreadsheet_seed(
self,
asset: AgentAsset,
*,
version: str,
actor_name: str,
):
return self._ensure_finance_rule_spreadsheet_seed(
asset,
version=version,
actor_name=actor_name,
file_name=COMPANY_COMMUNICATION_EXPENSE_RULE_FILENAME,
fallback_sheet_name="通信费报销规则",
)
@staticmethod
def _read_or_build_company_travel_rule_file(
manager: AgentAssetSpreadsheetManager,
) -> bytes:
live_key = (
Path("rules")
/ FINANCE_RULES_LIBRARY
/ COMPANY_TRAVEL_EXPENSE_RULE_FILENAME
).as_posix()
live_path = manager.resolve_storage_path(live_key)
if live_path.exists():
return live_path.read_bytes()
return AgentAssetSpreadsheetManager.build_blank_rule_workbook("差旅费报销规则")
def _ensure_finance_rule_spreadsheet_seed(
self,
asset: AgentAsset,
*,
version: str,
actor_name: str,
file_name: str,
fallback_sheet_name: str,
):
manager = AgentAssetSpreadsheetManager()
manager.ensure_rule_library_dirs()
live_document = manager.store_rule_library_spreadsheet(
library=FINANCE_RULES_LIBRARY,
file_name=file_name,
content=self._read_or_build_finance_rule_file(
manager,
file_name=file_name,
fallback_sheet_name=fallback_sheet_name,
),
actor_name=actor_name,
source="rule-library",
)
existing_document = (
asset.config_json.get("rule_document")
if isinstance(asset.config_json, dict)
else None
)
storage_key = (
str(existing_document.get("storage_key") or "").strip()
if isinstance(existing_document, dict)
else ""
)
if storage_key:
try:
existing_path = manager.resolve_storage_path(storage_key)
except FileNotFoundError:
existing_path = None
if existing_path is not None and existing_path.exists():
asset.config_json = {
**(asset.config_json or {}),
"detail_mode": "spreadsheet",
"tag": "财务规则",
"rule_library": FINANCE_RULES_LIBRARY,
"rule_document": {
**AgentAssetSpreadsheetManager.build_rule_document_config(
live_document,
asset_version=version,
),
"storage_key": live_document.storage_key,
},
}
return live_document
asset.config_json = {
**(asset.config_json or {}),
"detail_mode": "spreadsheet",
"tag": "财务规则",
"rule_library": FINANCE_RULES_LIBRARY,
"rule_document": {
**AgentAssetSpreadsheetManager.build_rule_document_config(
live_document,
asset_version=version,
),
"storage_key": live_document.storage_key,
},
}
return live_document
@staticmethod
def _read_or_build_finance_rule_file(
manager: AgentAssetSpreadsheetManager,
*,
file_name: str,
fallback_sheet_name: str,
) -> bytes:
live_key = (
Path("rules")
/ FINANCE_RULES_LIBRARY
/ file_name
).as_posix()
live_path = manager.resolve_storage_path(live_key)
if live_path.exists():
return live_path.read_bytes()
return AgentAssetSpreadsheetManager.build_blank_rule_workbook(fallback_sheet_name)

View File

@@ -1,14 +1,17 @@
from __future__ import annotations
from dataclasses import dataclass
from datetime import UTC, datetime, timedelta
from typing import Any
from sqlalchemy import func, select
from sqlalchemy import func, or_, select
from sqlalchemy.orm import Session, selectinload
from app.core.config import get_settings
from app.core.logging import get_logger
from app.core.security import verify_password
from app.models.employee import Employee
from app.models.financial_record import ExpenseClaim
from app.schemas.auth import AuthUserRead, LoginRequest, LoginResponse
from app.services.employee import EmployeeService
from app.services.employee_seed import ROLE_DISPLAY_ORDER
@@ -31,8 +34,15 @@ class AuthenticatedUser:
username: str
name: str
role: str
department: str
position: str
grade: str
employee_no: str
manager_name: str
location: str
cost_center: str
finance_owner_name: str
risk_profile: dict[str, Any]
role_codes: list[str]
email: str
avatar: str
@@ -78,8 +88,15 @@ class AuthService:
username=admin_username or admin_email,
name=display_name,
role="管理员",
department="",
position="系统管理员",
grade="",
employee_no="",
manager_name="",
location="",
cost_center="",
finance_owner_name="",
risk_profile={},
role_codes=["manager"],
email=admin_email or f"{admin_username}@local",
avatar=display_name[:1].upper(),
@@ -94,7 +111,11 @@ class AuthService:
stmt = (
select(Employee)
.options(selectinload(Employee.roles))
.options(
selectinload(Employee.organization_unit),
selectinload(Employee.manager),
selectinload(Employee.roles),
)
.where(func.lower(Employee.email) == identifier.lower())
)
employee = self.db.execute(stmt).scalars().first()
@@ -115,27 +136,91 @@ class AuthService:
)
role_codes = [role.role_code for role in sorted_roles]
primary_role_code = role_codes[0] if role_codes else "user"
department = employee.organization_unit.name if employee.organization_unit is not None else ""
manager_name = self._resolve_manager_name(employee)
return AuthenticatedUser(
username=employee.email,
name=employee.name,
role=ROLE_LABELS.get(primary_role_code, "使用者"),
department=department,
position=employee.position,
grade=employee.grade,
employee_no=employee.employee_no,
manager_name=manager_name,
location=employee.location or "",
cost_center=employee.cost_center or "",
finance_owner_name=employee.finance_owner_name or "",
risk_profile=self._build_risk_profile(employee),
role_codes=role_codes or ["user"],
email=employee.email,
avatar=(employee.name or "?")[:1].upper(),
is_admin=False,
)
@staticmethod
def _resolve_manager_name(employee: Employee) -> str:
if employee.manager is not None and employee.manager.name:
return str(employee.manager.name).strip()
if employee.organization_unit is not None and employee.organization_unit.manager_name:
return str(employee.organization_unit.manager_name).strip()
return ""
def _build_risk_profile(self, employee: Employee) -> dict[str, Any]:
since = datetime.now(UTC) - timedelta(days=90)
identity_values = [
str(employee.name or "").strip(),
str(employee.email or "").strip(),
str(employee.employee_no or "").strip(),
]
name_candidates = [item for item in dict.fromkeys(identity_values) if item]
conditions = [ExpenseClaim.employee_id == employee.id]
if name_candidates:
conditions.append(ExpenseClaim.employee_name.in_(name_candidates))
stmt = (
select(ExpenseClaim)
.where(or_(*conditions), ExpenseClaim.occurred_at >= since)
.order_by(ExpenseClaim.occurred_at.desc())
.limit(30)
)
claims = list(self.db.scalars(stmt).all())
recent_risk_flags: list[str] = []
for claim in claims:
for flag in claim.risk_flags_json or []:
normalized = str(flag or "").strip()
if normalized and normalized not in recent_risk_flags:
recent_risk_flags.append(normalized)
if len(recent_risk_flags) >= 6:
break
if len(recent_risk_flags) >= 6:
break
return {
"windowDays": 90,
"totalClaimCount": len(claims),
"riskyClaimCount": sum(1 for claim in claims if claim.risk_flags_json),
"draftClaimCount": sum(1 for claim in claims if claim.status == "draft"),
"recentRiskFlags": recent_risk_flags,
"lastClaimAt": claims[0].occurred_at.isoformat() if claims and claims[0].occurred_at else "",
}
@staticmethod
def _serialize_user(user: AuthenticatedUser) -> AuthUserRead:
return AuthUserRead(
username=user.username,
name=user.name,
role=user.role,
department=user.department,
departmentName=user.department,
position=user.position,
grade=user.grade,
employeeNo=user.employee_no,
managerName=user.manager_name,
location=user.location,
costCenter=user.cost_center,
financeOwnerName=user.finance_owner_name,
riskProfile=user.risk_profile,
roleCodes=user.role_codes,
email=user.email,
avatar=user.avatar,

View File

@@ -2,188 +2,31 @@ from __future__ import annotations
import json
import re
from dataclasses import dataclass
from decimal import Decimal, InvalidOperation
from typing import Any
from pydantic import BaseModel, Field, ValidationError
from pydantic import ValidationError
from sqlalchemy.orm import Session
@dataclass(frozen=True, slots=True)
class DocumentField:
key: str
label: str
value: str
@dataclass(frozen=True, slots=True)
class DocumentInsight:
document_type: str
document_type_label: str
scene_code: str
scene_label: str
expense_type: str
fields: tuple[DocumentField, ...] = ()
classification_source: str = "rule"
classification_confidence: float = 0.0
evidence: tuple[str, ...] = ()
warnings: tuple[str, ...] = ()
@dataclass(frozen=True, slots=True)
class DocumentRule:
document_type: str
document_type_label: str
scene_code: str
scene_label: str
expense_type: str
keywords: tuple[str, ...]
score_bias: float = 0.0
@dataclass(frozen=True, slots=True)
class RuleMatch:
rule: DocumentRule | None
confidence: float
evidence: tuple[str, ...]
score: float
class LlmDocumentClassification(BaseModel):
document_type: str = Field(default="other")
scene_code: str = Field(default="other")
scene_label: str = Field(default="其他票据")
expense_type: str = Field(default="other")
confidence: float = Field(default=0.0, ge=0.0, le=1.0)
evidence: list[str] = Field(default_factory=list)
fields: list[DocumentField] = Field(default_factory=list)
DEFAULT_RULE = DocumentRule(
document_type="other",
document_type_label="其他单据",
scene_code="other",
scene_label="其他票据",
expense_type="other",
keywords=(),
score_bias=0.0,
from app.services.document_intelligence_rules import DEFAULT_RULE, DOCUMENT_RULES, DOCUMENT_TYPE_RULE_MAP, SUPPORTED_DOCUMENT_TYPES
from app.services.document_intelligence_types import (
DocumentField,
DocumentInsight,
LlmDocumentClassification,
RuleMatch,
)
DOCUMENT_RULES: tuple[DocumentRule, ...] = (
DocumentRule(
document_type="flight_itinerary",
document_type_label="机票/航班行程单",
scene_code="travel",
scene_label="差旅票据",
expense_type="travel",
keywords=("电子行程单", "航班号", "航班", "机票", "登机", "航空", "客票"),
score_bias=0.34,
),
DocumentRule(
document_type="train_ticket",
document_type_label="火车/高铁票",
scene_code="travel",
scene_label="差旅票据",
expense_type="travel",
keywords=("高铁", "火车", "动车", "铁路", "车次", "检票", "二等座", "一等座"),
score_bias=0.32,
),
DocumentRule(
document_type="hotel_invoice",
document_type_label="酒店住宿票据",
scene_code="hotel",
scene_label="住宿票据",
expense_type="hotel",
keywords=("住宿", "房费", "客房", "入住", "离店", "酒店", "宾馆", "间夜"),
score_bias=0.16,
),
DocumentRule(
document_type="taxi_receipt",
document_type_label="出租车/网约车票据",
scene_code="transport",
scene_label="交通票据",
expense_type="transport",
keywords=("滴滴出行", "滴滴", "网约车", "出租车", "打车", "快车", "专车", "订单号", "上车", "下车", "起点", "终点", "里程", "司机"),
score_bias=0.38,
),
DocumentRule(
document_type="parking_toll_receipt",
document_type_label="停车/通行费票据",
scene_code="transport",
scene_label="交通票据",
expense_type="transport",
keywords=("停车费", "通行费", "过路费", "收费站", "停车场", "停车"),
score_bias=0.28,
),
DocumentRule(
document_type="meal_receipt",
document_type_label="餐饮票据",
scene_code="meal",
scene_label="餐饮票据",
expense_type="meal",
keywords=("餐饮", "餐费", "用餐", "饭店", "酒楼", "餐厅", "食品", "外卖", "咖啡"),
score_bias=0.14,
),
DocumentRule(
document_type="office_invoice",
document_type_label="办公用品票据",
scene_code="office",
scene_label="办公用品票据",
expense_type="office",
keywords=("办公用品", "文具", "耗材", "打印纸", "墨盒", "硒鼓", "键盘", "鼠标"),
score_bias=0.14,
),
DocumentRule(
document_type="meeting_invoice",
document_type_label="会议/会务票据",
scene_code="meeting",
scene_label="会务票据",
expense_type="meeting",
keywords=("会议", "会务", "会展", "论坛", "会议室", "会场"),
score_bias=0.12,
),
DocumentRule(
document_type="training_invoice",
document_type_label="培训票据",
scene_code="training",
scene_label="培训票据",
expense_type="training",
keywords=("培训", "课程", "讲师", "教材", "学费", "认证"),
score_bias=0.12,
),
DocumentRule(
document_type="vat_invoice",
document_type_label="增值税发票",
scene_code="other",
scene_label="通用发票",
expense_type="other",
keywords=("发票代码", "发票号码", "价税合计", "增值税", "电子发票"),
score_bias=-0.08,
),
DocumentRule(
document_type="receipt",
document_type_label="一般收据/凭证",
scene_code="other",
scene_label="其他票据",
expense_type="other",
keywords=("收据", "凭证", "票据"),
score_bias=-0.18,
),
)
DOCUMENT_TYPE_RULE_MAP = {rule.document_type: rule for rule in DOCUMENT_RULES}
SUPPORTED_DOCUMENT_TYPES = tuple(DOCUMENT_TYPE_RULE_MAP.keys()) + ("other",)
AMOUNT_PATTERNS = (
re.compile(
r"(?:价税合计|合计金额|费用合计|订单(?:总)?金额|支付(?:金额)?|实付(?:金额)?|实收(?:金额)?|总(?:额|计|价)|票价|金额|车费|消费金额)"
r"[:\s¥¥人民币]*([0-9]+(?:[.,][0-9]{1,2})?)"
r"(?:价税合计|合计金额|费用合计|总费用|费用总计|订单(?:总)?金额|支付(?:金额)?|实付(?:金额)?|实收(?:金额)?|总(?:额|计|价)|票价|金额|车费|消费金额|房费|住宿费)"
r"[:\s¥¥人民币为是]*([0-9]+(?:[.,][0-9]{1,2})?)"
),
re.compile(r"[¥¥]\s*([0-9]+(?:[.,][0-9]{1,2})?)"),
re.compile(r"([0-9]+(?:[.,][0-9]{1,2})?)\s*元"),
)
DATE_PATTERN = re.compile(r"((?:20\d{2}|19\d{2})[-/年.](?:1[0-2]|0?[1-9])[-/月.](?:3[01]|[12]\d|0?[1-9])日?)")
TIME_PATTERN = re.compile(r"(?<!\d)([01]?\d|2[0-3])[:]([0-5]\d)(?!\d)")
INVOICE_NUMBER_PATTERN = re.compile(r"(?:发票号码|票号|单号|订单号)[:\s]*([A-Za-z0-9-]{6,24})")
INVOICE_CODE_PATTERN = re.compile(r"(?:发票代码)[:\s]*([A-Za-z0-9-]{6,24})")
TRIP_NO_PATTERN = re.compile(r"(?:车次|航班(?:号)?)[:\s]*([A-Za-z0-9]{2,12})")
@@ -192,6 +35,58 @@ MERCHANT_PATTERNS = (
re.compile(r"(?:销售方(?:名称)?|商户(?:名称)?|开票方(?:名称)?|收款方(?:名称)?)[:\s]*([A-Za-z0-9\u4e00-\u9fa5()·&\\-]{2,40})"),
re.compile(r"([A-Za-z0-9\u4e00-\u9fa5()·&\\-]{2,40}(?:酒店|宾馆|饭店|酒楼|餐厅|航空|铁路|滴滴出行|停车场|服务区))"),
)
DATE_FIELD_KEYS = {
"date",
"time",
"issued_at",
"invoice_date",
"issue_date",
"travel_date",
"trip_date",
"journey_date",
"departure_date",
"departure_time",
"depart_date",
"depart_time",
"boarding_date",
"boarding_time",
"train_date",
"train_time",
"train_departure_time",
"scheduled_departure_time",
"flight_date",
"flight_time",
"ride_date",
"ride_time",
"pickup_time",
"start_time",
}
TRIP_DATE_LABEL_BY_DOCUMENT_TYPE = {
"train_ticket": "列车出发时间",
"flight_itinerary": "起飞日期",
"taxi_receipt": "乘车时间",
"transport_receipt": "乘车时间",
"parking_toll_receipt": "通行日期",
}
TRIP_DATE_FIELD_LABEL_TOKENS = (
"日期",
"时间",
"开票日期",
"发生时间",
"行程日期",
"出发日期",
"出发时间",
"列车出发时间",
"发车日期",
"发车时间",
"开车时间",
"乘车日期",
"乘车时间",
"起飞日期",
"航班日期",
"上车时间",
"用车时间",
)
class DocumentIntelligenceService:
@@ -212,7 +107,10 @@ class DocumentIntelligenceService:
compact = re.sub(r"\s+", "", raw_text).lower()
rule_match = _match_document_rule(compact)
base_rule = rule_match.rule or DEFAULT_RULE
fields = tuple(_extract_document_fields(raw_text))
fields = _apply_document_type_field_labels(
tuple(_extract_document_fields(raw_text, base_rule.document_type)),
base_rule.document_type,
)
rule_insight = DocumentInsight(
document_type=base_rule.document_type,
document_type_label=base_rule.document_type_label,
@@ -275,7 +173,10 @@ class DocumentIntelligenceService:
for item in parsed.evidence
if str(item or "").strip()
][:4]
normalized_fields = _normalize_llm_document_fields(parsed.fields)
normalized_fields = _apply_document_type_field_labels(
tuple(_normalize_llm_document_fields(parsed.fields)),
normalized_type,
)
return LlmDocumentClassification(
document_type=normalized_type,
@@ -312,7 +213,10 @@ class DocumentIntelligenceService:
scene_code=rule_insight.scene_code,
scene_label=rule_insight.scene_label,
expense_type=rule_insight.expense_type,
fields=merged_fields,
fields=_apply_document_type_field_labels(
merged_fields,
rule_insight.document_type,
),
classification_source=rule_insight.classification_source,
classification_confidence=rule_insight.classification_confidence,
evidence=rule_insight.evidence,
@@ -337,7 +241,10 @@ class DocumentIntelligenceService:
scene_code=rule_insight.scene_code,
scene_label=rule_insight.scene_label,
expense_type=rule_insight.expense_type,
fields=merged_fields,
fields=_apply_document_type_field_labels(
merged_fields,
rule_insight.document_type,
),
classification_source=rule_insight.classification_source,
classification_confidence=rule_insight.classification_confidence,
evidence=rule_insight.evidence,
@@ -354,7 +261,7 @@ class DocumentIntelligenceService:
scene_code=rule.scene_code if parsed.scene_code == "other" else parsed.scene_code,
scene_label=rule.scene_label if parsed.scene_label == "其他票据" else parsed.scene_label,
expense_type=rule.expense_type if parsed.expense_type == "other" else parsed.expense_type,
fields=merged_fields,
fields=_apply_document_type_field_labels(merged_fields, rule.document_type),
classification_source=source,
classification_confidence=max(parsed.confidence, rule_insight.classification_confidence),
evidence=tuple(parsed.evidence or rule_insight.evidence),
@@ -464,8 +371,49 @@ def _normalize_llm_document_field_key(key: str, label: str) -> str:
token in compact_label for token in ("金额", "价税合计", "合计", "总额", "总计", "票价", "支付金额", "实付金额", "实收金额")
):
return "amount"
if compact_key in {"date", "time", "issued_at", "invoice_date"} or any(
token in compact_label for token in ("日期", "时间", "开票日期", "发生时间")
if compact_key in {
"travel_date",
"trip_date",
"journey_date",
"departure_date",
"departure_time",
"depart_date",
"depart_time",
"boarding_date",
"boarding_time",
"train_date",
"train_time",
"train_departure_time",
"scheduled_departure_time",
"flight_date",
"flight_time",
"ride_date",
"ride_time",
"pickup_time",
"start_time",
} or any(
token in compact_label
for token in (
"行程日期",
"出发日期",
"出发时间",
"列车出发时间",
"发车日期",
"发车时间",
"开车时间",
"乘车日期",
"乘车时间",
"起飞日期",
"航班日期",
"上车时间",
"用车时间",
)
):
return "trip_date"
if compact_key in {"issued_at", "issue_date", "invoice_date"} or "开票日期" in compact_label:
return "invoice_date"
if compact_key in {"date", "time"} or any(
token in compact_label for token in ("日期", "时间", "发生时间")
):
return "date"
if compact_key in {"merchant_name", "merchant", "seller_name", "vendor_name"} or any(
@@ -504,7 +452,7 @@ def _normalize_llm_document_field_value(key: str, value: str) -> str:
return ""
text_value = format(candidate.quantize(Decimal("0.01")), "f").rstrip("0").rstrip(".")
return f"{text_value}"
if key == "date":
if key in {"date", "time", "invoice_date", "trip_date"}:
return _extract_date(raw_value) or _clean_field_value(raw_value)
if key == "route":
return _extract_route(raw_value) or _clean_field_value(
@@ -517,6 +465,8 @@ def _llm_document_field_label(key: str) -> str:
return {
"amount": "金额",
"date": "日期",
"invoice_date": "开票日期",
"trip_date": "行程日期",
"merchant_name": "商户",
"invoice_number": "票据号码",
"invoice_code": "发票代码",
@@ -525,6 +475,35 @@ def _llm_document_field_label(key: str) -> str:
}.get(key, key)
def _apply_document_type_field_labels(
fields: tuple[DocumentField, ...],
document_type: str,
) -> tuple[DocumentField, ...]:
date_label = TRIP_DATE_LABEL_BY_DOCUMENT_TYPE.get(
str(document_type or "").strip().lower()
)
if not date_label:
return fields
adjusted: list[DocumentField] = []
for field in fields:
compact_key = str(field.key or "").strip().lower()
compact_label = str(field.label or "").replace(" ", "")
if compact_key in {"issued_at", "issue_date", "invoice_date"} or any(
token in compact_label for token in ("开票日期", "发票日期")
):
adjusted.append(field)
continue
is_date_field = compact_key in DATE_FIELD_KEYS or any(
token in compact_label for token in TRIP_DATE_FIELD_LABEL_TOKENS
)
if is_date_field:
adjusted.append(DocumentField(key=field.key, label=date_label, value=field.value))
continue
adjusted.append(field)
return tuple(adjusted)
def _merge_document_fields(
base_fields: tuple[DocumentField, ...],
override_fields: tuple[DocumentField, ...],
@@ -540,13 +519,13 @@ def _merge_document_fields(
return tuple(merged[key] for key in order if key in merged)
def _extract_document_fields(text: str) -> list[DocumentField]:
def _extract_document_fields(text: str, document_type: str = "") -> list[DocumentField]:
fields: list[DocumentField] = []
amount = _extract_amount(text)
if amount:
fields.append(DocumentField(key="amount", label="金额", value=amount))
date_value = _extract_date(text)
date_value = _extract_date(text, document_type=document_type)
if date_value:
fields.append(DocumentField(key="date", label="日期", value=date_value))
@@ -584,6 +563,8 @@ def _extract_amount(text: str) -> str:
continue
if candidate <= Decimal("0.00"):
continue
if _is_amount_match_date_fragment(candidate, text, match.start(1), match.end(1)):
continue
if best_value is None or candidate > best_value:
best_value = candidate
@@ -594,10 +575,49 @@ def _extract_amount(text: str) -> str:
return f"{text_value}"
def _extract_date(text: str) -> str:
match = DATE_PATTERN.search(text)
if not match:
def _is_amount_match_date_fragment(amount: Decimal, text: str, start: int, end: int) -> bool:
if start < 0 or end < 0:
return False
normalized = amount.quantize(Decimal("0.01"))
if normalized != normalized.to_integral_value() or normalized < Decimal("1900") or normalized > Decimal("2099"):
return False
before = str(text or "")[max(0, start - 8):start]
after = str(text or "")[end:end + 10]
if re.match(r"\s*(?:年|[-/.])\s*\d{1,2}", after):
return True
if re.search(r"\d{1,2}\s*(?:年|[-/.])\s*$", before):
return True
return False
def _extract_date(text: str, *, document_type: str = "") -> str:
matches = list(DATE_PATTERN.finditer(text))
if not matches:
return ""
normalized_type = str(document_type or "").strip().lower()
if normalized_type in TRIP_DATE_LABEL_BY_DOCUMENT_TYPE:
candidates: list[tuple[int, int, bool, str]] = []
for index, match in enumerate(matches):
value = _format_date_match_with_time(text, match)
if not value:
continue
invoice_context = _is_invoice_date_context(text, match)
score = _score_trip_date_context(text, match, value, invoice_context)
candidates.append((score, index, invoice_context, value))
non_invoice_candidates = [candidate for candidate in candidates if not candidate[2]]
if non_invoice_candidates:
return max(non_invoice_candidates, key=lambda candidate: (candidate[0], -candidate[1]))[3]
if candidates:
return ""
return ""
return _format_date_match_with_time(text, matches[0])
def _format_date_match_with_time(text: str, match: re.Match[str]) -> str:
raw_value = str(match.group(1) or "").strip()
normalized = raw_value.replace("", "-").replace("", "-").replace("", "")
normalized = normalized.replace("/", "-").replace(".", "-")
@@ -605,7 +625,60 @@ def _extract_date(text: str) -> str:
if len(parts) != 3:
return raw_value
year, month, day = parts
return f"{year.zfill(4)}-{month.zfill(2)}-{day.zfill(2)}"
date_value = f"{year.zfill(4)}-{month.zfill(2)}-{day.zfill(2)}"
surrounding = str(text or "")[max(0, match.start() - 18): match.end() + 24]
time_match = TIME_PATTERN.search(surrounding)
if time_match:
hour = str(time_match.group(1) or "").zfill(2)
minute = str(time_match.group(2) or "").zfill(2)
return f"{date_value} {hour}:{minute}"
return date_value
def _is_invoice_date_context(text: str, match: re.Match[str]) -> bool:
window = str(text or "")[max(0, match.start() - 12): match.end() + 8]
compact = window.replace(" ", "")
return any(token in compact for token in ("开票日期", "发票日期", "开票时间", "开票"))
def _score_trip_date_context(
text: str,
match: re.Match[str],
value: str,
invoice_context: bool,
) -> int:
window = str(text or "")[max(0, match.start() - 32): match.end() + 32]
compact = window.replace(" ", "")
score = -20 if invoice_context else 0
if ":" in value or "" in value:
score += 8
if any(
token in compact
for token in (
"行程日期",
"出发日期",
"出发时间",
"列车出发时间",
"发车日期",
"发车时间",
"开车时间",
"乘车日期",
"乘车时间",
"起飞日期",
"起飞时间",
"航班日期",
"上车时间",
"用车时间",
)
):
score += 6
if any(token in compact for token in ("车次", "检票", "二等座", "一等座", "商务座", "软卧", "硬卧")):
score += 3
if re.search(r"[A-Z]\d{1,4}", compact):
score += 2
if re.search(r"[\u4e00-\u9fa5A-Za-z0-9()·]{2,20}(?:至|到|→|->|—||-)[\u4e00-\u9fa5A-Za-z0-9()·]{2,20}", compact):
score += 2
return score
def _extract_merchant(text: str) -> str:

View File

@@ -0,0 +1,120 @@
from __future__ import annotations
from app.services.document_intelligence_types import DocumentRule
DEFAULT_RULE = DocumentRule(
document_type="other",
document_type_label="其他单据",
scene_code="other",
scene_label="其他票据",
expense_type="other",
keywords=(),
score_bias=0.0,
)
DOCUMENT_RULES: tuple[DocumentRule, ...] = (
DocumentRule(
document_type="flight_itinerary",
document_type_label="机票/航班行程单",
scene_code="travel",
scene_label="差旅票据",
expense_type="travel",
keywords=("电子行程单", "航班号", "航班", "机票", "登机", "航空", "客票"),
score_bias=0.34,
),
DocumentRule(
document_type="train_ticket",
document_type_label="火车/高铁票",
scene_code="travel",
scene_label="差旅票据",
expense_type="travel",
keywords=("铁路电子客票", "电子客票", "高铁", "火车", "动车", "铁路", "车次", "检票", "二等座", "一等座", "票价"),
score_bias=0.32,
),
DocumentRule(
document_type="hotel_invoice",
document_type_label="酒店住宿票据",
scene_code="hotel",
scene_label="住宿票据",
expense_type="hotel",
keywords=("住宿", "房费", "客房", "入住", "离店", "酒店", "宾馆", "间夜"),
score_bias=0.16,
),
DocumentRule(
document_type="taxi_receipt",
document_type_label="出租车/网约车票据",
scene_code="transport",
scene_label="交通票据",
expense_type="transport",
keywords=("滴滴出行", "滴滴", "网约车", "出租车", "打车", "乘车", "用车", "叫车", "车费", "车资", "的士", "快车", "专车", "订单号", "上车", "下车", "起点", "终点", "里程", "司机"),
score_bias=0.38,
),
DocumentRule(
document_type="parking_toll_receipt",
document_type_label="停车/通行费票据",
scene_code="transport",
scene_label="交通票据",
expense_type="transport",
keywords=("停车费", "通行费", "过路费", "收费站", "停车场", "停车"),
score_bias=0.28,
),
DocumentRule(
document_type="meal_receipt",
document_type_label="餐饮票据",
scene_code="meal",
scene_label="餐饮票据",
expense_type="meal",
keywords=("餐饮", "餐费", "用餐", "饭店", "酒楼", "餐厅", "食品", "外卖", "咖啡"),
score_bias=0.14,
),
DocumentRule(
document_type="office_invoice",
document_type_label="办公用品票据",
scene_code="office",
scene_label="办公用品票据",
expense_type="office",
keywords=("办公用品", "文具", "耗材", "打印纸", "墨盒", "硒鼓", "键盘", "鼠标"),
score_bias=0.14,
),
DocumentRule(
document_type="meeting_invoice",
document_type_label="会议/会务票据",
scene_code="meeting",
scene_label="会务票据",
expense_type="meeting",
keywords=("会议", "会务", "会展", "论坛", "会议室", "会场"),
score_bias=0.12,
),
DocumentRule(
document_type="training_invoice",
document_type_label="培训票据",
scene_code="training",
scene_label="培训票据",
expense_type="training",
keywords=("培训", "课程", "讲师", "教材", "学费", "认证"),
score_bias=0.12,
),
DocumentRule(
document_type="vat_invoice",
document_type_label="增值税发票",
scene_code="other",
scene_label="通用发票",
expense_type="other",
keywords=("发票代码", "发票号码", "价税合计", "增值税", "电子发票"),
score_bias=-0.08,
),
DocumentRule(
document_type="receipt",
document_type_label="一般收据/凭证",
scene_code="other",
scene_label="其他票据",
expense_type="other",
keywords=("收据", "凭证", "票据"),
score_bias=-0.18,
),
)
DOCUMENT_TYPE_RULE_MAP = {rule.document_type: rule for rule in DOCUMENT_RULES}
SUPPORTED_DOCUMENT_TYPES = tuple(DOCUMENT_TYPE_RULE_MAP.keys()) + ("other",)

View File

@@ -0,0 +1,53 @@
from __future__ import annotations
from dataclasses import dataclass
from pydantic import BaseModel, ConfigDict, Field
@dataclass(frozen=True, slots=True)
class DocumentField:
key: str
label: str
value: str
@dataclass(frozen=True, slots=True)
class DocumentInsight:
document_type: str
document_type_label: str
scene_code: str
scene_label: str
expense_type: str
fields: tuple[DocumentField, ...] = ()
classification_source: str = "rule"
classification_confidence: float = 0.0
evidence: tuple[str, ...] = ()
warnings: tuple[str, ...] = ()
@dataclass(frozen=True, slots=True)
class DocumentRule:
document_type: str
document_type_label: str
scene_code: str
scene_label: str
expense_type: str
keywords: tuple[str, ...]
score_bias: float = 0.0
@dataclass(frozen=True, slots=True)
class RuleMatch:
rule: DocumentRule | None
confidence: float
evidence: tuple[str, ...]
score: float
class LlmDocumentClassification(BaseModel):
model_config = ConfigDict(arbitrary_types_allowed=True)
document_type: str = Field(default="other")
scene_code: str = Field(default="other")
scene_label: str = Field(default="其他票据")
expense_type: str = Field(default="other")
confidence: float = Field(default=0.0, ge=0.0, le=1.0)
evidence: list[str] = Field(default_factory=list)
fields: list[DocumentField] = Field(default_factory=list)

View File

@@ -1,10 +1,11 @@
from __future__ import annotations
from collections import Counter
from datetime import date, datetime
from datetime import UTC, date, datetime
from typing import Any
from zoneinfo import ZoneInfo
from sqlalchemy import inspect, text
from sqlalchemy import inspect, select, text
from sqlalchemy.orm import Session
from app.core.config import get_settings
@@ -19,7 +20,7 @@ from app.models.role import Role
from app.repositories.employee import EmployeeRepository
from app.schemas.employee import (
EmployeeCreate,
EmployeeHistoryRead,
EmployeeImportResultRead,
EmployeeMetaRead,
EmployeeOrganizationRead,
EmployeeRead,
@@ -27,8 +28,15 @@ from app.schemas.employee import (
EmployeeStatusSummaryRead,
EmployeeUpdate,
)
from app.services.employee_import import EmployeeImportCoordinator
from app.services.employee_serialization import (
format_history_datetime as serialize_history_datetime,
serialize_employee,
)
from app.services.employee_spreadsheet import build_import_template_bytes
from app.services.employee_seed import (
EMPLOYEE_DEFINITIONS,
EMPLOYEE_PROFILE_REPAIRS,
ORGANIZATION_DEFINITIONS,
ROLE_DEFINITIONS,
ROLE_DISPLAY_ORDER,
@@ -37,6 +45,8 @@ from app.services.employee_seed import (
logger = get_logger("app.services.employee")
DEFAULT_EMPLOYEE_PASSWORD = "123456"
MAX_EMPLOYEE_CHANGE_LOGS = 5
DISPLAY_TIMEZONE = ZoneInfo("Asia/Shanghai")
STATUS_TONE_MAP = {
"在职": "success",
@@ -57,7 +67,9 @@ def prepare_employee_directory() -> None:
session_factory = get_session_factory()
with session_factory() as db:
EmployeeService(db).ensure_directory_ready()
service = EmployeeService(db)
service.ensure_directory_ready()
service.apply_profile_repairs()
class EmployeeService:
@@ -120,10 +132,27 @@ class EmployeeService:
for role in self._sorted_roles(self.repository.list_roles())
]
organization_options = [
EmployeeOrganizationRead(
id=unit.id,
code=unit.unit_code,
name=unit.name,
unitType=unit.unit_type,
costCenter=unit.cost_center,
location=unit.location,
managerName=unit.manager_name,
)
for unit in sorted(
self.repository.list_organization_units(),
key=lambda item: item.name,
)
]
return EmployeeMetaRead(
totalEmployees=len(employees),
statusSummary=status_summary,
roleOptions=role_options,
organizationOptions=organization_options,
)
def create_employee(self, payload: EmployeeCreate) -> EmployeeRead:
@@ -152,7 +181,7 @@ class EmployeeService:
sync_state=payload.sync_state,
spotlight=payload.spotlight,
password_hash=hash_password(DEFAULT_EMPLOYEE_PASSWORD),
last_sync_at=datetime.now(),
last_sync_at=datetime.now(UTC),
)
if payload.organization_unit_code:
@@ -261,6 +290,43 @@ class EmployeeService:
employee.finance_owner_name = finance_owner_name
changed_fields.append("财务归口")
if "organization_unit_code" in payload.model_fields_set:
organization_code = self._normalize_optional_text(payload.organization_unit_code)
current_code = (
employee.organization_unit.unit_code if employee.organization_unit else None
)
if organization_code != current_code:
if organization_code:
organization = self.repository.get_organization_by_code(organization_code)
if organization is None:
raise ValueError(f"部门编码 {organization_code} 不存在")
employee.organization_unit = organization
else:
employee.organization_unit = None
changed_fields.append("所属部门")
if "manager_employee_no" in payload.model_fields_set:
manager_employee_no = self._normalize_optional_text(payload.manager_employee_no)
current_manager_no = employee.manager.employee_no if employee.manager else None
if manager_employee_no:
if manager_employee_no == employee.employee_no:
raise ValueError("直属上级不能是员工本人")
manager = self.repository.get_by_employee_no(manager_employee_no)
if manager is None:
raise ValueError(f"直属上级工号 {manager_employee_no} 不存在")
if manager_employee_no != current_manager_no:
employee.manager = manager
changed_fields.append("直属上级")
elif current_manager_no is not None:
employee.manager = None
changed_fields.append("直属上级")
role_changed = False
sorted_roles: list[Role] = []
if "role_codes" in payload.model_fields_set and payload.role_codes is not None:
requested_codes = list(dict.fromkeys(payload.role_codes))
roles: list[Role] = []
@@ -280,7 +346,7 @@ class EmployeeService:
current_role_codes = [role.role_code for role in self._sorted_roles(list(employee.roles))]
if next_role_codes != current_role_codes:
employee.roles = sorted_roles
changed_fields.append("系统角色")
role_changed = True
if "password" in payload.model_fields_set and payload.password:
password = payload.password.strip()
@@ -289,10 +355,10 @@ class EmployeeService:
employee.password_hash = hash_password(password)
password_changed = True
if not changed_fields and not password_changed:
if not changed_fields and not password_changed and not role_changed:
return self._serialize_employee(employee)
now = datetime.now()
now = datetime.now(UTC)
employee.last_sync_at = now
employee.sync_state = "已同步"
@@ -303,13 +369,25 @@ class EmployeeService:
occurred_at=now,
)
if role_changed:
role_labels = "".join(role.name for role in sorted_roles)
self._append_change_log(
employee,
action=f"更新系统角色({role_labels}",
occurred_at=now,
)
if password_changed:
self._append_change_log(employee, action="重置员工登录密码", occurred_at=now)
saved = self.repository.save(employee)
hydrated = self.repository.get(saved.id)
logger.info("Updated employee id=%s fields=%s", employee.id, ",".join(changed_fields))
return self._serialize_employee(hydrated or saved)
hydrated = self._save_employee_and_reload(employee)
logger.info(
"Updated employee id=%s fields=%s role_changed=%s",
employee.id,
",".join(changed_fields),
role_changed,
)
return self._serialize_employee(hydrated)
def disable_employee(self, employee_id: str) -> EmployeeRead:
self.ensure_directory_ready()
@@ -321,17 +399,16 @@ class EmployeeService:
if employee.employment_status == "停用":
return self._serialize_employee(employee)
now = datetime.now()
now = datetime.now(UTC)
employee.employment_status = "停用"
employee.sync_state = "已同步"
employee.last_sync_at = now
employee.spotlight = False
self._append_change_log(employee, action="停用员工账号", occurred_at=now)
saved = self.repository.save(employee)
hydrated = self.repository.get(saved.id)
hydrated = self._save_employee_and_reload(employee)
logger.info("Disabled employee id=%s no=%s", employee.id, employee.employee_no)
return self._serialize_employee(hydrated or saved)
return self._serialize_employee(hydrated)
def enable_employee(self, employee_id: str) -> EmployeeRead:
self.ensure_directory_ready()
@@ -343,16 +420,38 @@ class EmployeeService:
if employee.employment_status != "停用":
return self._serialize_employee(employee)
now = datetime.now()
now = datetime.now(UTC)
employee.employment_status = "在职"
employee.sync_state = "已同步"
employee.last_sync_at = now
self._append_change_log(employee, action="启用员工账号", occurred_at=now)
saved = self.repository.save(employee)
hydrated = self.repository.get(saved.id)
hydrated = self._save_employee_and_reload(employee)
logger.info("Enabled employee id=%s no=%s", employee.id, employee.employee_no)
return self._serialize_employee(hydrated or saved)
return self._serialize_employee(hydrated)
def build_import_template(self) -> bytes:
self.ensure_directory_ready()
return build_import_template_bytes()
def export_employees(self, status: str | None = None, keyword: str | None = None) -> bytes:
self.ensure_directory_ready()
return self._import_coordinator().export_employees(status=status, keyword=keyword)
def import_employees(self, content: bytes, actor: str = "系统管理员") -> EmployeeImportResultRead:
self.ensure_directory_ready()
return self._import_coordinator().import_employees(content, actor=actor)
def _import_coordinator(self) -> EmployeeImportCoordinator:
return EmployeeImportCoordinator(
self.db,
self.repository,
sorted_roles=self._sorted_roles,
append_change_log=self._append_change_log,
format_date=self._format_date,
format_datetime=self._format_datetime,
default_password=DEFAULT_EMPLOYEE_PASSWORD,
)
def _seed_roles(self) -> None:
existing_by_code = {role.role_code: role for role in self.repository.list_roles()}
@@ -471,6 +570,69 @@ class EmployeeService:
self.db.flush()
def apply_profile_repairs(self) -> None:
"""Apply one-off demo profile repairs. Intended for startup/bootstrap only."""
try:
self._repair_employee_profiles()
self._trim_all_employee_change_logs()
self.db.commit()
except Exception:
self.db.rollback()
logger.exception("Failed to apply employee profile repairs")
raise
def _repair_employee_profiles(self) -> None:
if not EMPLOYEE_PROFILE_REPAIRS:
return
employees = self.repository.list()
employees_by_email = {employee.email.lower(): employee for employee in employees if employee.email}
employees_by_no = {employee.employee_no: employee for employee in employees if employee.employee_no}
roles_by_code = {role.role_code: role for role in self.repository.list_roles()}
organizations_by_code = {
unit.unit_code: unit for unit in self.repository.list_organization_units()
}
for definition in EMPLOYEE_PROFILE_REPAIRS:
email = str(definition.get("email") or "").strip().lower()
employee_no = str(definition.get("employee_no") or "").strip()
employee = employees_by_email.get(email) or employees_by_no.get(employee_no)
if employee is None:
continue
for field_name in (
"position",
"grade",
"location",
"cost_center",
"finance_owner_name",
"employment_status",
"sync_state",
):
value = definition.get(field_name)
if value:
setattr(employee, field_name, value)
organization_code = definition.get("organization_unit_code")
if organization_code:
employee.organization_unit = organizations_by_code.get(organization_code)
manager_employee_no = definition.get("manager_employee_no")
if manager_employee_no:
employee.manager = employees_by_no.get(manager_employee_no)
if not employee.password_hash:
employee.password_hash = hash_password(DEFAULT_EMPLOYEE_PASSWORD)
role_codes = [item for item in definition.get("role_codes", []) if item in roles_by_code]
if role_codes:
merged_roles = {role.role_code: role for role in employee.roles}
for role_code in role_codes:
merged_roles[role_code] = roles_by_code[role_code]
employee.roles = self._sorted_roles(list(merged_roles.values()))
self.db.flush()
def _prune_extra_seed_employees(self) -> None:
if not EXTRA_SEED_EMPLOYEE_NOS:
return
@@ -530,6 +692,12 @@ class EmployeeService:
)
existing_keys.add(identity)
def _save_employee_and_reload(self, employee: Employee) -> Employee:
saved = self.repository.save(employee)
self._trim_employee_change_logs(saved.id)
self.db.commit()
return self.repository.get(saved.id) or saved
def _append_change_log(
self,
employee: Employee,
@@ -542,82 +710,43 @@ class EmployeeService:
employee=employee,
action=action,
owner=owner,
occurred_at=occurred_at or datetime.now(),
occurred_at=occurred_at or datetime.now(UTC),
)
)
def _trim_all_employee_change_logs(self) -> None:
for employee in self.repository.list():
self._trim_employee_change_logs(employee.id)
def _sorted_change_logs(self, employee: Employee) -> list[EmployeeChangeLog]:
return sorted(employee.change_logs, key=lambda item: item.occurred_at, reverse=True)
def _trim_employee_change_logs(self, employee_id: str) -> None:
stmt = (
select(EmployeeChangeLog)
.where(EmployeeChangeLog.employee_id == employee_id)
.order_by(EmployeeChangeLog.occurred_at.desc())
)
logs = list(self.db.execute(stmt).scalars().all())
if len(logs) <= MAX_EMPLOYEE_CHANGE_LOGS:
return
for stale in logs[MAX_EMPLOYEE_CHANGE_LOGS:]:
self.db.delete(stale)
def _serialize_employee(self, employee: Employee) -> EmployeeRead:
organization = employee.organization_unit
roles = self._sorted_roles(list(employee.roles))
role_labels = [role.name for role in roles]
role_codes = [role.role_code for role in roles]
history = [
EmployeeHistoryRead(
action=item.action,
owner=item.owner,
time=self._format_datetime(item.occurred_at) or "",
occurredAt=self._format_datetime(item.occurred_at) or "",
)
for item in employee.change_logs
]
return EmployeeRead(
id=employee.id,
avatar=(employee.name or "?")[:1],
name=employee.name,
employeeNo=employee.employee_no,
department=organization.name if organization else "",
position=employee.position,
grade=employee.grade,
manager=employee.manager.name if employee.manager else "CEO",
financeOwner=employee.finance_owner_name or "",
roles=role_labels,
roleCodes=role_codes,
status=employee.employment_status,
statusTone=STATUS_TONE_MAP.get(employee.employment_status, "neutral"),
gender=employee.gender,
age=self._calculate_age(employee.birth_date),
birthDate=self._format_date(employee.birth_date),
email=employee.email,
phone=employee.phone,
joinDate=self._format_date(employee.join_date),
location=employee.location,
costCenter=employee.cost_center,
updatedAt=self._format_datetime(employee.updated_at or employee.created_at),
lastSync=self._format_datetime(employee.last_sync_at),
syncState=employee.sync_state,
spotlight=employee.spotlight,
permissions=self._collect_permissions(role_codes),
history=history,
organization=(
EmployeeOrganizationRead(
id=organization.id,
code=organization.unit_code,
name=organization.name,
unitType=organization.unit_type,
costCenter=organization.cost_center,
location=organization.location,
managerName=organization.manager_name,
)
if organization
else None
),
return serialize_employee(
employee,
sorted_roles=self._sorted_roles(list(employee.roles)),
sorted_change_logs=self._sorted_change_logs(employee),
format_date=self._format_date,
format_datetime=self._format_datetime,
format_history_datetime=self._format_history_datetime,
role_permission_map=ROLE_PERMISSION_MAP,
status_tone_map=STATUS_TONE_MAP,
max_change_logs=MAX_EMPLOYEE_CHANGE_LOGS,
)
def _collect_permissions(self, role_codes: list[str]) -> list[str]:
permissions: list[str] = []
seen: set[str] = set()
for role_code in role_codes:
for permission in ROLE_PERMISSION_MAP.get(role_code, []):
if permission in seen:
continue
permissions.append(permission)
seen.add(permission)
return permissions
def _sorted_roles(self, roles: list[Role]) -> list[Role]:
return sorted(roles, key=lambda item: (ROLE_DISPLAY_ORDER.get(item.role_code, 999), item.name))
@@ -648,19 +777,24 @@ class EmployeeService:
return None
return value.strftime("%Y-%m-%d")
@staticmethod
def _to_display_datetime(value: datetime) -> datetime:
if value.tzinfo is None:
normalized = value.replace(tzinfo=UTC)
else:
normalized = value.astimezone(UTC)
return normalized.astimezone(DISPLAY_TIMEZONE)
@staticmethod
def _format_datetime(value: datetime | None) -> str | None:
if value is None:
return None
return value.strftime("%Y-%m-%d %H:%M")
local = EmployeeService._to_display_datetime(value)
return local.strftime("%Y-%m-%d %H:%M")
@staticmethod
def _calculate_age(birth_date: date | None) -> int | None:
if birth_date is None:
return None
today = date.today()
age = today.year - birth_date.year
if (today.month, today.day) < (birth_date.month, birth_date.day):
age -= 1
return age
def _format_history_datetime(value: datetime | None) -> str:
return serialize_history_datetime(
value,
to_display_datetime=EmployeeService._to_display_datetime,
)

View File

@@ -0,0 +1,331 @@
from __future__ import annotations
from collections.abc import Callable
from datetime import UTC, date, datetime
from sqlalchemy.orm import Session
from app.core.logging import get_logger
from app.core.security import hash_password
from app.models.employee import Employee
from app.models.role import Role
from app.repositories.employee import EmployeeRepository
from app.schemas.employee import (
EmployeeImportErrorRead,
EmployeeImportResultRead,
EmployeeImportSummaryRead,
)
from app.services.employee_spreadsheet import (
EmployeeImportRow,
EmployeeSpreadsheetError,
build_export_workbook_bytes,
parse_employee_workbook,
)
logger = get_logger("app.services.employee")
class EmployeeImportCoordinator:
def __init__(
self,
db: Session,
repository: EmployeeRepository,
*,
sorted_roles: Callable[[list[Role]], list[Role]],
append_change_log: Callable[[Employee, str, str, datetime | None], None],
format_date: Callable[[date | None], str | None],
format_datetime: Callable[[datetime | None], str | None],
default_password: str,
) -> None:
self.db = db
self.repository = repository
self.sorted_roles = sorted_roles
self.append_change_log = append_change_log
self.format_date = format_date
self.format_datetime = format_datetime
self.default_password = default_password
def export_employees(self, status: str | None = None, keyword: str | None = None) -> bytes:
employees = self.repository.list(status=status, keyword=keyword)
rows: list[list[str]] = []
for employee in employees:
organization = employee.organization_unit
role_codes = ",".join(role.role_code for role in self.sorted_roles(list(employee.roles)))
rows.append(
[
employee.employee_no,
employee.name,
employee.email,
employee.gender or "",
self.format_date(employee.birth_date) or "",
employee.phone or "",
self.format_date(employee.join_date) or "",
employee.location or "",
employee.position,
employee.grade,
organization.unit_code if organization else "",
employee.manager.employee_no if employee.manager else "",
employee.finance_owner_name or "",
employee.cost_center or "",
employee.employment_status,
role_codes,
]
)
return build_export_workbook_bytes(rows)
def import_employees(self, content: bytes, actor: str = "系统管理员") -> EmployeeImportResultRead:
parsed_rows, parse_errors = parse_employee_workbook(content)
if parse_errors:
return self._build_import_failure(parse_errors, total_rows=len(parsed_rows))
validation_errors = self._validate_import_rows(parsed_rows)
if validation_errors:
return self._build_import_failure(validation_errors, total_rows=len(parsed_rows))
try:
summary = self._apply_import_rows(parsed_rows, actor=actor)
except Exception:
self.db.rollback()
logger.exception("Employee import failed during database write")
raise
imported_at = self.format_datetime(datetime.now(UTC)) or ""
message = f"导入成功:新增 {summary['created']} 人,更新 {summary['updated']} 人。"
logger.info(
"Imported employees created=%d updated=%d total=%d",
summary["created"],
summary["updated"],
len(parsed_rows),
)
return EmployeeImportResultRead(
success=True,
message=message,
summary=EmployeeImportSummaryRead(
totalRows=len(parsed_rows),
created=summary["created"],
updated=summary["updated"],
errorCount=0,
),
errors=[],
importedAt=imported_at,
)
def _validate_import_rows(
self, rows: list[EmployeeImportRow]
) -> list[EmployeeSpreadsheetError]:
errors: list[EmployeeSpreadsheetError] = []
employee_nos_in_file: dict[str, int] = {}
emails_in_file: dict[str, int] = {}
roles_by_code = {role.role_code: role for role in self.repository.list_roles()}
organizations_by_code = {
unit.unit_code: unit for unit in self.repository.list_organization_units()
}
employees_by_no = {
employee.employee_no: employee for employee in self.repository.list()
}
import_employee_nos = {row.employee_no for row in rows}
for row in rows:
if row.employee_no in employee_nos_in_file:
errors.append(
EmployeeSpreadsheetError(
row=row.row_number,
column="员工编号*",
employee_no=row.employee_no,
message=f"员工编号 {row.employee_no} 在文件中重复。",
)
)
else:
employee_nos_in_file[row.employee_no] = row.row_number
if row.email in emails_in_file:
errors.append(
EmployeeSpreadsheetError(
row=row.row_number,
column="邮箱*",
employee_no=row.employee_no,
message=f"邮箱 {row.email} 在文件中重复。",
)
)
else:
emails_in_file[row.email] = row.row_number
existing_by_email = self.repository.get_by_email(row.email)
if existing_by_email is not None and existing_by_email.employee_no != row.employee_no:
errors.append(
EmployeeSpreadsheetError(
row=row.row_number,
column="邮箱*",
employee_no=row.employee_no,
message=(
f"邮箱 {row.email} 已被员工 "
f"{existing_by_email.employee_no} 使用。"
),
)
)
if row.organization_unit_code and row.organization_unit_code not in organizations_by_code:
errors.append(
EmployeeSpreadsheetError(
row=row.row_number,
column="部门编码",
employee_no=row.employee_no,
message=f"部门编码 {row.organization_unit_code} 不存在。",
)
)
if row.manager_employee_no:
manager_exists = (
row.manager_employee_no in employees_by_no
or row.manager_employee_no in import_employee_nos
)
if not manager_exists:
errors.append(
EmployeeSpreadsheetError(
row=row.row_number,
column="直属上级工号",
employee_no=row.employee_no,
message=f"直属上级工号 {row.manager_employee_no} 不存在。",
)
)
if row.manager_employee_no == row.employee_no:
errors.append(
EmployeeSpreadsheetError(
row=row.row_number,
column="直属上级工号",
employee_no=row.employee_no,
message="直属上级不能是员工本人。",
)
)
invalid_role_codes = [
code for code in row.role_codes if code not in roles_by_code
]
if invalid_role_codes:
errors.append(
EmployeeSpreadsheetError(
row=row.row_number,
column="角色编码",
employee_no=row.employee_no,
message=f"角色不存在:{''.join(invalid_role_codes)}",
)
)
return errors
def _apply_import_rows(
self,
rows: list[EmployeeImportRow],
*,
actor: str,
) -> dict[str, int]:
roles_by_code = {role.role_code: role for role in self.repository.list_roles()}
organizations_by_code = {
unit.unit_code: unit for unit in self.repository.list_organization_units()
}
employees_by_no = {
employee.employee_no: employee for employee in self.repository.list()
}
created = 0
updated = 0
now = datetime.now(UTC)
try:
for row in rows:
employee = employees_by_no.get(row.employee_no)
is_new = employee is None
if is_new:
employee = Employee(
employee_no=row.employee_no,
name=row.name,
email=row.email,
password_hash=hash_password(self.default_password),
)
self.db.add(employee)
employees_by_no[row.employee_no] = employee
created += 1
else:
updated += 1
employee.name = row.name
employee.email = row.email
employee.gender = row.gender
employee.birth_date = row.birth_date
employee.phone = row.phone
employee.join_date = row.join_date
employee.location = row.location
employee.position = row.position
employee.grade = row.grade
employee.finance_owner_name = row.finance_owner_name
employee.cost_center = row.cost_center
employee.employment_status = row.employment_status
employee.sync_state = "已同步"
employee.last_sync_at = now
if row.organization_unit_code:
employee.organization_unit = organizations_by_code[row.organization_unit_code]
else:
employee.organization_unit = None
employee.roles = self.sorted_roles(
[roles_by_code[code] for code in row.role_codes if code in roles_by_code]
)
action = (
"通过 Excel 导入新建员工档案"
if is_new
else "通过 Excel 导入更新员工档案"
)
self.append_change_log(employee, action=action, owner=actor, occurred_at=now)
self.db.flush()
for row in rows:
employee = employees_by_no[row.employee_no]
if row.manager_employee_no:
employee.manager = employees_by_no.get(row.manager_employee_no)
else:
employee.manager = None
self.db.commit()
except Exception:
self.db.rollback()
raise
return {"created": created, "updated": updated}
def _build_import_failure(
self,
errors: list[EmployeeSpreadsheetError],
*,
total_rows: int,
) -> EmployeeImportResultRead:
error_reads = [
EmployeeImportErrorRead(
row=item.row,
column=item.column,
employeeNo=item.employee_no,
message=item.message,
)
for item in errors
]
return EmployeeImportResultRead(
success=False,
message=(
f"导入未执行:共发现 {len(error_reads)} 处错误,请修正后重新导入。"
"原有员工数据未变更。"
),
summary=EmployeeImportSummaryRead(
totalRows=total_rows,
created=0,
updated=0,
errorCount=len(error_reads),
),
errors=error_reads,
importedAt=None,
)

View File

@@ -1,986 +1,17 @@
from __future__ import annotations
ROLE_DISPLAY_ORDER = {
"manager": 1,
"finance": 2,
"approver": 3,
"executive": 4,
"auditor": 5,
"user": 6,
}
from app.services.employee_seed_roles import ROLE_DEFINITIONS, ROLE_DISPLAY_ORDER, ROLE_PERMISSION_MAP
from app.services.employee_seed_organizations import EMPLOYEE_PROFILE_REPAIRS, ORGANIZATION_DEFINITIONS
from app.services.employee_seed_part1 import EMPLOYEE_DEFINITIONS_PART_1
from app.services.employee_seed_part2 import EMPLOYEE_DEFINITIONS_PART_2
ROLE_DEFINITIONS = [
{
"role_code": "user",
"name": "使用者",
"description": "可以发起报销、查看个人单据和使用 AI 助手。",
},
{
"role_code": "finance",
"name": "财务人员",
"description": "可以处理复核、查看财务知识与风险校验结果。",
},
{
"role_code": "manager",
"name": "管理员",
"description": "可以维护员工档案、组织结构和角色权限。",
},
{
"role_code": "executive",
"name": "高级管理人员",
"description": "可以查看跨部门数据看板与关键审批结果。",
},
{
"role_code": "approver",
"name": "审批负责人",
"description": "可以处理审批中心中的待审单据。",
},
{
"role_code": "auditor",
"name": "审计观察员",
"description": "可以查看变更记录和权限调整历史。",
},
]
ROLE_PERMISSION_MAP = {
"user": ["可发起差旅申请与报销", "可查看个人单据与票据识别结果"],
"finance": ["可处理财务复核任务", "可查看风险校验与财务知识库"],
"manager": ["可维护员工档案与组织结构", "可配置系统角色与访问边界"],
"executive": ["可查看跨部门经营看板", "可处理高金额报销最终审批"],
"approver": ["可处理本部门待审单据", "可查看审批链路与 SLA 状态"],
"auditor": ["可查看权限变更与审计留痕", "可导出员工权限观察记录"],
}
ORGANIZATION_DEFINITIONS = [
{
"unit_code": "ORG-ROOT",
"name": "星海科技",
"unit_type": "company",
"parent_code": None,
"cost_center": "CC-0000",
"location": "上海",
"manager_name": "李文静",
},
{
"unit_code": "EXEC-OFFICE",
"name": "总经办",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-1001",
"location": "上海",
"manager_name": "李文静",
},
{
"unit_code": "FIN-SSC",
"name": "财务共享中心",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-2108",
"location": "上海",
"manager_name": "张晓晴",
},
{
"unit_code": "HR-OD",
"name": "人力与组织",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-3206",
"location": "杭州",
"manager_name": "陈硕",
},
{
"unit_code": "SALES-SOUTH",
"name": "华南销售部",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-4102",
"location": "深圳",
"manager_name": "陈嘉",
},
{
"unit_code": "SALES-EAST",
"name": "华东销售部",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-4108",
"location": "上海",
"manager_name": "秦墨然",
},
{
"unit_code": "MKT-BRAND",
"name": "市场品牌部",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-5203",
"location": "北京",
"manager_name": "刘思雨",
},
{
"unit_code": "RND-CENTER",
"name": "产品研发中心",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-6105",
"location": "北京",
"manager_name": "吴磊",
},
{
"unit_code": "OPS-ADMIN",
"name": "行政采购部",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-7204",
"location": "南京",
"manager_name": "梁雨辰",
},
{
"unit_code": "AUDIT-RISK",
"name": "风控与审计部",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-8102",
"location": "上海",
"manager_name": "顾承宇",
},
]
EMPLOYEE_DEFINITIONS = [
{
"employee_no": "E10018",
"name": "李文静",
"gender": "",
"birth_date": "1987-03-26",
"phone": "13900187688",
"email": "wenjing.li@xfinance.com",
"join_date": "2018-06-21",
"location": "上海",
"position": "高级财务总监",
"grade": "D2",
"organization_unit_code": "EXEC-OFFICE",
"manager_employee_no": None,
"finance_owner_name": "集团财务",
"cost_center": "CC-1001",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-05 16:20",
"last_sync_at": "2026-05-05 16:20",
"role_codes": ["executive", "approver"],
},
{
"employee_no": "E10234",
"name": "张晓晴",
"gender": "",
"birth_date": "1994-08-12",
"phone": "13810234567",
"email": "xiaoqing.zhang@xfinance.com",
"join_date": "2021-03-15",
"location": "上海",
"position": "费用运营经理",
"grade": "M3",
"organization_unit_code": "FIN-SSC",
"manager_employee_no": "E10018",
"finance_owner_name": "华东财务组",
"cost_center": "CC-2108",
"employment_status": "在职",
"sync_state": "待生效",
"spotlight": True,
"updated_at": "2026-05-06 10:24",
"last_sync_at": "2026-05-06 10:24",
"role_codes": ["manager", "finance", "approver"],
"history": [
{
"action": "新增“审批负责人”角色",
"owner": "系统管理员 · 王敏",
"occurred_at": "2026-05-06 10:24",
},
{
"action": "调整财务归口为华东财务组",
"owner": "组织管理员 · 陈硕",
"occurred_at": "2026-05-05 18:10",
},
],
},
{
"employee_no": "E10258",
"name": "孙楠",
"gender": "",
"birth_date": "1992-09-17",
"phone": "13722580312",
"email": "nan.sun@xfinance.com",
"join_date": "2020-11-09",
"location": "上海",
"position": "财务分析师",
"grade": "P5",
"organization_unit_code": "FIN-SSC",
"manager_employee_no": "E10234",
"finance_owner_name": "华东财务组",
"cost_center": "CC-2111",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-04 15:18",
"last_sync_at": "2026-05-04 15:18",
"role_codes": ["finance"],
},
{
"employee_no": "E10271",
"name": "周悦宁",
"gender": "",
"birth_date": "1993-04-21",
"phone": "13622711986",
"email": "yuening.zhou@xfinance.com",
"join_date": "2021-07-05",
"location": "上海",
"position": "财务系统专员",
"grade": "P5",
"organization_unit_code": "FIN-SSC",
"manager_employee_no": "E10234",
"finance_owner_name": "华东财务组",
"cost_center": "CC-2112",
"employment_status": "在职",
"sync_state": "同步中",
"spotlight": False,
"updated_at": "2026-05-07 09:35",
"last_sync_at": "2026-05-07 09:10",
"role_codes": ["finance", "auditor"],
},
{
"employee_no": "E10289",
"name": "高嘉禾",
"gender": "",
"birth_date": "1996-02-14",
"phone": "13522895642",
"email": "jiahe.gao@xfinance.com",
"join_date": "2023-01-10",
"location": "上海",
"position": "差旅合规专员",
"grade": "P4",
"organization_unit_code": "FIN-SSC",
"manager_employee_no": "E10234",
"finance_owner_name": "华东财务组",
"cost_center": "CC-2115",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-03 11:42",
"last_sync_at": "2026-05-03 11:42",
"role_codes": ["finance"],
},
{
"employee_no": "E10867",
"name": "王敏",
"gender": "",
"birth_date": "1996-11-05",
"phone": "13688671200",
"email": "min.wang@xfinance.com",
"join_date": "2022-08-08",
"location": "杭州",
"position": "组织发展主管",
"grade": "P6",
"organization_unit_code": "HR-OD",
"manager_employee_no": "E11618",
"finance_owner_name": "总部财务BP",
"cost_center": "CC-3206",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-05 09:18",
"last_sync_at": "2026-05-05 09:18",
"role_codes": ["manager", "auditor"],
},
{
"employee_no": "E11618",
"name": "陈硕",
"gender": "",
"birth_date": "1990-05-09",
"phone": "13816186540",
"email": "shuo.chen@xfinance.com",
"join_date": "2019-09-16",
"location": "杭州",
"position": "人力资源经理",
"grade": "M2",
"organization_unit_code": "HR-OD",
"manager_employee_no": "E10018",
"finance_owner_name": "总部财务BP",
"cost_center": "CC-3201",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-04 17:08",
"last_sync_at": "2026-05-04 17:08",
"role_codes": ["manager", "approver"],
},
{
"employee_no": "E12311",
"name": "何思成",
"gender": "",
"birth_date": "1998-07-19",
"phone": "13723117654",
"email": "sicheng.he@xfinance.com",
"join_date": "2026-02-17",
"location": "杭州",
"position": "HRBP",
"grade": "P4",
"organization_unit_code": "HR-OD",
"manager_employee_no": "E11618",
"finance_owner_name": "总部财务BP",
"cost_center": "CC-3208",
"employment_status": "试用中",
"sync_state": "待生效",
"spotlight": False,
"updated_at": "2026-05-07 08:42",
"last_sync_at": "2026-05-07 08:42",
"role_codes": ["user"],
},
{
"employee_no": "E11026",
"name": "刘思雨",
"gender": "",
"birth_date": "1991-12-03",
"phone": "13921036540",
"email": "siyu.liu@xfinance.com",
"join_date": "2020-04-13",
"location": "北京",
"position": "品牌市场经理",
"grade": "M2",
"organization_unit_code": "MKT-BRAND",
"manager_employee_no": "E10018",
"finance_owner_name": "市场财务BP",
"cost_center": "CC-5203",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 14:36",
"last_sync_at": "2026-05-06 14:36",
"role_codes": ["user", "approver"],
},
{
"employee_no": "E12408",
"name": "冯可欣",
"gender": "",
"birth_date": "1997-10-28",
"phone": "13624085542",
"email": "kexin.feng@xfinance.com",
"join_date": "2024-03-11",
"location": "北京",
"position": "品牌策划",
"grade": "P4",
"organization_unit_code": "MKT-BRAND",
"manager_employee_no": "E11026",
"finance_owner_name": "市场财务BP",
"cost_center": "CC-5207",
"employment_status": "在职",
"sync_state": "同步中",
"spotlight": False,
"updated_at": "2026-05-07 10:02",
"last_sync_at": "2026-05-07 09:48",
"role_codes": ["user"],
},
{
"employee_no": "E12419",
"name": "许泽航",
"gender": "",
"birth_date": "1995-05-15",
"phone": "13524199508",
"email": "zehang.xu@xfinance.com",
"join_date": "2023-06-19",
"location": "北京",
"position": "数字营销专员",
"grade": "P4",
"organization_unit_code": "MKT-BRAND",
"manager_employee_no": "E11026",
"finance_owner_name": "市场财务BP",
"cost_center": "CC-5209",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-03 16:52",
"last_sync_at": "2026-05-03 16:52",
"role_codes": ["user"],
},
{
"employee_no": "E11602",
"name": "陈嘉",
"gender": "",
"birth_date": "1997-02-18",
"phone": "13716029901",
"email": "jia.chen@xfinance.com",
"join_date": "2026-03-01",
"location": "深圳",
"position": "区域销售经理",
"grade": "M2",
"organization_unit_code": "SALES-SOUTH",
"manager_employee_no": "E10018",
"finance_owner_name": "华南财务组",
"cost_center": "CC-4102",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-04 14:12",
"last_sync_at": "2026-05-04 14:12",
"role_codes": ["user", "approver"],
},
{
"employee_no": "E12476",
"name": "马骁然",
"gender": "",
"birth_date": "1994-01-08",
"phone": "13824760139",
"email": "xiaoran.ma@xfinance.com",
"join_date": "2022-09-05",
"location": "深圳",
"position": "销售运营专家",
"grade": "P5",
"organization_unit_code": "SALES-SOUTH",
"manager_employee_no": "E11602",
"finance_owner_name": "华南财务组",
"cost_center": "CC-4106",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 18:15",
"last_sync_at": "2026-05-06 18:15",
"role_codes": ["user"],
},
{
"employee_no": "E12508",
"name": "唐子墨",
"gender": "",
"birth_date": "1996-06-11",
"phone": "13925088761",
"email": "zimo.tang@xfinance.com",
"join_date": "2024-02-26",
"location": "深圳",
"position": "大客户代表",
"grade": "P4",
"organization_unit_code": "SALES-SOUTH",
"manager_employee_no": "E11602",
"finance_owner_name": "华南财务组",
"cost_center": "CC-4109",
"employment_status": "停用",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-01 11:06",
"last_sync_at": "2026-05-01 11:06",
"role_codes": ["user"],
},
{
"employee_no": "E12514",
"name": "罗欣怡",
"gender": "",
"birth_date": "2000-03-02",
"phone": "13625141227",
"email": "xinyi.luo@xfinance.com",
"join_date": "2026-02-24",
"location": "深圳",
"position": "销售协调专员",
"grade": "P3",
"organization_unit_code": "SALES-SOUTH",
"manager_employee_no": "E11602",
"finance_owner_name": "华南财务组",
"cost_center": "CC-4112",
"employment_status": "试用中",
"sync_state": "待生效",
"spotlight": False,
"updated_at": "2026-05-05 15:42",
"last_sync_at": "2026-05-05 15:42",
"role_codes": ["user"],
},
{
"employee_no": "E11745",
"name": "吴磊",
"gender": "",
"birth_date": "1989-09-27",
"phone": "13817459812",
"email": "lei.wu@xfinance.com",
"join_date": "2019-12-09",
"location": "北京",
"position": "研发平台主管",
"grade": "M3",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E10018",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6105",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 13:08",
"last_sync_at": "2026-05-06 13:08",
"role_codes": ["user", "approver", "auditor"],
},
{
"employee_no": "E11991",
"name": "赵明",
"gender": "",
"birth_date": "1994-06-09",
"phone": "13519913300",
"email": "ming.zhao@xfinance.com",
"join_date": "2023-11-18",
"location": "北京",
"position": "产品经理",
"grade": "P5",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6112",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-02 11:32",
"last_sync_at": "2026-05-02 11:32",
"role_codes": ["user"],
},
{
"employee_no": "E12611",
"name": "彭一凡",
"gender": "",
"birth_date": "1995-02-03",
"phone": "13726114588",
"email": "yifan.peng@xfinance.com",
"join_date": "2022-04-18",
"location": "北京",
"position": "后端工程师",
"grade": "P5",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6114",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 09:44",
"last_sync_at": "2026-05-06 09:44",
"role_codes": ["user"],
},
{
"employee_no": "E12618",
"name": "苏清禾",
"gender": "",
"birth_date": "1994-12-25",
"phone": "13626188763",
"email": "qinghe.su@xfinance.com",
"join_date": "2022-05-16",
"location": "北京",
"position": "数据工程师",
"grade": "P5",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6116",
"employment_status": "在职",
"sync_state": "同步中",
"spotlight": False,
"updated_at": "2026-05-07 10:26",
"last_sync_at": "2026-05-07 10:18",
"role_codes": ["user"],
},
{
"employee_no": "E12624",
"name": "沈知远",
"gender": "",
"birth_date": "1992-11-06",
"phone": "13926241855",
"email": "zhiyuan.shen@xfinance.com",
"join_date": "2021-11-22",
"location": "北京",
"position": "测试负责人",
"grade": "P6",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6119",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-05 13:12",
"last_sync_at": "2026-05-05 13:12",
"role_codes": ["user"],
},
{
"employee_no": "E11852",
"name": "周晓彤",
"gender": "",
"birth_date": "1997-05-27",
"phone": "13818529954",
"email": "xiaotong.zhou@xfinance.com",
"join_date": "2022-06-30",
"location": "南京",
"position": "行政采购专员",
"grade": "P4",
"organization_unit_code": "OPS-ADMIN",
"manager_employee_no": "E12653",
"finance_owner_name": "行政财务BP",
"cost_center": "CC-7204",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-05 11:22",
"last_sync_at": "2026-05-05 11:22",
"role_codes": ["user"],
},
{
"employee_no": "E12653",
"name": "梁雨辰",
"gender": "",
"birth_date": "1991-08-30",
"phone": "13726539876",
"email": "yuchen.liang@xfinance.com",
"join_date": "2021-01-04",
"location": "南京",
"position": "行政运营经理",
"grade": "M1",
"organization_unit_code": "OPS-ADMIN",
"manager_employee_no": "E10018",
"finance_owner_name": "行政财务BP",
"cost_center": "CC-7201",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 17:44",
"last_sync_at": "2026-05-06 17:44",
"role_codes": ["user", "approver"],
},
{
"employee_no": "E12661",
"name": "顾承宇",
"gender": "",
"birth_date": "1988-04-16",
"phone": "13926614528",
"email": "chengyu.gu@xfinance.com",
"join_date": "2020-02-03",
"location": "上海",
"position": "风控审计经理",
"grade": "M2",
"organization_unit_code": "AUDIT-RISK",
"manager_employee_no": "E10018",
"finance_owner_name": "集团财务",
"cost_center": "CC-8102",
"employment_status": "在职",
"sync_state": "待生效",
"spotlight": True,
"updated_at": "2026-05-07 09:52",
"last_sync_at": "2026-05-07 09:52",
"role_codes": ["auditor", "finance"],
"history": [
{
"action": "更新审计观察范围",
"owner": "系统管理员 · 张晓晴",
"occurred_at": "2026-05-07 09:52",
},
{
"action": "补充高风险费用抽样规则",
"owner": "审计管理员 · 王敏",
"occurred_at": "2026-05-06 18:30",
},
],
},
{
"employee_no": "E12679",
"name": "郑若彤",
"gender": "",
"birth_date": "1997-09-13",
"phone": "13626794520",
"email": "ruotong.zheng@xfinance.com",
"join_date": "2024-01-08",
"location": "上海",
"position": "审计专员",
"grade": "P4",
"organization_unit_code": "AUDIT-RISK",
"manager_employee_no": "E12661",
"finance_owner_name": "集团财务",
"cost_center": "CC-8105",
"employment_status": "在职",
"sync_state": "同步中",
"spotlight": False,
"updated_at": "2026-05-07 08:58",
"last_sync_at": "2026-05-07 08:40",
"role_codes": ["auditor"],
},
{
"employee_no": "E12688",
"name": "方逸晨",
"gender": "",
"birth_date": "1995-01-20",
"phone": "13526881142",
"email": "yichen.fang@xfinance.com",
"join_date": "2023-08-14",
"location": "南京",
"position": "采购合规分析师",
"grade": "P4",
"organization_unit_code": "OPS-ADMIN",
"manager_employee_no": "E12653",
"finance_owner_name": "行政财务BP",
"cost_center": "CC-7208",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-03 14:16",
"last_sync_at": "2026-05-03 14:16",
"role_codes": ["user", "finance"],
},
{
"employee_no": "E12067",
"name": "秦墨然",
"gender": "",
"birth_date": "1990-10-10",
"phone": "13820674519",
"email": "moran.qin@xfinance.com",
"join_date": "2020-07-20",
"location": "上海",
"position": "华东销售总监",
"grade": "M2",
"organization_unit_code": "SALES-EAST",
"manager_employee_no": "E10018",
"finance_owner_name": "华东财务组",
"cost_center": "CC-4108",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 12:40",
"last_sync_at": "2026-05-06 12:40",
"role_codes": ["user", "approver"],
},
{
"employee_no": "E12703",
"name": "宋知夏",
"gender": "",
"birth_date": "1994-07-07",
"phone": "13727031129",
"email": "zhixia.song@xfinance.com",
"join_date": "2022-12-12",
"location": "上海",
"position": "重点客户经理",
"grade": "P5",
"organization_unit_code": "SALES-EAST",
"manager_employee_no": "E12067",
"finance_owner_name": "华东财务组",
"cost_center": "CC-4111",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-04 10:58",
"last_sync_at": "2026-05-04 10:58",
"role_codes": ["user"],
},
{
"employee_no": "E12716",
"name": "杜嘉宁",
"gender": "",
"birth_date": "1999-11-16",
"phone": "13627161248",
"email": "jianing.du@xfinance.com",
"join_date": "2026-01-19",
"location": "上海",
"position": "销售代表",
"grade": "P3",
"organization_unit_code": "SALES-EAST",
"manager_employee_no": "E12067",
"finance_owner_name": "华东财务组",
"cost_center": "CC-4114",
"employment_status": "试用中",
"sync_state": "待生效",
"spotlight": False,
"updated_at": "2026-05-05 12:26",
"last_sync_at": "2026-05-05 12:26",
"role_codes": ["user"],
},
{
"employee_no": "E12722",
"name": "邵宁远",
"gender": "",
"birth_date": "1998-12-01",
"phone": "13527221506",
"email": "ningyuan.shao@xfinance.com",
"join_date": "2026-02-08",
"location": "北京",
"position": "数据分析师",
"grade": "P4",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6122",
"employment_status": "试用中",
"sync_state": "同步中",
"spotlight": False,
"updated_at": "2026-05-07 09:06",
"last_sync_at": "2026-05-07 08:55",
"role_codes": ["user"],
},
{
"employee_no": "E12739",
"name": "林可昕",
"gender": "",
"birth_date": "1996-10-23",
"phone": "13827394510",
"email": "kexin.lin@xfinance.com",
"join_date": "2023-04-17",
"location": "上海",
"position": "费用核算专员",
"grade": "P4",
"organization_unit_code": "FIN-SSC",
"manager_employee_no": "E10234",
"finance_owner_name": "华东财务组",
"cost_center": "CC-2118",
"employment_status": "停用",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-04-30 18:05",
"last_sync_at": "2026-04-30 18:05",
"role_codes": ["finance"],
},
{
"employee_no": "E12744",
"name": "赵予安",
"gender": "",
"birth_date": "1993-01-30",
"phone": "13727442139",
"email": "yuan.zhao@xfinance.com",
"join_date": "2021-10-11",
"location": "上海",
"position": "预算控制经理",
"grade": "M1",
"organization_unit_code": "FIN-SSC",
"manager_employee_no": "E10234",
"finance_owner_name": "集团财务",
"cost_center": "CC-2120",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 15:34",
"last_sync_at": "2026-05-06 15:34",
"role_codes": ["finance", "approver"],
},
{
"employee_no": "E12750",
"name": "谢知行",
"gender": "",
"birth_date": "1995-09-14",
"phone": "13627501386",
"email": "zhixing.xie@xfinance.com",
"join_date": "2022-07-25",
"location": "深圳",
"position": "渠道销售经理",
"grade": "P5",
"organization_unit_code": "SALES-SOUTH",
"manager_employee_no": "E11602",
"finance_owner_name": "华南财务组",
"cost_center": "CC-4116",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-04 09:48",
"last_sync_at": "2026-05-04 09:48",
"role_codes": ["user"],
},
{
"employee_no": "E12758",
"name": "顾南枝",
"gender": "",
"birth_date": "1994-04-12",
"phone": "13827584522",
"email": "nanzhi.gu@xfinance.com",
"join_date": "2022-05-09",
"location": "北京",
"position": "内容运营经理",
"grade": "P5",
"organization_unit_code": "MKT-BRAND",
"manager_employee_no": "E11026",
"finance_owner_name": "市场财务BP",
"cost_center": "CC-5211",
"employment_status": "在职",
"sync_state": "同步中",
"spotlight": False,
"updated_at": "2026-05-07 11:08",
"last_sync_at": "2026-05-07 10:50",
"role_codes": ["user"],
},
{
"employee_no": "E12763",
"name": "孟书言",
"gender": "",
"birth_date": "1992-02-09",
"phone": "13527633148",
"email": "shuyan.meng@xfinance.com",
"join_date": "2021-06-28",
"location": "北京",
"position": "架构工程师",
"grade": "P6",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6125",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 19:05",
"last_sync_at": "2026-05-06 19:05",
"role_codes": ["user"],
},
{
"employee_no": "E12771",
"name": "孔令谦",
"gender": "",
"birth_date": "1993-07-18",
"phone": "13627711572",
"email": "lingqian.kong@xfinance.com",
"join_date": "2021-09-13",
"location": "南京",
"position": "供应商管理专员",
"grade": "P4",
"organization_unit_code": "OPS-ADMIN",
"manager_employee_no": "E12653",
"finance_owner_name": "行政财务BP",
"cost_center": "CC-7210",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-02 17:22",
"last_sync_at": "2026-05-02 17:22",
"role_codes": ["user"],
},
{
"employee_no": "E12782",
"name": "乔语岚",
"gender": "",
"birth_date": "1996-05-06",
"phone": "13727823045",
"email": "yulan.qiao@xfinance.com",
"join_date": "2023-03-06",
"location": "上海",
"position": "风控策略分析师",
"grade": "P4",
"organization_unit_code": "AUDIT-RISK",
"manager_employee_no": "E12661",
"finance_owner_name": "集团财务",
"cost_center": "CC-8108",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-03 13:18",
"last_sync_at": "2026-05-03 13:18",
"role_codes": ["auditor"],
},
{
"employee_no": "E12790",
"name": "邹闻韬",
"gender": "",
"birth_date": "1991-03-11",
"phone": "13827903167",
"email": "wentao.zou@xfinance.com",
"join_date": "2020-10-26",
"location": "上海",
"position": "合规产品负责人",
"grade": "P7",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6128",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 08:56",
"last_sync_at": "2026-05-06 08:56",
"role_codes": ["user", "auditor"],
},
EMPLOYEE_DEFINITIONS = EMPLOYEE_DEFINITIONS_PART_1 + EMPLOYEE_DEFINITIONS_PART_2
__all__ = [
"ROLE_DISPLAY_ORDER",
"ROLE_DEFINITIONS",
"ROLE_PERMISSION_MAP",
"ORGANIZATION_DEFINITIONS",
"EMPLOYEE_PROFILE_REPAIRS",
"EMPLOYEE_DEFINITIONS",
]

View File

@@ -0,0 +1,112 @@
from __future__ import annotations
ORGANIZATION_DEFINITIONS = [
{
"unit_code": "ORG-ROOT",
"name": "星海科技",
"unit_type": "company",
"parent_code": None,
"cost_center": "CC-0000",
"location": "上海",
"manager_name": "李文静",
},
{
"unit_code": "EXEC-OFFICE",
"name": "总经办",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-1001",
"location": "上海",
"manager_name": "李文静",
},
{
"unit_code": "FIN-SSC",
"name": "财务共享中心",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-2108",
"location": "上海",
"manager_name": "张晓晴",
},
{
"unit_code": "HR-OD",
"name": "人力与组织",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-3206",
"location": "杭州",
"manager_name": "陈硕",
},
{
"unit_code": "SALES-SOUTH",
"name": "华南销售部",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-4102",
"location": "深圳",
"manager_name": "陈嘉",
},
{
"unit_code": "SALES-EAST",
"name": "华东销售部",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-4108",
"location": "上海",
"manager_name": "秦墨然",
},
{
"unit_code": "MKT-BRAND",
"name": "市场品牌部",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-5203",
"location": "北京",
"manager_name": "刘思雨",
},
{
"unit_code": "RND-CENTER",
"name": "产品研发中心",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-6105",
"location": "北京",
"manager_name": "吴磊",
},
{
"unit_code": "OPS-ADMIN",
"name": "行政采购部",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-7204",
"location": "南京",
"manager_name": "梁雨辰",
},
{
"unit_code": "AUDIT-RISK",
"name": "风控与审计部",
"unit_type": "department",
"parent_code": "ORG-ROOT",
"cost_center": "CC-8102",
"location": "上海",
"manager_name": "顾承宇",
},
]
EMPLOYEE_PROFILE_REPAIRS = [
{
"employee_no": "E90919",
"name": "曹笑竹",
"email": "caoxiaozhu@xf.com",
"location": "武汉",
"position": "财务智能化产品经理",
"grade": "P5",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6112",
"employment_status": "在职",
"sync_state": "已同步",
"role_codes": ["user"],
},
]

View File

@@ -0,0 +1,434 @@
from __future__ import annotations
EMPLOYEE_DEFINITIONS_PART_1 = [
{
"employee_no": "E10018",
"name": "李文静",
"gender": "",
"birth_date": "1987-03-26",
"phone": "13900187688",
"email": "wenjing.li@xfinance.com",
"join_date": "2018-06-21",
"location": "上海",
"position": "高级财务总监",
"grade": "D2",
"organization_unit_code": "EXEC-OFFICE",
"manager_employee_no": None,
"finance_owner_name": "集团财务",
"cost_center": "CC-1001",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-05 16:20",
"last_sync_at": "2026-05-05 16:20",
"role_codes": ["executive", "approver"],
},
{
"employee_no": "E10234",
"name": "张晓晴",
"gender": "",
"birth_date": "1994-08-12",
"phone": "13810234567",
"email": "xiaoqing.zhang@xfinance.com",
"join_date": "2021-03-15",
"location": "上海",
"position": "费用运营经理",
"grade": "M3",
"organization_unit_code": "FIN-SSC",
"manager_employee_no": "E10018",
"finance_owner_name": "华东财务组",
"cost_center": "CC-2108",
"employment_status": "在职",
"sync_state": "待生效",
"spotlight": True,
"updated_at": "2026-05-06 10:24",
"last_sync_at": "2026-05-06 10:24",
"role_codes": ["manager", "finance", "approver"],
"history": [
{
"action": "新增“审批负责人”角色",
"owner": "系统管理员 · 王敏",
"occurred_at": "2026-05-06 10:24",
},
{
"action": "调整财务归口为华东财务组",
"owner": "组织管理员 · 陈硕",
"occurred_at": "2026-05-05 18:10",
},
],
},
{
"employee_no": "E10258",
"name": "孙楠",
"gender": "",
"birth_date": "1992-09-17",
"phone": "13722580312",
"email": "nan.sun@xfinance.com",
"join_date": "2020-11-09",
"location": "上海",
"position": "财务分析师",
"grade": "P5",
"organization_unit_code": "FIN-SSC",
"manager_employee_no": "E10234",
"finance_owner_name": "华东财务组",
"cost_center": "CC-2111",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-04 15:18",
"last_sync_at": "2026-05-04 15:18",
"role_codes": ["finance"],
},
{
"employee_no": "E10271",
"name": "周悦宁",
"gender": "",
"birth_date": "1993-04-21",
"phone": "13622711986",
"email": "yuening.zhou@xfinance.com",
"join_date": "2021-07-05",
"location": "上海",
"position": "财务系统专员",
"grade": "P5",
"organization_unit_code": "FIN-SSC",
"manager_employee_no": "E10234",
"finance_owner_name": "华东财务组",
"cost_center": "CC-2112",
"employment_status": "在职",
"sync_state": "同步中",
"spotlight": False,
"updated_at": "2026-05-07 09:35",
"last_sync_at": "2026-05-07 09:10",
"role_codes": ["finance", "auditor"],
},
{
"employee_no": "E10289",
"name": "高嘉禾",
"gender": "",
"birth_date": "1996-02-14",
"phone": "13522895642",
"email": "jiahe.gao@xfinance.com",
"join_date": "2023-01-10",
"location": "上海",
"position": "差旅合规专员",
"grade": "P4",
"organization_unit_code": "FIN-SSC",
"manager_employee_no": "E10234",
"finance_owner_name": "华东财务组",
"cost_center": "CC-2115",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-03 11:42",
"last_sync_at": "2026-05-03 11:42",
"role_codes": ["finance"],
},
{
"employee_no": "E10867",
"name": "王敏",
"gender": "",
"birth_date": "1996-11-05",
"phone": "13688671200",
"email": "min.wang@xfinance.com",
"join_date": "2022-08-08",
"location": "杭州",
"position": "组织发展主管",
"grade": "P6",
"organization_unit_code": "HR-OD",
"manager_employee_no": "E11618",
"finance_owner_name": "总部财务BP",
"cost_center": "CC-3206",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-05 09:18",
"last_sync_at": "2026-05-05 09:18",
"role_codes": ["manager", "auditor"],
},
{
"employee_no": "E11618",
"name": "陈硕",
"gender": "",
"birth_date": "1990-05-09",
"phone": "13816186540",
"email": "shuo.chen@xfinance.com",
"join_date": "2019-09-16",
"location": "杭州",
"position": "人力资源经理",
"grade": "M2",
"organization_unit_code": "HR-OD",
"manager_employee_no": "E10018",
"finance_owner_name": "总部财务BP",
"cost_center": "CC-3201",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-04 17:08",
"last_sync_at": "2026-05-04 17:08",
"role_codes": ["manager", "approver"],
},
{
"employee_no": "E12311",
"name": "何思成",
"gender": "",
"birth_date": "1998-07-19",
"phone": "13723117654",
"email": "sicheng.he@xfinance.com",
"join_date": "2026-02-17",
"location": "杭州",
"position": "HRBP",
"grade": "P4",
"organization_unit_code": "HR-OD",
"manager_employee_no": "E11618",
"finance_owner_name": "总部财务BP",
"cost_center": "CC-3208",
"employment_status": "试用中",
"sync_state": "待生效",
"spotlight": False,
"updated_at": "2026-05-07 08:42",
"last_sync_at": "2026-05-07 08:42",
"role_codes": ["user"],
},
{
"employee_no": "E11026",
"name": "刘思雨",
"gender": "",
"birth_date": "1991-12-03",
"phone": "13921036540",
"email": "siyu.liu@xfinance.com",
"join_date": "2020-04-13",
"location": "北京",
"position": "品牌市场经理",
"grade": "M2",
"organization_unit_code": "MKT-BRAND",
"manager_employee_no": "E10018",
"finance_owner_name": "市场财务BP",
"cost_center": "CC-5203",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 14:36",
"last_sync_at": "2026-05-06 14:36",
"role_codes": ["user", "approver"],
},
{
"employee_no": "E12408",
"name": "冯可欣",
"gender": "",
"birth_date": "1997-10-28",
"phone": "13624085542",
"email": "kexin.feng@xfinance.com",
"join_date": "2024-03-11",
"location": "北京",
"position": "品牌策划",
"grade": "P4",
"organization_unit_code": "MKT-BRAND",
"manager_employee_no": "E11026",
"finance_owner_name": "市场财务BP",
"cost_center": "CC-5207",
"employment_status": "在职",
"sync_state": "同步中",
"spotlight": False,
"updated_at": "2026-05-07 10:02",
"last_sync_at": "2026-05-07 09:48",
"role_codes": ["user"],
},
{
"employee_no": "E12419",
"name": "许泽航",
"gender": "",
"birth_date": "1995-05-15",
"phone": "13524199508",
"email": "zehang.xu@xfinance.com",
"join_date": "2023-06-19",
"location": "北京",
"position": "数字营销专员",
"grade": "P4",
"organization_unit_code": "MKT-BRAND",
"manager_employee_no": "E11026",
"finance_owner_name": "市场财务BP",
"cost_center": "CC-5209",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-03 16:52",
"last_sync_at": "2026-05-03 16:52",
"role_codes": ["user"],
},
{
"employee_no": "E11602",
"name": "陈嘉",
"gender": "",
"birth_date": "1997-02-18",
"phone": "13716029901",
"email": "jia.chen@xfinance.com",
"join_date": "2026-03-01",
"location": "深圳",
"position": "区域销售经理",
"grade": "M2",
"organization_unit_code": "SALES-SOUTH",
"manager_employee_no": "E10018",
"finance_owner_name": "华南财务组",
"cost_center": "CC-4102",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-04 14:12",
"last_sync_at": "2026-05-04 14:12",
"role_codes": ["user", "approver"],
},
{
"employee_no": "E12476",
"name": "马骁然",
"gender": "",
"birth_date": "1994-01-08",
"phone": "13824760139",
"email": "xiaoran.ma@xfinance.com",
"join_date": "2022-09-05",
"location": "深圳",
"position": "销售运营专家",
"grade": "P5",
"organization_unit_code": "SALES-SOUTH",
"manager_employee_no": "E11602",
"finance_owner_name": "华南财务组",
"cost_center": "CC-4106",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 18:15",
"last_sync_at": "2026-05-06 18:15",
"role_codes": ["user"],
},
{
"employee_no": "E12508",
"name": "唐子墨",
"gender": "",
"birth_date": "1996-06-11",
"phone": "13925088761",
"email": "zimo.tang@xfinance.com",
"join_date": "2024-02-26",
"location": "深圳",
"position": "大客户代表",
"grade": "P4",
"organization_unit_code": "SALES-SOUTH",
"manager_employee_no": "E11602",
"finance_owner_name": "华南财务组",
"cost_center": "CC-4109",
"employment_status": "停用",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-01 11:06",
"last_sync_at": "2026-05-01 11:06",
"role_codes": ["user"],
},
{
"employee_no": "E12514",
"name": "罗欣怡",
"gender": "",
"birth_date": "2000-03-02",
"phone": "13625141227",
"email": "xinyi.luo@xfinance.com",
"join_date": "2026-02-24",
"location": "深圳",
"position": "销售协调专员",
"grade": "P3",
"organization_unit_code": "SALES-SOUTH",
"manager_employee_no": "E11602",
"finance_owner_name": "华南财务组",
"cost_center": "CC-4112",
"employment_status": "试用中",
"sync_state": "待生效",
"spotlight": False,
"updated_at": "2026-05-05 15:42",
"last_sync_at": "2026-05-05 15:42",
"role_codes": ["user"],
},
{
"employee_no": "E11745",
"name": "吴磊",
"gender": "",
"birth_date": "1989-09-27",
"phone": "13817459812",
"email": "lei.wu@xfinance.com",
"join_date": "2019-12-09",
"location": "北京",
"position": "研发平台主管",
"grade": "M3",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E10018",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6105",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 13:08",
"last_sync_at": "2026-05-06 13:08",
"role_codes": ["user", "approver", "auditor"],
},
{
"employee_no": "E11991",
"name": "赵明",
"gender": "",
"birth_date": "1994-06-09",
"phone": "13519913300",
"email": "ming.zhao@xfinance.com",
"join_date": "2023-11-18",
"location": "北京",
"position": "产品经理",
"grade": "P5",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6112",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-02 11:32",
"last_sync_at": "2026-05-02 11:32",
"role_codes": ["user"],
},
{
"employee_no": "E12611",
"name": "彭一凡",
"gender": "",
"birth_date": "1995-02-03",
"phone": "13726114588",
"email": "yifan.peng@xfinance.com",
"join_date": "2022-04-18",
"location": "北京",
"position": "后端工程师",
"grade": "P5",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6114",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 09:44",
"last_sync_at": "2026-05-06 09:44",
"role_codes": ["user"],
},
{
"employee_no": "E12618",
"name": "苏清禾",
"gender": "",
"birth_date": "1994-12-25",
"phone": "13626188763",
"email": "qinghe.su@xfinance.com",
"join_date": "2022-05-16",
"location": "北京",
"position": "数据工程师",
"grade": "P5",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6116",
"employment_status": "在职",
"sync_state": "同步中",
"spotlight": False,
"updated_at": "2026-05-07 10:26",
"last_sync_at": "2026-05-07 10:18",
"role_codes": ["user"],
},
]

View File

@@ -0,0 +1,412 @@
from __future__ import annotations
EMPLOYEE_DEFINITIONS_PART_2 = [
{
"employee_no": "E12624",
"name": "沈知远",
"gender": "",
"birth_date": "1992-11-06",
"phone": "13926241855",
"email": "zhiyuan.shen@xfinance.com",
"join_date": "2021-11-22",
"location": "北京",
"position": "测试负责人",
"grade": "P6",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6119",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-05 13:12",
"last_sync_at": "2026-05-05 13:12",
"role_codes": ["user"],
},
{
"employee_no": "E11852",
"name": "周晓彤",
"gender": "",
"birth_date": "1997-05-27",
"phone": "13818529954",
"email": "xiaotong.zhou@xfinance.com",
"join_date": "2022-06-30",
"location": "南京",
"position": "行政采购专员",
"grade": "P4",
"organization_unit_code": "OPS-ADMIN",
"manager_employee_no": "E12653",
"finance_owner_name": "行政财务BP",
"cost_center": "CC-7204",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-05 11:22",
"last_sync_at": "2026-05-05 11:22",
"role_codes": ["user"],
},
{
"employee_no": "E12653",
"name": "梁雨辰",
"gender": "",
"birth_date": "1991-08-30",
"phone": "13726539876",
"email": "yuchen.liang@xfinance.com",
"join_date": "2021-01-04",
"location": "南京",
"position": "行政运营经理",
"grade": "M1",
"organization_unit_code": "OPS-ADMIN",
"manager_employee_no": "E10018",
"finance_owner_name": "行政财务BP",
"cost_center": "CC-7201",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 17:44",
"last_sync_at": "2026-05-06 17:44",
"role_codes": ["user", "approver"],
},
{
"employee_no": "E12661",
"name": "顾承宇",
"gender": "",
"birth_date": "1988-04-16",
"phone": "13926614528",
"email": "chengyu.gu@xfinance.com",
"join_date": "2020-02-03",
"location": "上海",
"position": "风控审计经理",
"grade": "M2",
"organization_unit_code": "AUDIT-RISK",
"manager_employee_no": "E10018",
"finance_owner_name": "集团财务",
"cost_center": "CC-8102",
"employment_status": "在职",
"sync_state": "待生效",
"spotlight": True,
"updated_at": "2026-05-07 09:52",
"last_sync_at": "2026-05-07 09:52",
"role_codes": ["auditor", "finance"],
"history": [
{
"action": "更新审计观察范围",
"owner": "系统管理员 · 张晓晴",
"occurred_at": "2026-05-07 09:52",
},
{
"action": "补充高风险费用抽样规则",
"owner": "审计管理员 · 王敏",
"occurred_at": "2026-05-06 18:30",
},
],
},
{
"employee_no": "E12679",
"name": "郑若彤",
"gender": "",
"birth_date": "1997-09-13",
"phone": "13626794520",
"email": "ruotong.zheng@xfinance.com",
"join_date": "2024-01-08",
"location": "上海",
"position": "审计专员",
"grade": "P4",
"organization_unit_code": "AUDIT-RISK",
"manager_employee_no": "E12661",
"finance_owner_name": "集团财务",
"cost_center": "CC-8105",
"employment_status": "在职",
"sync_state": "同步中",
"spotlight": False,
"updated_at": "2026-05-07 08:58",
"last_sync_at": "2026-05-07 08:40",
"role_codes": ["auditor"],
},
{
"employee_no": "E12688",
"name": "方逸晨",
"gender": "",
"birth_date": "1995-01-20",
"phone": "13526881142",
"email": "yichen.fang@xfinance.com",
"join_date": "2023-08-14",
"location": "南京",
"position": "采购合规分析师",
"grade": "P4",
"organization_unit_code": "OPS-ADMIN",
"manager_employee_no": "E12653",
"finance_owner_name": "行政财务BP",
"cost_center": "CC-7208",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-03 14:16",
"last_sync_at": "2026-05-03 14:16",
"role_codes": ["user", "finance"],
},
{
"employee_no": "E12067",
"name": "秦墨然",
"gender": "",
"birth_date": "1990-10-10",
"phone": "13820674519",
"email": "moran.qin@xfinance.com",
"join_date": "2020-07-20",
"location": "上海",
"position": "华东销售总监",
"grade": "M2",
"organization_unit_code": "SALES-EAST",
"manager_employee_no": "E10018",
"finance_owner_name": "华东财务组",
"cost_center": "CC-4108",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 12:40",
"last_sync_at": "2026-05-06 12:40",
"role_codes": ["user", "approver"],
},
{
"employee_no": "E12703",
"name": "宋知夏",
"gender": "",
"birth_date": "1994-07-07",
"phone": "13727031129",
"email": "zhixia.song@xfinance.com",
"join_date": "2022-12-12",
"location": "上海",
"position": "重点客户经理",
"grade": "P5",
"organization_unit_code": "SALES-EAST",
"manager_employee_no": "E12067",
"finance_owner_name": "华东财务组",
"cost_center": "CC-4111",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-04 10:58",
"last_sync_at": "2026-05-04 10:58",
"role_codes": ["user"],
},
{
"employee_no": "E12716",
"name": "杜嘉宁",
"gender": "",
"birth_date": "1999-11-16",
"phone": "13627161248",
"email": "jianing.du@xfinance.com",
"join_date": "2026-01-19",
"location": "上海",
"position": "销售代表",
"grade": "P3",
"organization_unit_code": "SALES-EAST",
"manager_employee_no": "E12067",
"finance_owner_name": "华东财务组",
"cost_center": "CC-4114",
"employment_status": "试用中",
"sync_state": "待生效",
"spotlight": False,
"updated_at": "2026-05-05 12:26",
"last_sync_at": "2026-05-05 12:26",
"role_codes": ["user"],
},
{
"employee_no": "E12722",
"name": "邵宁远",
"gender": "",
"birth_date": "1998-12-01",
"phone": "13527221506",
"email": "ningyuan.shao@xfinance.com",
"join_date": "2026-02-08",
"location": "北京",
"position": "数据分析师",
"grade": "P4",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6122",
"employment_status": "试用中",
"sync_state": "同步中",
"spotlight": False,
"updated_at": "2026-05-07 09:06",
"last_sync_at": "2026-05-07 08:55",
"role_codes": ["user"],
},
{
"employee_no": "E12739",
"name": "林可昕",
"gender": "",
"birth_date": "1996-10-23",
"phone": "13827394510",
"email": "kexin.lin@xfinance.com",
"join_date": "2023-04-17",
"location": "上海",
"position": "费用核算专员",
"grade": "P4",
"organization_unit_code": "FIN-SSC",
"manager_employee_no": "E10234",
"finance_owner_name": "华东财务组",
"cost_center": "CC-2118",
"employment_status": "停用",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-04-30 18:05",
"last_sync_at": "2026-04-30 18:05",
"role_codes": ["finance"],
},
{
"employee_no": "E12744",
"name": "赵予安",
"gender": "",
"birth_date": "1993-01-30",
"phone": "13727442139",
"email": "yuan.zhao@xfinance.com",
"join_date": "2021-10-11",
"location": "上海",
"position": "预算控制经理",
"grade": "M1",
"organization_unit_code": "FIN-SSC",
"manager_employee_no": "E10234",
"finance_owner_name": "集团财务",
"cost_center": "CC-2120",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 15:34",
"last_sync_at": "2026-05-06 15:34",
"role_codes": ["finance", "approver"],
},
{
"employee_no": "E12750",
"name": "谢知行",
"gender": "",
"birth_date": "1995-09-14",
"phone": "13627501386",
"email": "zhixing.xie@xfinance.com",
"join_date": "2022-07-25",
"location": "深圳",
"position": "渠道销售经理",
"grade": "P5",
"organization_unit_code": "SALES-SOUTH",
"manager_employee_no": "E11602",
"finance_owner_name": "华南财务组",
"cost_center": "CC-4116",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-04 09:48",
"last_sync_at": "2026-05-04 09:48",
"role_codes": ["user"],
},
{
"employee_no": "E12758",
"name": "顾南枝",
"gender": "",
"birth_date": "1994-04-12",
"phone": "13827584522",
"email": "nanzhi.gu@xfinance.com",
"join_date": "2022-05-09",
"location": "北京",
"position": "内容运营经理",
"grade": "P5",
"organization_unit_code": "MKT-BRAND",
"manager_employee_no": "E11026",
"finance_owner_name": "市场财务BP",
"cost_center": "CC-5211",
"employment_status": "在职",
"sync_state": "同步中",
"spotlight": False,
"updated_at": "2026-05-07 11:08",
"last_sync_at": "2026-05-07 10:50",
"role_codes": ["user"],
},
{
"employee_no": "E12763",
"name": "孟书言",
"gender": "",
"birth_date": "1992-02-09",
"phone": "13527633148",
"email": "shuyan.meng@xfinance.com",
"join_date": "2021-06-28",
"location": "北京",
"position": "架构工程师",
"grade": "P6",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6125",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 19:05",
"last_sync_at": "2026-05-06 19:05",
"role_codes": ["user"],
},
{
"employee_no": "E12771",
"name": "孔令谦",
"gender": "",
"birth_date": "1993-07-18",
"phone": "13627711572",
"email": "lingqian.kong@xfinance.com",
"join_date": "2021-09-13",
"location": "南京",
"position": "供应商管理专员",
"grade": "P4",
"organization_unit_code": "OPS-ADMIN",
"manager_employee_no": "E12653",
"finance_owner_name": "行政财务BP",
"cost_center": "CC-7210",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-02 17:22",
"last_sync_at": "2026-05-02 17:22",
"role_codes": ["user"],
},
{
"employee_no": "E12782",
"name": "乔语岚",
"gender": "",
"birth_date": "1996-05-06",
"phone": "13727823045",
"email": "yulan.qiao@xfinance.com",
"join_date": "2023-03-06",
"location": "上海",
"position": "风控策略分析师",
"grade": "P4",
"organization_unit_code": "AUDIT-RISK",
"manager_employee_no": "E12661",
"finance_owner_name": "集团财务",
"cost_center": "CC-8108",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-03 13:18",
"last_sync_at": "2026-05-03 13:18",
"role_codes": ["auditor"],
},
{
"employee_no": "E12790",
"name": "邹闻韬",
"gender": "",
"birth_date": "1991-03-11",
"phone": "13827903167",
"email": "wentao.zou@xfinance.com",
"join_date": "2020-10-26",
"location": "上海",
"position": "合规产品负责人",
"grade": "P7",
"organization_unit_code": "RND-CENTER",
"manager_employee_no": "E11745",
"finance_owner_name": "研发财务BP",
"cost_center": "CC-6128",
"employment_status": "在职",
"sync_state": "已同步",
"spotlight": False,
"updated_at": "2026-05-06 08:56",
"last_sync_at": "2026-05-06 08:56",
"role_codes": ["user", "auditor"],
},
]

View File

@@ -0,0 +1,52 @@
from __future__ import annotations
ROLE_DISPLAY_ORDER = {
"manager": 1,
"finance": 2,
"approver": 3,
"executive": 4,
"auditor": 5,
"user": 6,
}
ROLE_DEFINITIONS = [
{
"role_code": "user",
"name": "使用者",
"description": "可以发起报销、查看个人单据和使用 AI 助手。",
},
{
"role_code": "finance",
"name": "财务人员",
"description": "可以处理复核、查看财务知识与风险校验结果。",
},
{
"role_code": "manager",
"name": "管理员",
"description": "可以维护员工档案、组织结构和角色权限。",
},
{
"role_code": "executive",
"name": "高级管理人员",
"description": "可以查看跨部门数据看板与关键审批结果。",
},
{
"role_code": "approver",
"name": "审批负责人",
"description": "可以处理审批中心中的待审单据。",
},
{
"role_code": "auditor",
"name": "审计观察员",
"description": "可以查看变更记录和权限调整历史。",
},
]
ROLE_PERMISSION_MAP = {
"user": ["可发起差旅申请与报销", "可查看个人单据与票据识别结果"],
"finance": ["可处理财务复核任务", "可查看风险校验与财务知识库"],
"manager": ["可维护员工档案与组织结构", "可配置系统角色与访问边界"],
"executive": ["可查看跨部门经营看板", "可处理高金额报销最终审批"],
"approver": ["可处理本部门待审单据", "可查看审批链路与 SLA 状态"],
"auditor": ["可查看权限变更与审计留痕", "可导出员工权限观察记录"],
}

View File

@@ -0,0 +1,126 @@
from __future__ import annotations
from collections.abc import Callable
from datetime import date, datetime
from app.models.employee import Employee
from app.models.employee_change_log import EmployeeChangeLog
from app.models.role import Role
from app.schemas.employee import (
EmployeeHistoryRead,
EmployeeOrganizationRead,
EmployeeRead,
)
def serialize_employee(
employee: Employee,
*,
sorted_roles: list[Role],
sorted_change_logs: list[EmployeeChangeLog],
format_date: Callable[[date | None], str | None],
format_datetime: Callable[[datetime | None], str | None],
format_history_datetime: Callable[[datetime | None], str],
role_permission_map: dict[str, list[str]],
status_tone_map: dict[str, str],
max_change_logs: int,
) -> EmployeeRead:
organization = employee.organization_unit
role_labels = [role.name for role in sorted_roles]
role_codes = [role.role_code for role in sorted_roles]
history = [
EmployeeHistoryRead(
action=item.action,
owner=item.owner,
time=format_history_datetime(item.occurred_at),
occurredAt=format_history_datetime(item.occurred_at),
)
for item in sorted_change_logs[:max_change_logs]
]
return EmployeeRead(
id=employee.id,
avatar=(employee.name or "?")[:1],
name=employee.name,
employeeNo=employee.employee_no,
department=organization.name if organization else "",
position=employee.position,
grade=employee.grade,
manager=employee.manager.name if employee.manager else "CEO",
managerEmployeeNo=employee.manager.employee_no if employee.manager else None,
financeOwner=employee.finance_owner_name or "",
roles=role_labels,
roleCodes=role_codes,
status=employee.employment_status,
statusTone=status_tone_map.get(employee.employment_status, "neutral"),
gender=employee.gender,
age=calculate_age(employee.birth_date),
birthDate=format_date(employee.birth_date),
email=employee.email,
phone=employee.phone,
joinDate=format_date(employee.join_date),
location=employee.location,
costCenter=employee.cost_center,
updatedAt=format_datetime(employee.updated_at or employee.created_at),
lastSync=format_datetime(employee.last_sync_at),
syncState=employee.sync_state,
spotlight=employee.spotlight,
permissions=collect_permissions(role_codes, role_permission_map),
history=history,
organization=(
EmployeeOrganizationRead(
id=organization.id,
code=organization.unit_code,
name=organization.name,
unitType=organization.unit_type,
costCenter=organization.cost_center,
location=organization.location,
managerName=organization.manager_name,
)
if organization
else None
),
)
def collect_permissions(
role_codes: list[str],
role_permission_map: dict[str, list[str]],
) -> list[str]:
permissions: list[str] = []
seen: set[str] = set()
for role_code in role_codes:
for permission in role_permission_map.get(role_code, []):
if permission in seen:
continue
permissions.append(permission)
seen.add(permission)
return permissions
def format_history_datetime(
value: datetime | None,
*,
to_display_datetime: Callable[[datetime], datetime],
) -> str:
if value is None:
return ""
local = to_display_datetime(value)
return (
f"{local.year}{local.month}{local.day}"
f"{local.hour}{local.minute}"
)
def calculate_age(birth_date: date | None) -> int | None:
if birth_date is None:
return None
today = date.today()
age = today.year - birth_date.year
if (today.month, today.day) < (birth_date.month, birth_date.day):
age -= 1
return age

View File

@@ -0,0 +1,368 @@
from __future__ import annotations
from dataclasses import dataclass
from datetime import date, datetime
from email.utils import parseaddr
from io import BytesIO
from typing import Any
from openpyxl import Workbook, load_workbook
EMPLOYEE_SHEET_NAME = "员工目录"
INSTRUCTION_SHEET_NAME = "填表说明"
EMPLOYEE_HEADERS: tuple[str, ...] = (
"员工编号*",
"姓名*",
"邮箱*",
"性别",
"出生日期",
"手机号",
"入职日期",
"办公地点",
"岗位*",
"职级*",
"部门编码",
"直属上级工号",
"财务归口",
"成本中心",
"在职状态*",
"角色编码",
)
HEADER_TO_FIELD: dict[str, str] = {
"员工编号*": "employee_no",
"姓名*": "name",
"邮箱*": "email",
"性别": "gender",
"出生日期": "birth_date",
"手机号": "phone",
"入职日期": "join_date",
"办公地点": "location",
"岗位*": "position",
"职级*": "grade",
"部门编码": "organization_unit_code",
"直属上级工号": "manager_employee_no",
"财务归口": "finance_owner_name",
"成本中心": "cost_center",
"在职状态*": "employment_status",
"角色编码": "role_codes",
}
VALID_EMPLOYMENT_STATUSES = {"在职", "试用中", "停用"}
DEFAULT_ROLE_CODES = ("user",)
MAX_IMPORT_ROWS = 2000
MAX_IMPORT_BYTES = 5 * 1024 * 1024
@dataclass(frozen=True)
class EmployeeImportRow:
row_number: int
employee_no: str
name: str
email: str
gender: str | None
birth_date: date | None
phone: str | None
join_date: date | None
location: str | None
position: str
grade: str
organization_unit_code: str | None
manager_employee_no: str | None
finance_owner_name: str | None
cost_center: str | None
employment_status: str
role_codes: list[str]
@dataclass(frozen=True)
class EmployeeSpreadsheetError:
row: int
column: str
employee_no: str
message: str
def build_import_template_bytes() -> bytes:
workbook = Workbook()
sheet = workbook.active
sheet.title = EMPLOYEE_SHEET_NAME
sheet.append(list(EMPLOYEE_HEADERS))
instructions = workbook.create_sheet(INSTRUCTION_SHEET_NAME)
instructions.append(["字段", "说明"])
instruction_rows = [
("员工编号*", "必填,全局唯一,导入时用于判断新建或覆盖。"),
("姓名*", "必填。"),
("邮箱*", "必填,全局唯一。"),
("性别", "可选:男、女,留空表示不填写。"),
("出生日期", "可选,格式 YYYY-MM-DD。"),
("手机号", "可选。"),
("入职日期", "可选,格式 YYYY-MM-DD。"),
("办公地点", "可选。"),
("岗位*", "必填。"),
("职级*", "必填,例如 P3、P5。"),
("部门编码", "可选,须与系统组织编码一致,例如 FIN-SSC。"),
("直属上级工号", "可选,须为系统中已有员工编号,或出现在本次导入表中。"),
("财务归口", "可选。"),
("成本中心", "可选。"),
("在职状态*", "必填:在职、试用中、停用。"),
("角色编码", "可选,多个角色用英文逗号分隔,例如 user,finance留空默认为 user。"),
("导入规则", "全部校验通过后才写入数据库;任一行有错则整批不导入,原有数据保持不变。"),
]
for row in instruction_rows:
instructions.append(list(row))
buffer = BytesIO()
workbook.save(buffer)
return buffer.getvalue()
def build_export_workbook_bytes(rows: list[list[Any]]) -> bytes:
workbook = Workbook()
sheet = workbook.active
sheet.title = EMPLOYEE_SHEET_NAME
sheet.append(list(EMPLOYEE_HEADERS))
for row in rows:
sheet.append(row)
buffer = BytesIO()
workbook.save(buffer)
return buffer.getvalue()
def parse_employee_workbook(content: bytes) -> tuple[list[EmployeeImportRow], list[EmployeeSpreadsheetError]]:
errors: list[EmployeeSpreadsheetError] = []
if not content:
return [], [
EmployeeSpreadsheetError(
row=0,
column="文件",
employee_no="",
message="上传文件不能为空。",
)
]
if len(content) > MAX_IMPORT_BYTES:
return [], [
EmployeeSpreadsheetError(
row=0,
column="文件",
employee_no="",
message=f"文件大小不能超过 {MAX_IMPORT_BYTES // (1024 * 1024)}MB。",
)
]
try:
workbook = load_workbook(filename=BytesIO(content), read_only=True, data_only=True)
except Exception:
return [], [
EmployeeSpreadsheetError(
row=0,
column="文件",
employee_no="",
message="无法解析 Excel 文件,请使用系统提供的 .xlsx 模板。",
)
]
if EMPLOYEE_SHEET_NAME not in workbook.sheetnames:
return [], [
EmployeeSpreadsheetError(
row=0,
column="工作表",
employee_no="",
message=f"缺少工作表“{EMPLOYEE_SHEET_NAME}”。",
)
]
worksheet = workbook[EMPLOYEE_SHEET_NAME]
raw_rows = list(worksheet.iter_rows(values_only=True))
if not raw_rows:
return [], [
EmployeeSpreadsheetError(
row=0,
column="文件",
employee_no="",
message="Excel 中没有可导入的数据行。",
)
]
header_row = [_normalize_cell(value) for value in raw_rows[0]]
if list(header_row) != list(EMPLOYEE_HEADERS):
return [], [
EmployeeSpreadsheetError(
row=1,
column="表头",
employee_no="",
message="表头与员工导入模板不一致,请下载最新模板后重试。",
)
]
parsed_rows: list[EmployeeImportRow] = []
for index, raw_row in enumerate(raw_rows[1:], start=2):
if index - 1 > MAX_IMPORT_ROWS:
errors.append(
EmployeeSpreadsheetError(
row=index,
column="文件",
employee_no="",
message=f"单次最多导入 {MAX_IMPORT_ROWS} 行数据。",
)
)
break
if _is_empty_data_row(raw_row):
continue
row_errors, parsed = _parse_data_row(index, raw_row)
errors.extend(row_errors)
if parsed is not None:
parsed_rows.append(parsed)
if not parsed_rows and not errors:
errors.append(
EmployeeSpreadsheetError(
row=0,
column="文件",
employee_no="",
message="Excel 中没有可导入的数据行。",
)
)
return parsed_rows, errors
def _parse_data_row(
row_number: int,
raw_row: tuple[Any, ...],
) -> tuple[list[EmployeeSpreadsheetError], EmployeeImportRow | None]:
errors: list[EmployeeSpreadsheetError] = []
values = {
HEADER_TO_FIELD[header]: _normalize_cell(raw_row[index] if index < len(raw_row) else "")
for index, header in enumerate(EMPLOYEE_HEADERS)
}
employee_no = values["employee_no"]
def add_error(column: str, message: str) -> None:
errors.append(
EmployeeSpreadsheetError(
row=row_number,
column=column,
employee_no=employee_no,
message=message,
)
)
if not employee_no:
add_error("员工编号*", "员工编号不能为空。")
name = values["name"]
if not name:
add_error("姓名*", "姓名不能为空。")
email = values["email"].lower() if values["email"] else ""
if not email:
add_error("邮箱*", "邮箱不能为空。")
elif not _is_valid_email(email):
add_error("邮箱*", "邮箱格式不正确。")
position = values["position"]
if not position:
add_error("岗位*", "岗位不能为空。")
grade = values["grade"]
if not grade:
add_error("职级*", "职级不能为空。")
employment_status = values["employment_status"]
if not employment_status:
add_error("在职状态*", "在职状态不能为空。")
elif employment_status not in VALID_EMPLOYMENT_STATUSES:
add_error("在职状态*", "在职状态必须为:在职、试用中、停用。")
gender = values["gender"] or None
if gender and gender not in {"", ""}:
add_error("性别", "性别只能填写:男、女,或留空。")
birth_date, birth_error = _parse_optional_date(values["birth_date"], "出生日期")
if birth_error:
add_error("出生日期", birth_error)
join_date, join_error = _parse_optional_date(values["join_date"], "入职日期")
if join_error:
add_error("入职日期", join_error)
role_codes = _parse_role_codes(values["role_codes"])
if values["role_codes"] and not role_codes:
add_error("角色编码", "角色编码不能为空片段,多个角色请用英文逗号分隔。")
if errors:
return errors, None
return (
[],
EmployeeImportRow(
row_number=row_number,
employee_no=employee_no,
name=name,
email=email,
gender=gender,
birth_date=birth_date,
phone=values["phone"] or None,
join_date=join_date,
location=values["location"] or None,
position=position,
grade=grade,
organization_unit_code=values["organization_unit_code"] or None,
manager_employee_no=values["manager_employee_no"] or None,
finance_owner_name=values["finance_owner_name"] or None,
cost_center=values["cost_center"] or None,
employment_status=employment_status,
role_codes=role_codes or list(DEFAULT_ROLE_CODES),
),
)
def _parse_role_codes(value: str) -> list[str]:
if not value:
return []
codes = [item.strip() for item in value.replace("", ",").split(",")]
return list(dict.fromkeys(code for code in codes if code))
def _parse_optional_date(value: str, label: str) -> tuple[date | None, str | None]:
if not value:
return None, None
if isinstance(value, datetime):
return value.date(), None
if isinstance(value, date):
return value, None
text = str(value).strip()
try:
return datetime.strptime(text, "%Y-%m-%d").date(), None
except ValueError:
return None, f"{label}格式必须为 YYYY-MM-DD。"
def _is_valid_email(value: str) -> bool:
_, address = parseaddr(value)
return bool(address) and "@" in address
def _normalize_cell(value: Any) -> str:
if value is None:
return ""
if isinstance(value, datetime):
return value.strftime("%Y-%m-%d")
if isinstance(value, date):
return value.strftime("%Y-%m-%d")
return str(value).strip()
def _is_empty_data_row(raw_row: tuple[Any, ...]) -> bool:
return not any(_normalize_cell(value) for value in raw_row)

View File

@@ -0,0 +1,206 @@
from __future__ import annotations
import re
from decimal import Decimal, InvalidOperation
from typing import Any
DOCUMENT_AMOUNT_PATTERNS = (
re.compile(
r"(?:价税合计|合计金额|费用合计|总费用|费用总计|订单(?:总)?金额|支付(?:金额)?|实付(?:金额)?|实收(?:金额)?|总(?:额|计|价)|票价|金额|车费|消费金额|房费|住宿费)"
r"[:\s¥¥人民币为是]*([0-9]+(?:[.,][0-9]{1,2})?)"
),
re.compile(r"[¥¥]\s*([0-9]+(?:[.,][0-9]{1,2})?)"),
re.compile(r"([0-9]+(?:[.,][0-9]{1,2})?)\s*元"),
)
DOCUMENT_AMOUNT_FIELD_KEYS = {
"amount",
"totalamount",
"paymentamount",
"paidamount",
"actualamount",
}
DOCUMENT_AMOUNT_LABEL_TOKENS = (
"金额",
"价税合计",
"合计",
"总额",
"总计",
"票价",
"支付金额",
"实付金额",
"实收金额",
)
DOCUMENT_TEXT_AMOUNT_PATTERNS = (
r"(?:金额|价税合计|合计|小写|实收金额|支付金额|订单金额|总额|总计|总费用|费用总计|票价|房费|住宿费|餐费)[:\s¥¥人民币为是]*([0-9]{1,6}(?:[.,][0-9]{1,2})?)",
r"[¥¥]\s*([0-9]{1,6}(?:[.,][0-9]{1,2})?)",
r"([0-9]{1,6}(?:[.,][0-9]{1,2})?)\s*元",
)
def resolve_document_item_amount(document: dict[str, Any]) -> Decimal | None:
text = " ".join(
[
str(document.get("summary") or "").strip(),
str(document.get("text") or "").strip(),
]
).strip()
field_amount = resolve_document_field_amount(document)
text_amount = resolve_document_text_amount(text)
if field_amount is not None:
if is_date_like_amount_candidate(field_amount, text):
return text_amount
return field_amount
return text_amount
def resolve_document_field_amount(document: dict[str, Any]) -> Decimal | None:
for field in list(document.get("document_fields") or []):
if not isinstance(field, dict):
continue
key = str(field.get("key") or "").strip().lower().replace("_", "")
label = str(field.get("label") or "").replace(" ", "")
is_amount_field = key in DOCUMENT_AMOUNT_FIELD_KEYS or any(
token in label for token in DOCUMENT_AMOUNT_LABEL_TOKENS
)
if not is_amount_field:
continue
raw_value = str(field.get("value") or "")
value = parse_document_amount_value(raw_value) or parse_plain_document_amount_value(
raw_value
)
if value is not None:
return value
return None
def resolve_document_text_amount(text: str) -> Decimal | None:
candidates = [
candidate
for candidate in extract_amount_candidates(text)
if not is_date_like_amount_candidate(candidate, text)
]
if not candidates:
return None
return max(candidates)
def parse_document_amount_value(value: str) -> Decimal | None:
raw_value = str(value or "").strip()
if not raw_value:
return None
for pattern in DOCUMENT_AMOUNT_PATTERNS:
match = pattern.search(raw_value)
if not match:
continue
numeric = str(match.group(1) or "").replace(",", ".").strip()
try:
amount = Decimal(numeric).quantize(Decimal("0.01"))
except (InvalidOperation, ValueError):
continue
if amount > Decimal("0.00"):
return amount
return None
def parse_plain_document_amount_value(value: str) -> Decimal | None:
raw_value = str(value or "").strip()
if not re.fullmatch(r"[0-9]{1,6}(?:[.,][0-9]{1,2})?", raw_value):
return None
try:
amount = Decimal(raw_value.replace(",", ".")).quantize(Decimal("0.01"))
except (InvalidOperation, ValueError):
return None
return amount if amount > Decimal("0.00") else None
def is_probable_year_amount(amount: Decimal | None) -> bool:
if amount is None:
return False
try:
normalized = Decimal(amount).quantize(Decimal("0.01"))
except (InvalidOperation, ValueError):
return False
return (
normalized == normalized.to_integral_value()
and Decimal("1900") <= normalized <= Decimal("2099")
)
def is_date_like_amount_candidate(amount: Decimal | None, text: str) -> bool:
if not is_probable_year_amount(amount):
return False
year = str(int(Decimal(amount or 0)))
pattern = re.compile(rf"(?<!\d){re.escape(year)}\s*(?:年|[-/.])\s*\d{{1,2}}")
return bool(pattern.search(str(text or "")))
def format_decimal_amount(amount: Decimal | None) -> str:
if amount is None:
return ""
normalized = Decimal(amount).quantize(Decimal("0.01"))
return format(normalized, "f")
def extract_amount_candidates(text: str) -> list[Decimal]:
values: list[Decimal] = []
seen: set[Decimal] = set()
def append_candidate(
raw: str,
*,
source_text: str = "",
start: int = -1,
end: int = -1,
) -> None:
compact = str(raw or "").replace(",", ".").strip()
if not compact:
return
try:
candidate = Decimal(compact).quantize(Decimal("0.01"))
except (InvalidOperation, ValueError):
return
if is_amount_match_date_fragment(candidate, source_text, start, end):
return
if candidate in seen:
return
seen.add(candidate)
values.append(candidate)
for pattern in DOCUMENT_TEXT_AMOUNT_PATTERNS:
for match in re.finditer(pattern, text, flags=re.IGNORECASE):
append_candidate(
match.group(1),
source_text=text,
start=match.start(1),
end=match.end(1),
)
if values:
return values
for match in re.finditer(r"(?<!\d)(\d{1,6}\.\d{1,2})(?!\d)", text):
append_candidate(match.group(1), source_text=text, start=match.start(1), end=match.end(1))
return values
def is_amount_match_date_fragment(
amount: Decimal,
text: str,
start: int,
end: int,
) -> bool:
if start < 0 or end < 0 or not is_probable_year_amount(amount):
return False
before = str(text or "")[max(0, start - 8):start]
after = str(text or "")[end:end + 10]
if re.match(r"\s*(?:年|[-/.])\s*\d{1,2}", after):
return True
if re.search(r"\d{1,2}\s*(?:年|[-/.])\s*$", before):
return True
return False

View File

@@ -0,0 +1,401 @@
from __future__ import annotations
import re
from typing import Any
from sqlalchemy import and_, func, or_, select
from sqlalchemy.orm import Session, selectinload
from app.api.deps import CurrentUserContext
from app.models.employee import Employee
from app.models.financial_record import ExpenseClaim
from app.models.organization import OrganizationUnit
PRIVILEGED_CLAIM_ROLE_CODES = {"finance", "executive"}
APPROVAL_VISIBLE_CLAIM_ROLE_CODES = {"manager", "approver"}
CLAIM_DELETE_ROLE_CODES = {"executive"}
class ExpenseClaimAccessPolicy:
def __init__(self, db: Session) -> None:
self.db = db
@staticmethod
def has_privileged_claim_access(current_user: CurrentUserContext) -> bool:
if current_user.is_admin:
return True
return bool(ExpenseClaimAccessPolicy.normalize_role_codes(current_user) & PRIVILEGED_CLAIM_ROLE_CODES)
@staticmethod
def has_claim_delete_access(current_user: CurrentUserContext) -> bool:
if current_user.is_admin:
return True
return bool(ExpenseClaimAccessPolicy.normalize_role_codes(current_user) & CLAIM_DELETE_ROLE_CODES)
def can_return_claim(self, current_user: CurrentUserContext, claim: ExpenseClaim) -> bool:
if self.has_privileged_claim_access(current_user):
return True
role_codes = self.normalize_role_codes(current_user)
if not (role_codes & APPROVAL_VISIBLE_CLAIM_ROLE_CODES):
return False
if str(claim.status or "").strip().lower() != "submitted":
return False
if str(claim.approval_stage or "").strip() != "直属领导审批":
return False
current_employee = self.resolve_current_employee(current_user)
if current_employee is not None and str(claim.employee_id or "").strip() == current_employee.id:
return False
claim_employee = claim.employee
if current_employee is not None and claim_employee is not None:
if claim_employee.manager_id == current_employee.id:
return True
if claim_employee.manager is not None and claim_employee.manager.id == current_employee.id:
return True
approver_name = str(
current_employee.name if current_employee is not None and current_employee.name else current_user.name or ""
).strip()
if not approver_name:
return False
return self.resolve_claim_manager_name(claim) == approver_name
def can_approve_claim(self, current_user: CurrentUserContext, claim: ExpenseClaim) -> bool:
stage = str(claim.approval_stage or "").strip()
if stage == "直属领导审批":
return self.is_current_direct_manager_approver(current_user, claim)
if stage == "财务审批":
role_codes = self.normalize_role_codes(current_user)
return current_user.is_admin or "finance" in role_codes
return False
def is_current_direct_manager_approver(self, current_user: CurrentUserContext, claim: ExpenseClaim) -> bool:
role_codes = self.normalize_role_codes(current_user)
if not (role_codes & APPROVAL_VISIBLE_CLAIM_ROLE_CODES):
return False
if str(claim.status or "").strip().lower() != "submitted":
return False
if str(claim.approval_stage or "").strip() != "直属领导审批":
return False
current_employee = self.resolve_current_employee(current_user)
if current_employee is not None and str(claim.employee_id or "").strip() == current_employee.id:
return False
claim_employee = claim.employee
if current_employee is not None and claim_employee is not None:
if claim_employee.manager_id == current_employee.id:
return True
if claim_employee.manager is not None and claim_employee.manager.id == current_employee.id:
return True
approver_name = str(
current_employee.name if current_employee is not None and current_employee.name else current_user.name or ""
).strip()
if not approver_name:
return False
return self.resolve_claim_manager_name(claim) == approver_name
@staticmethod
def normalize_role_codes(current_user: CurrentUserContext) -> set[str]:
return {
str(item).strip().lower()
for item in current_user.role_codes
if str(item).strip()
}
def resolve_current_employee(self, current_user: CurrentUserContext) -> Employee | None:
return self.resolve_employee_by_identity_candidates(
[
str(current_user.username or "").strip(),
str(current_user.name or "").strip(),
]
)
def resolve_current_user_display_name(self, current_user: CurrentUserContext) -> str:
current_employee = self.resolve_current_employee(current_user)
if current_employee is not None and str(current_employee.name or "").strip():
return str(current_employee.name).strip()
for candidate in (current_user.name, current_user.username):
normalized = str(candidate or "").strip()
if normalized and not self.is_email_like(normalized):
return normalized
return str(current_user.username or current_user.name or "anonymous").strip() or "anonymous"
def is_claim_owned_by_current_user(self, claim: ExpenseClaim, current_user: CurrentUserContext) -> bool:
current_employee = self.resolve_current_employee(current_user)
if current_employee is not None:
if str(claim.employee_id or "").strip() == current_employee.id:
return True
identity_values = {
str(current_employee.name or "").strip(),
str(current_employee.email or "").strip(),
str(current_employee.employee_no or "").strip(),
}
else:
identity_values = set()
identity_values.update(
{
str(current_user.username or "").strip(),
str(current_user.name or "").strip(),
}
)
identity_values.discard("")
return str(claim.employee_name or "").strip() in identity_values
@staticmethod
def is_email_like(value: str) -> bool:
return bool(re.match(r"^[^@\s]+@[^@\s]+\.[^@\s]+$", str(value or "").strip()))
def resolve_claim_employee_for_backfill(self, claim: ExpenseClaim) -> Employee | None:
if claim.employee is not None:
employee = self.db.scalar(
select(Employee)
.options(
selectinload(Employee.organization_unit),
selectinload(Employee.manager),
selectinload(Employee.roles),
)
.where(Employee.id == claim.employee.id)
.limit(1)
)
return employee or claim.employee
employee_id = str(claim.employee_id or "").strip()
if employee_id:
employee = self.db.scalar(
select(Employee)
.options(
selectinload(Employee.organization_unit),
selectinload(Employee.manager),
selectinload(Employee.roles),
)
.where(Employee.id == employee_id)
.limit(1)
)
if employee is not None:
return employee
return self.resolve_employee_by_identity_candidates([str(claim.employee_name or "").strip()])
def resolve_employee_by_identity_candidates(self, candidates: list[str]) -> Employee | None:
normalized_candidates = [
item
for item in dict.fromkeys(str(candidate or "").strip() for candidate in candidates)
if item
]
if not normalized_candidates:
return None
load_options = (
selectinload(Employee.organization_unit),
selectinload(Employee.manager),
selectinload(Employee.roles),
)
for candidate in normalized_candidates:
employee = self.db.scalar(
select(Employee)
.options(*load_options)
.where(
or_(
func.lower(Employee.email) == candidate.lower(),
func.lower(Employee.employee_no) == candidate.lower(),
)
)
.limit(1)
)
if employee is not None:
return employee
for candidate in normalized_candidates:
matches = list(
self.db.scalars(
select(Employee)
.options(*load_options)
.where(Employee.name == candidate)
.limit(2)
).all()
)
if len(matches) == 1:
return matches[0]
return None
def backfill_claim_identity_from_current_user(
self,
claim: ExpenseClaim,
current_user: CurrentUserContext,
) -> None:
employee = self.resolve_claim_employee_for_backfill(claim) or self.resolve_current_employee(current_user)
if employee is not None:
claim_employee_id = str(claim.employee_id or "").strip()
claim_employee_name = str(claim.employee_name or "").strip()
employee_names = {
str(employee.name or "").strip(),
str(employee.email or "").strip(),
str(employee.employee_no or "").strip(),
}
employee_names.discard("")
can_apply_employee = (
not claim_employee_id
or claim_employee_id == employee.id
or self.is_missing_value(claim_employee_name)
or claim_employee_name in employee_names
)
if can_apply_employee:
claim.employee = employee
claim.employee_id = employee.id
if employee.name:
claim.employee_name = employee.name
if employee.organization_unit is not None:
claim.department_id = employee.organization_unit_id
claim.department_name = employee.organization_unit.name
return
context_department = str(
getattr(current_user, "department_name", "")
or getattr(current_user, "department", "")
or getattr(current_user, "departmentName", "")
or ""
).strip()
if context_department and self.is_missing_value(claim.department_name):
claim.department_name = context_department
context_name = str(current_user.name or current_user.username or "").strip()
if context_name and self.is_missing_value(claim.employee_name):
claim.employee_name = context_name
def employee_name_is_unique(self, employee: Employee) -> bool:
normalized_name = str(employee.name or "").strip()
if not normalized_name:
return False
same_name_count = int(
self.db.scalar(
select(func.count()).select_from(Employee).where(Employee.name == normalized_name)
)
or 0
)
return same_name_count == 1
def build_personal_claim_conditions(self, current_user: CurrentUserContext) -> list[Any]:
conditions = []
username = str(current_user.username or "").strip()
employee = self.resolve_current_employee(current_user)
def add_condition(field_name: str, value: str | None) -> None:
normalized = str(value or "").strip()
if not normalized:
return
if field_name == "employee_id":
conditions.append(ExpenseClaim.employee_id == normalized)
return
conditions.append(ExpenseClaim.employee_name == normalized)
if employee is not None:
add_condition("employee_id", employee.id)
add_condition("employee_name", employee.email)
if self.employee_name_is_unique(employee):
add_condition("employee_name", employee.name)
else:
add_condition("employee_id", username)
add_condition("employee_name", username)
return conditions
def build_approval_claim_conditions(self, current_user: CurrentUserContext) -> list[Any]:
role_codes = self.normalize_role_codes(current_user)
if not (role_codes & APPROVAL_VISIBLE_CLAIM_ROLE_CODES):
return []
employee = self.resolve_current_employee(current_user)
manager_name = str(
employee.name if employee is not None and employee.name else current_user.name or ""
).strip()
pending_leader_approval_parts = [
ExpenseClaim.status == "submitted",
ExpenseClaim.approval_stage == "直属领导审批",
]
if employee is not None:
pending_leader_approval_parts.append(
or_(ExpenseClaim.employee_id.is_(None), ExpenseClaim.employee_id != employee.id)
)
if manager_name:
pending_leader_approval_parts.append(ExpenseClaim.employee_name != manager_name)
pending_leader_approval = and_(*pending_leader_approval_parts)
conditions = []
if employee is not None:
subordinate_ids = select(Employee.id).where(Employee.manager_id == employee.id)
conditions.append(and_(pending_leader_approval, ExpenseClaim.employee_id.in_(subordinate_ids)))
if manager_name:
managed_department_ids = select(OrganizationUnit.id).where(OrganizationUnit.manager_name == manager_name)
managed_department_names = select(OrganizationUnit.name).where(OrganizationUnit.manager_name == manager_name)
conditions.append(and_(pending_leader_approval, ExpenseClaim.department_id.in_(managed_department_ids)))
conditions.append(and_(pending_leader_approval, ExpenseClaim.department_name.in_(managed_department_names)))
return conditions
def apply_approval_claim_scope(self, stmt: Any, current_user: CurrentUserContext) -> Any:
role_codes = self.normalize_role_codes(current_user)
if current_user.is_admin or "executive" in role_codes:
return stmt.where(ExpenseClaim.status == "submitted")
if "finance" in role_codes:
return stmt.where(
ExpenseClaim.status == "submitted",
ExpenseClaim.approval_stage == "财务审批",
)
conditions = self.build_approval_claim_conditions(current_user)
if not conditions:
return stmt.where(ExpenseClaim.id == "__no_visible_claim__")
return stmt.where(or_(*conditions))
def apply_claim_scope(
self,
stmt: Any,
current_user: CurrentUserContext,
*,
include_approval_scope: bool = False,
) -> Any:
if self.has_privileged_claim_access(current_user):
return stmt
conditions = self.build_personal_claim_conditions(current_user)
if not conditions:
return stmt.where(ExpenseClaim.id == "__no_visible_claim__")
if include_approval_scope:
conditions.extend(self.build_approval_claim_conditions(current_user))
return stmt.where(or_(*conditions))
@staticmethod
def resolve_claim_manager_name(claim: ExpenseClaim) -> str:
if claim.employee is not None:
if claim.employee.manager is not None and claim.employee.manager.name:
return str(claim.employee.manager.name).strip()
if claim.employee.organization_unit is not None and claim.employee.organization_unit.manager_name:
return str(claim.employee.organization_unit.manager_name).strip()
return ""
@staticmethod
def is_missing_value(value: Any) -> bool:
normalized = str(value or "").strip()
return not normalized or normalized in {"待补充", "待确认", "N/A", "n/a", ""}

View File

@@ -0,0 +1,668 @@
from __future__ import annotations
import json
import re
import shutil
import uuid
from collections import defaultdict
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal, InvalidOperation
from pathlib import Path
from types import SimpleNamespace
from typing import Any
from sqlalchemy import func, or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from sqlalchemy.exc import IntegrityError
from sqlalchemy.orm import Session, selectinload
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.employee import Employee
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.ontology import OntologyEntity, OntologyParseResult
from app.schemas.reimbursement import (
ExpenseClaimItemCreate,
ExpenseClaimItemUpdate,
ExpenseClaimUpdate,
TravelReimbursementCalculatorRequest,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.agent_foundation import AgentFoundationService
from app.services.audit import AuditLogService
from app.services.document_intelligence import build_document_insight
from app.services.expense_claim_access_policy import ExpenseClaimAccessPolicy
from app.services.expense_claim_attachment_presentation import ExpenseClaimAttachmentPresentation
from app.services.expense_claim_attachment_storage import ExpenseClaimAttachmentStorage
from app.services.expense_claim_constants import (
EXPENSE_TYPE_LABELS,
MAX_DRAFT_CLAIMS_PER_USER,
EDITABLE_CLAIM_STATUSES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_DETAIL_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
DOCUMENT_TYPE_ITEM_TYPE_MAP,
DOCUMENT_TYPE_SCENE_MAP,
DOCUMENT_FACT_ITEM_TYPES,
ROUTE_DESCRIPTION_ITEM_TYPES,
DOCUMENT_TRIP_DATE_LABELS,
DOCUMENT_TRIP_DATE_REQUIREMENT_LABELS,
DOCUMENT_TRIP_DATE_KEYS,
DOCUMENT_GENERIC_DATE_KEYS,
DOCUMENT_INVOICE_DATE_KEYS,
DOCUMENT_TRIP_DATE_LABEL_TOKENS,
DOCUMENT_GENERIC_DATE_LABEL_TOKENS,
DOCUMENT_INVOICE_DATE_LABEL_TOKENS,
DOCUMENT_ROUTE_FORMAT_PATTERN,
DOCUMENT_ROUTE_TEXT_PATTERN,
DOCUMENT_ROUTE_ORIGIN_LABELS,
DOCUMENT_ROUTE_DESTINATION_LABELS,
GENERIC_ATTACHMENT_BACKFILL_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
EXPENSE_SCENE_KEYWORDS,
EXPENSE_TYPE_ALLOWED_DOCUMENT_SCENES,
DOCUMENT_SCENE_LABELS,
DOCUMENT_ASSOCIATION_REVIEW_ACTIONS,
PERSISTENT_EXPENSE_REVIEW_ACTIONS,
RETURN_REASON_OPTIONS,
MAX_CLAIM_NO_RETRY_ATTEMPTS,
DOCUMENT_DATE_PATTERN,
SYSTEM_GENERATED_REASON_PREFIXES,
LEADING_REASON_TIME_PATTERNS,
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
TRAVEL_REVIEW_RELEVANT_EXPENSE_TYPES,
TRAVEL_REVIEW_LONG_DISTANCE_DOCUMENT_TYPES,
TRAVEL_POLICY_CITY_TIERS,
TRAVEL_POLICY_CITY_MATCH_ORDER,
TRAVEL_POLICY_BAND_LABELS,
TRAVEL_POLICY_HOTEL_LIMITS,
TRAVEL_POLICY_ALLOWED_TRANSPORT_LEVELS,
TRAVEL_POLICY_ROUTE_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_STANDARD_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_FLIGHT_CLASS_PATTERNS,
TRAVEL_POLICY_TRAIN_CLASS_PATTERNS,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_claim_risk_review import ExpenseClaimRiskReviewMixin
from app.services.expense_amounts import (
extract_amount_candidates,
format_decimal_amount,
is_amount_match_date_fragment,
is_date_like_amount_candidate,
is_probable_year_amount,
parse_document_amount_value,
parse_plain_document_amount_value,
resolve_document_field_amount,
resolve_document_item_amount,
resolve_document_text_amount,
)
from app.services.expense_rule_runtime import (
DEFAULT_SCENE_RULE_ASSET_CODE,
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
resolve_document_type_label,
)
from app.services.ocr import OcrService
class ExpenseClaimAttachmentAnalysisMixin:
def _build_attachment_expense_audit_points(
self,
*,
document: Any,
item: ExpenseClaimItem,
document_info: dict[str, Any],
) -> list[str]:
text = " ".join(
[
str(getattr(document, "summary", "") or "").strip(),
str(getattr(document, "text", "") or "").strip(),
]
).strip()
document_payload = {
"document_fields": document_info.get("fields") or [],
"summary": str(getattr(document, "summary", "") or ""),
"text": str(getattr(document, "text", "") or ""),
}
field_amount = self._resolve_document_field_amount(document_payload)
audited_amount = self._resolve_document_item_amount(document_payload)
item_amount = Decimal(item.item_amount or Decimal("0.00")).quantize(Decimal("0.01"))
points: list[str] = []
if (
field_amount is not None
and audited_amount is not None
and self._is_date_like_amount_candidate(field_amount, text)
and abs(field_amount - audited_amount) > Decimal("1.00")
):
points.append(
"费用核算OCR 金额疑似误取日期"
f" {self._format_decimal_amount(field_amount)}"
f"已按票据文本中的总费用 {self._format_decimal_amount(audited_amount)} 元回填,"
"请核对酒店或票据原文总额。"
)
if (
audited_amount is not None
and item_amount > Decimal("0.00")
and abs(audited_amount - item_amount) > Decimal("1.00")
):
points.append(
f"费用核算:票据文本复核金额为 {self._format_decimal_amount(audited_amount)} 元,"
f"当前明细金额为 {self._format_decimal_amount(item_amount)} 元,请确认是否需要调整。"
)
return points
def _build_attachment_travel_policy_audit(
self,
*,
document: Any,
item: ExpenseClaimItem,
document_info: dict[str, Any],
claim: ExpenseClaim | None = None,
) -> dict[str, Any]:
policy = self._get_expense_rule_catalog().travel_policy
if policy is None:
return {"points": [], "rule_basis": [], "has_high_risk": False}
item_type = str(item.item_type or "").strip().lower()
document_type = str(document_info.get("document_type") or "").strip().lower()
scene_code = str(document_info.get("scene_code") or "").strip().lower()
if not (
item_type in {"hotel", "hotel_ticket"}
or document_type == "hotel_invoice"
or scene_code == "hotel"
):
return {"points": [], "rule_basis": [], "has_high_risk": False}
item_amount = Decimal(item.item_amount or Decimal("0.00")).quantize(Decimal("0.01"))
if item_amount <= Decimal("0.00"):
return {"points": [], "rule_basis": [], "has_high_risk": False}
claim = claim or getattr(item, "claim", None)
grade_band = self._resolve_travel_policy_band(getattr(claim, "employee_grade", None))
rule_name = str(policy.standard_rule_name or policy.rule_name or "公司差旅费报销规则").strip()
rule_version = str(policy.standard_rule_version or policy.rule_version or "").strip()
version_text = f"{rule_version}" if rule_version else ""
rule_basis = [
f"依据《{rule_name}{version_text},住宿费按员工职级、出差城市和每晚金额进行差标核算。"
]
if grade_band is None:
return {
"points": ["住宿标准:当前员工职级缺失,无法匹配规则中心的住宿报销标准。"],
"rule_basis": rule_basis,
"has_high_risk": False,
}
text = " ".join(
[
str(getattr(document, "summary", "") or "").strip(),
str(getattr(document, "text", "") or "").strip(),
]
).strip()
context = {
"item": item,
"document_info": document_info,
"ocr_summary": str(getattr(document, "summary", "") or "").strip(),
"ocr_text": str(getattr(document, "text", "") or "").strip(),
}
hotel_city = self._extract_hotel_city(context, policy)
claim_city = self._extract_city_from_text(str(getattr(claim, "location", "") or ""), policy) if claim else ""
reason_city = self._extract_city_from_text(str(getattr(claim, "reason", "") or ""), policy) if claim else ""
baseline_city = hotel_city or claim_city or reason_city
if not baseline_city:
baseline_city = self._extract_city_from_text(text, policy)
if not baseline_city:
return {
"points": ["住宿标准:未能从酒店名称、出差地点或票据内容匹配到规则中心城市,无法核算住宿差标。"],
"rule_basis": rule_basis,
"has_high_risk": False,
}
standard = self._resolve_travel_policy_hotel_standard(
policy=policy,
grade_band=grade_band,
city=baseline_city,
)
if standard is None:
return {"points": [], "rule_basis": rule_basis, "has_high_risk": False}
cap, standard_label = standard
night_count = self._extract_hotel_night_count(context)
nightly_amount = (item_amount / Decimal(max(night_count, 1))).quantize(Decimal("0.01"))
if nightly_amount <= cap:
return {"points": [], "rule_basis": rule_basis, "has_high_risk": False}
band_label = policy.band_labels.get(grade_band, str(getattr(claim, "employee_grade", "") or "当前职级").strip())
over_amount = (nightly_amount - cap).quantize(Decimal("0.01"))
return {
"points": [
(
f"住宿标准:{band_label}{standard_label}的住宿标准为 "
f"{self._format_decimal_amount(cap)} 元/晚,票据识别金额 "
f"{self._format_decimal_amount(item_amount)} 元 / {night_count} 晚,"
f"{self._format_decimal_amount(nightly_amount)} 元/晚,"
f"超出 {self._format_decimal_amount(over_amount)} 元/晚。"
)
],
"rule_basis": rule_basis,
"has_high_risk": True,
}
def _build_attachment_requirement_check(
self,
*,
item: ExpenseClaimItem,
document_info: dict[str, Any],
) -> dict[str, Any]:
expense_type = str(item.item_type or "").strip().lower() or "other"
policy = self._get_expense_scene_policy(expense_type)
expense_label = policy.label if policy is not None else self._resolve_expense_type_label(expense_type)
allowed_scenes = set(policy.allowed_scene_codes) if policy is not None else set()
allowed_document_types = set(policy.allowed_document_types) if policy is not None else set()
allowed_scene_labels = [self._resolve_document_scene_label(code) for code in sorted(allowed_scenes)]
allowed_document_type_labels = [
resolve_document_type_label(document_type)
for document_type in sorted(allowed_document_types)
]
recognized_scene_code = str(document_info.get("scene_code") or "other").strip() or "other"
recognized_scene_label = str(
document_info.get("scene_label") or self._resolve_document_scene_label(recognized_scene_code)
).strip()
recognized_document_type = str(document_info.get("document_type") or "other").strip() or "other"
recognized_document_type_label = str(document_info.get("document_type_label") or "其他单据").strip() or "其他单据"
matches = (
(not allowed_scenes and not allowed_document_types)
or recognized_scene_code in allowed_scenes
or recognized_document_type in allowed_document_types
)
if matches:
if allowed_scene_labels or allowed_document_type_labels:
message = (
f"当前费用项目为{expense_label},已识别为{recognized_document_type_label}"
f"符合当前{expense_label}场景的附件要求。"
)
else:
message = f"当前费用项目为{expense_label},已识别为{recognized_document_type_label}"
else:
expected_parts = [label + "相关票据" for label in allowed_scene_labels]
expected_parts.extend(allowed_document_type_labels)
expected_text = "".join(dict.fromkeys(part for part in expected_parts if part)) or "对应场景票据"
message = (
f"当前费用项目为{expense_label},要求上传{expected_text}"
f"当前识别为{recognized_document_type_label},不符合当前场景,建议过滤或更换附件。"
)
return {
"matches": matches,
"current_expense_type": expense_type,
"current_expense_type_label": expense_label,
"allowed_scene_labels": allowed_scene_labels,
"allowed_document_type_labels": allowed_document_type_labels,
"recognized_scene_code": recognized_scene_code,
"recognized_scene_label": recognized_scene_label,
"recognized_document_type": recognized_document_type,
"recognized_document_type_label": recognized_document_type_label,
"mismatch_severity": policy.attachment_mismatch_severity if policy is not None else "high",
"rule_code": policy.rule_code if policy is not None else DEFAULT_SCENE_RULE_ASSET_CODE,
"rule_name": policy.rule_name if policy is not None else "报销场景提交与附件标准",
"message": message,
}
@staticmethod
def _resolve_document_scene_label(scene_code: str) -> str:
normalized = str(scene_code or "").strip().lower()
return DOCUMENT_SCENE_LABELS.get(normalized, "其他票据")
@staticmethod
def _extract_amount_candidates(text: str) -> list[Decimal]:
return extract_amount_candidates(text)
@staticmethod
def _is_amount_match_date_fragment(
amount: Decimal,
text: str,
start: int,
end: int,
) -> bool:
return is_amount_match_date_fragment(amount, text, start, end)
@staticmethod
def _has_date_like_text(text: str) -> bool:
return bool(re.search(r"(20\d{2}[年/\-.]\d{1,2}[月/\-.]\d{1,2}日?)", text))
@staticmethod
def _normalize_match_text(text: str) -> str:
return re.sub(r"\s+", "", str(text or "")).lower()
@staticmethod
def _resolve_expense_type_label(expense_type: str | None) -> str:
normalized = str(expense_type or "").strip().lower()
return EXPENSE_TYPE_LABELS.get(normalized, "其他")
def _resolve_allowed_document_scenes(self, expense_type: str | None) -> set[str]:
normalized = str(expense_type or "").strip().lower()
policy = self._get_expense_scene_policy(normalized)
allowed_scenes = set(policy.allowed_scene_codes) if policy is not None else set()
allowed_scenes.update(EXPENSE_TYPE_ALLOWED_DOCUMENT_SCENES.get(normalized, set()))
return allowed_scenes
def _resolve_document_analysis_scenes(self, document_info: dict[str, Any], text: str) -> set[str]:
scenes: set[str] = set()
recognized_scene_code = str(document_info.get("scene_code") or "").strip().lower()
if recognized_scene_code and recognized_scene_code != "other":
scenes.add(recognized_scene_code)
recognized_document_type = str(document_info.get("document_type") or "").strip().lower()
mapped_scene = DOCUMENT_TYPE_SCENE_MAP.get(recognized_document_type)
if mapped_scene:
scenes.add(mapped_scene)
if scenes:
return scenes
return set(self._detect_expense_scenes(text).keys())
def _detect_expense_scenes(self, text: str) -> dict[str, list[str]]:
normalized = self._normalize_match_text(text)
if not normalized:
return {}
matches: dict[str, list[str]] = {}
for scene, keywords in EXPENSE_SCENE_KEYWORDS.items():
matched = [keyword for keyword in keywords if keyword in normalized]
if matched:
matches[scene] = matched[:3]
return matches
def _format_scene_labels(self, scene_codes: set[str]) -> str:
labels = [self._resolve_expense_type_label(code) for code in scene_codes]
unique_labels = list(dict.fromkeys(label for label in labels if label))
return "".join(unique_labels) if unique_labels else "其他"
def _build_purpose_mismatch_point(
self,
*,
item: ExpenseClaimItem,
document_scenes: set[str],
) -> str | None:
if not document_scenes:
return None
allowed_scenes = self._resolve_allowed_document_scenes(item.item_type)
document_scene_labels = self._format_scene_labels(document_scenes)
if allowed_scenes and document_scenes.isdisjoint(allowed_scenes):
expense_label = self._resolve_expense_type_label(item.item_type)
return f"附件类型:当前费用项目为{expense_label},但附件内容更像{document_scene_labels}相关票据。"
return None
@staticmethod
def _is_valid_route_description(value: str) -> bool:
text = str(value or "").strip()
if not text:
return False
if DOCUMENT_DATE_PATTERN.search(text):
return False
return bool(DOCUMENT_ROUTE_FORMAT_PATTERN.match(text))
def _build_route_format_point(
self,
*,
item: ExpenseClaimItem,
document_info: dict[str, Any],
) -> str | None:
item_type = str(item.item_type or "").strip().lower()
document_type = str(document_info.get("document_type") or "").strip().lower()
route_required = item_type in ROUTE_DESCRIPTION_ITEM_TYPES or document_type in {
"train_ticket",
"flight_itinerary",
"taxi_receipt",
"transport_receipt",
}
if not route_required:
return None
reason = str(item.item_reason or "").strip()
if self._is_valid_route_description(reason):
return None
example = "广州南-北京南" if item_type != "ride_ticket" else "深圳北站-腾讯滨海大厦"
current = f"当前为“{reason[:30]}”," if reason else ""
return (
f"行程说明:{current}格式应为“起始地-目的地”,"
f"例如“{example}”,请按票据行程补充。"
)
def _build_fallback_attachment_analysis(
self,
*,
media_type: str | None,
item: ExpenseClaimItem,
) -> dict[str, Any]:
return {
"severity": "medium",
"label": "中风险",
"headline": "AI提示附件已上传待识别结果",
"summary": "附件已成功保存,但当前尚未拿到有效识别结果,建议人工先核对票据内容。",
"points": [
f"附件格式:{self._attachment_presentation.resolve_media_type('attachment', fallback=media_type)}",
f"费用金额:当前明细金额为 {item.item_amount}",
],
"suggestion": "建议打开附件确认金额、日期和票据类型是否完整,再继续提交审批。",
}
def _build_failed_ocr_attachment_analysis(
self,
*,
media_type: str | None,
error_message: str,
item: ExpenseClaimItem,
) -> dict[str, Any]:
return {
"severity": "medium",
"label": "中风险",
"headline": "AI提示附件已上传但识别失败",
"summary": "文件已经保存成功,但本次 AI 识别未完成,因此无法给出完整票据核验结论。",
"points": [
f"识别异常:{error_message or 'OCR 服务暂不可用'}",
f"费用金额:当前明细金额为 {item.item_amount}",
f"附件格式:{self._attachment_presentation.resolve_media_type('attachment', fallback=media_type)}",
],
"suggestion": "建议重新上传更清晰的票据图片,或稍后重试识别后再提交。",
}
def _build_attachment_analysis(
self,
*,
document: Any,
item: ExpenseClaimItem,
claim: ExpenseClaim | None = None,
document_info: dict[str, Any] | None = None,
requirement_check: dict[str, Any] | None = None,
) -> dict[str, Any]:
warnings = [str(value).strip() for value in list(getattr(document, "warnings", []) or []) if str(value).strip()]
text = " ".join(
[
str(getattr(document, "summary", "") or "").strip(),
str(getattr(document, "text", "") or "").strip(),
]
).strip()
compact_text = text.replace(" ", "")
avg_score = float(getattr(document, "avg_score", 0.0) or 0.0)
line_count = int(getattr(document, "line_count", 0) or 0)
document_info = document_info or self._build_attachment_document_info(document)
requirement_check = requirement_check or self._build_attachment_requirement_check(
item=item,
document_info=document_info,
)
document_scenes = self._resolve_document_analysis_scenes(document_info, text)
purpose_mismatch_point = self._build_purpose_mismatch_point(
item=item,
document_scenes=document_scenes,
)
route_format_point = self._build_route_format_point(
item=item,
document_info=document_info,
)
expense_audit_points = self._build_attachment_expense_audit_points(
document=document,
item=item,
document_info=document_info,
)
travel_policy_audit = self._build_attachment_travel_policy_audit(
document=document,
item=item,
claim=claim,
document_info=document_info,
)
travel_policy_points = [
str(point).strip()
for point in list(travel_policy_audit.get("points") or [])
if str(point).strip()
]
travel_policy_rule_basis = [
str(point).strip()
for point in list(travel_policy_audit.get("rule_basis") or [])
if str(point).strip()
]
travel_policy_high_risk = bool(travel_policy_audit.get("has_high_risk"))
recognized_document_type = str(document_info.get("document_type") or "other").strip().lower() or "other"
recognized_document_label = str(document_info.get("document_type_label") or "其他单据").strip() or "其他单据"
requirement_matches = bool(requirement_check.get("matches"))
mismatch_severity = str(requirement_check.get("mismatch_severity") or "high").strip().lower() or "high"
has_ticket_keyword = any(
keyword in compact_text
for keyword in (
"发票",
"票据",
"增值税",
"电子行程单",
"购买方",
"销售方",
"税额",
"价税",
"票号",
"发票代码",
"凭证",
)
)
amount_candidates = self._extract_amount_candidates(text)
item_amount = Decimal(item.item_amount or Decimal("0.00")).quantize(Decimal("0.01"))
has_matching_amount = any(abs(candidate - item_amount) <= Decimal("1.00") for candidate in amount_candidates)
has_date_text = self._has_date_like_text(text)
amount_mismatch = bool(amount_candidates) and item_amount > Decimal("0.00") and not has_matching_amount
points: list[str] = []
if warnings:
points.append(f"识别提示:{warnings[0]}")
if line_count == 0 or not compact_text:
points.append("附件内容:未识别到有效文字,当前附件更像普通图片或内容过于模糊。")
if recognized_document_type == "other" and not has_ticket_keyword:
points.append("票据类型:未识别到发票、票据、电子行程单等关键字,暂无法判断票据类型。")
if not amount_candidates:
points.append("金额字段:未识别到可用于核对的金额。")
elif amount_mismatch:
candidate_text = "".join(str(candidate) for candidate in amount_candidates[:3])
points.append(f"金额字段:附件识别金额 {candidate_text} 元与报销金额 {item_amount} 元不一致。")
if not has_date_text:
date_requirement = DOCUMENT_TRIP_DATE_REQUIREMENT_LABELS.get(
recognized_document_type,
"开票日期或业务发生日期",
)
points.append(f"日期字段:未识别到{date_requirement}")
if not requirement_matches:
points.append(f"附件类型要求:{requirement_check.get('message')}")
points.extend(expense_audit_points)
points.extend(travel_policy_points)
if purpose_mismatch_point:
points.append(purpose_mismatch_point)
if route_format_point:
points.append(route_format_point)
if avg_score and avg_score < 0.72:
points.append(f"识别质量OCR 置信度偏低({avg_score:.0%}),可能影响票据核验准确性。")
issue_count = len(points)
if issue_count == 0:
return {
"severity": "pass",
"label": "AI提示符合条件",
"headline": "AI提示附件符合基础校验条件",
"summary": "已识别到票据类型和关键字段,且符合当前费用场景的附件要求。",
"points": [
f"票据类型:已识别为{recognized_document_label}",
f"附件类型要求:{requirement_check.get('message')}",
f"金额字段:已识别到与当前明细接近的金额 {item_amount} 元。",
],
"rule_basis": travel_policy_rule_basis,
"suggestion": "建议继续核对报销分类、费用说明和业务场景是否一致。",
}
severity = "low"
label = "低风险"
headline = "AI提示附件存在轻微待核对项"
summary = "当前附件已识别出部分票据要素,但仍建议人工继续复核。"
if travel_policy_high_risk:
severity = "high"
label = "高风险"
headline = "AI提示住宿金额超出报销标准"
summary = "当前住宿票据金额超过规则中心差旅住宿标准,强行提交前需补充超标原因。"
elif (
line_count == 0
or not compact_text
or (recognized_document_type == "other" and not has_ticket_keyword and issue_count >= 2)
or (not requirement_matches and mismatch_severity == "high")
or (purpose_mismatch_point and amount_mismatch)
):
severity = "high"
label = "高风险"
headline = "AI提示附件不符合票据校验条件"
summary = "当前附件存在明显异常,票据类型与当前费用场景不匹配,或无法作为有效报销材料。"
elif (
purpose_mismatch_point
or route_format_point
or expense_audit_points
or travel_policy_points
or amount_mismatch
or issue_count >= 2
or warnings
or (avg_score and avg_score < 0.72)
or (not requirement_matches and mismatch_severity in {"medium", "low"})
):
severity = "medium"
label = "中风险"
headline = "AI提示附件存在明显待整改项"
summary = "当前附件可见部分内容,但金额、用途、日期或附件类型仍有缺失或不一致。"
if route_format_point and issue_count == 1:
summary = "票据行程已识别,但费用明细说明未按“起始地-目的地”格式填写。"
elif expense_audit_points and issue_count == len(expense_audit_points):
summary = "OCR 金额已完成二次核算,请按票据原文总额复核。"
elif travel_policy_points and issue_count == len(travel_policy_points):
summary = "住宿票据已识别,但当前缺少职级或城市信息,无法完成差旅住宿标准核算。"
suggestion = {
"high": "建议过滤当前不匹配的票据,重新上传符合当前费用场景的清晰原件。",
"medium": "建议根据风险点补齐清晰票据,或修正金额、日期、费用说明后再提交。",
"low": "建议人工再次核对金额和业务说明,确认后可继续流转。",
}[severity]
if travel_policy_high_risk:
suggestion = "请核对住宿发票金额、晚数和出差城市;如确需超标,需在附加说明中补充超标说明并提交审批重点复核。"
return {
"severity": severity,
"label": label,
"headline": headline,
"summary": summary,
"points": points,
"rule_basis": list(dict.fromkeys(travel_policy_rule_basis)),
"suggestion": suggestion,
}

View File

@@ -0,0 +1,336 @@
from __future__ import annotations
import json
import re
import shutil
import uuid
from collections import defaultdict
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal, InvalidOperation
from pathlib import Path
from types import SimpleNamespace
from typing import Any
from sqlalchemy import func, or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from sqlalchemy.exc import IntegrityError
from sqlalchemy.orm import Session, selectinload
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.employee import Employee
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.ontology import OntologyEntity, OntologyParseResult
from app.schemas.reimbursement import (
ExpenseClaimItemCreate,
ExpenseClaimItemUpdate,
ExpenseClaimUpdate,
TravelReimbursementCalculatorRequest,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.agent_foundation import AgentFoundationService
from app.services.audit import AuditLogService
from app.services.document_intelligence import build_document_insight
from app.services.expense_claim_access_policy import ExpenseClaimAccessPolicy
from app.services.expense_claim_attachment_presentation import ExpenseClaimAttachmentPresentation
from app.services.expense_claim_attachment_storage import ExpenseClaimAttachmentStorage
from app.services.expense_claim_constants import (
EXPENSE_TYPE_LABELS,
MAX_DRAFT_CLAIMS_PER_USER,
EDITABLE_CLAIM_STATUSES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_DETAIL_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
DOCUMENT_TYPE_ITEM_TYPE_MAP,
DOCUMENT_TYPE_SCENE_MAP,
DOCUMENT_FACT_ITEM_TYPES,
ROUTE_DESCRIPTION_ITEM_TYPES,
DOCUMENT_TRIP_DATE_LABELS,
DOCUMENT_TRIP_DATE_REQUIREMENT_LABELS,
DOCUMENT_TRIP_DATE_KEYS,
DOCUMENT_GENERIC_DATE_KEYS,
DOCUMENT_INVOICE_DATE_KEYS,
DOCUMENT_TRIP_DATE_LABEL_TOKENS,
DOCUMENT_GENERIC_DATE_LABEL_TOKENS,
DOCUMENT_INVOICE_DATE_LABEL_TOKENS,
DOCUMENT_ROUTE_FORMAT_PATTERN,
DOCUMENT_ROUTE_TEXT_PATTERN,
DOCUMENT_ROUTE_ORIGIN_LABELS,
DOCUMENT_ROUTE_DESTINATION_LABELS,
GENERIC_ATTACHMENT_BACKFILL_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
EXPENSE_SCENE_KEYWORDS,
EXPENSE_TYPE_ALLOWED_DOCUMENT_SCENES,
DOCUMENT_SCENE_LABELS,
DOCUMENT_ASSOCIATION_REVIEW_ACTIONS,
PERSISTENT_EXPENSE_REVIEW_ACTIONS,
RETURN_REASON_OPTIONS,
MAX_CLAIM_NO_RETRY_ATTEMPTS,
DOCUMENT_DATE_PATTERN,
SYSTEM_GENERATED_REASON_PREFIXES,
LEADING_REASON_TIME_PATTERNS,
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
TRAVEL_REVIEW_RELEVANT_EXPENSE_TYPES,
TRAVEL_REVIEW_LONG_DISTANCE_DOCUMENT_TYPES,
TRAVEL_POLICY_CITY_TIERS,
TRAVEL_POLICY_CITY_MATCH_ORDER,
TRAVEL_POLICY_BAND_LABELS,
TRAVEL_POLICY_HOTEL_LIMITS,
TRAVEL_POLICY_ALLOWED_TRANSPORT_LEVELS,
TRAVEL_POLICY_ROUTE_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_STANDARD_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_FLIGHT_CLASS_PATTERNS,
TRAVEL_POLICY_TRAIN_CLASS_PATTERNS,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_claim_risk_review import ExpenseClaimRiskReviewMixin
from app.services.expense_amounts import (
extract_amount_candidates,
format_decimal_amount,
is_amount_match_date_fragment,
is_date_like_amount_candidate,
is_probable_year_amount,
parse_document_amount_value,
parse_plain_document_amount_value,
resolve_document_field_amount,
resolve_document_item_amount,
resolve_document_text_amount,
)
from app.services.expense_rule_runtime import (
DEFAULT_SCENE_RULE_ASSET_CODE,
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
resolve_document_type_label,
)
from app.services.ocr import OcrService
class ExpenseClaimAttachmentDocumentMixin:
def _build_attachment_payload(self, item: ExpenseClaimItem) -> dict[str, Any]:
file_path, media_type, filename = self._resolve_item_attachment_content(item)
metadata = self._attachment_storage.read_meta(file_path)
metadata = self._repair_pdf_text_layer_metadata_if_needed(
file_path=file_path,
metadata=metadata,
item=item,
)
uploaded_at_value = metadata.get("uploaded_at")
uploaded_at = None
if isinstance(uploaded_at_value, str) and uploaded_at_value.strip():
try:
uploaded_at = datetime.fromisoformat(uploaded_at_value)
except ValueError:
uploaded_at = None
analysis = metadata.get("analysis")
if not isinstance(analysis, dict):
analysis = None
document_info = metadata.get("document_info")
if not isinstance(document_info, dict):
document_info = None
requirement_check = metadata.get("requirement_check")
if not isinstance(requirement_check, dict):
requirement_check = None
preview_kind = str(metadata.get("preview_kind") or "").strip()
previewable = bool(metadata.get("previewable", self._attachment_presentation.is_previewable_media_type(media_type, filename)))
preview_url = self._attachment_presentation.build_preview_client_path(item.claim_id, item.id) if previewable else ""
return {
"file_name": str(metadata.get("file_name") or filename),
"storage_key": str(item.invoice_id or ""),
"media_type": str(metadata.get("media_type") or media_type),
"size_bytes": int(metadata.get("size_bytes") or file_path.stat().st_size),
"uploaded_at": uploaded_at,
"previewable": previewable,
"preview_kind": preview_kind or self._attachment_presentation.resolve_preview_kind(media_type, filename),
"preview_url": preview_url,
"analysis": analysis,
"document_info": document_info,
"requirement_check": requirement_check,
}
def _build_attachment_document_info(self, document: Any) -> dict[str, Any]:
insight = build_document_insight(
filename=str(getattr(document, "filename", "") or ""),
summary=str(getattr(document, "summary", "") or ""),
text=str(getattr(document, "text", "") or ""),
)
document_type = str(getattr(document, "document_type", "") or "").strip()
if document_type in {"", "other"}:
document_type = insight.document_type
document_type_label = str(getattr(document, "document_type_label", "") or "").strip()
if not document_type_label or document_type_label == "其他单据":
document_type_label = insight.document_type_label
scene_code = str(getattr(document, "scene_code", "") or "").strip()
if scene_code in {"", "other"}:
scene_code = insight.scene_code
scene_label = str(getattr(document, "scene_label", "") or "").strip()
if not scene_label or scene_label == "其他票据":
scene_label = insight.scene_label
raw_fields = list(getattr(document, "document_fields", []) or [])
normalized_fields: list[dict[str, str]] = []
for item in raw_fields:
key = ""
label = ""
value = ""
if isinstance(item, dict):
key = str(item.get("key") or "").strip()
label = str(item.get("label") or "").strip()
value = str(item.get("value") or "").strip()
else:
key = str(getattr(item, "key", "") or "").strip()
label = str(getattr(item, "label", "") or "").strip()
value = str(getattr(item, "value", "") or "").strip()
if key and label and value:
label = self._resolve_document_field_display_label(
document_type=document_type,
key=key,
label=label,
)
normalized_fields.append(
{
"key": key,
"label": label,
"value": value,
}
)
if not normalized_fields:
normalized_fields = [
{
"key": field.key,
"label": field.label,
"value": field.value,
}
for field in insight.fields
if field.value
]
return {
"document_type": document_type,
"document_type_label": document_type_label,
"scene_code": scene_code,
"scene_label": scene_label,
"fields": normalized_fields,
}
@staticmethod
def _resolve_document_field_display_label(
*,
document_type: str,
key: str,
label: str,
) -> str:
trip_label = DOCUMENT_TRIP_DATE_LABELS.get(
str(document_type or "").strip().lower()
)
if not trip_label:
return label
normalized_key = str(key or "").strip().lower().replace("_", "")
normalized_label = str(label or "").replace(" ", "")
if normalized_key in DOCUMENT_INVOICE_DATE_KEYS or any(
token in normalized_label for token in DOCUMENT_INVOICE_DATE_LABEL_TOKENS
):
return label
is_date_field = (
normalized_key
in DOCUMENT_TRIP_DATE_KEYS
| DOCUMENT_GENERIC_DATE_KEYS
or any(
token in normalized_label
for token in (
*DOCUMENT_TRIP_DATE_LABEL_TOKENS,
*DOCUMENT_GENERIC_DATE_LABEL_TOKENS,
)
)
)
return trip_label if is_date_field else label
def _backfill_item_type_from_attachment(
self,
*,
item: ExpenseClaimItem,
document_info: dict[str, Any],
) -> None:
current_type = str(item.item_type or "").strip().lower()
if current_type not in GENERIC_ATTACHMENT_BACKFILL_ITEM_TYPES:
return
document_type = str(document_info.get("document_type") or "").strip()
mapped_type = DOCUMENT_TYPE_ITEM_TYPE_MAP.get(document_type)
if mapped_type:
item.item_type = mapped_type
def _backfill_item_amount_from_attachment(
self,
*,
item: ExpenseClaimItem,
document: Any,
document_info: dict[str, Any],
) -> None:
current_amount = Decimal(item.item_amount or Decimal("0.00")).quantize(Decimal("0.01"))
if current_amount > Decimal("0.00"):
return
amount = self._resolve_document_item_amount(
{
"document_fields": document_info.get("fields") or [],
"summary": str(getattr(document, "summary", "") or ""),
"text": str(getattr(document, "text", "") or ""),
}
)
if amount is not None and amount > Decimal("0.00"):
item.item_amount = amount
def _backfill_item_date_from_attachment(
self,
*,
item: ExpenseClaimItem,
document: Any,
document_info: dict[str, Any],
) -> None:
document_payload = {
"document_type": str(document_info.get("document_type") or "").strip(),
"scene_code": str(document_info.get("scene_code") or "").strip(),
"summary": str(getattr(document, "summary", "") or "").strip(),
"text": str(getattr(document, "text", "") or "").strip(),
"document_fields": list(document_info.get("fields") or []),
}
parsed = self._resolve_document_item_date_candidate(document_payload)
if parsed is not None:
item.item_date = parsed
def _backfill_item_reason_from_attachment(
self,
*,
item: ExpenseClaimItem,
document: Any,
document_info: dict[str, Any],
) -> None:
reason = self._resolve_document_item_reason(
{
"document_type": str(document_info.get("document_type") or "").strip(),
"scene_code": str(document_info.get("scene_code") or "").strip(),
"scene_label": str(document_info.get("scene_label") or "").strip(),
"document_fields": document_info.get("fields") or [],
"summary": str(getattr(document, "summary", "") or ""),
"text": str(getattr(document, "text", "") or ""),
},
fallback=str(item.item_reason or "").strip(),
)
if reason:
item.item_reason = reason

View File

@@ -0,0 +1,495 @@
from __future__ import annotations
import json
import re
import shutil
import uuid
from collections import defaultdict
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal, InvalidOperation
from pathlib import Path
from types import SimpleNamespace
from typing import Any
from sqlalchemy import func, or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from sqlalchemy.exc import IntegrityError
from sqlalchemy.orm import Session, selectinload
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.employee import Employee
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.ontology import OntologyEntity, OntologyParseResult
from app.schemas.reimbursement import (
ExpenseClaimItemCreate,
ExpenseClaimItemUpdate,
ExpenseClaimUpdate,
TravelReimbursementCalculatorRequest,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.agent_foundation import AgentFoundationService
from app.services.audit import AuditLogService
from app.services.document_intelligence import build_document_insight
from app.services.expense_claim_access_policy import ExpenseClaimAccessPolicy
from app.services.expense_claim_attachment_presentation import ExpenseClaimAttachmentPresentation
from app.services.expense_claim_attachment_storage import ExpenseClaimAttachmentStorage
from app.services.expense_claim_constants import (
EXPENSE_TYPE_LABELS,
MAX_DRAFT_CLAIMS_PER_USER,
EDITABLE_CLAIM_STATUSES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_DETAIL_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
DOCUMENT_TYPE_ITEM_TYPE_MAP,
DOCUMENT_TYPE_SCENE_MAP,
DOCUMENT_FACT_ITEM_TYPES,
ROUTE_DESCRIPTION_ITEM_TYPES,
DOCUMENT_TRIP_DATE_LABELS,
DOCUMENT_TRIP_DATE_REQUIREMENT_LABELS,
DOCUMENT_TRIP_DATE_KEYS,
DOCUMENT_GENERIC_DATE_KEYS,
DOCUMENT_INVOICE_DATE_KEYS,
DOCUMENT_TRIP_DATE_LABEL_TOKENS,
DOCUMENT_GENERIC_DATE_LABEL_TOKENS,
DOCUMENT_INVOICE_DATE_LABEL_TOKENS,
DOCUMENT_ROUTE_FORMAT_PATTERN,
DOCUMENT_ROUTE_TEXT_PATTERN,
DOCUMENT_ROUTE_ORIGIN_LABELS,
DOCUMENT_ROUTE_DESTINATION_LABELS,
GENERIC_ATTACHMENT_BACKFILL_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
EXPENSE_SCENE_KEYWORDS,
EXPENSE_TYPE_ALLOWED_DOCUMENT_SCENES,
DOCUMENT_SCENE_LABELS,
DOCUMENT_ASSOCIATION_REVIEW_ACTIONS,
PERSISTENT_EXPENSE_REVIEW_ACTIONS,
RETURN_REASON_OPTIONS,
MAX_CLAIM_NO_RETRY_ATTEMPTS,
DOCUMENT_DATE_PATTERN,
SYSTEM_GENERATED_REASON_PREFIXES,
LEADING_REASON_TIME_PATTERNS,
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
TRAVEL_REVIEW_RELEVANT_EXPENSE_TYPES,
TRAVEL_REVIEW_LONG_DISTANCE_DOCUMENT_TYPES,
TRAVEL_POLICY_CITY_TIERS,
TRAVEL_POLICY_CITY_MATCH_ORDER,
TRAVEL_POLICY_BAND_LABELS,
TRAVEL_POLICY_HOTEL_LIMITS,
TRAVEL_POLICY_ALLOWED_TRANSPORT_LEVELS,
TRAVEL_POLICY_ROUTE_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_STANDARD_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_FLIGHT_CLASS_PATTERNS,
TRAVEL_POLICY_TRAIN_CLASS_PATTERNS,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_claim_risk_review import ExpenseClaimRiskReviewMixin
from app.services.expense_amounts import (
extract_amount_candidates,
format_decimal_amount,
is_amount_match_date_fragment,
is_date_like_amount_candidate,
is_probable_year_amount,
parse_document_amount_value,
parse_plain_document_amount_value,
resolve_document_field_amount,
resolve_document_item_amount,
resolve_document_text_amount,
)
from app.services.expense_rule_runtime import (
DEFAULT_SCENE_RULE_ASSET_CODE,
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
resolve_document_type_label,
)
from app.services.ocr import OcrService
class ExpenseClaimAttachmentOperationsMixin:
def upload_claim_item_attachment(
self,
*,
claim_id: str,
item_id: str,
filename: str,
content: bytes,
media_type: str | None,
current_user: CurrentUserContext,
) -> dict[str, Any] | None:
claim, item = self._get_claim_item_or_raise(
claim_id=claim_id,
item_id=item_id,
current_user=current_user,
)
if claim is None:
return None
self._ensure_draft_claim(claim)
self._ensure_mutable_claim_item(item)
normalized_name = self._attachment_storage.normalize_filename(filename)
if not content:
raise ValueError("上传文件不能为空。")
before_json = self._serialize_claim(claim)
attachment_dir = self._attachment_storage.build_item_dir(claim.id, item.id)
shutil.rmtree(attachment_dir, ignore_errors=True)
attachment_dir.mkdir(parents=True, exist_ok=True)
file_path = attachment_dir / normalized_name
file_path.write_bytes(content)
resolved_media_type = self._attachment_presentation.resolve_media_type(
normalized_name,
fallback=media_type,
)
attachment_analysis = self._build_fallback_attachment_analysis(
media_type=media_type,
item=item,
)
ocr_document = None
document_info = None
requirement_check = None
ocr_status = "empty"
ocr_error = ""
try:
ocr_result = OcrService(self.db).recognize_files(
[(normalized_name, content, media_type or "application/octet-stream")]
)
documents = list(ocr_result.documents or [])
if documents:
ocr_document = documents[0]
ocr_status = "recognized"
document_info = self._build_attachment_document_info(ocr_document)
self._backfill_item_type_from_attachment(
item=item,
document_info=document_info,
)
self._backfill_item_amount_from_attachment(
item=item,
document=ocr_document,
document_info=document_info,
)
self._backfill_item_date_from_attachment(
item=item,
document=ocr_document,
document_info=document_info,
)
self._backfill_item_reason_from_attachment(
item=item,
document=ocr_document,
document_info=document_info,
)
requirement_check = self._build_attachment_requirement_check(
item=item,
document_info=document_info,
)
attachment_analysis = self._build_attachment_analysis(
document=ocr_document,
item=item,
claim=claim,
document_info=document_info,
requirement_check=requirement_check,
)
except Exception as exc: # pragma: no cover - fallback path depends on OCR runtime
ocr_status = "failed"
ocr_error = str(exc)
attachment_analysis = self._build_failed_ocr_attachment_analysis(
media_type=media_type,
error_message=ocr_error,
item=item,
)
item.invoice_id = self._attachment_storage.to_storage_key(file_path)
preview_meta = self._attachment_presentation.build_preview_meta(
file_path=file_path,
media_type=resolved_media_type,
ocr_document=ocr_document,
)
meta = {
"file_name": normalized_name,
"storage_key": item.invoice_id,
"media_type": resolved_media_type,
"size_bytes": len(content),
"uploaded_at": datetime.now(UTC).isoformat(),
"previewable": bool(preview_meta["previewable"]),
"preview_kind": str(preview_meta["preview_kind"]),
"preview_storage_key": str(preview_meta["preview_storage_key"]),
"preview_media_type": str(preview_meta["preview_media_type"]),
"preview_file_name": str(preview_meta["preview_file_name"]),
"analysis": attachment_analysis,
"document_info": document_info,
"requirement_check": requirement_check,
"ocr_status": ocr_status,
"ocr_error": ocr_error,
"ocr_text": str(getattr(ocr_document, "text", "") or ""),
"ocr_summary": str(getattr(ocr_document, "summary", "") or ""),
"ocr_avg_score": float(getattr(ocr_document, "avg_score", 0.0) or 0.0),
"ocr_line_count": int(getattr(ocr_document, "line_count", 0) or 0),
"ocr_classification_source": str(getattr(ocr_document, "classification_source", "") or ""),
"ocr_classification_confidence": float(getattr(ocr_document, "classification_confidence", 0.0) or 0.0),
"ocr_classification_evidence": [
str(item)
for item in getattr(ocr_document, "classification_evidence", []) or []
if str(item).strip()
],
"ocr_warnings": [str(item) for item in getattr(ocr_document, "warnings", []) or []],
}
self._attachment_storage.write_meta(file_path, meta)
self._sync_claim_from_items(claim)
self.db.commit()
self.db.refresh(claim)
self.audit_service.log_action(
actor=current_user.name or current_user.username,
action="expense_claim.attachment_upload",
resource_type="expense_claim",
resource_id=claim.id,
before_json=before_json,
after_json=self._serialize_claim(claim),
)
return {
"message": f"{normalized_name} 已上传并关联到当前费用明细。",
"claim_id": claim.id,
"item_id": item.id,
"invoice_id": item.invoice_id,
"item_date": item.item_date.isoformat() if item.item_date else None,
"item_type": item.item_type,
"item_reason": item.item_reason,
"item_location": item.item_location,
"item_amount": item.item_amount,
"claim_amount": claim.amount,
"attachment": self._build_attachment_payload(item),
}
def get_claim_item_attachment_meta(
self,
*,
claim_id: str,
item_id: str,
current_user: CurrentUserContext,
) -> dict[str, Any] | None:
claim, item = self._get_claim_item_or_raise(
claim_id=claim_id,
item_id=item_id,
current_user=current_user,
)
if claim is None:
return None
return self._build_attachment_payload(item)
def get_claim_item_attachment_content(
self,
*,
claim_id: str,
item_id: str,
current_user: CurrentUserContext,
) -> tuple[Path, str, str] | None:
claim, item = self._get_claim_item_or_raise(
claim_id=claim_id,
item_id=item_id,
current_user=current_user,
)
if claim is None:
return None
return self._resolve_item_attachment_content(item)
def get_claim_item_attachment_preview_content(
self,
*,
claim_id: str,
item_id: str,
current_user: CurrentUserContext,
) -> tuple[Path, str, str] | None:
claim, item = self._get_claim_item_or_raise(
claim_id=claim_id,
item_id=item_id,
current_user=current_user,
)
if claim is None:
return None
return self._resolve_item_attachment_preview_content(item)
def delete_claim_item_attachment(
self,
*,
claim_id: str,
item_id: str,
current_user: CurrentUserContext,
) -> dict[str, Any] | None:
claim, item = self._get_claim_item_or_raise(
claim_id=claim_id,
item_id=item_id,
current_user=current_user,
)
if claim is None:
return None
self._ensure_draft_claim(claim)
self._ensure_mutable_claim_item(item)
before_json = self._serialize_claim(claim)
previous_name = self._attachment_presentation.resolve_display_name(item.invoice_id)
self._attachment_storage.delete_item_files(item)
item.invoice_id = None
self._sync_claim_from_items(claim)
self.db.commit()
self.db.refresh(claim)
self.audit_service.log_action(
actor=current_user.name or current_user.username,
action="expense_claim.attachment_delete",
resource_type="expense_claim",
resource_id=claim.id,
before_json=before_json,
after_json=self._serialize_claim(claim),
)
return {
"message": f"{previous_name or '附件'} 已删除。",
"claim_id": claim.id,
"item_id": item.id,
"invoice_id": item.invoice_id,
"attachment": None,
}
def _get_claim_item_or_raise(
self,
*,
claim_id: str,
item_id: str,
current_user: CurrentUserContext,
) -> tuple[ExpenseClaim | None, ExpenseClaimItem]:
claim = self.get_claim(claim_id, current_user)
if claim is None:
return None, None # type: ignore[return-value]
item = next((entry for entry in claim.items if entry.id == item_id), None)
if item is None:
raise LookupError("Item not found")
return claim, item
def _resolve_item_attachment_content(self, item: ExpenseClaimItem) -> tuple[Path, str, str]:
file_path = self._attachment_storage.resolve_item_path(item)
if file_path is None or not file_path.exists():
raise FileNotFoundError("Attachment not found")
metadata = self._attachment_storage.read_meta(file_path)
filename = str(metadata.get("file_name") or file_path.name)
media_type = self._attachment_presentation.resolve_media_type(
filename,
fallback=str(metadata.get("media_type") or ""),
)
return file_path, media_type, filename
def _repair_pdf_text_layer_metadata_if_needed(
self,
*,
file_path: Path,
metadata: dict[str, Any],
item: ExpenseClaimItem | None = None,
) -> dict[str, Any]:
if not metadata:
return metadata
media_type = str(metadata.get("media_type") or self._attachment_presentation.resolve_media_type(file_path.name)).strip()
if media_type != "application/pdf":
return metadata
ocr_text = str(metadata.get("ocr_text") or "")
ocr_summary = str(metadata.get("ocr_summary") or "")
if OcrService._placeholder_ratio(f"{ocr_summary}\n{ocr_text}") < 0.12:
return metadata
text_layer = OcrService(self.db)._extract_pdf_text_layer(file_path)
repaired_text, used_text_layer = OcrService._choose_document_text(
ocr_text=ocr_text,
text_layer=text_layer,
)
if not used_text_layer or not repaired_text:
return metadata
repaired_summary = OcrService._summarize_text(repaired_text)
document = SimpleNamespace(
filename=str(metadata.get("file_name") or file_path.name),
text=repaired_text,
summary=repaired_summary,
avg_score=float(metadata.get("ocr_avg_score") or 0.0),
line_count=int(metadata.get("ocr_line_count") or 0),
document_type="",
document_type_label="",
scene_code="",
scene_label="",
document_fields=[],
warnings=[str(value) for value in list(metadata.get("ocr_warnings") or []) if str(value).strip()],
)
document_info = self._build_attachment_document_info(document)
document.document_type = document_info.get("document_type", "")
document.document_type_label = document_info.get("document_type_label", "")
document.scene_code = document_info.get("scene_code", "")
document.scene_label = document_info.get("scene_label", "")
document.document_fields = list(document_info.get("fields") or [])
metadata["ocr_text"] = repaired_text
metadata["ocr_summary"] = repaired_summary
metadata["document_info"] = document_info
metadata["previewable"] = True
metadata["preview_kind"] = "pdf"
metadata["preview_storage_key"] = str(
metadata.get("storage_key") or self._attachment_storage.to_storage_key(file_path)
)
metadata["preview_media_type"] = "application/pdf"
metadata["preview_file_name"] = str(metadata.get("file_name") or file_path.name)
if item is not None:
requirement_check = self._build_attachment_requirement_check(
item=item,
document_info=document_info,
)
metadata["requirement_check"] = requirement_check
metadata["analysis"] = self._build_attachment_analysis(
document=document,
item=item,
claim=getattr(item, "claim", None),
document_info=document_info,
requirement_check=requirement_check,
)
self._attachment_storage.write_meta(file_path, metadata)
return metadata
def _resolve_item_attachment_preview_content(self, item: ExpenseClaimItem) -> tuple[Path, str, str]:
file_path, media_type, filename = self._resolve_item_attachment_content(item)
metadata = self._attachment_storage.read_meta(file_path)
metadata = self._repair_pdf_text_layer_metadata_if_needed(
file_path=file_path,
metadata=metadata,
item=item,
)
preview_storage_key = str(metadata.get("preview_storage_key") or "").strip()
preview_file_name = str(metadata.get("preview_file_name") or "").strip()
preview_media_type = str(metadata.get("preview_media_type") or "").strip()
if preview_storage_key:
preview_path = self._attachment_storage.resolve_path(preview_storage_key)
if preview_path is not None and preview_path.exists():
resolved_name = preview_file_name or preview_path.name
resolved_media_type = self._attachment_presentation.resolve_media_type(
resolved_name,
fallback=preview_media_type,
)
return preview_path, resolved_media_type, resolved_name
if self._attachment_presentation.is_previewable_media_type(media_type, filename):
return file_path, media_type, filename
raise FileNotFoundError("Attachment preview not found")

View File

@@ -0,0 +1,138 @@
from __future__ import annotations
import base64
import binascii
import mimetypes
import re
from pathlib import Path
from typing import Any
from urllib.parse import quote
from app.services.expense_claim_attachment_storage import ExpenseClaimAttachmentStorage
class ExpenseClaimAttachmentPresentation:
def __init__(self, storage: ExpenseClaimAttachmentStorage) -> None:
self.storage = storage
def build_preview_meta(
self,
*,
file_path: Path,
media_type: str,
ocr_document: Any | None,
) -> dict[str, Any]:
filename = file_path.name
storage_key = self.storage.to_storage_key(file_path)
preview_kind = self.resolve_preview_kind(media_type, filename)
preview_data_url = str(getattr(ocr_document, "preview_data_url", "") or "").strip()
preview_source_kind = str(getattr(ocr_document, "preview_kind", "") or "").strip()
if preview_source_kind == "image" and preview_data_url:
preview_asset = self._write_preview_asset_from_data_url(
attachment_dir=file_path.parent,
original_filename=filename,
preview_data_url=preview_data_url,
)
if preview_asset is not None:
preview_path, preview_media_type, preview_file_name = preview_asset
return {
"previewable": True,
"preview_kind": "image",
"preview_storage_key": self.storage.to_storage_key(preview_path),
"preview_media_type": preview_media_type,
"preview_file_name": preview_file_name,
}
if preview_kind:
return {
"previewable": True,
"preview_kind": preview_kind,
"preview_storage_key": storage_key,
"preview_media_type": media_type,
"preview_file_name": filename,
}
return {
"previewable": False,
"preview_kind": "",
"preview_storage_key": "",
"preview_media_type": "",
"preview_file_name": "",
}
@staticmethod
def resolve_preview_kind(media_type: str | None, filename: str) -> str:
resolved = str(media_type or "").strip() or (mimetypes.guess_type(filename)[0] or "")
if resolved.startswith("image/"):
return "image"
if resolved == "application/pdf":
return "pdf"
return ""
@staticmethod
def decode_data_url(payload: str) -> tuple[str, bytes] | None:
normalized = str(payload or "").strip()
matched = re.match(r"^data:(?P<media>[\w.+-]+/[\w.+-]+);base64,(?P<body>.+)$", normalized, flags=re.DOTALL)
if not matched:
return None
try:
content = base64.b64decode(matched.group("body"), validate=True)
except (binascii.Error, ValueError):
return None
return matched.group("media"), content
def _write_preview_asset_from_data_url(
self,
*,
attachment_dir: Path,
original_filename: str,
preview_data_url: str,
) -> tuple[Path, str, str] | None:
decoded = self.decode_data_url(preview_data_url)
if decoded is None:
return None
preview_media_type, preview_content = decoded
suffix = mimetypes.guess_extension(preview_media_type) or ".bin"
preview_name = f"{Path(original_filename).stem}.preview{suffix}"
preview_path = attachment_dir / preview_name
preview_path.write_bytes(preview_content)
return preview_path, preview_media_type, preview_name
@staticmethod
def build_preview_client_path(claim_id: str, item_id: str) -> str:
return (
"/reimbursements/claims/"
f"{quote(str(claim_id or '').strip(), safe='')}"
f"/items/{quote(str(item_id or '').strip(), safe='')}/attachment/preview"
)
@staticmethod
def resolve_media_type(filename: str, *, fallback: str | None = None) -> str:
guessed = mimetypes.guess_type(filename)[0]
return str(guessed or fallback or "application/octet-stream")
@staticmethod
def is_previewable_media_type(media_type: str | None, filename: str) -> bool:
resolved = str(media_type or "").strip() or (mimetypes.guess_type(filename)[0] or "")
return resolved.startswith("image/") or resolved == "application/pdf"
@staticmethod
def resolve_display_name(storage_key: str | None) -> str:
return Path(str(storage_key or "").strip()).name
@classmethod
def merge_reference(cls, current_invoice_id: str | None, next_invoice_id: str | None) -> str | None:
normalized_next = str(next_invoice_id or "").strip()
if not normalized_next:
return None
normalized_current = str(current_invoice_id or "").strip()
if (
normalized_current
and cls.resolve_display_name(normalized_current) == cls.resolve_display_name(normalized_next)
):
return normalized_current
return normalized_next

View File

@@ -0,0 +1,129 @@
from __future__ import annotations
import json
import re
import shutil
from pathlib import Path
from app.core.config import get_settings
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
class ExpenseClaimAttachmentStorage:
"""Centralizes filesystem operations for expense claim attachments."""
def root(self) -> Path:
return (get_settings().resolved_storage_root_dir / "expense_claims").resolve()
def build_item_dir(self, claim_id: str, item_id: str) -> Path:
return (self.root() / claim_id / item_id).resolve()
def delete_claim_files(self, claim: ExpenseClaim) -> None:
for item in list(claim.items or []):
self.delete_item_files(item)
self.delete_claim_root(claim.id)
def delete_claim_root(self, claim_id: str) -> None:
claim_root = self._assert_child(self.root() / claim_id)
self._delete_path(claim_root)
@staticmethod
def normalize_filename(filename: str | None) -> str:
normalized = Path(str(filename or "").strip()).name
normalized = re.sub(r"[^\w.\-\u4e00-\u9fff]+", "_", normalized).strip("._")
suffix = Path(normalized).suffix
if normalized:
return normalized
return f"attachment{suffix or '.bin'}"
def resolve_path(self, storage_key: str | None) -> Path | None:
normalized = str(storage_key or "").strip()
if not normalized:
return None
root = self.root()
path = (root / normalized).resolve()
try:
path.relative_to(root)
except ValueError as exc:
raise FileNotFoundError("Attachment path is invalid") from exc
return path
def resolve_item_path(self, item: ExpenseClaimItem) -> Path | None:
if not str(item.invoice_id or "").strip():
return None
file_path = self.resolve_path(item.invoice_id)
if file_path is not None and file_path.exists():
return file_path
filename = self.normalize_filename(item.invoice_id)
if not filename:
return file_path
fallback_path = (self.build_item_dir(item.claim_id, item.id) / filename).resolve()
try:
fallback_path.relative_to(self.root())
except ValueError as exc:
raise FileNotFoundError("Attachment path is invalid") from exc
return fallback_path
def to_storage_key(self, file_path: Path) -> str:
return file_path.resolve().relative_to(self.root()).as_posix()
def delete_item_files(self, item: ExpenseClaimItem) -> None:
file_path = self.resolve_item_path(item)
if file_path is None:
return
root = self.root()
if file_path.parent == root:
self._delete_path(file_path)
self._delete_path(self.meta_path(file_path))
return
self._delete_path(file_path.parent)
@staticmethod
def meta_path(file_path: Path) -> Path:
return file_path.with_name(f"{file_path.name}.meta.json")
def write_meta(self, file_path: Path, payload: dict) -> None:
meta_path = self.meta_path(file_path)
meta_path.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8")
def read_meta(self, file_path: Path) -> dict:
meta_path = self.meta_path(file_path)
if not meta_path.exists():
return {}
try:
payload = json.loads(meta_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return {}
return payload if isinstance(payload, dict) else {}
def _assert_child(self, path: Path) -> Path:
root = self.root()
resolved = path.resolve()
try:
resolved.relative_to(root)
except ValueError as exc:
raise FileNotFoundError("Attachment path is invalid") from exc
return resolved
def _delete_path(self, path: Path | None) -> None:
if path is None:
return
target = self._assert_child(path)
if not target.exists():
return
if target.is_dir():
shutil.rmtree(target)
else:
target.unlink()
if target.exists():
raise OSError(f"Attachment path was not deleted: {target}")

View File

@@ -0,0 +1,361 @@
from __future__ import annotations
import re
from decimal import Decimal
EXPENSE_TYPE_LABELS = {
"travel": "差旅",
"train_ticket": "火车票",
"flight_ticket": "机票",
"hotel_ticket": "住宿票",
"ride_ticket": "乘车",
"travel_allowance": "出差补贴",
"hotel": "住宿",
"transport": "交通",
"meal": "餐费",
"meeting": "会务",
"entertainment": "招待",
"office": "办公",
"training": "培训",
"communication": "通讯",
"welfare": "福利",
}
MAX_DRAFT_CLAIMS_PER_USER = 3
EDITABLE_CLAIM_STATUSES = ("draft", "supplement", "returned")
SYSTEM_GENERATED_ITEM_TYPES = {"travel_allowance"}
TRAVEL_DETAIL_ITEM_TYPES = {
"train_ticket",
"flight_ticket",
"hotel_ticket",
"ride_ticket",
"travel_allowance",
}
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES = {"train_ticket", "flight_ticket"}
DOCUMENT_TYPE_ITEM_TYPE_MAP = {
"train_ticket": "train_ticket",
"flight_itinerary": "flight_ticket",
"hotel_invoice": "hotel_ticket",
"taxi_receipt": "ride_ticket",
"transport_receipt": "ride_ticket",
}
DOCUMENT_TYPE_SCENE_MAP = {
"train_ticket": "travel",
"flight_itinerary": "travel",
"hotel_invoice": "hotel",
"taxi_receipt": "transport",
"transport_receipt": "transport",
"parking_toll_receipt": "transport",
"meal_receipt": "meal",
"office_invoice": "office",
"meeting_invoice": "meeting",
"training_invoice": "training",
}
DOCUMENT_FACT_ITEM_TYPES = {"train_ticket", "flight_ticket", "hotel_ticket", "ride_ticket", "ship_ticket", "ferry_ticket"}
ROUTE_DESCRIPTION_ITEM_TYPES = {"train_ticket", "flight_ticket", "ship_ticket", "ferry_ticket", "ride_ticket"}
DOCUMENT_TRIP_DATE_LABELS = {
"train_ticket": "列车出发时间",
"flight_itinerary": "起飞日期",
"taxi_receipt": "乘车时间",
"transport_receipt": "乘车时间",
"parking_toll_receipt": "通行日期",
}
DOCUMENT_TRIP_DATE_REQUIREMENT_LABELS = {
"train_ticket": "列车出发时间或乘车日期",
"flight_itinerary": "起飞日期或航班日期",
"taxi_receipt": "乘车时间",
"transport_receipt": "乘车时间",
"parking_toll_receipt": "通行日期",
"hotel_invoice": "入住或离店日期",
}
DOCUMENT_TRIP_DATE_KEYS = {
"traveldate",
"tripdate",
"journeydate",
"departuredate",
"departuretime",
"departdate",
"departtime",
"boardingdate",
"boardingtime",
"traindate",
"traintime",
"traindeparturetime",
"scheduleddeparturetime",
"flightdate",
"flighttime",
"ridedate",
"ridetime",
"pickuptime",
"starttime",
}
DOCUMENT_GENERIC_DATE_KEYS = {"date", "time", "occurredat", "occurreddate", "businessdate"}
DOCUMENT_INVOICE_DATE_KEYS = {"issuedat", "issuedate", "invoicedate", "billingdate"}
DOCUMENT_TRIP_DATE_LABEL_TOKENS = (
"出发日期",
"出发时间",
"列车出发时间",
"发车日期",
"发车时间",
"开车时间",
"乘车日期",
"乘车时间",
"起飞日期",
"航班日期",
"行程日期",
"上车时间",
"用车时间",
"通行日期",
)
DOCUMENT_GENERIC_DATE_LABEL_TOKENS = ("日期", "时间", "发生时间", "业务发生日期")
DOCUMENT_INVOICE_DATE_LABEL_TOKENS = ("开票日期", "发票日期")
DOCUMENT_ROUTE_FORMAT_PATTERN = re.compile(
r"^[A-Za-z0-9\u4e00-\u9fa5()·]{2,40}\s*-\s*"
r"[A-Za-z0-9\u4e00-\u9fa5()·]{2,40}$"
)
DOCUMENT_ROUTE_TEXT_PATTERN = re.compile(
r"([A-Za-z0-9\u4e00-\u9fa5()·]{2,40})\s*(?:至|到|→|->|—||-)\s*"
r"([A-Za-z0-9\u4e00-\u9fa5()·]{2,40})"
)
DOCUMENT_ROUTE_ORIGIN_LABELS = {"起点", "上车", "上车地点", "上车地址", "出发", "出发地", "出发站", "始发站", "乘车起点"}
DOCUMENT_ROUTE_DESTINATION_LABELS = {
"终点",
"下车",
"下车地点",
"下车地址",
"到达",
"到达地",
"到达站",
"目的地",
"乘车终点",
}
GENERIC_ATTACHMENT_BACKFILL_ITEM_TYPES = {"", "other", "travel", "transport", "hotel"}
LOCATION_REQUIRED_EXPENSE_TYPES = {"travel", "meeting", "entertainment"}
EXPENSE_SCENE_KEYWORDS = {
"travel": ("差旅", "出差", "行程"),
"hotel": ("酒店", "住宿", "房费", "客房", "入住", "离店"),
"transport": (
"交通",
"打车",
"出租车",
"网约车",
"滴滴",
"出行",
"乘车",
"用车",
"叫车",
"车费",
"车资",
"的士",
"高铁",
"动车",
"火车",
"机票",
"航班",
"行程单",
"登机",
"客票",
"公交",
"地铁",
"过路费",
"通行费",
"停车",
),
"meal": ("餐饮", "餐费", "用餐", "外卖", "快餐", "酒楼", "饭店", "饭馆", "食品", "咖啡"),
"entertainment": ("招待", "宴请", "接待", "客户餐", "商务餐", "业务招待"),
"office": ("办公", "办公用品", "文具", "耗材", "打印", "纸张", "硒鼓", "墨盒", "鼠标", "键盘", "电脑"),
"meeting": ("会议", "会务", "会展", "会议室", "会场", "场地费", "论坛"),
"training": ("培训", "课程", "讲师", "教材", "学费", "认证"),
}
EXPENSE_TYPE_ALLOWED_DOCUMENT_SCENES = {
"travel": {"travel", "hotel", "transport", "meal"},
"train_ticket": {"travel"},
"flight_ticket": {"travel"},
"hotel_ticket": {"hotel"},
"ride_ticket": {"transport"},
"travel_allowance": set(),
"hotel": {"hotel"},
"transport": {"transport", "travel"},
"meal": {"meal", "entertainment"},
"entertainment": {"entertainment", "meal"},
"office": {"office"},
"meeting": {"meeting"},
"training": {"training"},
}
DOCUMENT_SCENE_LABELS = {
"travel": "差旅",
"hotel": "住宿",
"transport": "交通",
"meal": "餐饮",
"entertainment": "业务招待",
"office": "办公用品",
"meeting": "会务",
"training": "培训",
"other": "其他票据",
}
DOCUMENT_ASSOCIATION_REVIEW_ACTIONS = {
"link_to_existing_draft",
"create_new_claim_from_documents",
}
PERSISTENT_EXPENSE_REVIEW_ACTIONS = {
"save_draft",
"next_step",
*DOCUMENT_ASSOCIATION_REVIEW_ACTIONS,
}
RETURN_REASON_OPTIONS = {
"missing_attachment": "附件缺失或不清晰",
"invoice_mismatch": "票据类型/金额与明细不一致",
"over_policy": "超出制度标准或缺少超标说明",
"business_explanation": "业务事由/地点/人员信息不完整",
"duplicate_or_abnormal": "疑似重复或异常票据",
"approval_question": "审批人需要补充说明",
}
MAX_CLAIM_NO_RETRY_ATTEMPTS = 3
DOCUMENT_DATE_PATTERN = re.compile(r"((?:20\d{2}|19\d{2})[-/年.](?:1[0-2]|0?[1-9])[-/月.](?:3[01]|[12]\d|0?[1-9])日?)")
SYSTEM_GENERATED_REASON_PREFIXES = (
"我上传了",
"请按当前已识别信息",
"请把当前上传的票据",
"请基于当前上传的多张票据",
"我已核对右侧识别结果",
"请同步修正逐票据识别结果",
"我已修改识别信息",
"查看报销草稿",
"请解释一下当前这笔报销的合规风险和待补充项",
)
LEADING_REASON_TIME_PATTERNS = (
re.compile(
r"^\s*(?:识别事项(?:有)?[:]\s*)?"
r"(?:业务发生(?:时间|日期)|费用发生(?:时间|日期)|发生(?:时间|日期)|报销(?:时间|日期)|时间)[:]?\s*"
r"(?:19|20)\d{2}[-/年.]\d{1,2}[-/月.]\d{1,2}日?"
r"(?:\s*(?:至|到|~||—|-)\s*(?:19|20)\d{2}[-/年.]\d{1,2}[-/月.]\d{1,2}日?)?"
r"\s*[,。;;、]?\s*"
),
re.compile(
r"^\s*(?:19|20)\d{2}[-/年.]\d{1,2}[-/月.]\d{1,2}日?"
r"(?:\s*(?:至|到|~||—|-)\s*(?:19|20)\d{2}[-/年.]\d{1,2}[-/月.]\d{1,2}日?)?"
r"\s*[,。;;、]\s*"
),
)
AI_REVIEW_LOOKBACK_DAYS = 90
AI_REVIEW_REPEAT_RISK_WARNING_COUNT = 1
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT = 2
TRAVEL_REVIEW_RELEVANT_EXPENSE_TYPES = {"travel", "hotel", "transport"}
TRAVEL_REVIEW_LONG_DISTANCE_DOCUMENT_TYPES = {"flight_itinerary", "train_ticket"}
TRAVEL_POLICY_CITY_TIERS = {
"北京": "tier_1",
"上海": "tier_1",
"广州": "tier_1",
"深圳": "tier_1",
"杭州": "tier_2",
"南京": "tier_2",
"苏州": "tier_2",
"武汉": "tier_2",
"成都": "tier_2",
"重庆": "tier_2",
"西安": "tier_2",
"天津": "tier_2",
"宁波": "tier_2",
"厦门": "tier_2",
"青岛": "tier_2",
"长沙": "tier_2",
"郑州": "tier_2",
"合肥": "tier_2",
"济南": "tier_2",
"沈阳": "tier_2",
"大连": "tier_2",
"福州": "tier_2",
"昆明": "tier_2",
"海口": "tier_2",
"三亚": "tier_2",
"无锡": "tier_2",
"东莞": "tier_2",
"佛山": "tier_2",
}
TRAVEL_POLICY_CITY_MATCH_ORDER = tuple(
sorted(TRAVEL_POLICY_CITY_TIERS.keys(), key=lambda item: len(item), reverse=True)
)
TRAVEL_POLICY_BAND_LABELS = {
"junior": "P1-P3",
"mid": "P4-P5",
"senior": "P6-P7",
"manager": "M1-M2",
"executive": "M3及以上 / D序列",
}
TRAVEL_POLICY_HOTEL_LIMITS = {
"junior": {
"tier_1": Decimal("450.00"),
"tier_2": Decimal("380.00"),
"tier_3": Decimal("320.00"),
},
"mid": {
"tier_1": Decimal("550.00"),
"tier_2": Decimal("480.00"),
"tier_3": Decimal("380.00"),
},
"senior": {
"tier_1": Decimal("700.00"),
"tier_2": Decimal("620.00"),
"tier_3": Decimal("520.00"),
},
"manager": {
"tier_1": Decimal("900.00"),
"tier_2": Decimal("820.00"),
"tier_3": Decimal("720.00"),
},
"executive": {
"tier_1": Decimal("1200.00"),
"tier_2": Decimal("1000.00"),
"tier_3": Decimal("900.00"),
},
}
TRAVEL_POLICY_ALLOWED_TRANSPORT_LEVELS = {
"junior": {"flight": 1, "train": 1},
"mid": {"flight": 1, "train": 1},
"senior": {"flight": 2, "train": 2},
"manager": {"flight": 3, "train": 3},
"executive": {"flight": 4, "train": 3},
}
TRAVEL_POLICY_ROUTE_EXCEPTION_KEYWORDS = (
"中转",
"转机",
"经停",
"改签",
"多地出差",
"多城市",
"多站",
"异地返程",
"异地结束",
"临时变更",
"继续前往",
"第二站",
)
TRAVEL_POLICY_STANDARD_EXCEPTION_KEYWORDS = (
"超标说明",
"无直达",
"展会高峰",
"会议高峰",
"协议酒店满房",
"客户指定",
"临时改签",
"行程变更",
"红眼航班",
"晚到店",
)
TRAVEL_POLICY_FLIGHT_CLASS_PATTERNS = (
("头等舱", 4),
("公务舱", 3),
("商务舱", 3),
("超级经济舱", 2),
("高端经济舱", 2),
("明珠经济舱", 2),
("经济舱", 1),
)
TRAVEL_POLICY_TRAIN_CLASS_PATTERNS = (
("商务座", 3),
("一等座", 2),
("软卧", 2),
("二等座", 1),
("二等卧", 1),
("硬卧", 1),
)
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN = re.compile(r"(\d+)\s*(?:晚|间夜)")

View File

@@ -0,0 +1,560 @@
from __future__ import annotations
import json
import re
import shutil
import uuid
from collections import defaultdict
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal, InvalidOperation
from pathlib import Path
from types import SimpleNamespace
from typing import Any
from sqlalchemy import func, or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from sqlalchemy.exc import IntegrityError
from sqlalchemy.orm import Session, selectinload
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.employee import Employee
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.ontology import OntologyEntity, OntologyParseResult
from app.schemas.reimbursement import (
ExpenseClaimItemCreate,
ExpenseClaimItemUpdate,
ExpenseClaimUpdate,
TravelReimbursementCalculatorRequest,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.agent_foundation import AgentFoundationService
from app.services.audit import AuditLogService
from app.services.document_intelligence import build_document_insight
from app.services.expense_claim_access_policy import ExpenseClaimAccessPolicy
from app.services.expense_claim_attachment_presentation import ExpenseClaimAttachmentPresentation
from app.services.expense_claim_attachment_storage import ExpenseClaimAttachmentStorage
from app.services.expense_claim_constants import (
EXPENSE_TYPE_LABELS,
MAX_DRAFT_CLAIMS_PER_USER,
EDITABLE_CLAIM_STATUSES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_DETAIL_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
DOCUMENT_TYPE_ITEM_TYPE_MAP,
DOCUMENT_TYPE_SCENE_MAP,
DOCUMENT_FACT_ITEM_TYPES,
ROUTE_DESCRIPTION_ITEM_TYPES,
DOCUMENT_TRIP_DATE_LABELS,
DOCUMENT_TRIP_DATE_REQUIREMENT_LABELS,
DOCUMENT_TRIP_DATE_KEYS,
DOCUMENT_GENERIC_DATE_KEYS,
DOCUMENT_INVOICE_DATE_KEYS,
DOCUMENT_TRIP_DATE_LABEL_TOKENS,
DOCUMENT_GENERIC_DATE_LABEL_TOKENS,
DOCUMENT_INVOICE_DATE_LABEL_TOKENS,
DOCUMENT_ROUTE_FORMAT_PATTERN,
DOCUMENT_ROUTE_TEXT_PATTERN,
DOCUMENT_ROUTE_ORIGIN_LABELS,
DOCUMENT_ROUTE_DESTINATION_LABELS,
GENERIC_ATTACHMENT_BACKFILL_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
EXPENSE_SCENE_KEYWORDS,
EXPENSE_TYPE_ALLOWED_DOCUMENT_SCENES,
DOCUMENT_SCENE_LABELS,
DOCUMENT_ASSOCIATION_REVIEW_ACTIONS,
PERSISTENT_EXPENSE_REVIEW_ACTIONS,
RETURN_REASON_OPTIONS,
MAX_CLAIM_NO_RETRY_ATTEMPTS,
DOCUMENT_DATE_PATTERN,
SYSTEM_GENERATED_REASON_PREFIXES,
LEADING_REASON_TIME_PATTERNS,
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
TRAVEL_REVIEW_RELEVANT_EXPENSE_TYPES,
TRAVEL_REVIEW_LONG_DISTANCE_DOCUMENT_TYPES,
TRAVEL_POLICY_CITY_TIERS,
TRAVEL_POLICY_CITY_MATCH_ORDER,
TRAVEL_POLICY_BAND_LABELS,
TRAVEL_POLICY_HOTEL_LIMITS,
TRAVEL_POLICY_ALLOWED_TRANSPORT_LEVELS,
TRAVEL_POLICY_ROUTE_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_STANDARD_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_FLIGHT_CLASS_PATTERNS,
TRAVEL_POLICY_TRAIN_CLASS_PATTERNS,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_claim_risk_review import ExpenseClaimRiskReviewMixin
from app.services.expense_amounts import (
extract_amount_candidates,
format_decimal_amount,
is_amount_match_date_fragment,
is_date_like_amount_candidate,
is_probable_year_amount,
parse_document_amount_value,
parse_plain_document_amount_value,
resolve_document_field_amount,
resolve_document_item_amount,
resolve_document_text_amount,
)
from app.services.expense_rule_runtime import (
DEFAULT_SCENE_RULE_ASSET_CODE,
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
resolve_document_type_label,
)
from app.services.ocr import OcrService
class ExpenseClaimDocumentItemBuilderMixin:
def _resolve_context_documents(self, context_json: dict[str, Any]) -> list[dict[str, Any]]:
documents = context_json.get("ocr_documents")
if not isinstance(documents, list):
documents = []
normalized: list[dict[str, Any]] = []
for index, item in enumerate(documents[:10], start=1):
if not isinstance(item, dict):
continue
normalized.append(
{
"index": index,
"filename": str(item.get("filename") or "").strip(),
"summary": str(item.get("summary") or "").strip(),
"text": str(item.get("text") or "").strip(),
"document_type": str(item.get("document_type") or "").strip(),
"scene_code": str(item.get("scene_code") or "").strip(),
"scene_label": str(item.get("scene_label") or "").strip(),
"document_fields": self._normalize_document_fields(item.get("document_fields")),
}
)
overrides = context_json.get("review_document_form_values")
if not isinstance(overrides, list) or not normalized:
return normalized
override_map: dict[tuple[int, str], dict[str, Any]] = {}
for item in overrides:
if not isinstance(item, dict):
continue
filename = str(item.get("filename") or "").strip()
index = int(item.get("index") or 0)
if not filename and index <= 0:
continue
override_map[(index, filename)] = item
for item in normalized:
override = override_map.get((int(item["index"]), str(item["filename"])))
if override is None:
override = override_map.get((int(item["index"]), ""))
if override is None:
continue
summary = str(override.get("summary") or "").strip()
scene_label = str(override.get("scene_label") or "").strip()
fields = override.get("fields")
if summary:
item["summary"] = summary
if scene_label:
item["scene_label"] = scene_label
if isinstance(fields, list):
item["document_fields"] = self._normalize_document_fields(fields)
return normalized
@staticmethod
def _normalize_document_fields(raw_fields: Any) -> list[dict[str, str]]:
if not isinstance(raw_fields, list):
return []
normalized: list[dict[str, str]] = []
for field in raw_fields:
if not isinstance(field, dict):
continue
label = str(field.get("label") or "").strip()
value = str(field.get("value") or "").strip()
key = str(field.get("key") or label or "").strip()
if not label or not value:
continue
normalized.append(
{
"key": key,
"label": label,
"value": value,
}
)
return normalized
def _build_context_item_specs(
self,
*,
context_documents: list[dict[str, Any]],
attachment_names: list[str],
occurred_at: datetime,
expense_type: str,
amount: Decimal,
reason: str,
location: str,
context_json: dict[str, Any],
employee_grade: str | None = None,
user_id: str = "",
) -> list[dict[str, Any]]:
specs: list[dict[str, Any]] = []
if context_documents:
for document in context_documents:
specs.append(
{
"item_date": self._resolve_document_item_date(document, fallback=occurred_at.date()),
"item_type": self._resolve_document_item_type(document, fallback=expense_type),
"item_reason": self._resolve_document_item_reason(document, fallback=reason),
"item_location": location,
"item_amount": self._resolve_document_item_amount(document),
"invoice_id": str(document.get("filename") or "").strip() or None,
}
)
elif attachment_names:
for attachment_name in attachment_names:
specs.append(
{
"item_date": occurred_at.date(),
"item_type": expense_type,
"item_reason": reason,
"item_location": location,
"item_amount": None,
"invoice_id": attachment_name,
}
)
if not specs:
return []
total_recognized = sum(
spec["item_amount"] for spec in specs if isinstance(spec.get("item_amount"), Decimal)
)
missing_specs = [spec for spec in specs if spec.get("item_amount") is None]
if missing_specs:
remaining = (amount - total_recognized).quantize(Decimal("0.01"))
if remaining > Decimal("0.00"):
missing_specs[0]["item_amount"] = remaining
for spec in specs:
if spec.get("item_amount") is None:
spec["item_amount"] = Decimal("0.00")
allowance_spec = self._build_travel_allowance_item_spec(
context_documents=context_documents,
specs=specs,
occurred_at=occurred_at,
expense_type=expense_type,
location=location,
context_json=context_json,
employee_grade=employee_grade,
user_id=user_id,
)
if allowance_spec is not None:
specs = [spec for spec in specs if str(spec.get("item_type") or "").strip() != "travel_allowance"]
specs.append(allowance_spec)
return specs
def _build_travel_allowance_item_spec(
self,
*,
context_documents: list[dict[str, Any]],
specs: list[dict[str, Any]],
occurred_at: datetime,
expense_type: str,
location: str,
context_json: dict[str, Any],
employee_grade: str | None,
user_id: str,
) -> dict[str, Any] | None:
if not self._should_add_travel_allowance_item(
expense_type=expense_type,
context_documents=context_documents,
context_json=context_json,
):
return None
grade = str(employee_grade or context_json.get("grade") or "").strip()
if not grade:
return None
days, _, end_date = self._resolve_travel_allowance_days(
context_json=context_json,
occurred_at=occurred_at,
)
allowance_location = self._resolve_travel_allowance_location(
location=location,
context_documents=context_documents,
)
if days < 1 or not allowance_location:
return None
try:
from app.services.travel_reimbursement_calculator import (
TravelReimbursementCalculatorService,
)
result = TravelReimbursementCalculatorService(self.db).calculate(
TravelReimbursementCalculatorRequest(
days=days,
location=allowance_location,
grade=grade,
),
CurrentUserContext(
username=user_id,
name="",
role_codes=[],
is_admin=False,
),
)
except ValueError:
return None
allowance_amount = Decimal(result.allowance_amount or Decimal("0.00")).quantize(Decimal("0.01"))
allowance_rate = Decimal(result.total_allowance_rate or Decimal("0.00")).quantize(Decimal("0.01"))
if allowance_amount <= Decimal("0.00") or allowance_rate <= Decimal("0.00"):
return None
return {
"item_date": end_date,
"item_type": "travel_allowance",
"item_reason": (
f"系统自动计算出差补贴:{result.matched_city}{days}天,"
f"{allowance_rate:.2f}元/天"
),
"item_location": str(result.allowance_region or allowance_location).strip(),
"item_amount": allowance_amount,
"invoice_id": None,
}
@staticmethod
def _should_add_travel_allowance_item(
*,
expense_type: str,
context_documents: list[dict[str, Any]],
context_json: dict[str, Any],
) -> bool:
normalized_expense_type = str(expense_type or "").strip().lower()
if normalized_expense_type == "travel":
return True
review_form_values = context_json.get("review_form_values")
if isinstance(review_form_values, dict):
review_type = str(
review_form_values.get("expense_type")
or review_form_values.get("scene_label")
or review_form_values.get("reason_value")
or ""
)
if any(keyword in review_type for keyword in ("差旅", "出差")):
return True
for document in context_documents:
document_type = str(document.get("document_type") or "").strip()
scene_code = str(document.get("scene_code") or "").strip()
if document_type in {"train_ticket", "flight_itinerary"} or scene_code == "travel":
return True
return False
def _resolve_travel_allowance_days(
self,
*,
context_json: dict[str, Any],
occurred_at: datetime,
) -> tuple[int, date, date]:
start_date = occurred_at.date()
end_date = start_date
explicit_days = self._extract_travel_allowance_days_from_context(context_json)
business_time_context = context_json.get("business_time_context")
if isinstance(business_time_context, dict):
start_date = self._parse_iso_date_or_default(business_time_context.get("start_date"), start_date)
end_date = self._parse_iso_date_or_default(business_time_context.get("end_date"), start_date)
else:
review_form_values = context_json.get("review_form_values")
if isinstance(review_form_values, dict):
time_text = str(
review_form_values.get("time_range")
or review_form_values.get("business_time")
or review_form_values.get("occurred_date")
or ""
).strip()
matched_dates = re.findall(r"\d{4}-\d{2}-\d{2}", time_text)
if matched_dates:
start_date = self._parse_iso_date_or_default(matched_dates[0], start_date)
end_date = self._parse_iso_date_or_default(matched_dates[-1], start_date)
if end_date < start_date:
end_date = start_date
if explicit_days > 0:
return explicit_days, start_date, start_date + timedelta(days=explicit_days - 1)
days = (end_date - start_date).days + 1
return max(1, days), start_date, end_date
@staticmethod
def _extract_travel_allowance_days_from_context(context_json: dict[str, Any]) -> int:
review_form_values = context_json.get("review_form_values")
text_parts: list[str] = []
if isinstance(review_form_values, dict):
text_parts.extend(
str(review_form_values.get(key) or "")
for key in (
"reason",
"business_reason",
"reason_value",
"scene_label",
"time_range",
"business_time",
)
)
text_parts.extend(
str(context_json.get(key) or "")
for key in ("user_input_text", "message", "raw_text", "ocr_summary")
)
return ExpenseClaimDocumentItemBuilderMixin._extract_travel_day_count(" ".join(text_parts))
@staticmethod
def _extract_travel_day_count(text: str) -> int:
normalized = str(text or "").replace(" ", "")
if not normalized:
return 0
patterns = (
r"(?:出差|差旅|行程|支撑|支持|部署|项目|业务)\D{0,12}?(\d{1,2})天",
r"(\d{1,2})天(?:出差|差旅|行程)",
)
for pattern in patterns:
match = re.search(pattern, normalized)
if not match:
continue
try:
return max(1, int(match.group(1)))
except ValueError:
continue
return 0
@staticmethod
def _parse_iso_date_or_default(value: Any, fallback: date) -> date:
try:
return date.fromisoformat(str(value or "").strip())
except ValueError:
return fallback
@staticmethod
def _resolve_travel_allowance_location(
*,
location: str,
context_documents: list[dict[str, Any]],
) -> str:
normalized_location = str(location or "").strip()
if normalized_location and normalized_location not in {"待补充", "未知", "暂无"}:
return normalized_location
for document in context_documents:
for field in list(document.get("document_fields") or []):
if not isinstance(field, dict):
continue
key = str(field.get("key") or "").strip().lower()
label = str(field.get("label") or "").strip()
value = str(field.get("value") or "").strip()
if key == "route" or "行程" in label:
separators = ("-", "", "", "->")
for separator in separators:
if separator in value:
return value.split(separator)[-1].strip()
if key in {"destination", "arrival_city"} or label in {"目的地", "到达城市"}:
return value
return ""
def _collect_invoice_keys_from_incoming_document(self, document: dict[str, Any]) -> list[str]:
document_info = dict(document or {})
if "fields" not in document_info and isinstance(document_info.get("document_fields"), list):
document_info["fields"] = document_info.get("document_fields")
return self._collect_invoice_keys_from_document_info(document_info)
def _resolve_document_item_type(self, document: dict[str, Any], *, fallback: str) -> str:
document_type = str(document.get("document_type") or "").strip()
mapped_type = DOCUMENT_TYPE_ITEM_TYPE_MAP.get(document_type)
if mapped_type:
return mapped_type
scene_code = str(document.get("scene_code") or "").strip()
if scene_code in {"travel", "hotel", "transport", "meal", "office", "meeting", "training"}:
return scene_code
if document_type in {"flight_itinerary", "train_ticket"}:
return "travel"
if document_type in {"taxi_receipt", "parking_toll_receipt", "transport_receipt"}:
return "transport"
if document_type == "hotel_invoice":
return "hotel"
if document_type == "meal_receipt":
return "meal"
if document_type == "office_invoice":
return "office"
if document_type == "meeting_invoice":
return "meeting"
if document_type == "training_invoice":
return "training"
scene_label = str(document.get("scene_label") or "").strip()
if "交通" in scene_label:
return "transport"
if "住宿" in scene_label:
return "hotel"
if "" in scene_label:
return "meal"
if "会务" in scene_label or "会议" in scene_label:
return "meeting"
if "培训" in scene_label:
return "training"
return fallback or "other"
def _resolve_document_item_reason(self, document: dict[str, Any], *, fallback: str) -> str:
document_type = str(document.get("document_type") or "").strip().lower()
item_type = self._resolve_document_item_type(document, fallback="")
if document_type in {"train_ticket", "flight_itinerary"} or item_type in {"train_ticket", "flight_ticket"}:
route = self._resolve_document_route_value(document)
trip_no = self._resolve_document_fact_field(
document,
keys={"trip_no", "flight_no", "train_no"},
labels={"车次", "航班"},
)
if route and trip_no:
return f"{self._format_document_route(route)}{trip_no}"
if route:
return self._format_document_route(route)
if document_type in {"taxi_receipt", "transport_receipt"} or item_type == "ride_ticket":
route = self._resolve_document_route_value(document)
if route:
return self._format_document_route(route)
if document_type == "hotel_invoice" or item_type == "hotel_ticket":
merchant = self._resolve_document_fact_field(
document,
keys={"merchant_name", "merchant", "seller_name", "vendor_name", "hotel_name"},
labels={"商户", "酒店", "宾馆", "销售方", "开票方"},
)
stay_range = self._resolve_document_stay_range(document)
if merchant and stay_range:
return f"{merchant}{stay_range}"
if merchant:
return merchant
if stay_range:
return stay_range
merchant = self._resolve_document_fact_field(
document,
keys={"merchant_name", "merchant", "seller_name", "vendor_name"},
labels={"商户", "销售方", "开票方", "收款方"},
)
if merchant:
return merchant
summary = str(document.get("summary") or "").strip()
return summary or fallback or ""

View File

@@ -0,0 +1,396 @@
from __future__ import annotations
import json
import re
import shutil
import uuid
from collections import defaultdict
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal, InvalidOperation
from pathlib import Path
from types import SimpleNamespace
from typing import Any
from sqlalchemy import func, or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from sqlalchemy.exc import IntegrityError
from sqlalchemy.orm import Session, selectinload
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.employee import Employee
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.ontology import OntologyEntity, OntologyParseResult
from app.schemas.reimbursement import (
ExpenseClaimItemCreate,
ExpenseClaimItemUpdate,
ExpenseClaimUpdate,
TravelReimbursementCalculatorRequest,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.agent_foundation import AgentFoundationService
from app.services.audit import AuditLogService
from app.services.document_intelligence import build_document_insight
from app.services.expense_claim_access_policy import ExpenseClaimAccessPolicy
from app.services.expense_claim_attachment_presentation import ExpenseClaimAttachmentPresentation
from app.services.expense_claim_attachment_storage import ExpenseClaimAttachmentStorage
from app.services.expense_claim_constants import (
EXPENSE_TYPE_LABELS,
MAX_DRAFT_CLAIMS_PER_USER,
EDITABLE_CLAIM_STATUSES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_DETAIL_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
DOCUMENT_TYPE_ITEM_TYPE_MAP,
DOCUMENT_TYPE_SCENE_MAP,
DOCUMENT_FACT_ITEM_TYPES,
ROUTE_DESCRIPTION_ITEM_TYPES,
DOCUMENT_TRIP_DATE_LABELS,
DOCUMENT_TRIP_DATE_REQUIREMENT_LABELS,
DOCUMENT_TRIP_DATE_KEYS,
DOCUMENT_GENERIC_DATE_KEYS,
DOCUMENT_INVOICE_DATE_KEYS,
DOCUMENT_TRIP_DATE_LABEL_TOKENS,
DOCUMENT_GENERIC_DATE_LABEL_TOKENS,
DOCUMENT_INVOICE_DATE_LABEL_TOKENS,
DOCUMENT_ROUTE_FORMAT_PATTERN,
DOCUMENT_ROUTE_TEXT_PATTERN,
DOCUMENT_ROUTE_ORIGIN_LABELS,
DOCUMENT_ROUTE_DESTINATION_LABELS,
GENERIC_ATTACHMENT_BACKFILL_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
EXPENSE_SCENE_KEYWORDS,
EXPENSE_TYPE_ALLOWED_DOCUMENT_SCENES,
DOCUMENT_SCENE_LABELS,
DOCUMENT_ASSOCIATION_REVIEW_ACTIONS,
PERSISTENT_EXPENSE_REVIEW_ACTIONS,
RETURN_REASON_OPTIONS,
MAX_CLAIM_NO_RETRY_ATTEMPTS,
DOCUMENT_DATE_PATTERN,
SYSTEM_GENERATED_REASON_PREFIXES,
LEADING_REASON_TIME_PATTERNS,
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
TRAVEL_REVIEW_RELEVANT_EXPENSE_TYPES,
TRAVEL_REVIEW_LONG_DISTANCE_DOCUMENT_TYPES,
TRAVEL_POLICY_CITY_TIERS,
TRAVEL_POLICY_CITY_MATCH_ORDER,
TRAVEL_POLICY_BAND_LABELS,
TRAVEL_POLICY_HOTEL_LIMITS,
TRAVEL_POLICY_ALLOWED_TRANSPORT_LEVELS,
TRAVEL_POLICY_ROUTE_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_STANDARD_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_FLIGHT_CLASS_PATTERNS,
TRAVEL_POLICY_TRAIN_CLASS_PATTERNS,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_claim_risk_review import ExpenseClaimRiskReviewMixin
from app.services.expense_amounts import (
extract_amount_candidates,
format_decimal_amount,
is_amount_match_date_fragment,
is_date_like_amount_candidate,
is_probable_year_amount,
parse_document_amount_value,
parse_plain_document_amount_value,
resolve_document_field_amount,
resolve_document_item_amount,
resolve_document_text_amount,
)
from app.services.expense_rule_runtime import (
DEFAULT_SCENE_RULE_ASSET_CODE,
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
resolve_document_type_label,
)
from app.services.ocr import OcrService
class ExpenseClaimDocumentParsingMixin:
def _resolve_document_route_value(self, document: dict[str, Any]) -> str:
route = self._resolve_document_fact_field(
document,
keys={"route", "trip_route"},
labels={"行程", "路线"},
)
if route:
return route
origin = self._resolve_document_fact_field(
document,
keys={
"origin",
"from",
"from_city",
"departure",
"departure_city",
"start",
"start_location",
"start_address",
"pickup_location",
"pickup_address",
"boarding_station",
},
labels=DOCUMENT_ROUTE_ORIGIN_LABELS,
)
destination = self._resolve_document_fact_field(
document,
keys={
"destination",
"to",
"to_city",
"arrival",
"arrival_city",
"end",
"end_location",
"end_address",
"dropoff_location",
"dropoff_address",
"alighting_station",
},
labels=DOCUMENT_ROUTE_DESTINATION_LABELS,
)
if origin and destination:
return f"{origin}-{destination}"
text = " ".join(
[
str(document.get("summary") or "").strip(),
str(document.get("text") or "").strip(),
]
).strip()
text_route = self._extract_document_route_from_text(text)
if text_route:
return text_route
text_origin = self._extract_document_labeled_text_value(text, DOCUMENT_ROUTE_ORIGIN_LABELS)
text_destination = self._extract_document_labeled_text_value(text, DOCUMENT_ROUTE_DESTINATION_LABELS)
if text_origin and text_destination:
return f"{text_origin}-{text_destination}"
return ""
@staticmethod
def _resolve_document_fact_field(
document: dict[str, Any],
*,
keys: set[str],
labels: set[str],
) -> str:
raw_fields = document.get("document_fields")
if not isinstance(raw_fields, list):
raw_fields = document.get("fields")
if not isinstance(raw_fields, list):
return ""
normalized_keys = {str(key or "").strip().lower().replace("_", "") for key in keys}
for field in raw_fields:
if not isinstance(field, dict):
continue
field_key = str(field.get("key") or "").strip().lower().replace("_", "")
label = str(field.get("label") or "").replace(" ", "")
value = str(field.get("value") or "").strip()
if not value:
continue
if field_key in normalized_keys or any(token in label for token in labels):
return value
return ""
@staticmethod
def _format_document_route(route: str) -> str:
normalized = (
str(route or "")
.strip()
.replace("->", "-")
.replace("", "-")
.replace("", "-")
.replace("", "-")
.replace("", "-")
.replace("", "-")
)
if "-" not in normalized:
return str(route or "").strip()
origin, destination = [part.strip() for part in normalized.split("-", 1)]
origin = origin.removeprefix("").strip()
destination = destination.removeprefix("").removeprefix("").strip()
if not origin or not destination or origin == destination:
return str(route or "").strip()
return f"{origin}-{destination}"
@staticmethod
def _extract_document_route_from_text(text: str) -> str:
for match in DOCUMENT_ROUTE_TEXT_PATTERN.finditer(str(text or "")):
origin = str(match.group(1) or "").strip()
destination = str(match.group(2) or "").strip()
if not origin or not destination or origin == destination:
continue
if origin.isdigit() and destination.isdigit():
continue
if DOCUMENT_DATE_PATTERN.search(f"{origin}-{destination}"):
continue
return f"{origin}-{destination}"
return ""
@staticmethod
def _extract_document_labeled_text_value(text: str, labels: set[str]) -> str:
for label in sorted(labels, key=len, reverse=True):
pattern = re.compile(
rf"{re.escape(label)}[:\s]*"
r"([A-Za-z0-9\u4e00-\u9fa5()·\-路街道号弄区县市省园桥站机场中心]{2,50})"
)
match = pattern.search(str(text or ""))
if match:
return str(match.group(1) or "").strip()
return ""
def _resolve_document_stay_range(self, document: dict[str, Any]) -> str:
check_in = self._resolve_document_fact_field(
document,
keys={"check_in", "checkin", "arrival_date", "start_date"},
labels={"入住", "入住日期", "到店", "开始日期"},
)
check_out = self._resolve_document_fact_field(
document,
keys={"check_out", "checkout", "departure_date", "end_date"},
labels={"离店", "退房", "离店日期", "结束日期"},
)
if check_in and check_out:
return f"{check_in}{check_out}"
nights = self._resolve_document_fact_field(
document,
keys={"nights", "night_count", "room_nights"},
labels={"间夜", "晚数", "入住天数"},
)
if nights:
return f"{nights}"
return ""
def _resolve_document_item_amount(self, document: dict[str, Any]) -> Decimal | None:
return resolve_document_item_amount(document)
def _resolve_document_field_amount(self, document: dict[str, Any]) -> Decimal | None:
return resolve_document_field_amount(document)
def _resolve_document_text_amount(self, text: str) -> Decimal | None:
return resolve_document_text_amount(text)
def _parse_document_amount_value(self, value: str) -> Decimal | None:
return parse_document_amount_value(value)
@staticmethod
def _parse_plain_document_amount_value(value: str) -> Decimal | None:
return parse_plain_document_amount_value(value)
@staticmethod
def _is_probable_year_amount(amount: Decimal | None) -> bool:
return is_probable_year_amount(amount)
@classmethod
def _is_date_like_amount_candidate(cls, amount: Decimal | None, text: str) -> bool:
return is_date_like_amount_candidate(amount, text)
@staticmethod
def _format_decimal_amount(amount: Decimal | None) -> str:
return format_decimal_amount(amount)
def _resolve_document_item_date(self, document: dict[str, Any], *, fallback: date) -> date:
return self._resolve_document_item_date_candidate(document) or fallback
def _resolve_document_item_date_candidate(self, document: dict[str, Any]) -> date | None:
document_type = str(document.get("document_type") or "").strip().lower()
if document_type in DOCUMENT_TRIP_DATE_LABELS:
parsed = self._resolve_document_date_from_fields(
document,
keys=DOCUMENT_TRIP_DATE_KEYS,
labels=DOCUMENT_TRIP_DATE_LABEL_TOKENS,
)
if parsed is not None:
return parsed
parsed = self._resolve_document_date_from_fields(
document,
keys=DOCUMENT_GENERIC_DATE_KEYS,
labels=DOCUMENT_GENERIC_DATE_LABEL_TOKENS,
excluded_labels=DOCUMENT_INVOICE_DATE_LABEL_TOKENS,
)
if parsed is not None:
return parsed
parsed = self._parse_document_date(
" ".join(
[
str(document.get("summary") or "").strip(),
str(document.get("text") or "").strip(),
]
).strip()
)
if parsed is not None:
return parsed
return None
for field in list(document.get("document_fields") or []):
if not isinstance(field, dict):
continue
key = str(field.get("key") or "").strip().lower().replace("_", "")
label = str(field.get("label") or "").replace(" ", "")
value = str(field.get("value") or "").strip()
if not value:
continue
if key in {"date", "time", "issuedat", "issuedate", "invoicedate"} or any(
token in label for token in ("日期", "时间", "开票日期", "发生时间")
):
parsed = self._parse_document_date(value)
if parsed is not None:
return parsed
parsed = self._parse_document_date(
" ".join(
[
str(document.get("summary") or "").strip(),
str(document.get("text") or "").strip(),
]
).strip()
)
return parsed
def _resolve_document_date_from_fields(
self,
document: dict[str, Any],
*,
keys: set[str],
labels: tuple[str, ...],
excluded_labels: tuple[str, ...] = (),
) -> date | None:
for field in list(document.get("document_fields") or []):
if not isinstance(field, dict):
continue
key = str(field.get("key") or "").strip().lower().replace("_", "")
label = str(field.get("label") or "").replace(" ", "")
if excluded_labels and any(token in label for token in excluded_labels):
continue
if key not in keys and not any(token in label for token in labels):
continue
parsed = self._parse_document_date(str(field.get("value") or ""))
if parsed is not None:
return parsed
return None
@staticmethod
def _parse_document_date(value: str) -> date | None:
match = DOCUMENT_DATE_PATTERN.search(str(value or ""))
if not match:
return None
raw_value = str(match.group(1) or "").strip()
normalized = raw_value.replace("", "-").replace("", "-").replace("", "")
normalized = normalized.replace("/", "-").replace(".", "-")
parts = [part for part in normalized.split("-") if part]
if len(parts) != 3:
return None
try:
return date(int(parts[0]), int(parts[1]), int(parts[2]))
except ValueError:
return None

View File

@@ -0,0 +1,612 @@
from __future__ import annotations
import json
import re
import shutil
import uuid
from collections import defaultdict
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal, InvalidOperation
from pathlib import Path
from types import SimpleNamespace
from typing import Any
from sqlalchemy import func, or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from sqlalchemy.exc import IntegrityError
from sqlalchemy.orm import Session, selectinload
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.employee import Employee
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.ontology import OntologyEntity, OntologyParseResult
from app.schemas.reimbursement import (
ExpenseClaimItemCreate,
ExpenseClaimItemUpdate,
ExpenseClaimUpdate,
TravelReimbursementCalculatorRequest,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.agent_foundation import AgentFoundationService
from app.services.audit import AuditLogService
from app.services.document_intelligence import build_document_insight
from app.services.expense_claim_access_policy import ExpenseClaimAccessPolicy
from app.services.expense_claim_attachment_presentation import ExpenseClaimAttachmentPresentation
from app.services.expense_claim_attachment_storage import ExpenseClaimAttachmentStorage
from app.services.expense_claim_constants import (
EXPENSE_TYPE_LABELS,
MAX_DRAFT_CLAIMS_PER_USER,
EDITABLE_CLAIM_STATUSES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_DETAIL_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
DOCUMENT_TYPE_ITEM_TYPE_MAP,
DOCUMENT_TYPE_SCENE_MAP,
DOCUMENT_FACT_ITEM_TYPES,
ROUTE_DESCRIPTION_ITEM_TYPES,
DOCUMENT_TRIP_DATE_LABELS,
DOCUMENT_TRIP_DATE_REQUIREMENT_LABELS,
DOCUMENT_TRIP_DATE_KEYS,
DOCUMENT_GENERIC_DATE_KEYS,
DOCUMENT_INVOICE_DATE_KEYS,
DOCUMENT_TRIP_DATE_LABEL_TOKENS,
DOCUMENT_GENERIC_DATE_LABEL_TOKENS,
DOCUMENT_INVOICE_DATE_LABEL_TOKENS,
DOCUMENT_ROUTE_FORMAT_PATTERN,
DOCUMENT_ROUTE_TEXT_PATTERN,
DOCUMENT_ROUTE_ORIGIN_LABELS,
DOCUMENT_ROUTE_DESTINATION_LABELS,
GENERIC_ATTACHMENT_BACKFILL_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
EXPENSE_SCENE_KEYWORDS,
EXPENSE_TYPE_ALLOWED_DOCUMENT_SCENES,
DOCUMENT_SCENE_LABELS,
DOCUMENT_ASSOCIATION_REVIEW_ACTIONS,
PERSISTENT_EXPENSE_REVIEW_ACTIONS,
RETURN_REASON_OPTIONS,
MAX_CLAIM_NO_RETRY_ATTEMPTS,
DOCUMENT_DATE_PATTERN,
SYSTEM_GENERATED_REASON_PREFIXES,
LEADING_REASON_TIME_PATTERNS,
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
TRAVEL_REVIEW_RELEVANT_EXPENSE_TYPES,
TRAVEL_REVIEW_LONG_DISTANCE_DOCUMENT_TYPES,
TRAVEL_POLICY_CITY_TIERS,
TRAVEL_POLICY_CITY_MATCH_ORDER,
TRAVEL_POLICY_BAND_LABELS,
TRAVEL_POLICY_HOTEL_LIMITS,
TRAVEL_POLICY_ALLOWED_TRANSPORT_LEVELS,
TRAVEL_POLICY_ROUTE_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_STANDARD_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_FLIGHT_CLASS_PATTERNS,
TRAVEL_POLICY_TRAIN_CLASS_PATTERNS,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_claim_risk_review import ExpenseClaimRiskReviewMixin
from app.services.expense_amounts import (
extract_amount_candidates,
format_decimal_amount,
is_amount_match_date_fragment,
is_date_like_amount_candidate,
is_probable_year_amount,
parse_document_amount_value,
parse_plain_document_amount_value,
resolve_document_field_amount,
resolve_document_item_amount,
resolve_document_text_amount,
)
from app.services.expense_rule_runtime import (
DEFAULT_SCENE_RULE_ASSET_CODE,
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
resolve_document_type_label,
)
from app.services.ocr import OcrService
class ExpenseClaimDraftFlowMixin:
def upsert_draft_from_ontology(
self,
*,
run_id: str,
user_id: str | None,
message: str,
ontology: OntologyParseResult,
context_json: dict[str, Any],
) -> dict[str, Any]:
self._ensure_ready()
context_json = dict(context_json or {})
retry_count = self._resolve_claim_no_retry_count(context_json)
review_action = str(context_json.get("review_action") or "").strip()
attachment_names = self._resolve_attachment_names(context_json)
context_documents = self._resolve_context_documents(context_json)
employee = self._resolve_employee(
ontology=ontology,
context_json=context_json,
user_id=user_id,
)
draft_owner_name = (
employee.name
if employee is not None
else self._resolve_employee_name(
ontology=ontology,
context_json=context_json,
user_id=user_id,
)
)
association_candidate = self._find_association_candidate(
ontology=ontology,
context_json=context_json,
user_id=user_id,
employee=employee,
)
if self._should_defer_multi_document_association(
context_json=context_json,
review_action=review_action,
association_candidate=association_candidate,
context_documents=context_documents,
):
document_count = max(len(context_documents), len(attachment_names), self._resolve_attachment_count(context_json))
return {
"message": (
f"检测到你已有草稿 {association_candidate.claim_no}"
f"当前新上传了 {document_count} 张票据,请先选择关联到现有草稿,或单独建立新的报销单。"
),
"draft_only": False,
"status": "pending_association_decision",
"pending_association_decision": True,
"association_candidate_claim_id": association_candidate.id,
"association_candidate_claim_no": association_candidate.claim_no,
}
claim = self._find_target_claim(
ontology=ontology,
context_json=context_json,
review_action=review_action,
association_candidate=association_candidate,
)
is_new_claim = claim is None
before_json = self._serialize_claim(claim) if claim is not None else None
if is_new_claim:
existing_draft_count = self._count_draft_claims_for_owner(
employee=employee,
user_id=user_id,
)
if existing_draft_count >= MAX_DRAFT_CLAIMS_PER_USER:
return {
"message": (
f"你当前已保存 {MAX_DRAFT_CLAIMS_PER_USER} 个草稿,请先完成已保存的草稿,"
"才能再次新建草稿。"
),
"draft_limit_reached": True,
"draft_only": False,
"status": "blocked",
"draft_count": existing_draft_count,
"max_draft_count": MAX_DRAFT_CLAIMS_PER_USER,
}
amount = self._resolve_amount(ontology.entities, context_json=context_json)
occurred_at = self._resolve_occurred_at(ontology, context_json=context_json)
explicit_expense_type = self._resolve_explicit_review_expense_type(context_json)
inferred_expense_type = self._resolve_expense_type(ontology.entities, context_json=context_json)
locked_expense_type = explicit_expense_type
if not locked_expense_type and claim is not None and review_action in DOCUMENT_ASSOCIATION_REVIEW_ACTIONS:
locked_expense_type = str(claim.expense_type or "").strip()
expense_type = locked_expense_type or inferred_expense_type
location = self._resolve_location(message=message, context_json=context_json)
reason = self._resolve_reason(
message=message,
context_json=context_json,
allow_message_fallback=is_new_claim,
)
attachment_count = len(attachment_names) or self._resolve_attachment_count(context_json)
final_amount = amount if amount is not None else (claim.amount if claim is not None else Decimal("0.00"))
final_occurred_at = (
occurred_at if occurred_at is not None else (claim.occurred_at if claim is not None else datetime.now(UTC))
)
final_expense_type = expense_type or (claim.expense_type if claim is not None else "other")
final_location = location or (claim.location if claim is not None else "待补充")
final_reason = reason or (claim.reason if claim is not None else "待补充")
final_attachment_count = (
attachment_count if attachment_count > 0 else int(claim.invoice_count or 0) if claim is not None else 0
)
final_risk_flags = self._merge_persistent_claim_risk_flags(
existing_flags=list(claim.risk_flags_json or []) if claim is not None else [],
next_flags=list(ontology.risk_flags),
)
if context_documents or attachment_names:
document_specs = self._build_context_item_specs(
context_documents=context_documents,
attachment_names=attachment_names,
occurred_at=final_occurred_at,
expense_type=final_expense_type,
amount=final_amount,
reason=final_reason,
location=final_location,
context_json=context_json,
employee_grade=str(employee.grade or "").strip() if employee is not None else "",
user_id=user_id,
)
else:
document_specs = []
if claim is not None and review_action == "link_to_existing_draft" and document_specs:
duplicate_result = self._build_duplicate_attachment_block_result(
claim=claim,
document_specs=document_specs,
context_documents=context_documents,
)
if duplicate_result is not None:
return duplicate_result
try:
if claim is None:
claim = ExpenseClaim(
claim_no=self._generate_claim_no(final_occurred_at),
employee_id=employee.id if employee is not None else None,
employee_name=draft_owner_name,
department_id=employee.organization_unit_id if employee is not None else None,
department_name=self._resolve_department_name(
employee=employee,
context_json=context_json,
),
project_code=self._resolve_project_code(ontology.entities),
expense_type=final_expense_type,
reason=final_reason,
location=final_location,
amount=final_amount,
currency="CNY",
invoice_count=final_attachment_count,
occurred_at=final_occurred_at,
status="draft",
approval_stage="待提交",
risk_flags_json=final_risk_flags,
)
self.db.add(claim)
else:
claim.employee_id = employee.id if employee is not None else claim.employee_id
claim.employee_name = (
employee.name
if employee is not None
else self._resolve_employee_name(
ontology=ontology,
context_json=context_json,
user_id=user_id,
fallback=claim.employee_name,
)
)
claim.department_id = employee.organization_unit_id if employee is not None else claim.department_id
claim.department_name = self._resolve_department_name(
employee=employee,
context_json=context_json,
fallback=claim.department_name,
)
claim.project_code = self._resolve_project_code(ontology.entities) or claim.project_code
claim.expense_type = final_expense_type
claim.reason = final_reason
claim.location = final_location
claim.amount = final_amount
claim.invoice_count = final_attachment_count
claim.occurred_at = final_occurred_at
claim.status = "draft"
claim.approval_stage = "待提交"
claim.risk_flags_json = final_risk_flags
self.db.flush()
if document_specs and (is_new_claim or review_action in DOCUMENT_ASSOCIATION_REVIEW_ACTIONS):
if review_action == "link_to_existing_draft" and claim.items:
self._append_document_items(
claim=claim,
item_specs=document_specs,
)
else:
self._replace_claim_items(
claim=claim,
item_specs=document_specs,
)
self._sync_claim_from_items(claim)
else:
self._upsert_primary_item(
claim=claim,
occurred_at=final_occurred_at,
expense_type=final_expense_type,
amount=final_amount,
reason=final_reason,
location=final_location,
attachment_names=attachment_names,
)
self._sync_claim_from_items(claim)
if locked_expense_type:
claim.expense_type = locked_expense_type
self.db.commit()
self.db.refresh(claim)
except IntegrityError as exc:
self.db.rollback()
if (
is_new_claim
and retry_count < MAX_CLAIM_NO_RETRY_ATTEMPTS
and self._is_claim_no_conflict_error(exc)
):
retry_context = dict(context_json)
retry_context["_claim_no_retry_count"] = retry_count + 1
return self.upsert_draft_from_ontology(
run_id=run_id,
user_id=user_id,
message=message,
ontology=ontology,
context_json=retry_context,
)
raise
except Exception:
self.db.rollback()
raise
self.audit_service.log_action(
actor=user_id or claim.employee_name or "anonymous",
action="expense_claim.draft_upsert",
resource_type="expense_claim",
resource_id=claim.id,
before_json=before_json,
after_json=self._serialize_claim(claim),
request_id=run_id,
)
return {
"message": (
f"{'创建' if is_new_claim else '更新'}报销草稿 {claim.claim_no},当前状态为 draft。"
"请核对识别结果,确认无误后继续提交。"
),
"draft_only": True,
"claim_id": claim.id,
"claim_no": claim.claim_no,
"status": claim.status,
"amount": float(claim.amount),
"invoice_count": int(claim.invoice_count or 0),
}
def _find_target_claim(
self,
*,
ontology: OntologyParseResult,
context_json: dict[str, Any],
review_action: str = "",
association_candidate: ExpenseClaim | None = None,
) -> ExpenseClaim | None:
if review_action == "create_new_claim_from_documents":
return None
if review_action == "link_to_existing_draft" and association_candidate is not None:
return association_candidate
draft_claim_id = str(context_json.get("draft_claim_id") or "").strip()
if draft_claim_id:
claim = self.db.get(ExpenseClaim, draft_claim_id)
if claim is not None and self._is_editable_claim_status(claim.status):
return claim
return None
claim_codes = [
item.normalized_value
for item in ontology.entities
if item.type == "expense_claim" and item.normalized_value
]
if not claim_codes:
return None
stmt = (
select(ExpenseClaim)
.where(ExpenseClaim.claim_no.in_(claim_codes))
.where(ExpenseClaim.status.in_(EDITABLE_CLAIM_STATUSES))
.limit(1)
)
return self.db.scalar(stmt)
def _find_association_candidate(
self,
*,
ontology: OntologyParseResult,
context_json: dict[str, Any],
user_id: str | None,
employee: Employee | None,
) -> ExpenseClaim | None:
draft_claim_id = str(context_json.get("draft_claim_id") or "").strip()
if draft_claim_id:
claim = self.db.get(ExpenseClaim, draft_claim_id)
if claim is not None and self._is_editable_claim_status(claim.status):
return claim
owner_filters = self._build_draft_owner_filters(
employee=employee,
user_id=user_id,
)
if not owner_filters:
fallback_name = self._resolve_employee_name(
ontology=ontology,
context_json=context_json,
user_id=user_id,
fallback="",
)
if fallback_name:
owner_filters = [ExpenseClaim.employee_name == fallback_name]
if not owner_filters:
return None
stmt = (
select(ExpenseClaim)
.where(ExpenseClaim.status.in_(EDITABLE_CLAIM_STATUSES))
.where(or_(*owner_filters))
.order_by(ExpenseClaim.updated_at.desc(), ExpenseClaim.created_at.desc())
.limit(1)
)
return self.db.scalar(stmt)
def _should_defer_multi_document_association(
self,
*,
context_json: dict[str, Any],
review_action: str,
association_candidate: ExpenseClaim | None,
context_documents: list[dict[str, Any]],
) -> bool:
if association_candidate is None:
return False
if review_action in DOCUMENT_ASSOCIATION_REVIEW_ACTIONS:
return False
document_count = max(
len(context_documents),
len(self._resolve_attachment_names(context_json)),
self._resolve_attachment_count(context_json),
)
return document_count > 1
def _replace_claim_items(
self,
*,
claim: ExpenseClaim,
item_specs: list[dict[str, Any]],
) -> None:
existing_items = sorted(
list(claim.items),
key=lambda item: (
item.item_date or date.max,
self._normalize_sort_datetime(item.created_at),
),
)
for index, spec in enumerate(item_specs):
item = existing_items[index] if index < len(existing_items) else None
if item is None:
item = ExpenseClaimItem(claim_id=claim.id)
claim.items.append(item)
self.db.add(item)
item.item_date = spec["item_date"]
item.item_type = spec["item_type"]
item.item_reason = spec["item_reason"]
item.item_location = spec["item_location"]
item.item_amount = spec["item_amount"]
item.invoice_id = (
None
if str(spec.get("item_type") or "").strip() in SYSTEM_GENERATED_ITEM_TYPES
else self._attachment_presentation.merge_reference(item.invoice_id, spec["invoice_id"])
)
for stale_item in existing_items[len(item_specs) :]:
claim.items.remove(stale_item)
self.db.delete(stale_item)
def _append_document_items(
self,
*,
claim: ExpenseClaim,
item_specs: list[dict[str, Any]],
) -> None:
system_specs = [
spec for spec in item_specs if str(spec.get("item_type") or "").strip() in SYSTEM_GENERATED_ITEM_TYPES
]
normal_specs = [
spec for spec in item_specs if str(spec.get("item_type") or "").strip() not in SYSTEM_GENERATED_ITEM_TYPES
]
existing_invoice_ids = {
str(item.invoice_id or "").strip()
for item in claim.items
if str(item.invoice_id or "").strip()
}
existing_invoice_names = {
self._attachment_presentation.resolve_display_name(item.invoice_id)
for item in claim.items
if str(item.invoice_id or "").strip()
}
for spec in normal_specs:
invoice_id = str(spec.get("invoice_id") or "").strip()
invoice_name = self._attachment_presentation.resolve_display_name(invoice_id)
if invoice_id and (invoice_id in existing_invoice_ids or invoice_name in existing_invoice_names):
continue
claim.items.append(
ExpenseClaimItem(
claim_id=claim.id,
item_date=spec["item_date"],
item_type=spec["item_type"],
item_reason=spec["item_reason"],
item_location=spec["item_location"],
item_amount=spec["item_amount"],
invoice_id=spec["invoice_id"],
)
)
self.db.add(claim.items[-1])
if invoice_id:
existing_invoice_ids.add(invoice_id)
existing_invoice_names.add(invoice_name)
if system_specs:
existing_system_items = [
item for item in list(claim.items) if str(item.item_type or "").strip() in SYSTEM_GENERATED_ITEM_TYPES
]
for stale_item in existing_system_items:
claim.items.remove(stale_item)
self.db.delete(stale_item)
for spec in system_specs:
claim.items.append(
ExpenseClaimItem(
claim_id=claim.id,
item_date=spec["item_date"],
item_type=spec["item_type"],
item_reason=spec["item_reason"],
item_location=spec["item_location"],
item_amount=spec["item_amount"],
invoice_id=spec["invoice_id"],
)
)
self.db.add(claim.items[-1])
def _build_duplicate_attachment_block_result(
self,
*,
claim: ExpenseClaim,
document_specs: list[dict[str, Any]],
context_documents: list[dict[str, Any]],
) -> dict[str, Any] | None:
duplicate_matches = self._find_duplicate_attachment_matches(
claim=claim,
document_specs=document_specs,
context_documents=context_documents,
)
if not duplicate_matches:
return None
duplicate_labels = list(
dict.fromkeys(
str(item.get("incoming_label") or item.get("existing_label") or "").strip()
for item in duplicate_matches
if str(item.get("incoming_label") or item.get("existing_label") or "").strip()
)
)
duplicate_text = "".join(duplicate_labels[:3]) or "本次上传票据"
reason = (
f"检测到本次上传的票据与草稿 {claim.claim_no} 中已有票据重复:{duplicate_text}"
"请重新上传不同的票据后再归集。"
)
return {
"message": reason,
"draft_only": False,
"status": "blocked",
"duplicate_attachment_blocked": True,
"duplicate_invoice_blocked": True,
"submission_blocked": True,
"submission_blocked_reasons": [reason],
"missing_fields": [reason],
"risk_flags": ["duplicate_invoice"],
"duplicate_attachments": duplicate_matches,
"claim_id": claim.id,
"claim_no": claim.claim_no,
"amount": float(claim.amount or Decimal("0.00")),
"invoice_count": int(claim.invoice_count or 0),
}

View File

@@ -0,0 +1,343 @@
from __future__ import annotations
import json
import re
import shutil
import uuid
from collections import defaultdict
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal, InvalidOperation
from pathlib import Path
from types import SimpleNamespace
from typing import Any
from sqlalchemy import func, or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from sqlalchemy.exc import IntegrityError
from sqlalchemy.orm import Session, selectinload
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.employee import Employee
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.ontology import OntologyEntity, OntologyParseResult
from app.schemas.reimbursement import (
ExpenseClaimItemCreate,
ExpenseClaimItemUpdate,
ExpenseClaimUpdate,
TravelReimbursementCalculatorRequest,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.agent_foundation import AgentFoundationService
from app.services.audit import AuditLogService
from app.services.document_intelligence import build_document_insight
from app.services.expense_claim_access_policy import ExpenseClaimAccessPolicy
from app.services.expense_claim_attachment_presentation import ExpenseClaimAttachmentPresentation
from app.services.expense_claim_attachment_storage import ExpenseClaimAttachmentStorage
from app.services.expense_claim_constants import (
EXPENSE_TYPE_LABELS,
MAX_DRAFT_CLAIMS_PER_USER,
EDITABLE_CLAIM_STATUSES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_DETAIL_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
DOCUMENT_TYPE_ITEM_TYPE_MAP,
DOCUMENT_TYPE_SCENE_MAP,
DOCUMENT_FACT_ITEM_TYPES,
ROUTE_DESCRIPTION_ITEM_TYPES,
DOCUMENT_TRIP_DATE_LABELS,
DOCUMENT_TRIP_DATE_REQUIREMENT_LABELS,
DOCUMENT_TRIP_DATE_KEYS,
DOCUMENT_GENERIC_DATE_KEYS,
DOCUMENT_INVOICE_DATE_KEYS,
DOCUMENT_TRIP_DATE_LABEL_TOKENS,
DOCUMENT_GENERIC_DATE_LABEL_TOKENS,
DOCUMENT_INVOICE_DATE_LABEL_TOKENS,
DOCUMENT_ROUTE_FORMAT_PATTERN,
DOCUMENT_ROUTE_TEXT_PATTERN,
DOCUMENT_ROUTE_ORIGIN_LABELS,
DOCUMENT_ROUTE_DESTINATION_LABELS,
GENERIC_ATTACHMENT_BACKFILL_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
EXPENSE_SCENE_KEYWORDS,
EXPENSE_TYPE_ALLOWED_DOCUMENT_SCENES,
DOCUMENT_SCENE_LABELS,
DOCUMENT_ASSOCIATION_REVIEW_ACTIONS,
PERSISTENT_EXPENSE_REVIEW_ACTIONS,
RETURN_REASON_OPTIONS,
MAX_CLAIM_NO_RETRY_ATTEMPTS,
DOCUMENT_DATE_PATTERN,
SYSTEM_GENERATED_REASON_PREFIXES,
LEADING_REASON_TIME_PATTERNS,
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
TRAVEL_REVIEW_RELEVANT_EXPENSE_TYPES,
TRAVEL_REVIEW_LONG_DISTANCE_DOCUMENT_TYPES,
TRAVEL_POLICY_CITY_TIERS,
TRAVEL_POLICY_CITY_MATCH_ORDER,
TRAVEL_POLICY_BAND_LABELS,
TRAVEL_POLICY_HOTEL_LIMITS,
TRAVEL_POLICY_ALLOWED_TRANSPORT_LEVELS,
TRAVEL_POLICY_ROUTE_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_STANDARD_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_FLIGHT_CLASS_PATTERNS,
TRAVEL_POLICY_TRAIN_CLASS_PATTERNS,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_claim_risk_review import ExpenseClaimRiskReviewMixin
from app.services.expense_amounts import (
extract_amount_candidates,
format_decimal_amount,
is_amount_match_date_fragment,
is_date_like_amount_candidate,
is_probable_year_amount,
parse_document_amount_value,
parse_plain_document_amount_value,
resolve_document_field_amount,
resolve_document_item_amount,
resolve_document_text_amount,
)
from app.services.expense_rule_runtime import (
DEFAULT_SCENE_RULE_ASSET_CODE,
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
resolve_document_type_label,
)
from app.services.ocr import OcrService
class ExpenseClaimDraftPersistenceMixin:
def _find_duplicate_attachment_matches(
self,
*,
claim: ExpenseClaim,
document_specs: list[dict[str, Any]],
context_documents: list[dict[str, Any]],
) -> list[dict[str, str]]:
existing_tokens: dict[str, dict[str, str]] = {}
for item in list(claim.items or []):
if str(item.item_type or "").strip() in SYSTEM_GENERATED_ITEM_TYPES:
continue
invoice_id = str(item.invoice_id or "").strip()
if not invoice_id:
continue
display_name = self._attachment_presentation.resolve_display_name(invoice_id)
for token in self._build_duplicate_attachment_tokens(invoice_id):
existing_tokens.setdefault(
token,
{
"existing_label": display_name or invoice_id,
"existing_item_id": str(item.id or ""),
"match_type": "filename",
},
)
file_path = self._attachment_storage.resolve_item_path(item)
if file_path is not None and file_path.exists():
metadata = self._attachment_storage.read_meta(file_path)
document_info = metadata.get("document_info")
if isinstance(document_info, dict):
for invoice_key in self._collect_invoice_keys_from_document_info(document_info):
token = self._normalize_duplicate_attachment_token(invoice_key)
if token:
existing_tokens.setdefault(
token,
{
"existing_label": display_name or invoice_id,
"existing_item_id": str(item.id or ""),
"match_type": "invoice_key",
},
)
if not existing_tokens:
return []
document_by_filename = {
str(document.get("filename") or "").strip(): document
for document in context_documents
if isinstance(document, dict) and str(document.get("filename") or "").strip()
}
matches: list[dict[str, str]] = []
seen_tokens: set[str] = set()
for spec in document_specs:
if str(spec.get("item_type") or "").strip() in SYSTEM_GENERATED_ITEM_TYPES:
continue
invoice_id = str(spec.get("invoice_id") or "").strip()
if not invoice_id:
continue
incoming_tokens = self._build_duplicate_attachment_tokens(invoice_id)
document = document_by_filename.get(invoice_id)
if document is not None:
incoming_tokens.extend(
self._normalize_duplicate_attachment_token(invoice_key)
for invoice_key in self._collect_invoice_keys_from_incoming_document(document)
)
for token in incoming_tokens:
if not token or token in seen_tokens or token not in existing_tokens:
continue
seen_tokens.add(token)
existing = existing_tokens[token]
matches.append(
{
"incoming_label": self._attachment_presentation.resolve_display_name(invoice_id) or invoice_id,
"existing_label": existing.get("existing_label", ""),
"existing_item_id": existing.get("existing_item_id", ""),
"match_type": existing.get("match_type", "filename"),
}
)
return matches
@classmethod
def _build_duplicate_attachment_tokens(cls, value: str | None) -> list[str]:
raw = str(value or "").strip()
display_name = ExpenseClaimAttachmentPresentation.resolve_display_name(raw)
candidates = [raw, display_name]
return list(
dict.fromkeys(
token
for token in (cls._normalize_duplicate_attachment_token(candidate) for candidate in candidates)
if token
)
)
@staticmethod
def _normalize_duplicate_attachment_token(value: str | None) -> str:
normalized = Path(str(value or "").strip()).name.lower()
normalized = re.sub(r"\s+", "", normalized)
normalized = re.sub(r"[^\w.\-\u4e00-\u9fff]+", "_", normalized).strip("._")
return normalized
def _upsert_primary_item(
self,
*,
claim: ExpenseClaim,
occurred_at: datetime,
expense_type: str,
amount: Decimal,
reason: str,
location: str,
attachment_names: list[str],
) -> None:
item = claim.items[0] if claim.items else None
if item is None:
item = ExpenseClaimItem(
claim_id=claim.id,
item_date=occurred_at.date(),
item_type=expense_type,
item_reason=reason,
item_location=location,
item_amount=amount,
invoice_id=attachment_names[0] if attachment_names else None,
)
claim.items.append(item)
self.db.add(item)
return
item.item_date = occurred_at.date()
item.item_type = expense_type
item.item_reason = reason
item.item_location = location
item.item_amount = amount
item.invoice_id = (
self._attachment_presentation.merge_reference(item.invoice_id, attachment_names[0])
if attachment_names
else item.invoice_id
)
def _generate_claim_no(self, occurred_at: datetime) -> str:
month_code = occurred_at.strftime("%Y%m")
prefix = f"EXP-{month_code}-"
existing_claim_nos = list(
self.db.scalars(
select(ExpenseClaim.claim_no).where(ExpenseClaim.claim_no.like(f"{prefix}%"))
)
)
max_suffix = 0
for claim_no in existing_claim_nos:
normalized = str(claim_no or "").strip()
if not normalized.startswith(prefix):
continue
suffix = normalized[len(prefix):]
if not suffix.isdigit():
continue
max_suffix = max(max_suffix, int(suffix))
return f"{prefix}{max_suffix + 1:03d}"
@staticmethod
def _resolve_claim_no_retry_count(context_json: dict[str, Any]) -> int:
try:
return max(0, int(context_json.get("_claim_no_retry_count") or 0))
except (TypeError, ValueError):
return 0
@staticmethod
def _is_claim_no_conflict_error(exc: IntegrityError) -> bool:
message = str(exc).lower()
return (
"claim_no" in message
and (
"unique" in message
or "duplicate key" in message
or "ix_expense_claims_claim_no" in message
or "expense_claims.claim_no" in message
)
)
def _count_draft_claims_for_owner(
self,
*,
employee: Employee | None,
user_id: str | None,
) -> int:
owner_filters = self._build_draft_owner_filters(
employee=employee,
user_id=user_id,
)
if not owner_filters:
return 0
stmt = (
select(func.count())
.select_from(ExpenseClaim)
.where(ExpenseClaim.status == "draft")
.where(or_(*owner_filters))
)
return int(self.db.scalar(stmt) or 0)
def _build_draft_owner_filters(
self,
*,
employee: Employee | None,
user_id: str | None,
) -> list[Any]:
conditions: list[Any] = []
seen: set[tuple[str, str]] = set()
def add_condition(field_name: str, value: str | None) -> None:
normalized = str(value or "").strip()
if not normalized or normalized == "待补充":
return
marker = (field_name, normalized.lower())
if marker in seen:
return
seen.add(marker)
if field_name == "employee_id":
conditions.append(ExpenseClaim.employee_id == normalized)
return
conditions.append(ExpenseClaim.employee_name == normalized)
if employee is not None:
add_condition("employee_id", employee.id)
add_condition("employee_name", employee.email)
if self._access_policy.employee_name_is_unique(employee):
add_condition("employee_name", employee.name)
add_condition("employee_name", user_id)
return conditions

View File

@@ -0,0 +1,7 @@
from __future__ import annotations
class ExpenseClaimSubmissionBlockedError(ValueError):
def __init__(self, issues: list[str]) -> None:
self.issues = [str(issue or "").strip() for issue in issues if str(issue or "").strip()]
super().__init__("提交前请先补全信息:" + "".join(self.issues))

View File

@@ -0,0 +1,461 @@
from __future__ import annotations
import re
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal
from types import SimpleNamespace
from typing import Any
from sqlalchemy import or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.reimbursement import TravelReimbursementCalculatorRequest
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.expense_claim_constants import (
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
DOCUMENT_FACT_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_rule_runtime import (
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
)
class ExpenseClaimItemSyncMixin:
def _sync_travel_allowance_item(self, claim: ExpenseClaim) -> None:
items = list(claim.items or [])
allowance_items = [
item for item in items if str(item.item_type or "").strip().lower() == "travel_allowance"
]
business_items = [
item for item in items if str(item.item_type or "").strip().lower() != "travel_allowance"
]
business_types = {str(item.item_type or "").strip().lower() for item in business_items}
is_travel_claim = str(claim.expense_type or "").strip().lower() == "travel"
has_travel_detail = bool(business_types & TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES)
if not is_travel_claim and not has_travel_detail:
for item in allowance_items:
self._discard_claim_item(claim, item)
return
grade = str(claim.employee_grade or "").strip()
if not grade:
return
allowance_location = self._resolve_travel_allowance_location_from_claim(
claim=claim,
business_items=business_items,
)
if not allowance_location:
return
existing_allowance = allowance_items[0] if allowance_items else None
days, start_date, end_date = self._resolve_travel_allowance_days_from_claim(
claim=claim,
business_items=business_items,
existing_allowance=existing_allowance,
)
if days < 1:
return
try:
from app.services.travel_reimbursement_calculator import (
TravelReimbursementCalculatorService,
)
result = TravelReimbursementCalculatorService(self.db).calculate(
TravelReimbursementCalculatorRequest(
days=days,
location=allowance_location,
grade=grade,
),
CurrentUserContext(
username=str(claim.employee_id or claim.employee_name or "system"),
name=str(claim.employee_name or ""),
role_codes=[],
is_admin=False,
),
)
except ValueError:
return
allowance_amount = Decimal(result.allowance_amount or Decimal("0.00")).quantize(Decimal("0.01"))
allowance_rate = Decimal(result.total_allowance_rate or Decimal("0.00")).quantize(Decimal("0.01"))
if allowance_amount <= Decimal("0.00") or allowance_rate <= Decimal("0.00"):
return
item = existing_allowance
if item is None:
item = ExpenseClaimItem(claim_id=claim.id)
claim.items.append(item)
self.db.add(item)
for duplicate in allowance_items[1:]:
self._discard_claim_item(claim, duplicate)
item.item_date = end_date
item.item_type = "travel_allowance"
item.item_reason = (
f"系统自动计算出差补贴:{result.matched_city}{days}天,"
f"{allowance_rate:.2f}元/天"
)
item.item_location = str(result.allowance_region or allowance_location).strip()
item.item_amount = allowance_amount
item.invoice_id = None
def _discard_claim_item(self, claim: ExpenseClaim, item: ExpenseClaimItem) -> None:
if item in claim.items:
claim.items.remove(item)
state = sqlalchemy_inspect(item)
if state.persistent:
self.db.delete(item)
elif state.pending:
self.db.expunge(item)
def _resolve_travel_allowance_days_from_claim(
self,
*,
claim: ExpenseClaim,
business_items: list[ExpenseClaimItem],
existing_allowance: ExpenseClaimItem | None,
) -> tuple[int, date, date]:
dated_items = sorted(
[item.item_date for item in business_items if item.item_date is not None]
)
if dated_items:
start_date = dated_items[0]
end_date = dated_items[-1]
elif claim.occurred_at is not None:
start_date = claim.occurred_at.date()
end_date = start_date
else:
start_date = date.today()
end_date = start_date
days = (end_date - start_date).days + 1
explicit_days = max(
(self._extract_travel_day_count(item.item_reason) for item in business_items),
default=0,
)
if explicit_days > 0:
days = explicit_days
end_date = start_date + timedelta(days=days - 1)
return max(1, days), start_date, end_date
existing_days = self._extract_travel_allowance_days(existing_allowance)
unique_dates = {value for value in dated_items}
if existing_days > days and len(unique_dates) <= 1:
days = existing_days
end_date = start_date + timedelta(days=days - 1)
return max(1, days), start_date, end_date
@staticmethod
def _extract_travel_allowance_days(item: ExpenseClaimItem | None) -> int:
if item is None:
return 0
match = re.search(r"(\d+)\s*天", str(item.item_reason or ""))
if not match:
return 0
try:
return max(0, int(match.group(1)))
except ValueError:
return 0
def _resolve_travel_allowance_location_from_claim(
self,
*,
claim: ExpenseClaim,
business_items: list[ExpenseClaimItem],
) -> str:
claim_location = str(claim.location or "").strip()
if claim_location and claim_location not in {"待补充", "未知", "暂无", "非必填"}:
return claim_location
sorted_items = sorted(
business_items,
key=lambda item: (item.item_date or date.max, self._normalize_sort_datetime(item.created_at)),
)
for item in sorted_items:
location = str(item.item_location or "").strip()
if location and location not in {"待补充", "未知", "暂无", "非必填"}:
return location
reason = str(item.item_reason or "").strip()
for separator in ("-", "", "", "", "->"):
if separator in reason:
destination = reason.split(separator)[-1].strip()
if destination:
return destination
return ""
def _sync_claim_from_items(self, claim: ExpenseClaim) -> None:
self._sync_travel_allowance_item(claim)
if not claim.items:
claim.amount = Decimal("0.00")
claim.invoice_count = 0
claim.risk_flags_json = self._merge_claim_attachment_risk_flags(claim, [])
return
ordered_items = sorted(
claim.items,
key=lambda item: (
item.item_date or date.max,
self._normalize_sort_datetime(item.created_at),
),
)
primary_item = ordered_items[0]
total_amount = sum((item.item_amount for item in ordered_items), Decimal("0.00"))
claim.amount = total_amount.quantize(Decimal("0.01"))
claim.invoice_count = sum(1 for item in ordered_items if str(item.invoice_id or "").strip())
claim.occurred_at = datetime(
primary_item.item_date.year,
primary_item.item_date.month,
primary_item.item_date.day,
tzinfo=UTC,
)
claim.expense_type = self._resolve_claim_expense_type_from_items(
ordered_items,
fallback=str(primary_item.item_type or claim.expense_type or "other").strip() or "other",
)
primary_item_type = str(primary_item.item_type or "").strip()
if primary_item_type not in DOCUMENT_FACT_ITEM_TYPES:
claim.reason = (
self._normalize_optional_text(primary_item.item_reason, fallback=claim.reason or "待补充")
or "待补充"
)
claim.location = (
self._normalize_optional_text(primary_item.item_location, fallback=claim.location or "待补充")
or "待补充"
)
claim.risk_flags_json = self._merge_claim_attachment_risk_flags(
claim,
self._build_claim_attachment_risk_flags(ordered_items),
)
if str(claim.status or "").strip().lower() == "draft":
claim.approval_stage = "待提交"
@staticmethod
def _resolve_claim_expense_type_from_items(
items: list[ExpenseClaimItem],
*,
fallback: str,
) -> str:
fallback_type = str(fallback or "").strip() or "other"
item_types = {str(item.item_type or "").strip().lower() for item in items}
if item_types & (TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES | {"travel_allowance"}):
return "travel"
return fallback_type
def _refresh_item_attachment_analysis(self, item: ExpenseClaimItem) -> None:
file_path = self._attachment_storage.resolve_path(item.invoice_id)
if file_path is None or not file_path.exists():
return
metadata = self._attachment_storage.read_meta(file_path)
media_type = str(metadata.get("media_type") or self._attachment_presentation.resolve_media_type(file_path.name)).strip()
ocr_status = str(metadata.get("ocr_status") or "").strip().lower()
if ocr_status == "failed":
analysis = self._build_failed_ocr_attachment_analysis(
media_type=media_type,
error_message=str(metadata.get("ocr_error") or ""),
item=item,
)
elif ocr_status == "recognized" or any(
(
str(metadata.get("ocr_text") or "").strip(),
str(metadata.get("ocr_summary") or "").strip(),
int(metadata.get("ocr_line_count") or 0),
list(metadata.get("ocr_warnings") or []),
)
):
stored_document_info = metadata.get("document_info")
if not isinstance(stored_document_info, dict):
stored_document_info = {}
document = SimpleNamespace(
filename=str(metadata.get("file_name") or file_path.name),
text=str(metadata.get("ocr_text") or ""),
summary=str(metadata.get("ocr_summary") or ""),
avg_score=float(metadata.get("ocr_avg_score") or 0.0),
line_count=int(metadata.get("ocr_line_count") or 0),
document_type=str(stored_document_info.get("document_type") or ""),
document_type_label=str(stored_document_info.get("document_type_label") or ""),
scene_code=str(stored_document_info.get("scene_code") or ""),
scene_label=str(stored_document_info.get("scene_label") or ""),
document_fields=list(stored_document_info.get("fields") or []),
warnings=[str(value) for value in list(metadata.get("ocr_warnings") or []) if str(value).strip()],
)
document_info = self._build_attachment_document_info(document)
requirement_check = self._build_attachment_requirement_check(
item=item,
document_info=document_info,
)
analysis = self._build_attachment_analysis(
document=document,
item=item,
claim=getattr(item, "claim", None),
document_info=document_info,
requirement_check=requirement_check,
)
metadata["document_info"] = document_info
metadata["requirement_check"] = requirement_check
else:
analysis = self._build_fallback_attachment_analysis(media_type=media_type, item=item)
metadata["analysis"] = analysis
self._attachment_storage.write_meta(file_path, metadata)
def _build_claim_attachment_risk_flags(
self, ordered_items: list[ExpenseClaimItem]
) -> list[dict[str, Any]]:
derived_flags: list[dict[str, Any]] = []
for index, item in enumerate(ordered_items, start=1):
file_path = self._attachment_storage.resolve_path(item.invoice_id)
if file_path is None or not file_path.exists():
continue
metadata = self._attachment_storage.read_meta(file_path)
analysis = metadata.get("analysis")
if not isinstance(analysis, dict):
continue
severity = str(analysis.get("severity") or "").strip().lower()
if severity in {"", "pass", "low"}:
continue
summary = (
str(analysis.get("summary") or analysis.get("headline") or "").strip()
or "附件存在待核对风险。"
)
points = [
str(point or "").strip()
for point in list(analysis.get("points") or [])
if str(point or "").strip()
]
message_detail = "".join(points[:3]) if points else summary
label = str(
analysis.get("label") or ("高风险" if severity == "high" else "中风险")
).strip()
derived_flags.append(
{
"source": "attachment_analysis",
"item_id": item.id,
"severity": severity,
"label": label,
"message": f"费用明细第 {index} 条:{message_detail}",
"summary": summary,
"points": points,
}
)
return derived_flags
def _get_expense_rule_catalog(self) -> Any:
cached = getattr(self, "_expense_rule_catalog", None)
if cached is not None:
return cached
db = getattr(self, "db", None)
if db is None:
catalog = build_default_expense_rule_catalog()
else:
catalog = ExpenseRuleRuntimeService(db).load_catalog()
setattr(self, "_expense_rule_catalog", catalog)
return catalog
def _get_expense_scene_policy(self, expense_type: str | None) -> Any | None:
return self._get_expense_rule_catalog().get_scene_policy(expense_type)
def _resolve_min_attachment_count(self, expense_type: str | None) -> int:
policy = self._get_expense_scene_policy(expense_type)
if policy is None:
return 1
return max(0, int(policy.min_attachment_count or 0))
def _build_scene_reason_corpus(self, claim: ExpenseClaim) -> str:
parts = [str(claim.reason or "").strip(), str(claim.location or "").strip()]
for item in claim.items:
parts.append(str(item.item_reason or "").strip())
parts.append(str(item.item_location or "").strip())
return "\n".join(part for part in parts if part)
@staticmethod
def _merge_claim_attachment_risk_flags(
claim: ExpenseClaim,
attachment_risk_flags: list[dict[str, Any]],
) -> list[Any]:
preserved_flags = [
flag
for flag in list(claim.risk_flags_json or [])
if not (isinstance(flag, dict) and str(flag.get("source") or "").strip() == "attachment_analysis")
]
return preserved_flags + attachment_risk_flags
@staticmethod
def _format_submission_blocked_message(issues: list[str]) -> str:
normalized_issues = [str(issue or "").strip() for issue in issues if str(issue or "").strip()]
if not normalized_issues:
return "AI预审未通过但没有返回明确原因请刷新草稿后重试。"
return "AI预审暂未通过原因如下\n" + "\n".join(
f"{index}. {issue}" for index, issue in enumerate(normalized_issues, start=1)
)
def _validate_claim_for_submission(self, claim: ExpenseClaim) -> list[str]:
issues: list[str] = []
claim_location_required = self._is_location_required_expense_type(claim.expense_type)
claim_min_attachment_count = self._resolve_min_attachment_count(claim.expense_type)
if self._is_missing_value(claim.employee_name):
issues.append("申请人未完善")
if self._is_missing_value(claim.department_name):
issues.append("所属部门未完善")
if self._is_missing_value(claim.expense_type):
issues.append("报销类型未完善")
if self._is_missing_value(claim.reason):
issues.append("报销事由未完善")
if claim_location_required and self._is_missing_value(claim.location):
issues.append("业务地点未完善")
if claim.amount is None or claim.amount <= Decimal("0.00"):
issues.append("报销金额未完善")
if claim.occurred_at is None:
issues.append("发生时间未完善")
if int(claim.invoice_count or 0) < claim_min_attachment_count:
issues.append("票据附件数量不足")
if not claim.items:
issues.append("费用明细不能为空")
for index, item in enumerate(claim.items, start=1):
prefix = f"费用明细第 {index}"
is_system_generated = str(item.item_type or "").strip().lower() in SYSTEM_GENERATED_ITEM_TYPES
item_location_required = self._is_location_required_expense_type(item.item_type or claim.expense_type)
if item.item_date is None:
issues.append(f"{prefix}缺少日期")
if self._is_missing_value(item.item_type):
issues.append(f"{prefix}缺少费用项目")
if self._is_missing_value(item.item_reason):
issues.append(f"{prefix}缺少说明")
if item_location_required and self._is_missing_value(item.item_location):
issues.append(f"{prefix}缺少地点")
if item.item_amount is None or item.item_amount <= Decimal("0.00"):
issues.append(f"{prefix}缺少金额")
if not is_system_generated and self._is_missing_value(item.invoice_id):
issues.append(f"{prefix}缺少票据标识")
return issues
def _is_location_required_expense_type(self, expense_type: str | None) -> bool:
policy = self._get_expense_scene_policy(expense_type)
if policy is None:
return str(expense_type or "").strip().lower() in LOCATION_REQUIRED_EXPENSE_TYPES
return bool(policy.location_required)

View File

@@ -0,0 +1,392 @@
from __future__ import annotations
import json
import re
import shutil
import uuid
from collections import defaultdict
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal, InvalidOperation
from pathlib import Path
from types import SimpleNamespace
from typing import Any
from sqlalchemy import func, or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from sqlalchemy.exc import IntegrityError
from sqlalchemy.orm import Session, selectinload
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.employee import Employee
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.ontology import OntologyEntity, OntologyParseResult
from app.schemas.reimbursement import (
ExpenseClaimItemCreate,
ExpenseClaimItemUpdate,
ExpenseClaimUpdate,
TravelReimbursementCalculatorRequest,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.agent_foundation import AgentFoundationService
from app.services.audit import AuditLogService
from app.services.document_intelligence import build_document_insight
from app.services.expense_claim_access_policy import ExpenseClaimAccessPolicy
from app.services.expense_claim_attachment_presentation import ExpenseClaimAttachmentPresentation
from app.services.expense_claim_attachment_storage import ExpenseClaimAttachmentStorage
from app.services.expense_claim_constants import (
EXPENSE_TYPE_LABELS,
MAX_DRAFT_CLAIMS_PER_USER,
EDITABLE_CLAIM_STATUSES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_DETAIL_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
DOCUMENT_TYPE_ITEM_TYPE_MAP,
DOCUMENT_TYPE_SCENE_MAP,
DOCUMENT_FACT_ITEM_TYPES,
ROUTE_DESCRIPTION_ITEM_TYPES,
DOCUMENT_TRIP_DATE_LABELS,
DOCUMENT_TRIP_DATE_REQUIREMENT_LABELS,
DOCUMENT_TRIP_DATE_KEYS,
DOCUMENT_GENERIC_DATE_KEYS,
DOCUMENT_INVOICE_DATE_KEYS,
DOCUMENT_TRIP_DATE_LABEL_TOKENS,
DOCUMENT_GENERIC_DATE_LABEL_TOKENS,
DOCUMENT_INVOICE_DATE_LABEL_TOKENS,
DOCUMENT_ROUTE_FORMAT_PATTERN,
DOCUMENT_ROUTE_TEXT_PATTERN,
DOCUMENT_ROUTE_ORIGIN_LABELS,
DOCUMENT_ROUTE_DESTINATION_LABELS,
GENERIC_ATTACHMENT_BACKFILL_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
EXPENSE_SCENE_KEYWORDS,
EXPENSE_TYPE_ALLOWED_DOCUMENT_SCENES,
DOCUMENT_SCENE_LABELS,
DOCUMENT_ASSOCIATION_REVIEW_ACTIONS,
PERSISTENT_EXPENSE_REVIEW_ACTIONS,
RETURN_REASON_OPTIONS,
MAX_CLAIM_NO_RETRY_ATTEMPTS,
DOCUMENT_DATE_PATTERN,
SYSTEM_GENERATED_REASON_PREFIXES,
LEADING_REASON_TIME_PATTERNS,
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
TRAVEL_REVIEW_RELEVANT_EXPENSE_TYPES,
TRAVEL_REVIEW_LONG_DISTANCE_DOCUMENT_TYPES,
TRAVEL_POLICY_CITY_TIERS,
TRAVEL_POLICY_CITY_MATCH_ORDER,
TRAVEL_POLICY_BAND_LABELS,
TRAVEL_POLICY_HOTEL_LIMITS,
TRAVEL_POLICY_ALLOWED_TRANSPORT_LEVELS,
TRAVEL_POLICY_ROUTE_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_STANDARD_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_FLIGHT_CLASS_PATTERNS,
TRAVEL_POLICY_TRAIN_CLASS_PATTERNS,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_claim_risk_review import ExpenseClaimRiskReviewMixin
from app.services.expense_amounts import (
extract_amount_candidates,
format_decimal_amount,
is_amount_match_date_fragment,
is_date_like_amount_candidate,
is_probable_year_amount,
parse_document_amount_value,
parse_plain_document_amount_value,
resolve_document_field_amount,
resolve_document_item_amount,
resolve_document_text_amount,
)
from app.services.expense_rule_runtime import (
DEFAULT_SCENE_RULE_ASSET_CODE,
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
resolve_document_type_label,
)
from app.services.ocr import OcrService
class ExpenseClaimOntologyResolverMixin:
def _resolve_employee(
self,
*,
ontology: OntologyParseResult,
context_json: dict[str, Any],
user_id: str | None,
) -> Employee | None:
normalized_user_id = str(user_id or "").strip()
if normalized_user_id:
stmt = (
select(Employee)
.options(selectinload(Employee.organization_unit), selectinload(Employee.manager))
.where(func.lower(Employee.email) == normalized_user_id.lower())
.limit(1)
)
employee = self.db.scalar(stmt)
if employee is not None:
return employee
employee_name = self._resolve_employee_name(
ontology=ontology,
context_json=context_json,
user_id=None,
)
if not employee_name:
return None
stmt = (
select(Employee)
.options(selectinload(Employee.organization_unit), selectinload(Employee.manager))
.where(Employee.name == employee_name)
.limit(1)
)
return self.db.scalar(stmt)
@staticmethod
def _resolve_employee_name(
*,
ontology: OntologyParseResult,
context_json: dict[str, Any],
user_id: str | None,
fallback: str = "待补充",
) -> str:
review_form_values = context_json.get("review_form_values")
if isinstance(review_form_values, dict):
for key in ("reporter_name", "employee_name", "claimant_name"):
value = str(review_form_values.get(key) or "").strip()
if value:
return value
for item in ontology.entities:
if item.type == "employee" and item.value.strip():
return item.value.strip()
for key in ("name", "user_name", "employee_name"):
value = str(context_json.get(key) or "").strip()
if value:
return value
return str(user_id or fallback).strip() or fallback
@staticmethod
def _resolve_department_name(
*,
employee: Employee | None,
context_json: dict[str, Any],
fallback: str = "待补充",
) -> str:
if employee is not None and employee.organization_unit is not None:
return employee.organization_unit.name
request_context = context_json.get("request_context")
if isinstance(request_context, dict):
for key in ("department", "department_name", "deptName"):
value = str(request_context.get(key) or "").strip()
if value:
return value
for key in ("department_name", "department"):
value = str(context_json.get(key) or "").strip()
if value:
return value
return fallback
@staticmethod
def _resolve_project_code(entities: list[OntologyEntity]) -> str | None:
for item in entities:
if item.type == "project" and item.normalized_value.strip():
return item.normalized_value.strip()
return None
@staticmethod
def _resolve_explicit_review_expense_type(context_json: dict[str, Any]) -> str | None:
review_form_values = context_json.get("review_form_values")
if isinstance(review_form_values, dict):
compact = str(
review_form_values.get("expense_type")
or review_form_values.get("reimbursement_type")
or ""
).replace(" ", "")
if compact:
if "招待" in compact or ("客户" in compact and any(word in compact for word in ("吃饭", "宴请", "请客", "用餐"))):
return "entertainment"
if any(word in compact for word in ("差旅", "出差", "机票", "行程")):
return "travel"
if any(word in compact for word in ("住宿", "酒店", "宾馆")):
return "hotel"
if any(word in compact for word in ("交通", "打车", "网约车", "出租车", "乘车", "用车", "叫车", "车费", "车资", "的士", "停车")):
return "transport"
if any(word in compact for word in ("餐费", "用餐", "午餐", "晚餐", "早餐", "伙食")):
return "meal"
if "会务" in compact:
return "meeting"
if any(word in compact for word in ("办公费", "办公用品", "文具", "耗材", "办公耗材", "打印纸", "办公设备", "键盘", "鼠标", "白板")):
return "office"
if any(word in compact for word in ("培训费", "培训", "讲师费", "课时费", "课程费")):
return "training"
if any(word in compact for word in ("通讯费", "话费", "流量费", "宽带费")):
return "communication"
if any(word in compact for word in ("福利费", "团建", "慰问", "节日福利", "体检费")):
return "welfare"
return None
@staticmethod
def _resolve_expense_type(
entities: list[OntologyEntity],
*,
context_json: dict[str, Any],
) -> str | None:
explicit_expense_type = ExpenseClaimOntologyResolverMixin._resolve_explicit_review_expense_type(context_json)
if explicit_expense_type:
return explicit_expense_type
for item in entities:
if item.type == "expense_type":
normalized = item.normalized_value.strip()
if normalized:
return normalized
return None
@staticmethod
def _resolve_reason(
*,
message: str,
context_json: dict[str, Any],
allow_message_fallback: bool,
) -> str | None:
review_form_values = context_json.get("review_form_values")
if isinstance(review_form_values, dict):
for key in ("reason", "business_reason"):
value = str(review_form_values.get(key) or "").strip()
if value:
return ExpenseClaimOntologyResolverMixin._strip_leading_time_from_reason(value)
explicit_text = context_json.get("user_input_text")
if isinstance(explicit_text, str):
normalized_explicit_text = explicit_text.strip()
if normalized_explicit_text:
return ExpenseClaimOntologyResolverMixin._strip_leading_time_from_reason(normalized_explicit_text)[:500] or None
return None
request_context = context_json.get("request_context")
if (
isinstance(request_context, dict)
and str(context_json.get("entry_source") or "").strip() == "detail"
):
for key in ("reason", "title"):
value = str(request_context.get(key) or "").strip()
if value:
return value
if not allow_message_fallback:
return None
normalized_message = str(message or "").strip()
compact_message = re.sub(r"\s+", "", normalized_message)
if compact_message.startswith(SYSTEM_GENERATED_REASON_PREFIXES):
return None
return ExpenseClaimOntologyResolverMixin._strip_leading_time_from_reason(normalized_message)[:500] or None
@staticmethod
def _strip_leading_time_from_reason(value: str) -> str:
reason = str(value or "").strip()
for pattern in LEADING_REASON_TIME_PATTERNS:
next_reason = pattern.sub("", reason).strip()
if next_reason != reason:
return next_reason
return reason
@staticmethod
def _resolve_location(*, message: str, context_json: dict[str, Any]) -> str | None:
review_form_values = context_json.get("review_form_values")
if isinstance(review_form_values, dict):
for key in ("business_location", "location"):
value = str(review_form_values.get(key) or "").strip()
if value:
return value
request_context = context_json.get("request_context")
if (
isinstance(request_context, dict)
and str(context_json.get("entry_source") or "").strip() == "detail"
):
for key in ("city", "location"):
value = str(request_context.get(key) or "").strip()
if value:
return value
compact = str(message or "").replace(" ", "")
city_match = re.search(
r"去(?P<city>[\u4e00-\u9fa5]{2,8}?)(?:出差|拜访|参会|见客户|客户现场|支撑|支持|部署|实施|处理|协助)",
compact,
)
if city_match:
return city_match.group("city").strip()
if "客户现场" in compact:
return "客户现场"
return None
@staticmethod
def _resolve_occurred_at(
ontology: OntologyParseResult,
*,
context_json: dict[str, Any],
) -> datetime | None:
review_form_values = context_json.get("review_form_values")
if isinstance(review_form_values, dict):
for key in ("occurred_date", "time_range", "business_time"):
value = str(review_form_values.get(key) or "").strip()
if not value:
continue
try:
parsed = date.fromisoformat(value)
return datetime(parsed.year, parsed.month, parsed.day, tzinfo=UTC)
except ValueError:
continue
start_date = ontology.time_range.start_date
if start_date:
try:
parsed = date.fromisoformat(start_date)
return datetime(parsed.year, parsed.month, parsed.day, tzinfo=UTC)
except ValueError:
pass
return None
@staticmethod
def _resolve_amount(
entities: list[OntologyEntity],
*,
context_json: dict[str, Any],
) -> Decimal | None:
review_form_values = context_json.get("review_form_values")
if isinstance(review_form_values, dict):
raw_value = str(review_form_values.get("amount") or "").strip()
if raw_value:
compact = raw_value.replace("", "").replace(",", "").strip()
try:
return Decimal(compact).quantize(Decimal("0.01"))
except (InvalidOperation, ValueError):
pass
for item in entities:
if item.type != "amount" or item.role == "threshold":
continue
try:
return Decimal(item.normalized_value).quantize(Decimal("0.01"))
except (InvalidOperation, ValueError):
continue
return None
@staticmethod
def _resolve_attachment_names(context_json: dict[str, Any]) -> list[str]:
names = context_json.get("attachment_names")
if not isinstance(names, list):
return []
return [str(name).strip() for name in names if str(name).strip()]
def _resolve_attachment_count(self, context_json: dict[str, Any]) -> int:
names = self._resolve_attachment_names(context_json)
if names:
return len(names)
try:
return max(0, int(context_json.get("attachment_count") or 0))
except (TypeError, ValueError):
return 0

View File

@@ -0,0 +1,733 @@
from __future__ import annotations
import re
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal
from types import SimpleNamespace
from typing import Any
from sqlalchemy import or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.reimbursement import TravelReimbursementCalculatorRequest
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.expense_claim_constants import (
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
DOCUMENT_FACT_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_rule_runtime import (
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
)
class ExpenseClaimPlatformRiskMixin:
def evaluate_platform_risk_rules(
self,
claim: ExpenseClaim,
*,
rule_codes: list[str] | None = None,
) -> dict[str, list[Any]]:
manifests = self._load_platform_risk_rule_manifests(rule_codes=rule_codes)
if not manifests:
return {"flags": [], "blocking_reasons": []}
contexts = self._build_claim_attachment_contexts(claim)
flags: list[dict[str, Any]] = []
blocking_reasons: list[str] = []
for manifest in manifests:
if not self._risk_manifest_applies_to_claim(manifest, claim=claim, contexts=contexts):
continue
flag = self._evaluate_platform_risk_manifest(
manifest,
claim=claim,
contexts=contexts,
)
if flag is None:
continue
flags.append(flag)
severity = str(flag.get("severity") or "").strip().lower()
action = str(flag.get("action") or "").strip().lower()
if severity == "high" or action == "block":
blocking_reasons.append(str(flag.get("message") or flag.get("label") or "").strip())
deduplicated_reasons = list(
dict.fromkeys(reason for reason in blocking_reasons if reason)
)
return {"flags": flags, "blocking_reasons": deduplicated_reasons}
def _load_platform_risk_rule_manifests(
self,
*,
rule_codes: list[str] | None,
) -> list[dict[str, Any]]:
code_filter = {
str(code or "").strip()
for code in list(rule_codes or [])
if str(code or "").strip()
}
manifests_by_code: dict[str, dict[str, Any]] = {}
assets = list(
self.db.scalars(
select(AgentAsset)
.where(AgentAsset.asset_type == AgentAssetType.RULE.value)
.where(AgentAsset.status == AgentAssetStatus.ACTIVE.value)
.where(AgentAsset.domain == AgentAssetDomain.EXPENSE.value)
.order_by(AgentAsset.updated_at.desc(), AgentAsset.created_at.desc())
).all()
)
library_manager = AgentAssetRuleLibraryManager()
for asset in assets:
config_json = asset.config_json if isinstance(asset.config_json, dict) else {}
if str(config_json.get("detail_mode") or "").strip().lower() != "json_risk":
continue
rule_code = str(asset.code or "").strip()
if code_filter and rule_code not in code_filter:
continue
rule_document = config_json.get("rule_document")
if not isinstance(rule_document, dict):
continue
file_name = str(rule_document.get("file_name") or "").strip()
rule_library = (
str(config_json.get("rule_library") or RISK_RULES_LIBRARY).strip()
or RISK_RULES_LIBRARY
)
if not file_name:
continue
try:
payload = library_manager.read_rule_library_json(
library=rule_library,
file_name=file_name,
)
except (FileNotFoundError, ValueError):
continue
manifest_code = str(payload.get("rule_code") or rule_code).strip()
if not manifest_code or (code_filter and manifest_code not in code_filter):
continue
if payload.get("enabled") is False:
continue
payload = dict(payload)
payload.setdefault("rule_code", manifest_code)
payload["_rule_version"] = str(
asset.published_version or asset.current_version or "v1.0.0"
)
payload["_rule_asset_id"] = asset.id
manifests_by_code[manifest_code] = payload
missing_codes = code_filter - set(manifests_by_code)
should_load_fallback = not code_filter or bool(missing_codes)
if should_load_fallback:
try:
files = library_manager.list_rule_library_json_files(library=RISK_RULES_LIBRARY)
except ValueError:
files = []
for file_name in files:
try:
payload = library_manager.read_rule_library_json(
library=RISK_RULES_LIBRARY,
file_name=file_name,
)
except (FileNotFoundError, ValueError):
continue
rule_code = str(payload.get("rule_code") or "").strip()
if not rule_code or rule_code in manifests_by_code:
continue
if code_filter and rule_code not in missing_codes:
continue
if payload.get("enabled") is False:
continue
payload = dict(payload)
payload["_rule_version"] = "v1.0.0"
manifests_by_code[rule_code] = payload
return list(manifests_by_code.values())
def _risk_manifest_applies_to_claim(
self,
manifest: dict[str, Any],
*,
claim: ExpenseClaim,
contexts: list[dict[str, Any]],
) -> bool:
applies_to = manifest.get("applies_to")
if not isinstance(applies_to, dict):
applies_to = {}
try:
min_attachments = int(applies_to.get("min_attachments") or 0)
except (TypeError, ValueError):
min_attachments = 0
if min_attachments and int(claim.invoice_count or 0) < min_attachments and not contexts:
return False
expense_types = {
str(claim.expense_type or "").strip().lower(),
*{
str(item.item_type or "").strip().lower()
for item in list(claim.items or [])
if str(item.item_type or "").strip()
},
}
domains = {
str(value or "").strip().lower()
for value in list(applies_to.get("domains") or [])
if str(value or "").strip()
}
configured_expense_types = {
str(value or "").strip().lower()
for value in list(applies_to.get("expense_types") or [])
if str(value or "").strip()
}
if configured_expense_types and not (expense_types & configured_expense_types):
return False
if domains and not self._risk_domains_match_claim(
domains,
expense_types=expense_types,
contexts=contexts,
):
return False
return True
def _risk_domains_match_claim(
self,
domains: set[str],
*,
expense_types: set[str],
contexts: list[dict[str, Any]],
) -> bool:
normalized_contexts: list[dict[str, str]] = []
for context in contexts:
document_info = context.get("document_info") or {}
normalized_contexts.append(
{
"scene_code": str(document_info.get("scene_code") or "").strip().lower(),
"document_type": str(
document_info.get("document_type") or ""
).strip().lower(),
"item_type": str(
getattr(context.get("item"), "item_type", "") or ""
).strip().lower(),
}
)
if "travel" in domains:
if expense_types & {"travel", "hotel", "transport"}:
return True
if any(
item["scene_code"] in {"travel", "hotel", "transport"}
or item["document_type"]
in {
"flight_itinerary",
"train_ticket",
"hotel_invoice",
"taxi_receipt",
}
for item in normalized_contexts
):
return True
if "meal" in domains:
if expense_types & {"meal", "entertainment"}:
return True
if any(
item["scene_code"] == "meal" or item["document_type"] == "meal_receipt"
for item in normalized_contexts
):
return True
return bool(domains & expense_types)
def _evaluate_platform_risk_manifest(
self,
manifest: dict[str, Any],
*,
claim: ExpenseClaim,
contexts: list[dict[str, Any]],
) -> dict[str, Any] | None:
evaluator = str(manifest.get("evaluator") or "").strip().lower()
if evaluator == "reason_too_brief":
return self._evaluate_reason_too_brief_risk(manifest, claim=claim)
if evaluator == "entertainment_reason_missing":
return self._evaluate_entertainment_reason_missing_risk(manifest, claim=claim)
if evaluator == "document_expense_mismatch":
return self._evaluate_document_expense_mismatch_risk(
manifest,
claim=claim,
contexts=contexts,
)
if evaluator == "location_consistency":
return self._evaluate_location_consistency_risk(
manifest,
claim=claim,
contexts=contexts,
)
if evaluator == "duplicate_invoice":
return self._evaluate_duplicate_invoice_risk(manifest, claim=claim, contexts=contexts)
if evaluator == "identity_consistency":
return self._evaluate_identity_consistency_risk(
manifest,
claim=claim,
contexts=contexts,
)
if evaluator == "cross_year_invoice":
return self._evaluate_cross_year_invoice_risk(manifest, claim=claim, contexts=contexts)
if evaluator == "void_or_red_invoice":
return self._evaluate_text_keyword_risk(
manifest,
contexts=contexts,
keywords=["作废", "红冲", "红字", "冲红"],
fallback_message="票据文本中出现作废、红冲或红字发票相关信息,建议退回补充或人工复核。",
)
if evaluator == "vague_goods_description":
return self._evaluate_text_keyword_risk(
manifest,
contexts=contexts,
keywords=["详见清单", "服务费", "咨询费", "其他", "办公用品"],
fallback_message="票据商品或服务描述较笼统,建议审批人核对真实用途和明细清单。",
)
if evaluator == "multi_city_reason_required":
return self._evaluate_multi_city_reason_required_risk(
manifest,
claim=claim,
contexts=contexts,
)
return None
def _evaluate_reason_too_brief_risk(
self,
manifest: dict[str, Any],
*,
claim: ExpenseClaim,
) -> dict[str, Any] | None:
params = manifest.get("params") if isinstance(manifest.get("params"), dict) else {}
try:
min_reason_length = max(1, int(params.get("min_reason_length") or 6))
except (TypeError, ValueError):
min_reason_length = 6
reason_corpus = re.sub(r"\s+", "", self._build_scene_reason_corpus(claim))
if len(reason_corpus) >= min_reason_length:
return None
return self._build_platform_risk_flag(
manifest,
message=f"报销事由有效描述不足 {min_reason_length} 个字符,暂不足以支撑真实性判断。",
evidence={"reason_length": len(reason_corpus), "min_reason_length": min_reason_length},
)
def _evaluate_entertainment_reason_missing_risk(
self,
manifest: dict[str, Any],
*,
claim: ExpenseClaim,
) -> dict[str, Any] | None:
expense_types = {
str(claim.expense_type or "").strip().lower(),
*{str(item.item_type or "").strip().lower() for item in list(claim.items or [])},
}
reason_corpus = self._build_scene_reason_corpus(claim)
compact_reason = re.sub(r"\s+", "", reason_corpus)
looks_like_entertainment = (
"entertainment" in expense_types
or "招待" in compact_reason
or "客户" in compact_reason
)
if not looks_like_entertainment:
return None
required_keywords = ("客户", "项目", "参与", "人员", "对象", "商务", "会议")
has_detail = any(keyword in compact_reason for keyword in required_keywords)
if has_detail:
return None
return self._build_platform_risk_flag(
manifest,
message="招待或餐饮类费用未识别到客户、项目、参与人员等必要说明,建议补充后再流转。",
evidence={"reason": reason_corpus[:300]},
)
def _evaluate_document_expense_mismatch_risk(
self,
manifest: dict[str, Any],
*,
claim: ExpenseClaim,
contexts: list[dict[str, Any]],
) -> dict[str, Any] | None:
mismatches: list[str] = []
for context in contexts:
item = context["item"]
item_type = (
str(item.item_type or claim.expense_type or "other").strip().lower()
or "other"
)
policy = self._get_expense_scene_policy(item_type)
if policy is None:
continue
document_info = context.get("document_info") or {}
recognized_scene_code = (
str(document_info.get("scene_code") or "other").strip().lower()
or "other"
)
recognized_document_type = (
str(document_info.get("document_type") or "other").strip().lower()
or "other"
)
if (
recognized_scene_code in set(policy.allowed_scene_codes)
or recognized_document_type in set(policy.allowed_document_types)
):
continue
recognized_label = str(
document_info.get("document_type_label")
or recognized_document_type
or "未知票据"
)
mismatches.append(f"{context['index']} 条明细为{policy.label},附件识别为{recognized_label}")
if not mismatches:
return None
return self._build_platform_risk_flag(
manifest,
message="".join(mismatches[:3]) + ",与当前费用场景不匹配。",
evidence={"mismatches": mismatches[:5]},
)
def _evaluate_location_consistency_risk(
self,
manifest: dict[str, Any],
*,
claim: ExpenseClaim,
contexts: list[dict[str, Any]],
) -> dict[str, Any] | None:
policy = self._get_expense_rule_catalog().travel_policy
if policy is None:
return None
declared_cities = self._extract_known_cities_from_text(
" ".join(
[
str(claim.location or ""),
*[str(item.item_location or "") for item in list(claim.items or [])],
]
),
policy,
)
evidence_cities = self._collect_attachment_cities(contexts, policy)
if not declared_cities or not evidence_cities:
return None
if set(declared_cities) & set(evidence_cities):
return None
declared_text = "".join(declared_cities)
evidence_text = "".join(evidence_cities[:5])
return self._build_platform_risk_flag(
manifest,
message=f"申报地点 {declared_text} 与票据识别地点 {evidence_text} 不一致,建议补充异地说明或更换附件。",
evidence={"declared_cities": declared_cities, "evidence_cities": evidence_cities},
)
def _evaluate_duplicate_invoice_risk(
self,
manifest: dict[str, Any],
*,
claim: ExpenseClaim,
contexts: list[dict[str, Any]],
) -> dict[str, Any] | None:
invoice_keys = self._collect_invoice_keys_from_contexts(contexts)
duplicate_keys = [
key
for key, count in self._count_values(invoice_keys).items()
if count > 1
]
if duplicate_keys:
return self._build_platform_risk_flag(
manifest,
message=f"当前报销单内存在重复票据号码:{''.join(duplicate_keys[:3])}",
evidence={"duplicate_invoice_keys": duplicate_keys[:5]},
)
if not invoice_keys:
return None
other_items = list(
self.db.scalars(
select(ExpenseClaimItem)
.where(ExpenseClaimItem.claim_id != claim.id)
.where(ExpenseClaimItem.invoice_id.is_not(None))
).all()
)
matched_claim_ids: set[str] = set()
for other_item in other_items:
other_path = self._attachment_storage.resolve_path(other_item.invoice_id)
if other_path is None or not other_path.exists():
continue
other_meta = self._attachment_storage.read_meta(other_path)
other_document_info = other_meta.get("document_info")
if not isinstance(other_document_info, dict):
continue
other_keys = self._collect_invoice_keys_from_document_info(other_document_info)
if set(invoice_keys) & set(other_keys):
matched_claim_ids.add(str(other_item.claim_id or ""))
if not matched_claim_ids:
return None
return self._build_platform_risk_flag(
manifest,
message=f"票据号码已在其他报销单中出现,疑似重复报销:{''.join(invoice_keys[:3])}",
evidence={
"invoice_keys": invoice_keys[:5],
"matched_claim_ids": sorted(matched_claim_ids)[:5],
},
)
def _evaluate_identity_consistency_risk(
self,
manifest: dict[str, Any],
*,
claim: ExpenseClaim,
contexts: list[dict[str, Any]],
) -> dict[str, Any] | None:
params = manifest.get("params") if isinstance(manifest.get("params"), dict) else {}
allow_keywords = [
str(value)
for value in list(params.get("allow_keywords") or [])
if str(value).strip()
]
claimant = str(claim.employee_name or "").strip()
if not claimant:
return None
mismatched_buyers: list[str] = []
for context in contexts:
buyer = self._resolve_first_document_field_value(
context.get("document_info") or {},
keys={"buyer_name", "buyer", "purchaser_name", "claimant"},
labels={"购买方", "抬头", "买方", "购方"},
)
if not buyer:
continue
if claimant in buyer or any(keyword in buyer for keyword in allow_keywords):
continue
mismatched_buyers.append(buyer)
if not mismatched_buyers:
return None
return self._build_platform_risk_flag(
manifest,
message=f"发票抬头 {mismatched_buyers[0]} 与报销人 {claimant} 不一致,建议人工复核。",
evidence={"claimant": claimant, "buyers": mismatched_buyers[:5]},
)
def _evaluate_cross_year_invoice_risk(
self,
manifest: dict[str, Any],
*,
claim: ExpenseClaim,
contexts: list[dict[str, Any]],
) -> dict[str, Any] | None:
claim_year = claim.occurred_at.year if claim.occurred_at is not None else None
if claim_year is None:
return None
issue_years: list[int] = []
for context in contexts:
text = " ".join(
[
self._resolve_first_document_field_value(
context.get("document_info") or {},
keys={"date", "issue_date", "invoice_date"},
labels={"日期", "开票日期", "发生时间"},
),
str(context.get("ocr_summary") or ""),
str(context.get("ocr_text") or ""),
]
)
for match in re.findall(r"(20\d{2}|19\d{2})[年/\-.]", text):
try:
issue_years.append(int(match))
except ValueError:
continue
mismatch_years = sorted({year for year in issue_years if year != claim_year})
if not mismatch_years:
return None
return self._build_platform_risk_flag(
manifest,
message=f"票据年份 {mismatch_years[0]} 与费用发生年份 {claim_year} 不一致,建议确认是否跨年报销。",
evidence={"claim_year": claim_year, "invoice_years": mismatch_years},
)
def _evaluate_text_keyword_risk(
self,
manifest: dict[str, Any],
*,
contexts: list[dict[str, Any]],
keywords: list[str],
fallback_message: str,
) -> dict[str, Any] | None:
matched: list[str] = []
for context in contexts:
text = f"{context.get('ocr_summary') or ''}\n{context.get('ocr_text') or ''}"
for keyword in keywords:
if keyword in text and keyword not in matched:
matched.append(keyword)
if not matched:
return None
return self._build_platform_risk_flag(
manifest,
message=fallback_message,
evidence={"matched_keywords": matched},
)
def _evaluate_multi_city_reason_required_risk(
self,
manifest: dict[str, Any],
*,
claim: ExpenseClaim,
contexts: list[dict[str, Any]],
) -> dict[str, Any] | None:
policy = self._get_expense_rule_catalog().travel_policy
if policy is None:
return None
cities = self._collect_attachment_cities(contexts, policy)
for item in list(claim.items or []):
for city in self._extract_known_cities_from_text(str(item.item_location or ""), policy):
if city not in cities:
cities.append(city)
if len(cities) <= 2:
return None
reason_corpus = self._build_travel_reason_corpus(claim)
if self._text_contains_keywords(reason_corpus, policy.route_exception_keywords):
return None
return self._build_platform_risk_flag(
manifest,
message=f"本次报销识别到多城市行程({''.join(cities[:5])}),但事由中未说明中转、多地拜访或改签原因。",
evidence={"cities": cities[:8]},
)
def _build_platform_risk_flag(
self,
manifest: dict[str, Any],
*,
message: str,
evidence: dict[str, Any],
) -> dict[str, Any]:
outcomes = manifest.get("outcomes") if isinstance(manifest.get("outcomes"), dict) else {}
fail_outcome = outcomes.get("fail") if isinstance(outcomes.get("fail"), dict) else {}
severity = str(fail_outcome.get("severity") or "medium").strip().lower() or "medium"
default_action = "block" if severity == "high" else "manual_review"
action = str(fail_outcome.get("action") or default_action).strip()
label = str(manifest.get("name") or manifest.get("rule_code") or "风险规则命中").strip()
return {
"source": "submission_review",
"hit_source": "rule_center",
"rule_type": "risk",
"rule_code": str(manifest.get("rule_code") or "").strip(),
"rule_version": str(manifest.get("_rule_version") or "v1.0.0").strip(),
"severity": severity,
"action": action,
"label": label,
"message": message,
"evidence": evidence,
}
@staticmethod
def _count_values(values: list[str]) -> dict[str, int]:
counts: dict[str, int] = {}
for value in values:
normalized = str(value or "").strip()
if not normalized:
continue
counts[normalized] = counts.get(normalized, 0) + 1
return counts
def _collect_invoice_keys_from_contexts(self, contexts: list[dict[str, Any]]) -> list[str]:
invoice_keys: list[str] = []
for context in contexts:
document_info = context.get("document_info") or {}
for key in self._collect_invoice_keys_from_document_info(document_info):
if key not in invoice_keys:
invoice_keys.append(key)
return invoice_keys
def _collect_invoice_keys_from_document_info(self, document_info: dict[str, Any]) -> list[str]:
keys: list[str] = []
for field in list(document_info.get("fields") or []):
if not isinstance(field, dict):
continue
field_key = str(field.get("key") or "").strip().lower().replace("_", "")
label = str(field.get("label") or "").replace(" ", "")
value = str(field.get("value") or "").strip()
if not value:
continue
if field_key in {"invoiceno", "invoicenumber", "number", "code"} or any(
token in label for token in ("发票号码", "票号", "发票代码", "号码")
):
normalized = re.sub(r"\s+", "", value)
if normalized and normalized not in keys:
keys.append(normalized)
return keys
def _collect_attachment_cities(
self,
contexts: list[dict[str, Any]],
policy: RuntimeTravelPolicy,
) -> list[str]:
cities: list[str] = []
for context in contexts:
document_info = context.get("document_info") or {}
parts = [
str(context.get("ocr_summary") or ""),
str(context.get("ocr_text") or ""),
str(context.get("item").item_location if context.get("item") is not None else ""),
]
for field in list(document_info.get("fields") or []):
if isinstance(field, dict):
parts.append(str(field.get("value") or ""))
for city in self._extract_known_cities_from_text(" ".join(parts), policy):
if city not in cities:
cities.append(city)
return cities
@staticmethod
def _extract_known_cities_from_text(text: str, policy: RuntimeTravelPolicy) -> list[str]:
normalized = str(text or "").strip()
if not normalized:
return []
cities: list[str] = []
for city in sorted(policy.city_tiers.keys(), key=lambda item: len(item), reverse=True):
if city in normalized and city not in cities:
cities.append(city)
return cities
@staticmethod
def _resolve_first_document_field_value(
document_info: dict[str, Any],
*,
keys: set[str],
labels: set[str],
) -> str:
normalized_keys = {key.replace("_", "").lower() for key in keys}
for field in list(document_info.get("fields") or []):
if not isinstance(field, dict):
continue
field_key = str(field.get("key") or "").strip().lower().replace("_", "")
label = str(field.get("label") or "").replace(" ", "")
value = str(field.get("value") or "").strip()
if not value:
continue
if field_key in normalized_keys or any(token in label for token in labels):
return value
return ""

View File

@@ -0,0 +1,654 @@
from __future__ import annotations
import re
from collections import defaultdict
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal
from types import SimpleNamespace
from typing import Any
from sqlalchemy import or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.reimbursement import TravelReimbursementCalculatorRequest
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.expense_claim_constants import (
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
DOCUMENT_FACT_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_rule_runtime import (
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
)
class ExpenseClaimPolicyReviewMixin:
def _run_scene_policy_review(self, claim: ExpenseClaim) -> dict[str, list[Any]]:
catalog = self._get_expense_rule_catalog()
flags: list[dict[str, Any]] = []
blocking_reasons: list[str] = []
reason_corpus = self._build_scene_reason_corpus(claim)
scene_totals: dict[str, Decimal] = defaultdict(lambda: Decimal("0.00"))
scene_warned: set[str] = set()
for item in claim.items:
item_type = str(item.item_type or claim.expense_type or "other").strip().lower() or "other"
policy = catalog.get_scene_policy(item_type)
if policy is None:
continue
scene_totals[item_type] += Decimal(item.item_amount or Decimal("0.00")).quantize(Decimal("0.01"))
if policy.always_warn and item_type not in scene_warned:
scene_warned.add(item_type)
flags.append(
{
"source": "submission_review",
"severity": "medium",
"label": f"{policy.label}人工重点复核",
"message": policy.always_warn_message or f"{policy.label}默认需要人工重点复核。",
"rule_code": policy.rule_code,
}
)
item_limit = policy.item_amount_limit
item_amount = Decimal(item.item_amount or Decimal("0.00")).quantize(Decimal("0.01"))
if item_limit is not None and item_amount > Decimal("0.00"):
exceeded = self._evaluate_amount_limit(
amount=item_amount,
limit_config=item_limit,
reason_text="\n".join(
part
for part in [reason_corpus, str(item.item_reason or "").strip()]
if part
),
)
if exceeded is not None:
severity, threshold = exceeded
label = (
f"{policy.label}金额超标待说明"
if severity == "high"
else f"{policy.label}金额超标提醒"
)
message = (
f"{policy.label}当前识别金额为 {item_amount} 元,"
f"已超过制度阈值 {threshold} 元。"
)
if severity == "high":
message += " 当前未识别到例外说明,请先补充原因。"
blocking_reasons.append(f"{policy.label}金额超出制度阈值,且未补充例外说明。")
else:
message += " 已识别到例外说明,请审批人重点复核。"
flags.append(
{
"source": "submission_review",
"severity": severity,
"label": label,
"message": message,
"rule_code": policy.rule_code,
}
)
for scene_code, total_amount in scene_totals.items():
policy = catalog.get_scene_policy(scene_code)
if policy is None or policy.claim_amount_limit is None or total_amount <= Decimal("0.00"):
continue
exceeded = self._evaluate_amount_limit(
amount=total_amount,
limit_config=policy.claim_amount_limit,
reason_text=reason_corpus,
)
if exceeded is None:
continue
severity, threshold = exceeded
label = f"{policy.label}合计超标待说明" if severity == "high" else f"{policy.label}合计超标提醒"
message = (
f"{policy.label}当前合计金额为 {total_amount} 元,"
f"已超过制度阈值 {threshold} 元。"
)
if severity == "high":
message += " 当前未识别到例外说明,请先补充原因。"
blocking_reasons.append(f"{policy.label}合计金额超出制度阈值,且未补充例外说明。")
else:
message += " 已识别到例外说明,请审批人重点复核。"
flags.append(
{
"source": "submission_review",
"severity": severity,
"label": label,
"message": message,
"rule_code": policy.rule_code,
}
)
return {
"flags": flags,
"blocking_reasons": list(dict.fromkeys(reason for reason in blocking_reasons if reason)),
}
def _evaluate_amount_limit(
self,
*,
amount: Decimal,
limit_config: Any,
reason_text: str,
) -> tuple[str, Decimal] | None:
block_amount = getattr(limit_config, "block_amount", None)
warn_amount = getattr(limit_config, "warn_amount", None)
exception_keywords = list(getattr(limit_config, "exception_keywords", []) or [])
has_exception = self._text_contains_keywords(reason_text, exception_keywords)
if block_amount is not None and amount > Decimal(block_amount):
return ("medium" if has_exception else "high", Decimal(block_amount))
if warn_amount is not None and amount > Decimal(warn_amount):
return ("medium", Decimal(warn_amount))
return None
def _run_travel_policy_review(self, claim: ExpenseClaim) -> dict[str, list[Any]]:
policy = self._get_expense_rule_catalog().travel_policy
if policy is None:
return {"flags": [], "blocking_reasons": []}
contexts = [
context
for context in self._build_claim_attachment_contexts(claim)
if self._is_travel_policy_relevant_context(context, policy)
]
if not contexts:
return {"flags": [], "blocking_reasons": []}
reason_corpus = self._build_travel_reason_corpus(claim)
has_route_exception = self._text_contains_keywords(
reason_corpus,
policy.route_exception_keywords,
)
has_standard_exception = self._text_contains_keywords(
reason_corpus,
policy.standard_exception_keywords,
)
grade_band = self._resolve_travel_policy_band(claim.employee_grade)
band_label = policy.band_labels.get(grade_band or "", str(claim.employee_grade or "").strip() or "当前职级")
itinerary_segments: list[dict[str, Any]] = []
itinerary_cities: list[str] = []
hotel_contexts: list[dict[str, Any]] = []
flags: list[dict[str, Any]] = []
blocking_reasons: list[str] = []
for context in contexts:
route_segment = self._extract_route_segment(context, policy)
if route_segment and self._is_long_distance_travel_context(context, policy):
itinerary_segments.append(
{
"item": context["item"],
"origin": route_segment[0],
"destination": route_segment[1],
}
)
itinerary_cities.extend([route_segment[0], route_segment[1]])
scene_code = str(context["document_info"].get("scene_code") or "").strip().lower()
document_type = str(context["document_info"].get("document_type") or "").strip().lower()
item_type = str(context["item"].item_type or "").strip().lower()
if "hotel" in {scene_code, document_type, item_type} or document_type == "hotel_invoice":
hotel_contexts.append(context)
unique_itinerary_cities = list(dict.fromkeys(city for city in itinerary_cities if city))
expected_destination_city = self._resolve_expected_travel_city(
claim,
contexts,
unique_itinerary_cities,
policy,
)
if itinerary_segments:
unique_destinations = list(
dict.fromkeys(segment["destination"] for segment in itinerary_segments if segment["destination"])
)
first_origin = str(itinerary_segments[0]["origin"] or "").strip()
last_destination = str(itinerary_segments[-1]["destination"] or "").strip()
for previous, current in zip(itinerary_segments, itinerary_segments[1:]):
previous_destination = str(previous["destination"] or "").strip()
current_origin = str(current["origin"] or "").strip()
if previous_destination and current_origin and previous_destination != current_origin:
message = (
f"差旅行程未形成连续链路:上一段到达 {previous_destination}"
f"下一段却从 {current_origin} 出发,请补充中转或改签说明。"
)
flags.append(
{
"source": "submission_review",
"severity": "high",
"label": "行程闭环异常",
"message": message,
"rule_code": policy.rule_code,
}
)
blocking_reasons.append("差旅行程未形成连续闭环,请补充中转、改签或异地出发原因。")
break
if (
expected_destination_city
and last_destination
and last_destination not in {expected_destination_city, first_origin}
):
message = (
f"差旅行程终点识别为 {last_destination}"
f"与申报目的地 {expected_destination_city} 不一致,请补充多地出差或后续行程说明。"
)
flags.append(
{
"source": "submission_review",
"severity": "high",
"label": "行程终点异常",
"message": message,
"rule_code": policy.rule_code,
}
)
blocking_reasons.append("差旅行程终点与申报目的地不一致,请补充多地出差说明或补齐后续票据。")
expected_city_set = {
city
for city in (expected_destination_city, first_origin)
if city
}
extra_destinations = [
city
for city in unique_destinations
if city and city not in expected_city_set
]
if extra_destinations and not has_route_exception:
destinations_text = "".join(extra_destinations[:3])
flags.append(
{
"source": "submission_review",
"severity": "high",
"label": "多城市行程待说明",
"message": (
f"检测到本次差旅涉及 {destinations_text} 多个目的地,"
"但当前报销事由未说明中转、多地拜访或改签原因。"
),
"rule_code": policy.rule_code,
}
)
blocking_reasons.append("检测到多城市差旅行程,但当前未补充中转或多地出差说明。")
allowed_hotel_cities = {
city
for city in [expected_destination_city, *unique_itinerary_cities]
if city
}
for context in hotel_contexts:
hotel_city = self._extract_hotel_city(context, policy)
if hotel_city and allowed_hotel_cities and hotel_city not in allowed_hotel_cities:
expected_text = "".join(sorted(allowed_hotel_cities))
flags.append(
{
"source": "submission_review",
"severity": "high",
"label": "酒店地点异常",
"message": (
f"酒店票据识别城市为 {hotel_city}"
f"与当前差旅目的地/行程城市 {expected_text} 不一致,请补充异地住宿原因。"
),
"rule_code": policy.rule_code,
}
)
blocking_reasons.append("酒店票据地点与差旅目的地不一致,请补充异地住宿原因或更换附件。")
if grade_band is None:
continue
baseline_city = hotel_city or expected_destination_city
standard = self._resolve_travel_policy_hotel_standard(
policy=policy,
grade_band=grade_band,
city=baseline_city,
)
if standard is None:
continue
cap, standard_label = standard
night_count = self._extract_hotel_night_count(context)
item_amount = Decimal(context["item"].item_amount or Decimal("0.00")).quantize(Decimal("0.01"))
nightly_amount = (item_amount / Decimal(max(night_count, 1))).quantize(Decimal("0.01"))
if nightly_amount <= cap:
continue
hotel_message = (
f"{band_label} 职级在{standard_label}的住宿标准为 {cap} 元/晚,"
f"当前酒店识别金额约 {nightly_amount} 元/晚。"
)
item_reason = str(context["item"].item_reason or "").strip()
item_has_exception = self._text_contains_keywords(item_reason, policy.standard_exception_keywords)
if has_standard_exception or item_has_exception:
flags.append(
{
"source": "submission_review",
"severity": "medium",
"label": "住宿超标提醒",
"message": hotel_message + " 已识别到补充说明,请直属领导重点复核。",
"rule_code": policy.rule_code,
}
)
else:
flags.append(
{
"source": "submission_review",
"severity": "high",
"label": "住宿超标待说明",
"message": hotel_message + " 当前未识别到超标说明,请先补充原因。",
"rule_code": policy.rule_code,
}
)
blocking_reasons.append("住宿金额超出当前职级差标,且未补充超标说明。")
if grade_band is not None:
for context in contexts:
transport_class = self._detect_transport_class(context, policy)
if transport_class is None:
continue
transport_kind, class_label, class_level = transport_class
allowed_level = policy.transport_limits.get(grade_band, {}).get(transport_kind)
if allowed_level is None or class_level <= allowed_level:
continue
item_reason = str(context["item"].item_reason or "").strip()
item_has_exception = self._text_contains_keywords(item_reason, policy.standard_exception_keywords)
message = f"{band_label} 职级当前默认不可报销 {class_label}"
if has_standard_exception or item_has_exception:
flags.append(
{
"source": "submission_review",
"severity": "medium",
"label": "交通舱位超标提醒",
"message": message + " 已识别到补充说明,请审批人重点复核。",
"rule_code": policy.rule_code,
}
)
else:
flags.append(
{
"source": "submission_review",
"severity": "high",
"label": "交通舱位超标待说明",
"message": message + " 当前未识别到例外说明,请先补充原因。",
"rule_code": policy.rule_code,
}
)
blocking_reasons.append("交通舱位或席别超出当前职级差标,且未补充例外说明。")
return {
"flags": flags,
"blocking_reasons": list(dict.fromkeys(reason for reason in blocking_reasons if reason)),
}
def _build_claim_attachment_contexts(self, claim: ExpenseClaim) -> list[dict[str, Any]]:
contexts: list[dict[str, Any]] = []
ordered_items = sorted(
claim.items,
key=lambda item: (
item.item_date or date.max,
self._normalize_sort_datetime(item.created_at),
),
)
for index, item in enumerate(ordered_items, start=1):
file_path = self._attachment_storage.resolve_path(item.invoice_id)
if file_path is None or not file_path.exists():
continue
metadata = self._attachment_storage.read_meta(file_path)
document_info = metadata.get("document_info")
contexts.append(
{
"index": index,
"item": item,
"document_info": document_info if isinstance(document_info, dict) else {},
"ocr_text": str(metadata.get("ocr_text") or ""),
"ocr_summary": str(metadata.get("ocr_summary") or ""),
}
)
return contexts
def _is_travel_policy_relevant_context(
self,
context: dict[str, Any],
policy: RuntimeTravelPolicy,
) -> bool:
item = context.get("item")
document_info = context.get("document_info") or {}
item_type = str(getattr(item, "item_type", "") or "").strip().lower()
scene_code = str(document_info.get("scene_code") or "").strip().lower()
document_type = str(document_info.get("document_type") or "").strip().lower()
return (
item_type in set(policy.relevant_expense_types)
or scene_code in set(policy.relevant_expense_types)
or document_type in {"hotel_invoice", *set(policy.long_distance_document_types)}
)
@staticmethod
def _resolve_document_field_value(document_info: dict[str, Any], key: str) -> str:
normalized_key = str(key or "").strip().lower()
for field in list(document_info.get("fields") or []):
if not isinstance(field, dict):
continue
field_key = str(field.get("key") or "").strip().lower()
if field_key == normalized_key:
return str(field.get("value") or "").strip()
return ""
@staticmethod
def _text_contains_keywords(text: str, keywords: tuple[str, ...] | list[str]) -> bool:
compact = re.sub(r"\s+", "", str(text or ""))
if not compact:
return False
return any(keyword in compact for keyword in keywords)
def _build_travel_reason_corpus(self, claim: ExpenseClaim) -> str:
parts = [str(claim.reason or "").strip(), str(claim.location or "").strip()]
for item in claim.items:
parts.append(str(item.item_reason or "").strip())
parts.append(str(item.item_location or "").strip())
return "\n".join(part for part in parts if part)
@staticmethod
def _resolve_travel_policy_band(grade: str | None) -> str | None:
normalized = str(grade or "").strip().upper()
if not normalized:
return None
p_match = re.search(r"P(\d+)", normalized)
if p_match:
level = int(p_match.group(1))
if level <= 3:
return "junior"
if level <= 5:
return "mid"
return "senior"
m_match = re.search(r"M(\d+)", normalized)
if m_match:
level = int(m_match.group(1))
if level <= 2:
return "manager"
return "executive"
if normalized.startswith("D"):
return "executive"
return None
def _resolve_expected_travel_city(
self,
claim: ExpenseClaim,
contexts: list[dict[str, Any]],
itinerary_cities: list[str],
policy: RuntimeTravelPolicy,
) -> str:
claim_city = self._extract_city_from_text(str(claim.location or ""), policy)
if claim_city:
return claim_city
for context in contexts:
hotel_city = self._extract_hotel_city(context, policy)
if hotel_city:
return hotel_city
if len(itinerary_cities) >= 2 and itinerary_cities[1]:
return itinerary_cities[1]
for city in itinerary_cities:
if city:
return city
return ""
def _extract_route_segment(
self,
context: dict[str, Any],
policy: RuntimeTravelPolicy,
) -> tuple[str, str] | None:
document_info = context["document_info"]
route_value = self._resolve_document_field_value(document_info, "route")
if not route_value or "-" not in route_value:
return None
origin_text, destination_text = [segment.strip() for segment in route_value.split("-", 1)]
origin_city = self._extract_city_from_text(origin_text, policy)
destination_city = self._extract_city_from_text(destination_text, policy)
if not origin_city or not destination_city or origin_city == destination_city:
return None
return origin_city, destination_city
def _extract_hotel_city(self, context: dict[str, Any], policy: RuntimeTravelPolicy) -> str:
document_info = context["document_info"]
item = context["item"]
merchant_name = self._resolve_document_field_value(document_info, "merchant_name")
for candidate in (
merchant_name,
str(item.item_location or ""),
str(context.get("ocr_summary") or ""),
str(context.get("ocr_text") or ""),
):
city = self._extract_city_from_text(candidate, policy)
if city:
return city
return ""
@staticmethod
def _format_travel_policy_city_tier(city_tier: str) -> str:
return {
"tier_1": "一线城市",
"tier_2": "重点城市",
"tier_3": "其他城市",
}.get(str(city_tier or "").strip(), "当前城市")
def _resolve_travel_policy_hotel_standard(
self,
*,
policy: RuntimeTravelPolicy,
grade_band: str,
city: str,
) -> tuple[Decimal, str] | None:
normalized_city = str(city or "").strip()
city_limits = getattr(policy, "hotel_city_limits", {}) or {}
city_entry = city_limits.get(normalized_city) if normalized_city else None
if city_entry and city_entry.get(grade_band) is not None:
cap = Decimal(city_entry[grade_band]).quantize(Decimal("0.01"))
return cap, normalized_city
city_tier = (getattr(policy, "city_tiers", {}) or {}).get(normalized_city, "tier_3")
tier_entry = (getattr(policy, "hotel_limits", {}) or {}).get(grade_band, {})
tier_cap = tier_entry.get(city_tier)
if tier_cap is None:
return None
tier_label = self._format_travel_policy_city_tier(city_tier)
cap = Decimal(tier_cap).quantize(Decimal("0.01"))
return cap, tier_label
@staticmethod
def _extract_city_from_text(text: str, policy: RuntimeTravelPolicy) -> str:
normalized = str(text or "").strip()
if not normalized:
return ""
city_names = set(policy.city_tiers.keys())
city_names.update((getattr(policy, "hotel_city_limits", {}) or {}).keys())
city_match_order = sorted(city_names, key=lambda item: len(item), reverse=True)
for city in city_match_order:
if city in normalized:
return city
return ""
@staticmethod
def _extract_hotel_night_count(context: dict[str, Any]) -> int:
text = " ".join(
[
str(context.get("ocr_summary") or "").strip(),
str(context.get("ocr_text") or "").strip(),
]
).strip()
match = TRAVEL_POLICY_HOTEL_NIGHT_PATTERN.search(text)
if not match:
return 1
try:
return max(1, int(match.group(1)))
except (TypeError, ValueError):
return 1
def _detect_transport_class(
self,
context: dict[str, Any],
policy: RuntimeTravelPolicy,
) -> tuple[str, str, int] | None:
document_info = context["document_info"]
document_type = str(document_info.get("document_type") or "").strip().lower()
text = " ".join(
[
str(context.get("ocr_summary") or "").strip(),
str(context.get("ocr_text") or "").strip(),
]
).strip()
compact_text = re.sub(r"\s+", "", text)
if not compact_text:
return None
if document_type == "flight_itinerary":
for config in policy.flight_classes:
label = str(config.keyword or "").strip()
level = int(config.level)
if label in compact_text:
return "flight", label, level
return None
if document_type == "train_ticket":
for config in policy.train_classes:
label = str(config.keyword or "").strip()
level = int(config.level)
if label in compact_text:
return "train", label, level
return None
return None
def _is_long_distance_travel_context(
self,
context: dict[str, Any],
policy: RuntimeTravelPolicy,
) -> bool:
document_info = context["document_info"]
document_type = str(document_info.get("document_type") or "").strip().lower()
scene_code = str(document_info.get("scene_code") or "").strip().lower()
if document_type in set(policy.long_distance_document_types):
return True
return scene_code == "travel"

View File

@@ -0,0 +1,269 @@
from __future__ import annotations
import json
import re
import shutil
import uuid
from collections import defaultdict
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal, InvalidOperation
from pathlib import Path
from types import SimpleNamespace
from typing import Any
from sqlalchemy import func, or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from sqlalchemy.exc import IntegrityError
from sqlalchemy.orm import Session, selectinload
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.employee import Employee
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.ontology import OntologyEntity, OntologyParseResult
from app.schemas.reimbursement import (
ExpenseClaimItemCreate,
ExpenseClaimItemUpdate,
ExpenseClaimUpdate,
TravelReimbursementCalculatorRequest,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.agent_foundation import AgentFoundationService
from app.services.audit import AuditLogService
from app.services.document_intelligence import build_document_insight
from app.services.expense_claim_access_policy import ExpenseClaimAccessPolicy
from app.services.expense_claim_attachment_presentation import ExpenseClaimAttachmentPresentation
from app.services.expense_claim_attachment_storage import ExpenseClaimAttachmentStorage
from app.services.expense_claim_constants import (
EXPENSE_TYPE_LABELS,
MAX_DRAFT_CLAIMS_PER_USER,
EDITABLE_CLAIM_STATUSES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_DETAIL_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
DOCUMENT_TYPE_ITEM_TYPE_MAP,
DOCUMENT_TYPE_SCENE_MAP,
DOCUMENT_FACT_ITEM_TYPES,
ROUTE_DESCRIPTION_ITEM_TYPES,
DOCUMENT_TRIP_DATE_LABELS,
DOCUMENT_TRIP_DATE_REQUIREMENT_LABELS,
DOCUMENT_TRIP_DATE_KEYS,
DOCUMENT_GENERIC_DATE_KEYS,
DOCUMENT_INVOICE_DATE_KEYS,
DOCUMENT_TRIP_DATE_LABEL_TOKENS,
DOCUMENT_GENERIC_DATE_LABEL_TOKENS,
DOCUMENT_INVOICE_DATE_LABEL_TOKENS,
DOCUMENT_ROUTE_FORMAT_PATTERN,
DOCUMENT_ROUTE_TEXT_PATTERN,
DOCUMENT_ROUTE_ORIGIN_LABELS,
DOCUMENT_ROUTE_DESTINATION_LABELS,
GENERIC_ATTACHMENT_BACKFILL_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
EXPENSE_SCENE_KEYWORDS,
EXPENSE_TYPE_ALLOWED_DOCUMENT_SCENES,
DOCUMENT_SCENE_LABELS,
DOCUMENT_ASSOCIATION_REVIEW_ACTIONS,
PERSISTENT_EXPENSE_REVIEW_ACTIONS,
RETURN_REASON_OPTIONS,
MAX_CLAIM_NO_RETRY_ATTEMPTS,
DOCUMENT_DATE_PATTERN,
SYSTEM_GENERATED_REASON_PREFIXES,
LEADING_REASON_TIME_PATTERNS,
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
TRAVEL_REVIEW_RELEVANT_EXPENSE_TYPES,
TRAVEL_REVIEW_LONG_DISTANCE_DOCUMENT_TYPES,
TRAVEL_POLICY_CITY_TIERS,
TRAVEL_POLICY_CITY_MATCH_ORDER,
TRAVEL_POLICY_BAND_LABELS,
TRAVEL_POLICY_HOTEL_LIMITS,
TRAVEL_POLICY_ALLOWED_TRANSPORT_LEVELS,
TRAVEL_POLICY_ROUTE_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_STANDARD_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_FLIGHT_CLASS_PATTERNS,
TRAVEL_POLICY_TRAIN_CLASS_PATTERNS,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_claim_risk_review import ExpenseClaimRiskReviewMixin
from app.services.expense_amounts import (
extract_amount_candidates,
format_decimal_amount,
is_amount_match_date_fragment,
is_date_like_amount_candidate,
is_probable_year_amount,
parse_document_amount_value,
parse_plain_document_amount_value,
resolve_document_field_amount,
resolve_document_item_amount,
resolve_document_text_amount,
)
from app.services.expense_rule_runtime import (
DEFAULT_SCENE_RULE_ASSET_CODE,
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
resolve_document_type_label,
)
from app.services.ocr import OcrService
class ExpenseClaimReadModelMixin:
@staticmethod
def _serialize_claim(claim: ExpenseClaim) -> dict[str, Any]:
return {
"id": claim.id,
"claim_no": claim.claim_no,
"employee_name": claim.employee_name,
"department_name": claim.department_name,
"project_code": claim.project_code,
"expense_type": claim.expense_type,
"reason": claim.reason,
"location": claim.location,
"amount": float(claim.amount),
"invoice_count": int(claim.invoice_count or 0),
"status": claim.status,
"approval_stage": claim.approval_stage,
"risk_flags_json": list(claim.risk_flags_json or []),
}
@staticmethod
def _collect_return_flags(risk_flags: Any) -> list[dict[str, Any]]:
if not isinstance(risk_flags, list):
return []
return [
flag
for flag in risk_flags
if isinstance(flag, dict) and str(flag.get("source") or "").strip() == "manual_return"
]
@staticmethod
def _normalize_return_reason_codes(reason_codes: list[str] | None) -> list[str]:
return ExpenseClaimReadModelMixin._normalize_return_reason_code_payload(reason_codes)["reason_codes"]
@staticmethod
def _normalize_return_reason_code_payload(reason_codes: list[str] | None) -> dict[str, list[str]]:
normalized_codes: list[str] = []
unknown_codes: list[str] = []
for item in reason_codes or []:
code = str(item or "").strip()
if not code:
continue
if code in RETURN_REASON_OPTIONS and code not in normalized_codes:
normalized_codes.append(code)
elif code not in RETURN_REASON_OPTIONS and code not in unknown_codes:
unknown_codes.append(code)
return {
"reason_codes": normalized_codes,
"unknown_reason_codes": unknown_codes,
}
@staticmethod
def _merge_persistent_claim_risk_flags(*, existing_flags: list[Any], next_flags: list[Any]) -> list[Any]:
if not next_flags:
return list(existing_flags or [])
merged_flags = list(next_flags or [])
next_return_markers = {
ExpenseClaimReadModelMixin._build_return_flag_marker(flag)
for flag in merged_flags
if isinstance(flag, dict) and str(flag.get("source") or "").strip() == "manual_return"
}
for flag in list(existing_flags or []):
if not (isinstance(flag, dict) and str(flag.get("source") or "").strip() == "manual_return"):
continue
marker = ExpenseClaimReadModelMixin._build_return_flag_marker(flag)
if marker in next_return_markers:
continue
merged_flags.append(flag)
next_return_markers.add(marker)
return merged_flags
@staticmethod
def _build_return_flag_marker(flag: dict[str, Any]) -> tuple[str, str, str]:
event_id = str(flag.get("return_event_id") or "").strip()
if event_id:
return ("event_id", event_id, "")
return (
str(flag.get("return_count") or "").strip(),
str(flag.get("created_at") or "").strip(),
str(flag.get("message") or flag.get("reason") or "").strip(),
)
@staticmethod
def _build_default_return_message(*, operator: str, risk_points: list[str]) -> str:
if risk_points:
return f"{operator} 退回该报销单:{''.join(risk_points)}。请申请人调整后重新提交。"
return f"{operator} 已退回该报销单,请申请人调整后重新提交。"
@staticmethod
def _normalize_return_stage_key(stage: str | None) -> str:
normalized = str(stage or "").strip()
if "直属" in normalized or "领导" in normalized or "负责人" in normalized:
return "direct_manager"
if "财务" in normalized:
return "finance"
if "AI" in normalized or "预审" in normalized:
return "ai_review"
if "归档" in normalized or "入账" in normalized:
return "archive"
return "unknown"
@staticmethod
def _is_editable_claim_status(status: str | None) -> bool:
return str(status or "").strip().lower() in EDITABLE_CLAIM_STATUSES
@staticmethod
def _normalize_optional_text(value: str | None, *, fallback: str = "", allow_empty: bool = False) -> str | None:
normalized = str(value or "").strip()
if normalized:
return normalized
if allow_empty:
return None
return fallback
@staticmethod
def _normalize_sort_datetime(value: datetime | None) -> datetime:
if value is None:
return datetime.max.replace(tzinfo=UTC)
if value.tzinfo is None:
return value.replace(tzinfo=UTC)
return value
@staticmethod
def _is_missing_value(value: Any) -> bool:
text = str(value or "").strip()
if not text:
return True
compact = text.replace(" ", "")
return compact in {"待补充", "暂无", "", "未知", "处理中"}
def _ensure_draft_claim(self, claim: ExpenseClaim) -> None:
if not self._is_editable_claim_status(claim.status):
raise ValueError("只有草稿、待补充或退回待提交状态的报销单才允许执行该操作。")
@staticmethod
def _ensure_draft_pending_claim(claim: ExpenseClaim) -> None:
status = str(claim.status or "").strip().lower()
if status != "draft":
raise ValueError("只有草稿待提交状态的报销单才允许编辑附加说明。")
@staticmethod
def _ensure_mutable_claim_item(item: ExpenseClaimItem) -> None:
if str(item.item_type or "").strip().lower() in SYSTEM_GENERATED_ITEM_TYPES:
raise ValueError("系统自动计算的费用明细不可手动修改。")
def _delete_claim_assistant_sessions(self, claim_id: str | None) -> None:
from app.services.agent_conversations import AgentConversationService
AgentConversationService(self.db).delete_conversations_for_draft_claim(
claim_id=claim_id,
source="user_message",
session_type="expense",
)
def _ensure_ready(self) -> None:
AgentFoundationService(self.db).ensure_foundation_ready()

View File

@@ -0,0 +1,393 @@
from __future__ import annotations
import json
import re
import shutil
import uuid
from collections import defaultdict
from datetime import UTC, date, datetime, timedelta
from decimal import Decimal, InvalidOperation
from pathlib import Path
from types import SimpleNamespace
from typing import Any
from sqlalchemy import func, or_, select
from sqlalchemy import inspect as sqlalchemy_inspect
from sqlalchemy.exc import IntegrityError
from sqlalchemy.orm import Session, selectinload
from app.api.deps import CurrentUserContext
from app.core.agent_enums import AgentAssetDomain, AgentAssetStatus, AgentAssetType
from app.models.agent_asset import AgentAsset
from app.models.employee import Employee
from app.models.financial_record import ExpenseClaim, ExpenseClaimItem
from app.schemas.ontology import OntologyEntity, OntologyParseResult
from app.schemas.reimbursement import (
ExpenseClaimItemCreate,
ExpenseClaimItemUpdate,
ExpenseClaimUpdate,
TravelReimbursementCalculatorRequest,
)
from app.services.agent_asset_rule_library import AgentAssetRuleLibraryManager
from app.services.agent_asset_spreadsheet import RISK_RULES_LIBRARY
from app.services.agent_foundation import AgentFoundationService
from app.services.audit import AuditLogService
from app.services.document_intelligence import build_document_insight
from app.services.expense_claim_access_policy import ExpenseClaimAccessPolicy
from app.services.expense_claim_attachment_presentation import ExpenseClaimAttachmentPresentation
from app.services.expense_claim_attachment_storage import ExpenseClaimAttachmentStorage
from app.services.expense_claim_errors import ExpenseClaimSubmissionBlockedError
from app.services.expense_claim_constants import (
EXPENSE_TYPE_LABELS,
MAX_DRAFT_CLAIMS_PER_USER,
EDITABLE_CLAIM_STATUSES,
SYSTEM_GENERATED_ITEM_TYPES,
TRAVEL_DETAIL_ITEM_TYPES,
TRAVEL_ALLOWANCE_TRIGGER_ITEM_TYPES,
DOCUMENT_TYPE_ITEM_TYPE_MAP,
DOCUMENT_TYPE_SCENE_MAP,
DOCUMENT_FACT_ITEM_TYPES,
ROUTE_DESCRIPTION_ITEM_TYPES,
DOCUMENT_TRIP_DATE_LABELS,
DOCUMENT_TRIP_DATE_REQUIREMENT_LABELS,
DOCUMENT_TRIP_DATE_KEYS,
DOCUMENT_GENERIC_DATE_KEYS,
DOCUMENT_INVOICE_DATE_KEYS,
DOCUMENT_TRIP_DATE_LABEL_TOKENS,
DOCUMENT_GENERIC_DATE_LABEL_TOKENS,
DOCUMENT_INVOICE_DATE_LABEL_TOKENS,
DOCUMENT_ROUTE_FORMAT_PATTERN,
DOCUMENT_ROUTE_TEXT_PATTERN,
DOCUMENT_ROUTE_ORIGIN_LABELS,
DOCUMENT_ROUTE_DESTINATION_LABELS,
GENERIC_ATTACHMENT_BACKFILL_ITEM_TYPES,
LOCATION_REQUIRED_EXPENSE_TYPES,
EXPENSE_SCENE_KEYWORDS,
EXPENSE_TYPE_ALLOWED_DOCUMENT_SCENES,
DOCUMENT_SCENE_LABELS,
DOCUMENT_ASSOCIATION_REVIEW_ACTIONS,
PERSISTENT_EXPENSE_REVIEW_ACTIONS,
RETURN_REASON_OPTIONS,
MAX_CLAIM_NO_RETRY_ATTEMPTS,
DOCUMENT_DATE_PATTERN,
SYSTEM_GENERATED_REASON_PREFIXES,
LEADING_REASON_TIME_PATTERNS,
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
TRAVEL_REVIEW_RELEVANT_EXPENSE_TYPES,
TRAVEL_REVIEW_LONG_DISTANCE_DOCUMENT_TYPES,
TRAVEL_POLICY_CITY_TIERS,
TRAVEL_POLICY_CITY_MATCH_ORDER,
TRAVEL_POLICY_BAND_LABELS,
TRAVEL_POLICY_HOTEL_LIMITS,
TRAVEL_POLICY_ALLOWED_TRANSPORT_LEVELS,
TRAVEL_POLICY_ROUTE_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_STANDARD_EXCEPTION_KEYWORDS,
TRAVEL_POLICY_FLIGHT_CLASS_PATTERNS,
TRAVEL_POLICY_TRAIN_CLASS_PATTERNS,
TRAVEL_POLICY_HOTEL_NIGHT_PATTERN,
)
from app.services.expense_claim_risk_review import ExpenseClaimRiskReviewMixin
from app.services.expense_amounts import (
extract_amount_candidates,
format_decimal_amount,
is_amount_match_date_fragment,
is_date_like_amount_candidate,
is_probable_year_amount,
parse_document_amount_value,
parse_plain_document_amount_value,
resolve_document_field_amount,
resolve_document_item_amount,
resolve_document_text_amount,
)
from app.services.expense_rule_runtime import (
DEFAULT_SCENE_RULE_ASSET_CODE,
ExpenseRuleRuntimeService,
RuntimeTravelPolicy,
build_default_expense_rule_catalog,
resolve_document_type_label,
)
from app.services.ocr import OcrService
class ExpenseClaimReviewPreviewMixin:
def save_or_submit_from_ontology(
self,
*,
run_id: str,
user_id: str | None,
message: str,
ontology: OntologyParseResult,
context_json: dict[str, Any],
) -> dict[str, Any]:
review_action = str(context_json.get("review_action") or "").strip()
if review_action not in PERSISTENT_EXPENSE_REVIEW_ACTIONS:
return self._build_expense_review_preview_result(
user_id=user_id,
message=message,
ontology=ontology,
context_json=context_json,
)
result = self.upsert_draft_from_ontology(
run_id=run_id,
user_id=user_id,
message=message,
ontology=ontology,
context_json=context_json,
)
if review_action != "next_step":
return result
claim_id = str(result.get("claim_id") or "").strip()
if not claim_id or result.get("draft_limit_reached"):
return result
current_user = CurrentUserContext(
username=str(user_id or context_json.get("name") or "anonymous").strip() or "anonymous",
name=str(context_json.get("name") or user_id or "anonymous").strip() or "anonymous",
role_codes=[
str(item).strip()
for item in list(context_json.get("role_codes") or [])
if str(item).strip()
],
is_admin=bool(context_json.get("is_admin")),
department_name=str(context_json.get("department_name") or context_json.get("department") or "").strip(),
)
try:
claim = self.submit_claim(claim_id, current_user)
except ExpenseClaimSubmissionBlockedError as exc:
return {
**result,
"message": self._format_submission_blocked_message(exc.issues),
"submission_blocked": True,
"submission_blocked_reasons": exc.issues,
"missing_fields": exc.issues,
"draft_only": False,
}
except ValueError as exc:
message = str(exc)
return {
**result,
"message": message,
"submission_blocked": True,
"submission_blocked_reasons": [message] if message else [],
"missing_fields": [message] if message else [],
"draft_only": False,
}
if claim is None:
return {
**result,
"message": "未找到可提交的报销单,请刷新后重试。",
"submission_blocked": True,
"draft_only": False,
}
if str(claim.status or "").strip().lower() != "submitted":
review_message = ""
for flag in list(claim.risk_flags_json or []):
if not isinstance(flag, dict):
continue
if str(flag.get("source") or "").strip() != "submission_review":
continue
review_message = str(flag.get("message") or "").strip()
if review_message:
break
return {
"message": review_message or f"报销单 {claim.claim_no} 经 AI预审后转为待补充请先修正后再提交。",
"submission_blocked": True,
"draft_only": False,
"claim_id": claim.id,
"claim_no": claim.claim_no,
"status": claim.status,
"approval_stage": claim.approval_stage,
"amount": float(claim.amount),
"invoice_count": int(claim.invoice_count or 0),
}
return {
"message": (
f"报销单 {claim.claim_no} 已完成 AI预审"
f"当前节点为 {claim.approval_stage or '审批中'}"
),
"draft_only": False,
"claim_id": claim.id,
"claim_no": claim.claim_no,
"status": claim.status,
"approval_stage": claim.approval_stage,
"amount": float(claim.amount),
"invoice_count": int(claim.invoice_count or 0),
}
def _build_expense_review_preview_result(
self,
*,
user_id: str | None,
message: str,
ontology: OntologyParseResult,
context_json: dict[str, Any],
) -> dict[str, Any]:
attachment_count = self._resolve_attachment_count(context_json)
calculation_copy = self._build_expense_review_preview_calculation_copy(
user_id=user_id,
message=message,
ontology=ontology,
context_json=context_json,
)
return {
"message": "\n\n".join(
item
for item in [
"我已先整理出本次报销的待核对信息。下面是基于当前信息的制度测算,票据补齐后会按真实金额重新复核。",
calculation_copy,
]
if item
),
"draft_only": True,
"preview_only": True,
"status": "preview",
"invoice_count": attachment_count,
}
def _build_expense_review_preview_calculation_copy(
self,
*,
user_id: str | None,
message: str,
ontology: OntologyParseResult,
context_json: dict[str, Any],
) -> str:
expense_type = self._resolve_explicit_review_expense_type(context_json) or self._resolve_expense_type(
ontology.entities,
context_json=context_json,
)
if expense_type == "travel" or (
(not expense_type or expense_type == "other")
and self._should_preview_as_travel(message=message, context_json=context_json)
):
return self._build_travel_review_preview_calculation_copy(
user_id=user_id,
message=message,
ontology=ontology,
context_json=context_json,
)
amount = self._resolve_amount(ontology.entities, context_json=context_json) or Decimal("0.00")
expense_label = EXPENSE_TYPE_LABELS.get(str(expense_type or "").strip(), "当前费用")
return "\n".join(
[
"报销测算参考:",
"",
"| 项目 | 当前信息 | 复核口径 |",
"| --- | --- | --- |",
f"| 费用类型 | {expense_label} | 匹配规则中心对应费用标准 |",
f"| 票据金额 | {self._format_decimal_amount(amount)} 元 | 以真实票据识别金额和用户确认金额为准 |",
"| 规则校验 | 待票据和关键信息补齐 | 按费用类型、发生地点、业务事由和审批口径复核 |",
]
)
def _build_travel_review_preview_calculation_copy(
self,
*,
user_id: str | None,
message: str,
ontology: OntologyParseResult,
context_json: dict[str, Any],
) -> str:
location = self._resolve_location(message=message, context_json=context_json) or "待确认"
occurred_at = self._resolve_occurred_at(ontology, context_json=context_json) or datetime.now(UTC)
days, _, _ = self._resolve_travel_allowance_days(
context_json=context_json,
occurred_at=occurred_at,
)
amount = self._resolve_amount(ontology.entities, context_json=context_json) or Decimal("0.00")
employee = self._resolve_employee(
ontology=ontology,
context_json=context_json,
user_id=user_id,
)
grade = str(
context_json.get("employee_grade")
or context_json.get("grade")
or context_json.get("user_grade")
or (employee.grade if employee is not None else "")
or ""
).strip()
if location == "待确认" or not grade:
return "\n".join(
[
"报销测算参考:",
"",
"| 项目 | 当前信息 | 测算说明 |",
"| --- | --- | --- |",
f"| 出差地点 | {location} | 用于匹配城市住宿标准和补贴区域 |",
f"| 出差天数 | {days} 天 | 来自业务发生时间或用户描述 |",
f"| 职级 | {grade or '待确认'} | 补齐后才能匹配住宿标准和补贴档位 |",
f"| 交通票据 | {self._format_decimal_amount(amount)} 元 | 上传票据后按真实金额重新复核 |",
]
)
try:
from app.services.travel_reimbursement_calculator import (
TravelReimbursementCalculatorService,
)
result = TravelReimbursementCalculatorService(self.db).calculate(
TravelReimbursementCalculatorRequest(days=days, location=location, grade=grade),
CurrentUserContext(
username=str(user_id or context_json.get("name") or "anonymous").strip() or "anonymous",
name=str(context_json.get("name") or user_id or "anonymous").strip() or "anonymous",
role_codes=[],
is_admin=False,
),
)
except ValueError:
return "\n".join(
[
"报销测算参考:",
"",
"| 项目 | 当前信息 | 测算说明 |",
"| --- | --- | --- |",
f"| 出差地点 | {location} | 暂时未能匹配规则中心地点 |",
f"| 出差天数 | {days} 天 | 来自业务发生时间或用户描述 |",
f"| 职级 | {grade} | 暂时无法自动匹配差旅标准 |",
f"| 交通票据 | {self._format_decimal_amount(amount)} 元 | 上传票据后按真实金额重新复核 |",
]
)
ticket_amount = amount.quantize(Decimal("0.01"))
total_amount = (
ticket_amount
+ Decimal(result.hotel_amount or Decimal("0.00"))
+ Decimal(result.allowance_amount or Decimal("0.00"))
).quantize(Decimal("0.01"))
ticket_basis = "当前未上传交通票据,先按 0.00 元占位" if ticket_amount <= Decimal("0.00") else "已识别或填写的交通票据金额"
return "\n".join(
[
"报销测算参考:",
"",
f"职级 {grade},目的地 {location},匹配城市 {result.matched_city};补齐交通、酒店等票据后,我会按真实票据金额和规则中心标准重新复核。",
"",
"| 项目 | 测算口径 | 金额 |",
"| --- | --- | ---: |",
f"| 交通票据 | {ticket_basis} | {self._format_decimal_amount(ticket_amount)} 元 |",
f"| 住宿标准 | {self._format_decimal_amount(result.hotel_rate)} 元/天 × {days} 天 | {self._format_decimal_amount(result.hotel_amount)} 元 |",
f"| 出差补贴 | {self._format_decimal_amount(result.total_allowance_rate)} 元/天 × {days} 天 | {self._format_decimal_amount(result.allowance_amount)} 元 |",
f"| 参考合计 | 交通票据 + 住宿标准 + 出差补贴 | {self._format_decimal_amount(total_amount)} 元 |",
]
)
@staticmethod
def _should_preview_as_travel(*, message: str, context_json: dict[str, Any]) -> bool:
text_parts = [message]
review_form_values = context_json.get("review_form_values")
if isinstance(review_form_values, dict):
text_parts.extend(str(value or "") for value in review_form_values.values())
text_parts.extend(str(context_json.get(key) or "") for key in ("user_input_text", "raw_text", "ocr_summary"))
compact = "".join(text_parts)
return any(keyword in compact for keyword in ("差旅", "出差", "火车票", "机票", "酒店", "住宿票"))

View File

@@ -0,0 +1,177 @@
from __future__ import annotations
from datetime import UTC, datetime, timedelta
from typing import Any
from sqlalchemy import or_, select
from app.models.financial_record import ExpenseClaim
from app.services.expense_claim_constants import (
AI_REVIEW_LOOKBACK_DAYS,
AI_REVIEW_REPEAT_RISK_BLOCK_COUNT,
AI_REVIEW_REPEAT_RISK_WARNING_COUNT,
)
from app.services.expense_claim_item_sync import ExpenseClaimItemSyncMixin
from app.services.expense_claim_platform_risk import ExpenseClaimPlatformRiskMixin
from app.services.expense_claim_policy_review import ExpenseClaimPolicyReviewMixin
class ExpenseClaimRiskReviewMixin(
ExpenseClaimPlatformRiskMixin,
ExpenseClaimPolicyReviewMixin,
ExpenseClaimItemSyncMixin,
):
def _run_ai_submission_review(self, claim: ExpenseClaim) -> dict[str, Any]:
base_flags = list(claim.risk_flags_json or [])
attachment_flags = [
flag
for flag in base_flags
if isinstance(flag, dict) and str(flag.get("source") or "").strip() == "attachment_analysis"
]
preserved_flags = [
flag
for flag in base_flags
if not (isinstance(flag, dict) and str(flag.get("source") or "").strip() == "submission_review")
]
review_flags: list[dict[str, Any]] = []
attention_reasons: list[str] = []
high_attachment_flags = [
flag
for flag in attachment_flags
if str(flag.get("severity") or "").strip().lower() == "high"
]
medium_attachment_flags = [
flag
for flag in attachment_flags
if str(flag.get("severity") or "").strip().lower() == "medium"
]
if high_attachment_flags:
attention_reasons.append("存在高风险票据,需审批人重点复核。")
review_flags.append(
{
"source": "submission_review",
"severity": "high",
"label": "AI预审重点复核",
"message": (
f"AI预审发现 {len(high_attachment_flags)} 条高风险附件,"
"已随单流转给审批人重点复核。"
),
}
)
elif medium_attachment_flags:
review_flags.append(
{
"source": "submission_review",
"severity": "medium",
"label": "AI预审提醒",
"message": f"AI预审发现 {len(medium_attachment_flags)} 条中风险附件,已随单流转给审批人复核。",
}
)
manager_name = self._resolve_claim_manager_name(claim)
if not manager_name:
attention_reasons.append("未识别到该员工的直属领导,需审批环节补充分配。")
review_flags.append(
{
"source": "submission_review",
"severity": "medium",
"label": "审批链待分配",
"message": "AI预审发现直属领导缺失已提交到审批环节等待分配或复核。",
}
)
historical_risk_count = self._count_recent_risky_claims(claim)
if historical_risk_count >= AI_REVIEW_REPEAT_RISK_BLOCK_COUNT:
review_flags.append(
{
"source": "submission_review",
"severity": "medium",
"label": "历史风险偏高",
"message": (
f"{AI_REVIEW_LOOKBACK_DAYS} 天内该员工已有 {historical_risk_count} 笔带风险标记的报销,"
"本次已追加到审批链重点关注。"
),
}
)
elif historical_risk_count >= AI_REVIEW_REPEAT_RISK_WARNING_COUNT:
review_flags.append(
{
"source": "submission_review",
"severity": "low",
"label": "历史风险提醒",
"message": (
f"{AI_REVIEW_LOOKBACK_DAYS} 天内该员工已有 {historical_risk_count} 笔带风险标记的报销,"
"建议直属领导重点复核。"
),
}
)
travel_review = self._run_travel_policy_review(claim)
attention_reasons.extend(travel_review["blocking_reasons"])
review_flags.extend(travel_review["flags"])
scene_policy_review = self._run_scene_policy_review(claim)
attention_reasons.extend(scene_policy_review["blocking_reasons"])
review_flags.extend(scene_policy_review["flags"])
platform_risk_review = self.evaluate_platform_risk_rules(claim)
attention_reasons.extend(platform_risk_review["blocking_reasons"])
review_flags.extend(platform_risk_review["flags"])
if attention_reasons:
summary_message = "AI预审发现需审批重点关注事项" + "".join(
dict.fromkeys(attention_reasons)
)
review_flags.insert(
0,
{
"source": "submission_review",
"severity": "medium",
"label": "AI预审重点复核",
"message": summary_message,
},
)
return {
"status": "submitted",
"approval_stage": "直属领导审批",
"risk_flags": preserved_flags + review_flags,
"message": (
f"报销单 {claim.claim_no} 已完成 AI预审"
f"现已提交给直属领导 {manager_name or '审批人'} 审批。"
),
"passed": True,
}
@staticmethod
def _resolve_claim_manager_name(claim: ExpenseClaim) -> str:
if claim.employee is not None:
if claim.employee.manager is not None and claim.employee.manager.name:
return str(claim.employee.manager.name).strip()
if claim.employee.organization_unit is not None and claim.employee.organization_unit.manager_name:
return str(claim.employee.organization_unit.manager_name).strip()
return ""
def _count_recent_risky_claims(self, claim: ExpenseClaim) -> int:
filters = []
if claim.employee_id:
filters.append(ExpenseClaim.employee_id == claim.employee_id)
elif claim.employee_name:
filters.append(ExpenseClaim.employee_name == claim.employee_name)
if not filters:
return 0
since = datetime.now(UTC) - timedelta(days=AI_REVIEW_LOOKBACK_DAYS)
stmt = (
select(ExpenseClaim)
.where(or_(*filters))
.where(ExpenseClaim.id != claim.id)
.where(ExpenseClaim.occurred_at >= since)
)
recent_claims = list(self.db.scalars(stmt).all())
return sum(1 for item in recent_claims if list(item.risk_flags_json or []))

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,299 @@
from __future__ import annotations
import re
from typing import Any
EXPENSE_RULE_CODE_BLOCK_PATTERN = re.compile(r"```expense-rule\s*(\{.*?\})\s*```", re.DOTALL)
DOCUMENT_TYPE_LABELS = {
"flight_itinerary": "机票/航班行程单",
"train_ticket": "火车/高铁票",
"hotel_invoice": "酒店住宿票据",
"taxi_receipt": "出租车/网约车票据",
"parking_toll_receipt": "停车/通行费票据",
"meal_receipt": "餐饮票据",
"office_invoice": "办公用品票据",
"meeting_invoice": "会议/会务票据",
"training_invoice": "培训票据",
"vat_invoice": "增值税发票",
"receipt": "一般收据/凭证",
"other": "其他单据",
}
SCENE_LABELS = {
"travel": "差旅",
"hotel": "住宿",
"transport": "交通",
"meal": "餐饮",
"entertainment": "业务招待",
"office": "办公",
"meeting": "会务",
"training": "培训",
"communication": "通讯",
"welfare": "福利",
"other": "其他",
}
DEFAULT_SCENE_RULE_ASSET_CODE = "rule.expense.scene_submission_standard"
DEFAULT_TRAVEL_RULE_ASSET_CODE = "rule.expense.travel_risk_control_standard"
DEFAULT_SCENE_MATRIX_CONFIG: dict[str, Any] = {
"kind": "scene_matrix",
"version": 1,
"scenes": {
"travel": {
"label": "差旅费",
"location_required": True,
"min_attachment_count": 1,
"allowed_scene_codes": ["travel"],
"allowed_document_types": ["flight_itinerary", "train_ticket"],
"attachment_mismatch_severity": "high",
},
"hotel": {
"label": "住宿费",
"location_required": False,
"min_attachment_count": 1,
"allowed_scene_codes": ["hotel"],
"allowed_document_types": ["hotel_invoice", "vat_invoice", "receipt"],
"attachment_mismatch_severity": "high",
},
"transport": {
"label": "交通费",
"location_required": False,
"min_attachment_count": 1,
"allowed_scene_codes": ["transport"],
"allowed_document_types": ["taxi_receipt", "parking_toll_receipt", "vat_invoice", "receipt"],
"attachment_mismatch_severity": "high",
"item_amount_limit": {
"scope": "item_amount",
"warn_amount": "300.00",
"block_amount": "800.00",
"exception_keywords": ["跨城", "夜间", "应急", "无公共交通", "机场", "火车站", "超标说明"],
"metric_label": "单笔交通金额",
},
},
"meal": {
"label": "餐费",
"location_required": False,
"min_attachment_count": 1,
"allowed_scene_codes": ["meal"],
"allowed_document_types": ["meal_receipt", "vat_invoice", "receipt"],
"attachment_mismatch_severity": "high",
"claim_amount_limit": {
"scope": "claim_total",
"warn_amount": "300.00",
"block_amount": "800.00",
"exception_keywords": ["客户接待", "团队活动", "加班", "展会", "超标说明"],
"metric_label": "餐费合计",
},
},
"entertainment": {
"label": "业务招待费",
"location_required": True,
"min_attachment_count": 1,
"allowed_scene_codes": ["meal"],
"allowed_document_types": ["meal_receipt", "vat_invoice", "receipt"],
"attachment_mismatch_severity": "high",
"claim_amount_limit": {
"scope": "claim_total",
"warn_amount": "2000.00",
"block_amount": "5000.00",
"exception_keywords": ["重要客户", "商务宴请", "项目签约", "超标说明"],
"metric_label": "招待费合计",
},
},
"office": {
"label": "办公费",
"location_required": False,
"min_attachment_count": 1,
"allowed_scene_codes": ["office"],
"allowed_document_types": ["office_invoice", "vat_invoice", "receipt"],
"attachment_mismatch_severity": "high",
"claim_amount_limit": {
"scope": "claim_total",
"warn_amount": "1500.00",
"block_amount": "5000.00",
"exception_keywords": ["批量采购", "固定资产", "部门集中采购", "超标说明"],
"metric_label": "办公费合计",
},
},
"meeting": {
"label": "会务费",
"location_required": True,
"min_attachment_count": 1,
"allowed_scene_codes": ["meeting"],
"allowed_document_types": ["meeting_invoice", "vat_invoice", "receipt"],
"attachment_mismatch_severity": "high",
"claim_amount_limit": {
"scope": "claim_total",
"warn_amount": "5000.00",
"block_amount": "30000.00",
"exception_keywords": ["大型会议", "外部场地", "超标说明"],
"metric_label": "会务费合计",
},
},
"training": {
"label": "培训费",
"location_required": False,
"min_attachment_count": 1,
"allowed_scene_codes": ["training"],
"allowed_document_types": ["training_invoice", "vat_invoice", "receipt"],
"attachment_mismatch_severity": "high",
"claim_amount_limit": {
"scope": "claim_total",
"warn_amount": "3000.00",
"block_amount": "15000.00",
"exception_keywords": ["认证考试", "外部培训", "超标说明"],
"metric_label": "培训费合计",
},
},
"communication": {
"label": "通讯费",
"location_required": False,
"min_attachment_count": 1,
"allowed_scene_codes": ["other"],
"allowed_document_types": ["vat_invoice", "receipt"],
"attachment_mismatch_severity": "medium",
"claim_amount_limit": {
"scope": "claim_total",
"warn_amount": "300.00",
"block_amount": "1000.00",
"exception_keywords": ["国际漫游", "专项通信", "超标说明"],
"metric_label": "通讯费合计",
},
},
"welfare": {
"label": "福利费",
"location_required": False,
"min_attachment_count": 1,
"allowed_scene_codes": ["other"],
"allowed_document_types": ["vat_invoice", "receipt"],
"attachment_mismatch_severity": "medium",
"claim_amount_limit": {
"scope": "claim_total",
"warn_amount": "1000.00",
"block_amount": "5000.00",
"exception_keywords": ["节日福利", "团队活动", "员工关怀", "超标说明"],
"metric_label": "福利费合计",
},
},
"other": {
"label": "其他费用",
"location_required": False,
"min_attachment_count": 1,
"allowed_scene_codes": ["other"],
"allowed_document_types": ["vat_invoice", "receipt"],
"attachment_mismatch_severity": "medium",
"always_warn": True,
"always_warn_message": "其他费用默认进入人工重点复核,请补充清晰用途说明并由审批人重点确认。",
"claim_amount_limit": {
"scope": "claim_total",
"warn_amount": "1000.00",
"block_amount": "3000.00",
"exception_keywords": ["特殊事项", "临时采购", "超标说明"],
"metric_label": "其他费用合计",
},
},
},
}
DEFAULT_TRAVEL_POLICY_CONFIG: dict[str, Any] = {
"kind": "travel_policy",
"version": 1,
"relevant_expense_types": ["travel", "hotel", "transport"],
"long_distance_document_types": ["flight_itinerary", "train_ticket"],
"route_exception_keywords": [
"中转",
"转机",
"经停",
"改签",
"多地出差",
"多城市",
"多站",
"异地返程",
"异地结束",
"临时变更",
"继续前往",
"第二站",
],
"standard_exception_keywords": [
"超标说明",
"无直达",
"展会高峰",
"会议高峰",
"协议酒店满房",
"客户指定",
"临时改签",
"行程变更",
"红眼航班",
"晚到店",
],
"band_labels": {
"junior": "P1-P3",
"mid": "P4-P5",
"senior": "P6-P7",
"manager": "M1-M2",
"executive": "M3及以上 / D序列",
},
"city_tiers": {
"北京": "tier_1",
"上海": "tier_1",
"广州": "tier_1",
"深圳": "tier_1",
"杭州": "tier_2",
"南京": "tier_2",
"苏州": "tier_2",
"武汉": "tier_2",
"成都": "tier_2",
"重庆": "tier_2",
"西安": "tier_2",
"天津": "tier_2",
"宁波": "tier_2",
"厦门": "tier_2",
"青岛": "tier_2",
"长沙": "tier_2",
"郑州": "tier_2",
"合肥": "tier_2",
"济南": "tier_2",
"沈阳": "tier_2",
"大连": "tier_2",
"福州": "tier_2",
"昆明": "tier_2",
"海口": "tier_2",
"三亚": "tier_2",
"无锡": "tier_2",
"东莞": "tier_2",
"佛山": "tier_2",
},
"hotel_limits": {
"junior": {"tier_1": "450.00", "tier_2": "380.00", "tier_3": "320.00"},
"mid": {"tier_1": "550.00", "tier_2": "480.00", "tier_3": "380.00"},
"senior": {"tier_1": "700.00", "tier_2": "620.00", "tier_3": "520.00"},
"manager": {"tier_1": "900.00", "tier_2": "820.00", "tier_3": "720.00"},
"executive": {"tier_1": "1200.00", "tier_2": "1000.00", "tier_3": "900.00"},
},
"transport_limits": {
"junior": {"flight": 1, "train": 1},
"mid": {"flight": 1, "train": 1},
"senior": {"flight": 2, "train": 2},
"manager": {"flight": 3, "train": 3},
"executive": {"flight": 4, "train": 3},
},
"flight_classes": [
{"keyword": "头等舱", "level": 4},
{"keyword": "公务舱", "level": 3},
{"keyword": "商务舱", "level": 3},
{"keyword": "超级经济舱", "level": 2},
{"keyword": "高端经济舱", "level": 2},
{"keyword": "明珠经济舱", "level": 2},
{"keyword": "经济舱", "level": 1},
],
"train_classes": [
{"keyword": "商务座", "level": 3},
{"keyword": "一等座", "level": 2},
{"keyword": "软卧", "level": 2},
{"keyword": "二等座", "level": 1},
{"keyword": "二等卧", "level": 1},
{"keyword": "硬卧", "level": 1},
],
}

View File

@@ -0,0 +1,116 @@
from __future__ import annotations
from dataclasses import dataclass, field
from decimal import Decimal
from typing import Literal
from pydantic import BaseModel, Field
from app.services.expense_rule_runtime_defaults import (
DEFAULT_SCENE_MATRIX_CONFIG,
DEFAULT_SCENE_RULE_ASSET_CODE,
DEFAULT_TRAVEL_POLICY_CONFIG,
DEFAULT_TRAVEL_RULE_ASSET_CODE,
DOCUMENT_TYPE_LABELS,
)
class AmountLimitConfig(BaseModel):
scope: Literal["claim_total", "item_amount"] = "claim_total"
warn_amount: Decimal | None = None
block_amount: Decimal | None = None
exception_keywords: list[str] = Field(default_factory=list)
metric_label: str = "金额"
class ScenePolicyConfig(BaseModel):
label: str
location_required: bool = False
min_attachment_count: int = 1
allowed_scene_codes: list[str] = Field(default_factory=list)
allowed_document_types: list[str] = Field(default_factory=list)
attachment_mismatch_severity: Literal["low", "medium", "high"] = "high"
claim_amount_limit: AmountLimitConfig | None = None
item_amount_limit: AmountLimitConfig | None = None
always_warn: bool = False
always_warn_message: str = ""
class SceneMatrixRuleConfig(BaseModel):
kind: Literal["scene_matrix"]
version: int = 1
scenes: dict[str, ScenePolicyConfig]
class TravelClassConfig(BaseModel):
keyword: str
level: int
class TravelPolicyConfig(BaseModel):
kind: Literal["travel_policy"]
version: int = 1
relevant_expense_types: list[str] = Field(default_factory=list)
long_distance_document_types: list[str] = Field(default_factory=list)
route_exception_keywords: list[str] = Field(default_factory=list)
standard_exception_keywords: list[str] = Field(default_factory=list)
band_labels: dict[str, str] = Field(default_factory=dict)
city_tiers: dict[str, str] = Field(default_factory=dict)
hotel_limits: dict[str, dict[str, Decimal]] = Field(default_factory=dict)
hotel_city_limits: dict[str, dict[str, Decimal]] = Field(default_factory=dict)
allowance_limits: dict[str, dict[str, Decimal]] = Field(default_factory=dict)
standard_rule_code: str = ""
standard_rule_name: str = ""
standard_rule_version: str = ""
transport_limits: dict[str, dict[str, int]] = Field(default_factory=dict)
flight_classes: list[TravelClassConfig] = Field(default_factory=list)
train_classes: list[TravelClassConfig] = Field(default_factory=list)
class ExpenseScenePolicy(ScenePolicyConfig):
expense_type: str
rule_code: str
rule_name: str
rule_version: str
class RuntimeTravelPolicy(TravelPolicyConfig):
rule_code: str
rule_name: str
rule_version: str
@dataclass
class ExpenseRuleCatalog:
scene_policies: dict[str, ExpenseScenePolicy] = field(default_factory=dict)
travel_policy: RuntimeTravelPolicy | None = None
def get_scene_policy(self, expense_type: str | None) -> ExpenseScenePolicy | None:
normalized = str(expense_type or "").strip().lower() or "other"
return self.scene_policies.get(normalized)
def resolve_document_type_label(document_type: str | None) -> str:
normalized = str(document_type or "").strip().lower() or "other"
return DOCUMENT_TYPE_LABELS.get(normalized, normalized or "其他单据")
def build_default_expense_rule_catalog() -> ExpenseRuleCatalog:
catalog = ExpenseRuleCatalog()
scene_matrix = SceneMatrixRuleConfig.model_validate(DEFAULT_SCENE_MATRIX_CONFIG)
for expense_type, config in scene_matrix.scenes.items():
catalog.scene_policies[expense_type] = ExpenseScenePolicy(
expense_type=expense_type,
rule_code=DEFAULT_SCENE_RULE_ASSET_CODE,
rule_name="报销场景提交与附件标准",
rule_version="v1.0.0",
**config.model_dump(),
)
travel_policy = TravelPolicyConfig.model_validate(DEFAULT_TRAVEL_POLICY_CONFIG)
catalog.travel_policy = RuntimeTravelPolicy(
rule_code=DEFAULT_TRAVEL_RULE_ASSET_CODE,
rule_name="差旅报销风险管控制度",
rule_version="v1.1.0",
**travel_policy.model_dump(),
)
return catalog

View File

@@ -0,0 +1,166 @@
from __future__ import annotations
import json
from app.services.expense_rule_runtime_defaults import (
DEFAULT_SCENE_MATRIX_CONFIG,
DEFAULT_TRAVEL_POLICY_CONFIG,
SCENE_LABELS,
)
from app.services.expense_rule_runtime_models import (
SceneMatrixRuleConfig,
resolve_document_type_label,
)
def build_scene_submission_standard_markdown() -> str:
scene_matrix = SceneMatrixRuleConfig.model_validate(DEFAULT_SCENE_MATRIX_CONFIG)
sections: list[str] = [
"# 报销场景提交与附件标准",
"",
"## 模板信息",
"",
"- 模板类型:系统内置场景矩阵规则",
"- 运行时类型:`scene_matrix`",
"- 适用对象:报销提交与附件校验",
"",
"## 目标",
"",
"统一约束各报销场景的必填字段、附件类型和金额预警口径,在上传附件和提交审核两个时点直接输出可执行风险判断。",
"",
"## 适用范围",
"",
"适用于差旅、住宿、交通、餐费、业务招待、办公、会务、培训、通讯、福利和其他费用场景。",
"",
"## 输入字段",
"",
"- expense_type",
"- attachments",
"- location",
"- amount / item_amount",
"- reason",
"",
"## 判断规则",
"",
]
for index, (expense_type, config) in enumerate(scene_matrix.scenes.items(), start=1):
expected_document_labels = "".join(
resolve_document_type_label(item) for item in config.allowed_document_types
)
expected_scene_labels = "".join(
SCENE_LABELS.get(item, item) for item in config.allowed_scene_codes
)
sections.extend(
[
f"### 规则 {index} {config.label}`{expense_type}`",
"",
f"- 业务地点:{'必填' if config.location_required else '非必填'}",
f"- 最少附件数:{config.min_attachment_count}",
f"- 允许识别场景:{expected_scene_labels or '不限制'}",
f"- 允许附件类型:{expected_document_labels or '不限制'}",
f"- 附件不匹配处理:{config.attachment_mismatch_severity.upper()}",
]
)
if config.claim_amount_limit is not None:
sections.append(
f"- 合计金额阈值:预警 {config.claim_amount_limit.warn_amount or '-'} 元,"
f"拦截 {config.claim_amount_limit.block_amount or '-'}"
)
if config.item_amount_limit is not None:
sections.append(
f"- 单笔金额阈值:预警 {config.item_amount_limit.warn_amount or '-'} 元,"
f"拦截 {config.item_amount_limit.block_amount or '-'}"
)
if config.always_warn and config.always_warn_message:
sections.append(f"- 特殊处理:{config.always_warn_message}")
sections.append("")
sections.extend(
[
"## 输出",
"",
"- 命中高风险时退回待补充。",
"- 命中中风险时继续流转,并提示审批人重点复核。",
"- 命中 always_warn 场景时追加人工重点复核提示。",
"",
"## 来源依据",
"",
"- 公司报销制度中关于场景识别、附件要求、金额阈值和人工复核的统一口径。",
"",
"## 审核约束",
"",
"- 当前规则为系统内置真实运行规则,变更后需重新审核并评估回滚影响。",
"- 规则 JSON 与 Markdown 说明必须保持一致。",
"",
"## 管理员备注",
"",
"如后续制度调整附件类型、金额阈值或人工复核口径,应优先修改运行时 JSON 并同步更新说明。",
"",
"```expense-rule",
json.dumps(DEFAULT_SCENE_MATRIX_CONFIG, ensure_ascii=False, indent=2),
"```",
]
)
return "\n".join(sections)
def build_travel_risk_control_standard_markdown() -> str:
return "\n".join(
[
"# 差旅报销风险管控制度",
"",
"## 模板信息",
"",
"- 模板键:`travel_standard_v1`",
"- 运行时类型:`travel_policy`",
"- 适用对象:差旅、住宿、交通相关报销审核",
"",
"## 目标",
"",
"校验差旅行程闭环、酒店地点一致性、住宿标准、飞机舱位和火车席别是否符合制度,并对例外情况保留人工复核入口。",
"",
"## 适用范围",
"",
"适用于差旅费、住宿费和交通费相关报销单,重点覆盖跨城市出差、改签、中转和超标说明场景。",
"",
"## 输入字段",
"",
"- expense_type",
"- attachments / OCR routes",
"- location",
"- employee_grade",
"- reason",
"",
"## 判断规则",
"",
"- 两段及以上长途交通票据必须首尾衔接。",
"- 最终终点应与申报目的地一致,或返回首段出发城市。",
"- 检测到多城市行程但无说明时,按高风险退回待补充。",
"- 酒店城市必须落在目的地或交通链路停留城市中。",
"- 住宿标准、飞机舱位和火车席别按职级与城市分级执行。",
"- 超标但有说明时记为中风险;超标且无说明时记为高风险。",
"",
"## 输出",
"",
"- 行程异常时输出高风险退回。",
"- 差标超限但有合理说明时输出中风险提醒。",
"- 命中差旅制度规则时,保留 `rule_code` 和 `rule_version` 供审批链追踪。",
"",
"## 来源依据",
"",
"- 公司差旅制度关于行程闭环、酒店地点一致性、职级差标和例外说明的规定。",
"",
"## 审核约束",
"",
"- 当前规则为系统内置真实运行规则,修改前需确认差旅制度版本与灰度回滚方案。",
"- 规则 JSON 与 Markdown 说明必须保持一致。",
"",
"## 管理员备注",
"",
"如制度调整职级带、城市分级或交通等级,应先更新运行时 JSON再同步修改本说明。",
"",
"```expense-rule",
json.dumps(DEFAULT_TRAVEL_POLICY_CONFIG, ensure_ascii=False, indent=2),
"```",
]
)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,66 @@
from __future__ import annotations
FIXED_KNOWLEDGE_FOLDERS = [
"财务知识库",
"制度政策",
"报销制度",
"差旅规范",
"发票管理",
"税务合规",
"预算管理",
"财务共享",
"培训资料",
"常见问答",
]
ICON_BY_TYPE = {
"pdf": "mdi mdi-file-document-outline-pdf pdf",
"word": "mdi mdi-file-document-outline-word word",
"excel": "mdi mdi-file-document-outline-excel excel",
"ppt": "mdi mdi-file-powerpoint-box ppt",
"image": "mdi mdi-file-image-outline image",
"text": "mdi mdi-file-document-outline text",
"archive": "mdi mdi-folder-zip-outline archive",
"binary": "mdi mdi-file-outline",
}
TEXT_EXTENSIONS = {"txt", "md", "csv", "json", "xml", "yml", "yaml", "log"}
WORD_EXTENSIONS = {"doc", "docx"}
EXCEL_EXTENSIONS = {"xls", "xlsx", "csv"}
PPT_EXTENSIONS = {"ppt", "pptx"}
IMAGE_EXTENSIONS = {"png", "jpg", "jpeg", "gif", "bmp", "webp", "svg"}
ARCHIVE_EXTENSIONS = {"zip", "rar", "7z"}
STRUCTURED_PREVIEW_EXTENSIONS = {"docx", "xlsx", "pptx"} | TEXT_EXTENSIONS
INLINE_PREVIEW_EXTENSIONS = {"pdf"} | IMAGE_EXTENSIONS
ONLYOFFICE_EDITABLE_EXTENSIONS = {"docx", "xlsx", "pptx"}
KNOWLEDGE_INGEST_SYNC_STALE_SECONDS = 90
KNOWLEDGE_SEARCH_RESULT_LIMIT = 3
KNOWLEDGE_SEARCH_STOP_TERMS = {
"什么",
"怎么",
"如何",
"多少",
"是否",
"可以",
"一下",
"请问",
"帮我",
"一下子",
"这个",
"那个",
"哪些",
"一下吧",
}
KNOWLEDGE_INGEST_STATUS_PUBLISHED = 1
KNOWLEDGE_INGEST_STATUS_SYNCING = 2
KNOWLEDGE_INGEST_STATUS_INGESTED = 3
KNOWLEDGE_INGEST_STATUS_FAILED = 4
KNOWLEDGE_INGEST_STATUS_META = {
KNOWLEDGE_INGEST_STATUS_PUBLISHED: ("待归纳", "muted"),
KNOWLEDGE_INGEST_STATUS_SYNCING: ("正归纳", "warning"),
KNOWLEDGE_INGEST_STATUS_INGESTED: ("已归纳", "success"),
KNOWLEDGE_INGEST_STATUS_FAILED: ("归纳失败", "danger"),
}

View File

@@ -0,0 +1,223 @@
from __future__ import annotations
import re
import shutil
import subprocess
from pathlib import Path
from xml.etree import ElementTree
from zipfile import BadZipFile, ZipFile
from app.services.knowledge_constants import IMAGE_EXTENSIONS, TEXT_EXTENSIONS
from app.services.knowledge_file_utils import extract_extension
def _read_text_preview(file_path: Path) -> str:
encodings = ("utf-8", "utf-8-sig", "gbk")
for encoding in encodings:
try:
return file_path.read_text(encoding=encoding)
except UnicodeDecodeError:
continue
return "当前文本文件编码暂不支持在线解析。"
def _extract_docx_text(file_path: Path) -> str:
try:
with ZipFile(file_path) as archive:
xml_content = archive.read("word/document.xml")
except (BadZipFile, KeyError):
return "当前 Word 文件解析失败。"
root = ElementTree.fromstring(xml_content)
texts = [node.text.strip() for node in root.iter() if node.tag.endswith("}t") and node.text]
return "\n".join(texts)
def _extract_document_text_from_path(
*,
file_path: Path,
original_name: str,
mime_type: str,
) -> str:
extension = extract_extension(original_name)
if extension in TEXT_EXTENSIONS:
return _normalize_extracted_text(_read_text_preview(file_path))
if extension == "docx":
return _normalize_extracted_text(_extract_docx_text(file_path))
if extension == "pdf":
text = _normalize_extracted_text(_extract_pdf_text(file_path))
if text:
return text
return _normalize_extracted_text(
_extract_text_with_ocr(
file_path=file_path,
original_name=original_name,
mime_type=mime_type,
)
)
if extension in IMAGE_EXTENSIONS:
return _normalize_extracted_text(
_extract_text_with_ocr(
file_path=file_path,
original_name=original_name,
mime_type=mime_type,
)
)
return ""
def _normalize_extracted_text(text: str) -> str:
normalized = str(text or "").replace("\r\n", "\n").replace("\r", "\n")
normalized = re.sub(r"\n{3,}", "\n\n", normalized)
return normalized.strip()
def _extract_pdf_text(file_path: Path) -> str:
pdftotext_bin = shutil.which("pdftotext")
if not pdftotext_bin:
return ""
completed = subprocess.run(
[pdftotext_bin, "-layout", str(file_path), "-"],
capture_output=True,
text=True,
timeout=40,
check=False,
)
if completed.returncode != 0:
return ""
return str(completed.stdout or "")
def _extract_text_with_ocr(
*,
file_path: Path,
original_name: str,
mime_type: str,
) -> str:
try:
from app.services.ocr import OcrService
result = OcrService().recognize_files(
[(original_name, file_path.read_bytes(), mime_type)]
)
except Exception:
return ""
parts: list[str] = []
for document in result.documents:
text = str(getattr(document, "text", "") or "").strip()
summary = str(getattr(document, "summary", "") or "").strip()
if text:
parts.append(text)
elif summary:
parts.append(summary)
return "\n\n".join(part for part in parts if part)
def _extract_xlsx_sheets(file_path: Path) -> list[tuple[str, list[list[str]]]]:
try:
with ZipFile(file_path) as archive:
shared_strings: list[str] = []
if "xl/sharedStrings.xml" in archive.namelist():
shared_root = ElementTree.fromstring(archive.read("xl/sharedStrings.xml"))
shared_strings = [
"".join(node.itertext()).strip()
for node in shared_root.iter()
if node.tag.endswith("}si")
]
sheet_files = sorted(
name
for name in archive.namelist()
if re.fullmatch(r"xl/worksheets/sheet\d+\.xml", name)
)
if not sheet_files:
return []
relationship_targets: dict[str, str] = {}
if "xl/_rels/workbook.xml.rels" in archive.namelist():
rel_root = ElementTree.fromstring(archive.read("xl/_rels/workbook.xml.rels"))
for node in rel_root.iter():
if not node.tag.endswith("Relationship"):
continue
rel_id = node.attrib.get("Id")
target = node.attrib.get("Target")
if not rel_id or not target:
continue
normalized = target.lstrip("/")
if not normalized.startswith("xl/"):
normalized = f"xl/{normalized.lstrip('./')}"
relationship_targets[rel_id] = normalized
ordered_sheets: list[tuple[str, str]] = []
if "xl/workbook.xml" in archive.namelist():
workbook_root = ElementTree.fromstring(archive.read("xl/workbook.xml"))
for index, node in enumerate(workbook_root.iter()):
if not node.tag.endswith("sheet"):
continue
sheet_name = node.attrib.get("name") or f"Sheet {index + 1}"
relationship_id = next(
(value for key, value in node.attrib.items() if key.endswith("}id")),
None,
)
target = relationship_targets.get(relationship_id or "")
if target:
ordered_sheets.append((sheet_name, target))
if not ordered_sheets:
ordered_sheets = [
(f"Sheet {index + 1}", sheet_file)
for index, sheet_file in enumerate(sheet_files)
]
preview_sheets: list[tuple[str, list[list[str]]]] = []
for sheet_name, target in ordered_sheets:
if target not in archive.namelist():
continue
sheet_root = ElementTree.fromstring(archive.read(target))
rows: list[list[str]] = []
for row in sheet_root.iter():
if not row.tag.endswith("}row"):
continue
row_values: list[str] = []
for cell in row:
if not cell.tag.endswith("}c"):
continue
cell_type = cell.attrib.get("t")
value_node = next((item for item in cell if item.tag.endswith("}v")), None)
if cell_type == "inlineStr":
text_node = next((item for item in cell.iter() if item.tag.endswith("}t")), None)
row_values.append((text_node.text or "").strip() if text_node is not None else "")
continue
if value_node is None or value_node.text is None:
row_values.append("")
continue
raw_value = value_node.text.strip()
if cell_type == "s" and raw_value.isdigit():
index = int(raw_value)
row_values.append(shared_strings[index] if index < len(shared_strings) else raw_value)
else:
row_values.append(raw_value)
if row_values:
rows.append(row_values)
preview_sheets.append((sheet_name, rows))
return preview_sheets
except (BadZipFile, ElementTree.ParseError, KeyError, ValueError):
return []
def _extract_pptx_slides(file_path: Path) -> list[list[str]]:
try:
with ZipFile(file_path) as archive:
slide_names = sorted(
name
for name in archive.namelist()
if re.fullmatch(r"ppt/slides/slide\d+\.xml", name)
)
slides: list[list[str]] = []
for slide_name in slide_names:
root = ElementTree.fromstring(archive.read(slide_name))
texts = [node.text.strip() for node in root.iter() if node.tag.endswith("}t") and node.text]
slides.append(texts)
return slides
except (BadZipFile, ElementTree.ParseError, KeyError):
return []

View File

@@ -0,0 +1,112 @@
from __future__ import annotations
from datetime import UTC, datetime
from pathlib import Path
from uuid import uuid4
from app.services.knowledge_constants import (
ARCHIVE_EXTENSIONS,
EXCEL_EXTENSIONS,
FIXED_KNOWLEDGE_FOLDERS,
IMAGE_EXTENSIONS,
INLINE_PREVIEW_EXTENSIONS,
PPT_EXTENSIONS,
STRUCTURED_PREVIEW_EXTENSIONS,
TEXT_EXTENSIONS,
WORD_EXTENSIONS,
)
def normalize_filename(filename: str) -> str:
normalized = Path(str(filename or "").strip()).name.strip()
normalized = normalized.replace("/", "_").replace("\\", "_")
if not normalized:
raise ValueError("文件名不能为空。")
return normalized
def normalize_folder(folder: str) -> str:
normalized = str(folder or "").strip()
if normalized not in FIXED_KNOWLEDGE_FOLDERS:
raise ValueError("只能上传到预设知识库文件夹。")
return normalized
def extract_extension(filename: str) -> str:
suffix = Path(filename).suffix.lower().lstrip(".")
return suffix
def _build_onlyoffice_document_key(entry: dict[str, Any]) -> str:
version = int(entry.get("version_number", 1))
checksum = str(entry.get("sha256") or "")[:12]
return f"{entry['id']}-v{version}-{checksum or 'nochecksum'}"
def _build_onlyoffice_access_token(self, document_id: str) -> str:
onlyoffice_settings = resolve_onlyoffice_settings()
payload = {
"scope": "onlyoffice-content",
"document_id": document_id,
}
return jwt.encode(payload, onlyoffice_settings.jwt_secret, algorithm="HS256")
def _resolve_onlyoffice_document_type(extension: str) -> str:
if extension in WORD_EXTENSIONS:
return "word"
if extension in EXCEL_EXTENSIONS:
return "cell"
if extension in PPT_EXTENSIONS:
return "slide"
raise ValueError("当前文件格式不支持 ONLYOFFICE 预览。")
def parse_stored_name(stored_name: str) -> tuple[str, str]:
if "__" not in stored_name:
return uuid4().hex, stored_name
document_id, original_name = stored_name.split("__", 1)
return document_id or uuid4().hex, original_name or stored_name
def format_time(value: str | None) -> str:
if not value:
return ""
try:
parsed = datetime.fromisoformat(value)
except ValueError:
return value
return parsed.astimezone(UTC).strftime("%Y-%m-%d %H:%M")
def format_size(size_bytes: int) -> str:
if size_bytes < 1024:
return f"{size_bytes} B"
if size_bytes < 1024 * 1024:
return f"{size_bytes / 1024:.1f} KB"
return f"{size_bytes / (1024 * 1024):.1f} MB"
def resolve_file_type(extension: str) -> str:
if extension == "pdf":
return "pdf"
if extension in WORD_EXTENSIONS:
return "word"
if extension in EXCEL_EXTENSIONS:
return "excel"
if extension in PPT_EXTENSIONS:
return "ppt"
if extension in IMAGE_EXTENSIONS:
return "image"
if extension in TEXT_EXTENSIONS:
return "text"
if extension in ARCHIVE_EXTENSIONS:
return "archive"
return "binary"
def resolve_file_type_label(file_type: str) -> str:
mapping = {
"pdf": "PDF 预览",
"word": "Word 预览",
"excel": "Excel 预览",
"ppt": "PPT 预览",
"image": "图片预览",
"text": "文本预览",
"archive": "压缩包",
"binary": "文件预览",
}
return mapping.get(file_type, "文件预览")
def can_preview(extension: str) -> bool:
return extension in INLINE_PREVIEW_EXTENSIONS or extension in STRUCTURED_PREVIEW_EXTENSIONS

View File

@@ -0,0 +1,69 @@
from __future__ import annotations
from datetime import UTC, datetime
from typing import Any
from sqlalchemy import select
from sqlalchemy.orm import Session
from app.core.agent_enums import AgentRunStatus
from app.models.agent_run import AgentRun
from app.services.knowledge_constants import (
KNOWLEDGE_INGEST_STATUS_META,
KNOWLEDGE_INGEST_STATUS_PUBLISHED,
KNOWLEDGE_INGEST_SYNC_STALE_SECONDS,
)
def normalize_ingest_status_code(value: Any) -> int:
try:
status_code = int(value)
except (TypeError, ValueError):
return KNOWLEDGE_INGEST_STATUS_PUBLISHED
if status_code not in KNOWLEDGE_INGEST_STATUS_META:
return KNOWLEDGE_INGEST_STATUS_PUBLISHED
return status_code
def is_syncing_status_stale(entry: dict[str, Any]) -> bool:
raw_value = str(entry.get("ingest_status_updated_at") or "").strip()
if not raw_value:
return True
try:
updated_at = datetime.fromisoformat(raw_value)
except ValueError:
return True
if updated_at.tzinfo is None:
updated_at = updated_at.replace(tzinfo=UTC)
age_seconds = (datetime.now(UTC) - updated_at.astimezone(UTC)).total_seconds()
return age_seconds >= KNOWLEDGE_INGEST_SYNC_STALE_SECONDS
def should_preserve_syncing_status(entry: dict[str, Any], *, db: Session | None) -> bool:
agent_run_id = str(entry.get("ingest_agent_run_id") or "").strip()
if not agent_run_id or db is None:
return not is_syncing_status_stale(entry)
run = db.scalar(select(AgentRun).where(AgentRun.run_id == agent_run_id))
if run is None:
return not is_syncing_status_stale(entry)
if run.status != AgentRunStatus.RUNNING.value:
return False
heartbeat_at = str((run.route_json or {}).get("heartbeat_at") or "").strip()
if heartbeat_at:
probe_entry = {"ingest_status_updated_at": heartbeat_at}
return not is_syncing_status_stale(probe_entry)
return not is_syncing_status_stale(entry)
def resolve_linked_ingest_run_status(entry: dict[str, Any], *, db: Session | None) -> str:
agent_run_id = str(entry.get("ingest_agent_run_id") or "").strip()
if not agent_run_id or db is None:
return ""
run = db.scalar(select(AgentRun).where(AgentRun.run_id == agent_run_id))
if run is None:
return ""
return str(run.status or "").strip()

View File

@@ -0,0 +1,166 @@
from __future__ import annotations
from dataclasses import dataclass
from typing import Any
import jwt
from app.api.deps import CurrentUserContext
from app.core.config import get_settings
from app.core.logging import get_logger
from app.schemas.knowledge import KnowledgeOnlyOfficeConfigRead
from app.services.knowledge_constants import (
EXCEL_EXTENSIONS,
ONLYOFFICE_EDITABLE_EXTENSIONS,
PPT_EXTENSIONS,
WORD_EXTENSIONS,
)
from app.services.knowledge_file_utils import extract_extension
from app.services.settings import resolve_onlyoffice_settings
logger = get_logger("app.services.knowledge")
@dataclass(slots=True)
class OnlyOfficeCallbackPayload:
status: int
download_url: str
users: list[str]
def parse_onlyoffice_callback(payload: dict[str, Any]) -> OnlyOfficeCallbackPayload:
status = int(payload.get("status") or 0)
download_url = str(payload.get("url") or "").strip()
users = [str(item).strip() for item in payload.get("users") or [] if str(item).strip()]
return OnlyOfficeCallbackPayload(status=status, download_url=download_url, users=users)
def build_onlyoffice_document_key(entry: dict[str, Any]) -> str:
version = int(entry.get("version_number", 1))
checksum = str(entry.get("sha256") or "")[:12]
return f"{entry['id']}-v{version}-{checksum or 'nochecksum'}"
def build_onlyoffice_access_token(document_id: str) -> str:
onlyoffice_settings = resolve_onlyoffice_settings()
payload = {
"scope": "onlyoffice-content",
"document_id": document_id,
}
return jwt.encode(payload, onlyoffice_settings.jwt_secret, algorithm="HS256")
def build_onlyoffice_config(
*,
document_id: str,
entry: dict[str, Any],
current_user: CurrentUserContext,
) -> KnowledgeOnlyOfficeConfigRead:
settings = get_settings()
onlyoffice_settings = resolve_onlyoffice_settings()
if not onlyoffice_settings.enabled:
logger.warning(
"ONLYOFFICE disabled in runtime config doc=%s enabled=%s public_url=%s backend_url=%s jwt_set=%s",
document_id,
onlyoffice_settings.enabled,
onlyoffice_settings.public_url,
onlyoffice_settings.backend_url,
bool(onlyoffice_settings.jwt_secret),
)
raise ValueError("ONLYOFFICE 预览未启用。")
if not onlyoffice_settings.public_url or not onlyoffice_settings.backend_url:
logger.warning(
"ONLYOFFICE config incomplete doc=%s enabled=%s public_url=%s backend_url=%s jwt_set=%s",
document_id,
onlyoffice_settings.enabled,
onlyoffice_settings.public_url,
onlyoffice_settings.backend_url,
bool(onlyoffice_settings.jwt_secret),
)
raise ValueError("ONLYOFFICE 地址配置不完整。")
if not onlyoffice_settings.jwt_secret:
logger.warning(
"ONLYOFFICE JWT missing doc=%s enabled=%s public_url=%s backend_url=%s jwt_set=%s",
document_id,
onlyoffice_settings.enabled,
onlyoffice_settings.public_url,
onlyoffice_settings.backend_url,
bool(onlyoffice_settings.jwt_secret),
)
raise ValueError("ONLYOFFICE JWT 密钥未配置。")
extension = extract_extension(entry["original_name"])
if extension not in ONLYOFFICE_EDITABLE_EXTENSIONS:
raise ValueError("当前文件格式不支持 ONLYOFFICE 预览。")
backend_base_url = onlyoffice_settings.backend_url.rstrip("/")
public_url = onlyoffice_settings.public_url.rstrip("/")
access_token = build_onlyoffice_access_token(document_id)
document_url = (
f"{backend_base_url}{settings.api_v1_prefix}/knowledge/documents/{document_id}/onlyoffice/content"
f"?access_token={access_token}"
)
callback_url = (
f"{backend_base_url}{settings.api_v1_prefix}/knowledge/documents/{document_id}/onlyoffice/callback"
)
config: dict[str, Any] = {
"documentType": resolve_onlyoffice_document_type(extension),
"document": {
"fileType": extension,
"key": build_onlyoffice_document_key(entry),
"title": entry["original_name"],
"url": document_url,
"permissions": {
"download": True,
"edit": False,
"print": True,
"copy": True,
},
},
"editorConfig": {
"mode": "view",
"lang": "zh-CN",
"callbackUrl": callback_url,
"user": {
"id": current_user.username,
"name": current_user.name,
},
"customization": {
"compactHeader": True,
"compactToolbar": True,
"toolbarNoTabs": False,
"autosave": False,
"forcesave": False,
},
},
"width": "100%",
"height": "100%",
}
config["token"] = jwt.encode(config, onlyoffice_settings.jwt_secret, algorithm="HS256")
return KnowledgeOnlyOfficeConfigRead(documentServerUrl=public_url, config=config)
def validate_onlyoffice_access_token(document_id: str, access_token: str) -> None:
onlyoffice_settings = resolve_onlyoffice_settings()
try:
payload = jwt.decode(
access_token,
onlyoffice_settings.jwt_secret,
algorithms=["HS256"],
)
except jwt.PyJWTError as exc:
raise ValueError("ONLYOFFICE 文件访问令牌无效。") from exc
if payload.get("scope") != "onlyoffice-content" or payload.get("document_id") != document_id:
raise ValueError("ONLYOFFICE 文件访问令牌无效。")
def resolve_onlyoffice_document_type(extension: str) -> str:
if extension in WORD_EXTENSIONS:
return "word"
if extension in EXCEL_EXTENSIONS:
return "cell"
if extension in PPT_EXTENSIONS:
return "slide"
raise ValueError("当前文件格式不支持 ONLYOFFICE 预览。")

View File

@@ -0,0 +1,157 @@
from __future__ import annotations
from typing import Any
from app.schemas.knowledge import (
KnowledgePreviewBlockRead,
KnowledgePreviewPageRead,
KnowledgePreviewStatRead,
)
from app.services.knowledge_constants import IMAGE_EXTENSIONS, TEXT_EXTENSIONS
from app.services.knowledge_document_extractors import (
_extract_docx_text,
_extract_pptx_slides,
_extract_xlsx_sheets,
_read_text_preview,
)
from app.services.knowledge_file_utils import extract_extension, format_size
def build_preview(
entry: dict[str, Any],
*,
resolve_document_path,
) -> tuple[str, list[KnowledgePreviewPageRead]]:
extension = extract_extension(entry["original_name"])
file_path = resolve_document_path(entry)
if extension == "pdf":
return "pdf", []
if extension in IMAGE_EXTENSIONS:
return "image", []
if extension in TEXT_EXTENSIONS:
text = _read_text_preview(file_path)
return "text", [_build_text_preview_page(entry, text)]
if extension == "docx":
text = _extract_docx_text(file_path)
return "text", [_build_text_preview_page(entry, text)]
if extension == "xlsx":
return "table", _build_xlsx_preview_pages(entry, file_path)
if extension == "pptx":
return "slides", _build_pptx_preview_pages(entry, file_path)
return (
"unsupported",
[
KnowledgePreviewPageRead(
title=entry["original_name"],
subtitle="当前格式暂不支持在线解析预览。",
stats=[
KnowledgePreviewStatRead(label="文件格式", value=extension.upper() or "FILE"),
KnowledgePreviewStatRead(label="文件大小", value=format_size(entry["size_bytes"])),
KnowledgePreviewStatRead(label="建议操作", value="下载后查看"),
],
blocks=[
KnowledgePreviewBlockRead(
heading="预览说明",
lines=[
"当前系统已支持该文件的上传、下载和权限控制。",
"如需在线预览,可后续接入专门的文档转换服务。",
],
)
],
)
],
)
def _build_text_preview_page(
entry: dict[str, Any], text: str
) -> KnowledgePreviewPageRead:
lines = [line.strip() for line in text.splitlines() if line.strip()]
if not lines:
lines = ["文件内容为空,或当前文档未提取到可展示文本。"]
groups = [lines[index : index + 8] for index in range(0, min(len(lines), 24), 8)]
blocks = [
KnowledgePreviewBlockRead(heading=f"内容片段 {index + 1}", lines=group)
for index, group in enumerate(groups)
]
return KnowledgePreviewPageRead(
title=entry["original_name"],
subtitle="文本提取预览",
stats=[
KnowledgePreviewStatRead(label="文件格式", value=entry["extension"].upper() or "TEXT"),
KnowledgePreviewStatRead(label="可见行数", value=str(len(lines))),
KnowledgePreviewStatRead(label="文件大小", value=format_size(entry["size_bytes"])),
],
blocks=blocks,
)
def _build_xlsx_preview_pages(
entry: dict[str, Any], file_path
) -> list[KnowledgePreviewPageRead]:
sheets = self._extract_xlsx_sheets(file_path)
if not sheets:
sheets = [("Sheet 1", [["未提取到表格内容。"]])]
preview_pages: list[KnowledgePreviewPageRead] = []
sheet_count = len(sheets)
for sheet_name, rows in sheets[:8]:
visible_rows = rows[:12] if rows else [["未提取到表格内容。"]]
blocks = [
KnowledgePreviewBlockRead(
heading=f"{index + 1}",
lines=[" | ".join((cell or "") for cell in row)],
)
for index, row in enumerate(visible_rows)
]
preview_pages.append(
KnowledgePreviewPageRead(
title=sheet_name,
subtitle="表格内容预览",
stats=[
KnowledgePreviewStatRead(label="工作表数量", value=str(sheet_count)),
KnowledgePreviewStatRead(label="预览行数", value=str(len(visible_rows))),
KnowledgePreviewStatRead(label="文件大小", value=format_size(entry["size_bytes"])),
],
blocks=blocks,
)
)
return preview_pages
def _build_pptx_preview_pages(
entry: dict[str, Any], file_path
) -> list[KnowledgePreviewPageRead]:
slides = self._extract_pptx_slides(file_path)
if not slides:
slides = [["未提取到幻灯片文本。"]]
pages: list[KnowledgePreviewPageRead] = []
for index, slide_lines in enumerate(slides[:8]):
pages.append(
KnowledgePreviewPageRead(
title=entry["original_name"],
subtitle=f"幻灯片 {index + 1}",
stats=[
KnowledgePreviewStatRead(label="页码", value=str(index + 1)),
KnowledgePreviewStatRead(label="文本条数", value=str(len(slide_lines))),
KnowledgePreviewStatRead(label="文件格式", value="PPTX"),
],
blocks=[
KnowledgePreviewBlockRead(
heading="幻灯片内容",
lines=slide_lines or ["该页未提取到文本内容。"],
)
],
)
)
return pages

View File

@@ -1,27 +1,36 @@
from __future__ import annotations
import asyncio
import json
import os
import re
import socket
import threading
from dataclasses import dataclass
from datetime import UTC, datetime
from functools import partial
from http import HTTPStatus
from pathlib import Path
from time import perf_counter
from typing import Any
from urllib.error import HTTPError, URLError
from urllib.parse import quote
from urllib.request import Request, urlopen
from sqlalchemy.orm import Session
from app.core.config import get_settings
from app.core.logging import get_logger
from app.db.session import get_session_factory
from app.services.knowledge_rag_runtime import (
DEFAULT_EMBEDDING_TIMEOUT_SECONDS,
DEFAULT_LIGHTRAG_QUERY_MODE,
DEFAULT_LLM_TIMEOUT_SECONDS,
KnowledgeRagError,
RuntimeModelConfig,
_LightRagRuntime,
_build_ali_rerank_request,
_build_azure_deployment_base,
_build_headers,
_ensure_path,
_extract_chat_text,
_extract_embedding_vectors,
_extract_error_message,
_extract_rerank_results,
_normalize_endpoint,
_parse_json_body,
_send_json_request,
)
from app.services.settings import SettingsService
logger = get_logger("app.services.knowledge_rag")
@@ -29,9 +38,6 @@ logger = get_logger("app.services.knowledge_rag")
DEFAULT_QDRANT_URL = "http://127.0.0.1:6333"
CONTAINER_QDRANT_URL = "http://qdrant:6333"
DEFAULT_LIGHTRAG_WORKSPACE = "x_financial_knowledge"
DEFAULT_LIGHTRAG_QUERY_MODE = "naive"
DEFAULT_LLM_TIMEOUT_SECONDS = 180
DEFAULT_EMBEDDING_TIMEOUT_SECONDS = 120
MAX_KNOWLEDGE_HIT_CONTENT_LENGTH = 2200
MAX_KNOWLEDGE_HIT_EXCERPT_LENGTH = 220
MAX_QUERY_TERMS = 12
@@ -71,450 +77,12 @@ STRUCTURED_APPENDIX_LEADING_MARKERS = (
)
STRUCTURED_APPENDIX_LEADING_WINDOW = 220
_runtime_lock = threading.RLock()
_runtime_instance: _LightRagRuntime | None = None
_runtime_signature: tuple[Any, ...] | None = None
class KnowledgeRagError(RuntimeError):
pass
@dataclass(frozen=True, slots=True)
class RuntimeModelConfig:
slot: str
provider: str
model: str
endpoint: str
api_key: str
capability: str
class _LightRagRuntime:
def __init__(
self,
*,
working_dir: Path,
workspace: str,
qdrant_url: str,
qdrant_api_key: str,
primary_chat: RuntimeModelConfig,
backup_chat: RuntimeModelConfig | None,
embedding: RuntimeModelConfig,
reranker: RuntimeModelConfig | None,
) -> None:
self.working_dir = working_dir
self.workspace = workspace
self.qdrant_url = qdrant_url
self.qdrant_api_key = qdrant_api_key
self.primary_chat = primary_chat
self.backup_chat = backup_chat
self.embedding = embedding
self.reranker = reranker
self._rag = self._build_rag()
self._initialize()
self._graph_has_content_cache: bool | None = None
@property
def rag(self):
return self._rag
def _build_rag(self):
try:
from lightrag import LightRAG
from lightrag.utils import EmbeddingFunc
except ImportError as exc: # pragma: no cover - exercised in runtime env
raise KnowledgeRagError(
"LightRAG 依赖未安装,请先在 server 环境执行依赖安装。"
) from exc
self.working_dir.mkdir(parents=True, exist_ok=True)
if self.qdrant_url:
os.environ["QDRANT_URL"] = self.qdrant_url
if self.qdrant_api_key:
os.environ["QDRANT_API_KEY"] = self.qdrant_api_key
embedding_dim = self._probe_embedding_dimension(self.embedding)
logger.info(
"Initialize LightRAG runtime workspace=%s qdrant=%s embedding_model=%s dim=%s",
self.workspace,
self.qdrant_url,
self.embedding.model,
embedding_dim,
)
async def embedding_func(texts: list[str]) -> Any:
return await asyncio.to_thread(self._embed_sync, texts)
async def llm_model_func(
prompt: str,
system_prompt: str | None = None,
history_messages: list[dict[str, Any]] | None = None,
keyword_extraction: bool = False,
**kwargs: Any,
) -> str:
return await asyncio.to_thread(
self._complete_sync,
prompt,
system_prompt,
history_messages or [],
keyword_extraction,
kwargs,
)
async def rerank_model_func(
query: str,
documents: list[str],
top_n: int | None = None,
**_kwargs: Any,
) -> list[dict[str, Any]]:
return await asyncio.to_thread(
self._rerank_sync,
query,
documents,
top_n,
)
return LightRAG(
working_dir=str(self.working_dir),
workspace=self.workspace,
kv_storage="JsonKVStorage",
graph_storage="NetworkXStorage",
vector_storage="QdrantVectorDBStorage",
doc_status_storage="JsonDocStatusStorage",
llm_model_name=self.primary_chat.model,
llm_model_func=llm_model_func,
embedding_func=EmbeddingFunc(
embedding_dim=embedding_dim,
func=embedding_func,
max_token_size=8192,
model_name=self.embedding.model,
supports_asymmetric=False,
),
rerank_model_func=rerank_model_func if self.reranker is not None else None,
enable_llm_cache=False,
enable_llm_cache_for_entity_extract=False,
)
def _initialize(self) -> None:
from lightrag.utils import always_get_an_event_loop
loop = always_get_an_event_loop()
loop.run_until_complete(self._rag.initialize_storages())
def finalize(self) -> None:
from lightrag.utils import always_get_an_event_loop
loop = always_get_an_event_loop()
loop.run_until_complete(self._rag.finalize_storages())
def query_data(self, query: str, *, conversation_history: list[dict[str, str]] | None = None) -> dict[str, Any]:
from lightrag import QueryParam
configured_mode = os.environ.get("LIGHTRAG_QUERY_MODE", DEFAULT_LIGHTRAG_QUERY_MODE).strip() or DEFAULT_LIGHTRAG_QUERY_MODE
mode = "naive" if configured_mode != "naive" and not self._graph_has_content() else configured_mode
started_at = perf_counter()
param = QueryParam(
mode=mode,
top_k=8,
chunk_top_k=10,
only_need_context=True,
response_type="Multiple Paragraphs",
conversation_history=conversation_history or [],
include_references=True,
)
try:
result = self._rag.query_data(query, param)
logger.info("LightRAG query completed mode=%s elapsed=%.2fs", mode, perf_counter() - started_at)
return result
except Exception:
if mode == "naive":
raise
logger.warning("LightRAG query mode=%s failed, retry with naive mode", mode)
fallback_param = QueryParam(
mode="naive",
top_k=8,
chunk_top_k=10,
only_need_context=True,
response_type="Multiple Paragraphs",
conversation_history=conversation_history or [],
include_references=True,
)
result = self._rag.query_data(query, fallback_param)
logger.info("LightRAG query completed mode=naive elapsed=%.2fs", perf_counter() - started_at)
return result
def _graph_has_content(self) -> bool:
if self._graph_has_content_cache is not None:
return self._graph_has_content_cache
graph_path = self.working_dir / self.workspace / "graph_chunk_entity_relation.graphml"
try:
graph_text = graph_path.read_text(encoding="utf-8")
except OSError:
self._graph_has_content_cache = False
return False
self._graph_has_content_cache = "<node " in graph_text or "<edge " in graph_text
return self._graph_has_content_cache
def insert_documents(
self,
*,
texts: list[str],
document_ids: list[str],
file_paths: list[str],
) -> str:
return self._rag.insert(texts, ids=document_ids, file_paths=file_paths)
def get_document_statuses(self, document_ids: list[str]) -> dict[str, Any]:
from lightrag.utils import always_get_an_event_loop
loop = always_get_an_event_loop()
return loop.run_until_complete(self._rag.aget_docs_by_ids(document_ids))
def delete_document(self, document_id: str) -> None:
from lightrag.utils import always_get_an_event_loop
loop = always_get_an_event_loop()
result = loop.run_until_complete(self._rag.adelete_by_doc_id(document_id))
status = str(getattr(result, "status", "") or "")
if status not in {"success", "not_found"}:
raise KnowledgeRagError(str(getattr(result, "message", "") or "LightRAG 删除文档失败。"))
def _probe_embedding_dimension(self, config: RuntimeModelConfig) -> int:
vectors = self._request_embeddings(config, ["dimension probe"])
if not vectors or not isinstance(vectors[0], list):
raise KnowledgeRagError("无法从 embedding 模型返回结果中解析向量维度。")
dimension = len(vectors[0])
if dimension <= 0:
raise KnowledgeRagError("embedding 模型返回了无效的向量维度。")
return dimension
def _embed_sync(self, texts: list[str]) -> Any:
import numpy as np
vectors = self._request_embeddings(self.embedding, texts)
return np.array(vectors, dtype=float)
def _rerank_sync(
self,
query: str,
documents: list[str],
top_n: int | None,
) -> list[dict[str, Any]]:
if self.reranker is None:
return []
status_code, body = self._request_rerank(
self.reranker,
query=query,
documents=documents,
top_n=top_n,
)
if status_code >= HTTPStatus.BAD_REQUEST:
raise KnowledgeRagError(f"reranker 模型返回异常状态码 {status_code}")
return _extract_rerank_results(body, provider=self.reranker.provider)
def _complete_sync(
self,
prompt: str,
system_prompt: str | None,
history_messages: list[dict[str, Any]],
keyword_extraction: bool,
kwargs: dict[str, Any],
) -> str:
del keyword_extraction
last_error: Exception | None = None
for config in [self.primary_chat, self.backup_chat]:
if config is None:
continue
try:
return self._request_chat_completion(
config,
prompt=prompt,
system_prompt=system_prompt,
history_messages=history_messages,
max_tokens=int(kwargs.get("max_tokens") or 1200),
temperature=float(kwargs.get("temperature") or 0.1),
)
except Exception as exc: # pragma: no cover - runtime fallback
last_error = exc
logger.warning(
"LightRAG LLM request failed slot=%s provider=%s model=%s: %s",
config.slot,
config.provider,
config.model,
exc,
)
continue
raise KnowledgeRagError(f"LightRAG 调用知识模型失败:{last_error or '没有可用模型配置'}")
def _request_chat_completion(
self,
config: RuntimeModelConfig,
*,
prompt: str,
system_prompt: str | None,
history_messages: list[dict[str, Any]],
max_tokens: int,
temperature: float,
) -> str:
messages: list[dict[str, Any]] = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.extend(history_messages)
messages.append({"role": "user", "content": prompt})
if config.provider == "Azure OpenAI":
url = f"{_build_azure_deployment_base(config.endpoint, config.model)}/chat/completions?api-version={AZURE_API_VERSION}"
payload = {
"messages": messages,
"max_tokens": max_tokens,
"temperature": temperature,
}
status_code, body = _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=False, use_api_key=True),
payload=payload,
timeout_seconds=DEFAULT_LLM_TIMEOUT_SECONDS,
)
elif config.provider == "Ollama":
url = _ensure_path(_normalize_endpoint(config.endpoint), "api/chat")
payload = {
"model": config.model,
"messages": messages,
"stream": False,
"options": {
"num_predict": max_tokens,
"temperature": temperature,
},
}
status_code, body = _send_json_request(
"POST",
url,
headers={"Content-Type": "application/json", "Accept": "application/json"},
payload=payload,
timeout_seconds=DEFAULT_LLM_TIMEOUT_SECONDS,
)
else:
url = _ensure_path(_normalize_endpoint(config.endpoint), "chat/completions")
payload = {
"model": config.model,
"messages": messages,
"max_tokens": max_tokens,
"temperature": temperature,
}
status_code, body = _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=True),
payload=payload,
timeout_seconds=DEFAULT_LLM_TIMEOUT_SECONDS,
)
if status_code >= HTTPStatus.BAD_REQUEST:
raise KnowledgeRagError(f"知识模型返回异常状态码 {status_code}")
return _extract_chat_text(body, provider=config.provider)
def _request_embeddings(self, config: RuntimeModelConfig, texts: list[str]) -> list[list[float]]:
if config.provider == "Azure OpenAI":
url = f"{_build_azure_deployment_base(config.endpoint, config.model)}/embeddings?api-version={AZURE_API_VERSION}"
payload = {"input": texts}
status_code, body = _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=False, use_api_key=True),
payload=payload,
timeout_seconds=DEFAULT_EMBEDDING_TIMEOUT_SECONDS,
)
elif config.provider == "Ollama":
url = _ensure_path(_normalize_endpoint(config.endpoint), "api/embed")
payload = {"model": config.model, "input": texts}
status_code, body = _send_json_request(
"POST",
url,
headers={"Content-Type": "application/json", "Accept": "application/json"},
payload=payload,
timeout_seconds=DEFAULT_EMBEDDING_TIMEOUT_SECONDS,
)
else:
url = _ensure_path(_normalize_endpoint(config.endpoint), "embeddings")
payload = {"model": config.model, "input": texts}
status_code, body = _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=True),
payload=payload,
timeout_seconds=DEFAULT_EMBEDDING_TIMEOUT_SECONDS,
)
if status_code >= HTTPStatus.BAD_REQUEST:
raise KnowledgeRagError(f"embedding 模型返回异常状态码 {status_code}")
return _extract_embedding_vectors(body, provider=config.provider)
def _request_rerank(
self,
config: RuntimeModelConfig,
*,
query: str,
documents: list[str],
top_n: int | None,
) -> tuple[int, Any]:
if config.provider == "Azure OpenAI":
url = f"{_build_azure_deployment_base(config.endpoint, config.model)}/rerank?api-version={AZURE_API_VERSION}"
payload: dict[str, Any] = {
"query": query,
"documents": documents,
}
if top_n is not None:
payload["top_n"] = top_n
return _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=False, use_api_key=True),
payload=payload,
timeout_seconds=DEFAULT_LLM_TIMEOUT_SECONDS,
)
if config.provider == "Ali":
url, payload = _build_ali_rerank_request(
config.model,
query=query,
documents=documents,
top_n=top_n,
)
return _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=True),
payload=payload,
timeout_seconds=DEFAULT_LLM_TIMEOUT_SECONDS,
)
url = _ensure_path(_normalize_endpoint(config.endpoint), "rerank")
payload = {
"model": config.model,
"query": query,
"documents": documents,
}
if top_n is not None:
payload["top_n"] = top_n
return _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=True),
payload=payload,
timeout_seconds=DEFAULT_LLM_TIMEOUT_SECONDS,
)
class KnowledgeRagService:
def __init__(self, db: Session | None = None, storage_root: Path | None = None) -> None:
self.db = db
@@ -955,219 +523,6 @@ def shutdown_knowledge_rag_runtime() -> None:
_runtime_signature = None
def _normalize_endpoint(endpoint: str) -> str:
normalized = str(endpoint or "").strip()
if not normalized:
raise KnowledgeRagError("模型 endpoint 不能为空。")
return normalized.rstrip("/")
def _ensure_path(endpoint: str, suffix: str) -> str:
suffix = suffix.lstrip("/")
if endpoint.endswith(suffix):
return endpoint
return f"{endpoint}/{suffix}"
def _build_azure_deployment_base(endpoint: str, model: str) -> str:
normalized_endpoint = _normalize_endpoint(endpoint)
quoted_model = quote(model, safe="")
if "/openai/deployments/" in normalized_endpoint:
return normalized_endpoint
if "/openai/v1" in normalized_endpoint:
resource_root = normalized_endpoint.split("/openai/v1", maxsplit=1)[0]
return f"{resource_root}/openai/deployments/{quoted_model}"
if normalized_endpoint.endswith("/openai"):
return f"{normalized_endpoint}/deployments/{quoted_model}"
return f"{normalized_endpoint}/openai/deployments/{quoted_model}"
def _build_headers(
api_key: str,
*,
use_bearer: bool,
use_api_key: bool = False,
) -> dict[str, str]:
headers = {
"Content-Type": "application/json",
"Accept": "application/json",
}
normalized_key = str(api_key or "").strip()
if normalized_key:
if use_api_key:
headers["api-key"] = normalized_key
elif use_bearer:
headers["Authorization"] = f"Bearer {normalized_key}"
return headers
def _send_json_request(
method: str,
url: str,
*,
headers: dict[str, str],
payload: dict[str, Any],
timeout_seconds: int,
) -> tuple[int, Any]:
data = json.dumps(payload).encode("utf-8")
request = Request(url=url, data=data, headers=headers, method=method)
try:
with urlopen(request, timeout=timeout_seconds) as response: # noqa: S310
body = response.read().decode("utf-8") if response.length != 0 else ""
return response.status, _parse_json_body(body)
except HTTPError as exc: # pragma: no cover - runtime path
body = exc.read().decode("utf-8", errors="ignore")
detail = _extract_error_message(_parse_json_body(body)) or f"接口返回 {exc.code}"
raise KnowledgeRagError(detail) from exc
except URLError as exc: # pragma: no cover - runtime path
raise KnowledgeRagError(f"无法连接模型接口:{getattr(exc, 'reason', exc)}") from exc
except TimeoutError as exc: # pragma: no cover - runtime path
raise KnowledgeRagError("模型接口调用超时。") from exc
def _parse_json_body(body: str) -> Any:
if not body:
return None
try:
return json.loads(body)
except json.JSONDecodeError:
return {"message": body}
def _extract_error_message(payload: Any) -> str | None:
if payload is None:
return None
if isinstance(payload, dict):
if isinstance(payload.get("detail"), str):
return payload["detail"]
if isinstance(payload.get("message"), str):
return payload["message"]
error_payload = payload.get("error")
if isinstance(error_payload, dict) and isinstance(error_payload.get("message"), str):
return error_payload["message"]
if isinstance(payload, str):
return payload
return None
def _extract_chat_text(payload: Any, *, provider: str) -> str:
if provider == "Ollama":
message = payload.get("message") if isinstance(payload, dict) else None
if isinstance(message, dict):
return str(message.get("content") or "").strip()
return ""
if not isinstance(payload, dict):
return ""
choices = payload.get("choices")
if not isinstance(choices, list) or not choices:
return ""
first_choice = choices[0]
if not isinstance(first_choice, dict):
return ""
message = first_choice.get("message")
if isinstance(message, dict):
content = message.get("content")
if isinstance(content, str):
return content.strip()
if isinstance(content, list):
parts: list[str] = []
for item in content:
if isinstance(item, dict) and item.get("type") == "text":
parts.append(str(item.get("text") or "").strip())
return "\n".join(part for part in parts if part).strip()
text = first_choice.get("text")
if isinstance(text, str):
return text.strip()
return ""
def _extract_embedding_vectors(payload: Any, *, provider: str) -> list[list[float]]:
if provider == "Ollama":
embeddings = payload.get("embeddings") if isinstance(payload, dict) else None
if isinstance(embeddings, list):
return [[float(value) for value in item] for item in embeddings if isinstance(item, list)]
embedding = payload.get("embedding") if isinstance(payload, dict) else None
if isinstance(embedding, list):
return [[float(value) for value in embedding]]
raise KnowledgeRagError("Ollama embedding 返回格式无法识别。")
if not isinstance(payload, dict):
raise KnowledgeRagError("embedding 接口返回格式无效。")
data = payload.get("data")
if not isinstance(data, list) or not data:
raise KnowledgeRagError("embedding 接口没有返回 data。")
vectors: list[list[float]] = []
for item in data:
if not isinstance(item, dict):
continue
embedding = item.get("embedding")
if isinstance(embedding, list):
vectors.append([float(value) for value in embedding])
if not vectors:
raise KnowledgeRagError("embedding 接口返回中未找到向量数据。")
return vectors
def _build_ali_rerank_request(
model: str,
*,
query: str,
documents: list[str],
top_n: int | None,
) -> tuple[str, dict[str, Any]]:
normalized_model = str(model or "").strip()
if normalized_model == "qwen3-rerank":
payload: dict[str, Any] = {
"model": normalized_model,
"query": query,
"documents": documents,
}
if top_n is not None:
payload["top_n"] = top_n
return "https://dashscope.aliyuncs.com/compatible-api/v1/reranks", payload
payload = {
"model": normalized_model,
"input": {
"query": query,
"documents": documents,
},
"parameters": {
"return_documents": False,
},
}
if top_n is not None:
payload["parameters"]["top_n"] = top_n
return "https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank", payload
def _extract_rerank_results(payload: Any, *, provider: str) -> list[dict[str, Any]]:
if not isinstance(payload, dict):
return []
if provider == "Ali" and isinstance(payload.get("output"), dict):
results = payload["output"].get("results")
else:
results = payload.get("results")
if not isinstance(results, list):
return []
normalized: list[dict[str, Any]] = []
for item in results:
if not isinstance(item, dict):
continue
try:
normalized.append(
{
"index": int(item["index"]),
"relevance_score": float(item["relevance_score"]),
}
)
except (KeyError, TypeError, ValueError):
continue
return normalized
def _parse_document_identity(file_path: str) -> tuple[str, str]:
path = Path(str(file_path or "").strip())
name = path.name

View File

@@ -0,0 +1,672 @@
from __future__ import annotations
import asyncio
import json
import os
from dataclasses import dataclass
from http import HTTPStatus
from pathlib import Path
from time import perf_counter
from typing import Any
from urllib.error import HTTPError, URLError
from urllib.parse import quote
from urllib.request import Request, urlopen
from app.core.logging import get_logger
from app.services.model_connectivity import AZURE_API_VERSION
logger = get_logger("app.services.knowledge_rag")
DEFAULT_LIGHTRAG_QUERY_MODE = "naive"
DEFAULT_LLM_TIMEOUT_SECONDS = 180
DEFAULT_EMBEDDING_TIMEOUT_SECONDS = 120
class KnowledgeRagError(RuntimeError):
pass
@dataclass(frozen=True, slots=True)
class RuntimeModelConfig:
slot: str
provider: str
model: str
endpoint: str
api_key: str
capability: str
class _LightRagRuntime:
def __init__(
self,
*,
working_dir: Path,
workspace: str,
qdrant_url: str,
qdrant_api_key: str,
primary_chat: RuntimeModelConfig,
backup_chat: RuntimeModelConfig | None,
embedding: RuntimeModelConfig,
reranker: RuntimeModelConfig | None,
) -> None:
self.working_dir = working_dir
self.workspace = workspace
self.qdrant_url = qdrant_url
self.qdrant_api_key = qdrant_api_key
self.primary_chat = primary_chat
self.backup_chat = backup_chat
self.embedding = embedding
self.reranker = reranker
self._rag = self._build_rag()
self._initialize()
self._graph_has_content_cache: bool | None = None
@property
def rag(self):
return self._rag
def _build_rag(self):
try:
from lightrag import LightRAG
from lightrag.utils import EmbeddingFunc
except ImportError as exc: # pragma: no cover - exercised in runtime env
raise KnowledgeRagError(
"LightRAG 依赖未安装,请先在 server 环境执行依赖安装。"
) from exc
self.working_dir.mkdir(parents=True, exist_ok=True)
if self.qdrant_url:
os.environ["QDRANT_URL"] = self.qdrant_url
if self.qdrant_api_key:
os.environ["QDRANT_API_KEY"] = self.qdrant_api_key
embedding_dim = self._probe_embedding_dimension(self.embedding)
logger.info(
"Initialize LightRAG runtime workspace=%s qdrant=%s embedding_model=%s dim=%s",
self.workspace,
self.qdrant_url,
self.embedding.model,
embedding_dim,
)
async def embedding_func(texts: list[str]) -> Any:
return await asyncio.to_thread(self._embed_sync, texts)
async def llm_model_func(
prompt: str,
system_prompt: str | None = None,
history_messages: list[dict[str, Any]] | None = None,
keyword_extraction: bool = False,
**kwargs: Any,
) -> str:
return await asyncio.to_thread(
self._complete_sync,
prompt,
system_prompt,
history_messages or [],
keyword_extraction,
kwargs,
)
async def rerank_model_func(
query: str,
documents: list[str],
top_n: int | None = None,
**_kwargs: Any,
) -> list[dict[str, Any]]:
return await asyncio.to_thread(
self._rerank_sync,
query,
documents,
top_n,
)
return LightRAG(
working_dir=str(self.working_dir),
workspace=self.workspace,
kv_storage="JsonKVStorage",
graph_storage="NetworkXStorage",
vector_storage="QdrantVectorDBStorage",
doc_status_storage="JsonDocStatusStorage",
llm_model_name=self.primary_chat.model,
llm_model_func=llm_model_func,
embedding_func=EmbeddingFunc(
embedding_dim=embedding_dim,
func=embedding_func,
max_token_size=8192,
model_name=self.embedding.model,
supports_asymmetric=False,
),
rerank_model_func=rerank_model_func if self.reranker is not None else None,
enable_llm_cache=False,
enable_llm_cache_for_entity_extract=False,
)
def _initialize(self) -> None:
from lightrag.utils import always_get_an_event_loop
loop = always_get_an_event_loop()
loop.run_until_complete(self._rag.initialize_storages())
def finalize(self) -> None:
from lightrag.utils import always_get_an_event_loop
loop = always_get_an_event_loop()
loop.run_until_complete(self._rag.finalize_storages())
def query_data(self, query: str, *, conversation_history: list[dict[str, str]] | None = None) -> dict[str, Any]:
from lightrag import QueryParam
configured_mode = os.environ.get("LIGHTRAG_QUERY_MODE", DEFAULT_LIGHTRAG_QUERY_MODE).strip() or DEFAULT_LIGHTRAG_QUERY_MODE
mode = "naive" if configured_mode != "naive" and not self._graph_has_content() else configured_mode
started_at = perf_counter()
param = QueryParam(
mode=mode,
top_k=8,
chunk_top_k=10,
only_need_context=True,
response_type="Multiple Paragraphs",
conversation_history=conversation_history or [],
include_references=True,
)
try:
result = self._rag.query_data(query, param)
logger.info("LightRAG query completed mode=%s elapsed=%.2fs", mode, perf_counter() - started_at)
return result
except Exception:
if mode == "naive":
raise
logger.warning("LightRAG query mode=%s failed, retry with naive mode", mode)
fallback_param = QueryParam(
mode="naive",
top_k=8,
chunk_top_k=10,
only_need_context=True,
response_type="Multiple Paragraphs",
conversation_history=conversation_history or [],
include_references=True,
)
result = self._rag.query_data(query, fallback_param)
logger.info("LightRAG query completed mode=naive elapsed=%.2fs", perf_counter() - started_at)
return result
def _graph_has_content(self) -> bool:
if self._graph_has_content_cache is not None:
return self._graph_has_content_cache
graph_path = self.working_dir / self.workspace / "graph_chunk_entity_relation.graphml"
try:
graph_text = graph_path.read_text(encoding="utf-8")
except OSError:
self._graph_has_content_cache = False
return False
self._graph_has_content_cache = "<node " in graph_text or "<edge " in graph_text
return self._graph_has_content_cache
def insert_documents(
self,
*,
texts: list[str],
document_ids: list[str],
file_paths: list[str],
) -> str:
return self._rag.insert(texts, ids=document_ids, file_paths=file_paths)
def get_document_statuses(self, document_ids: list[str]) -> dict[str, Any]:
from lightrag.utils import always_get_an_event_loop
loop = always_get_an_event_loop()
return loop.run_until_complete(self._rag.aget_docs_by_ids(document_ids))
def delete_document(self, document_id: str) -> None:
from lightrag.utils import always_get_an_event_loop
loop = always_get_an_event_loop()
result = loop.run_until_complete(self._rag.adelete_by_doc_id(document_id))
status = str(getattr(result, "status", "") or "")
if status not in {"success", "not_found"}:
raise KnowledgeRagError(str(getattr(result, "message", "") or "LightRAG 删除文档失败。"))
def _probe_embedding_dimension(self, config: RuntimeModelConfig) -> int:
vectors = self._request_embeddings(config, ["dimension probe"])
if not vectors or not isinstance(vectors[0], list):
raise KnowledgeRagError("无法从 embedding 模型返回结果中解析向量维度。")
dimension = len(vectors[0])
if dimension <= 0:
raise KnowledgeRagError("embedding 模型返回了无效的向量维度。")
return dimension
def _embed_sync(self, texts: list[str]) -> Any:
import numpy as np
vectors = self._request_embeddings(self.embedding, texts)
return np.array(vectors, dtype=float)
def _rerank_sync(
self,
query: str,
documents: list[str],
top_n: int | None,
) -> list[dict[str, Any]]:
if self.reranker is None:
return []
status_code, body = self._request_rerank(
self.reranker,
query=query,
documents=documents,
top_n=top_n,
)
if status_code >= HTTPStatus.BAD_REQUEST:
raise KnowledgeRagError(f"reranker 模型返回异常状态码 {status_code}")
return _extract_rerank_results(body, provider=self.reranker.provider)
def _complete_sync(
self,
prompt: str,
system_prompt: str | None,
history_messages: list[dict[str, Any]],
keyword_extraction: bool,
kwargs: dict[str, Any],
) -> str:
del keyword_extraction
last_error: Exception | None = None
for config in [self.primary_chat, self.backup_chat]:
if config is None:
continue
try:
return self._request_chat_completion(
config,
prompt=prompt,
system_prompt=system_prompt,
history_messages=history_messages,
max_tokens=int(kwargs.get("max_tokens") or 1200),
temperature=float(kwargs.get("temperature") or 0.1),
)
except Exception as exc: # pragma: no cover - runtime fallback
last_error = exc
logger.warning(
"LightRAG LLM request failed slot=%s provider=%s model=%s: %s",
config.slot,
config.provider,
config.model,
exc,
)
continue
raise KnowledgeRagError(f"LightRAG 调用知识模型失败:{last_error or '没有可用模型配置'}")
def _request_chat_completion(
self,
config: RuntimeModelConfig,
*,
prompt: str,
system_prompt: str | None,
history_messages: list[dict[str, Any]],
max_tokens: int,
temperature: float,
) -> str:
messages: list[dict[str, Any]] = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.extend(history_messages)
messages.append({"role": "user", "content": prompt})
if config.provider == "Azure OpenAI":
url = f"{_build_azure_deployment_base(config.endpoint, config.model)}/chat/completions?api-version={AZURE_API_VERSION}"
payload = {
"messages": messages,
"max_tokens": max_tokens,
"temperature": temperature,
}
status_code, body = _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=False, use_api_key=True),
payload=payload,
timeout_seconds=DEFAULT_LLM_TIMEOUT_SECONDS,
)
elif config.provider == "Ollama":
url = _ensure_path(_normalize_endpoint(config.endpoint), "api/chat")
payload = {
"model": config.model,
"messages": messages,
"stream": False,
"options": {
"num_predict": max_tokens,
"temperature": temperature,
},
}
status_code, body = _send_json_request(
"POST",
url,
headers={"Content-Type": "application/json", "Accept": "application/json"},
payload=payload,
timeout_seconds=DEFAULT_LLM_TIMEOUT_SECONDS,
)
else:
url = _ensure_path(_normalize_endpoint(config.endpoint), "chat/completions")
payload = {
"model": config.model,
"messages": messages,
"max_tokens": max_tokens,
"temperature": temperature,
}
status_code, body = _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=True),
payload=payload,
timeout_seconds=DEFAULT_LLM_TIMEOUT_SECONDS,
)
if status_code >= HTTPStatus.BAD_REQUEST:
raise KnowledgeRagError(f"知识模型返回异常状态码 {status_code}")
return _extract_chat_text(body, provider=config.provider)
def _request_embeddings(self, config: RuntimeModelConfig, texts: list[str]) -> list[list[float]]:
if config.provider == "Azure OpenAI":
url = f"{_build_azure_deployment_base(config.endpoint, config.model)}/embeddings?api-version={AZURE_API_VERSION}"
payload = {"input": texts}
status_code, body = _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=False, use_api_key=True),
payload=payload,
timeout_seconds=DEFAULT_EMBEDDING_TIMEOUT_SECONDS,
)
elif config.provider == "Ollama":
url = _ensure_path(_normalize_endpoint(config.endpoint), "api/embed")
payload = {"model": config.model, "input": texts}
status_code, body = _send_json_request(
"POST",
url,
headers={"Content-Type": "application/json", "Accept": "application/json"},
payload=payload,
timeout_seconds=DEFAULT_EMBEDDING_TIMEOUT_SECONDS,
)
else:
url = _ensure_path(_normalize_endpoint(config.endpoint), "embeddings")
payload = {"model": config.model, "input": texts}
status_code, body = _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=True),
payload=payload,
timeout_seconds=DEFAULT_EMBEDDING_TIMEOUT_SECONDS,
)
if status_code >= HTTPStatus.BAD_REQUEST:
raise KnowledgeRagError(f"embedding 模型返回异常状态码 {status_code}")
return _extract_embedding_vectors(body, provider=config.provider)
def _request_rerank(
self,
config: RuntimeModelConfig,
*,
query: str,
documents: list[str],
top_n: int | None,
) -> tuple[int, Any]:
if config.provider == "Azure OpenAI":
url = f"{_build_azure_deployment_base(config.endpoint, config.model)}/rerank?api-version={AZURE_API_VERSION}"
payload: dict[str, Any] = {
"query": query,
"documents": documents,
}
if top_n is not None:
payload["top_n"] = top_n
return _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=False, use_api_key=True),
payload=payload,
timeout_seconds=DEFAULT_LLM_TIMEOUT_SECONDS,
)
if config.provider == "Ali":
url, payload = _build_ali_rerank_request(
config.model,
query=query,
documents=documents,
top_n=top_n,
)
return _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=True),
payload=payload,
timeout_seconds=DEFAULT_LLM_TIMEOUT_SECONDS,
)
url = _ensure_path(_normalize_endpoint(config.endpoint), "rerank")
payload = {
"model": config.model,
"query": query,
"documents": documents,
}
if top_n is not None:
payload["top_n"] = top_n
return _send_json_request(
"POST",
url,
headers=_build_headers(config.api_key, use_bearer=True),
payload=payload,
timeout_seconds=DEFAULT_LLM_TIMEOUT_SECONDS,
)
def _normalize_endpoint(endpoint: str) -> str:
normalized = str(endpoint or "").strip()
if not normalized:
raise KnowledgeRagError("模型 endpoint 不能为空。")
return normalized.rstrip("/")
def _ensure_path(endpoint: str, suffix: str) -> str:
suffix = suffix.lstrip("/")
if endpoint.endswith(suffix):
return endpoint
return f"{endpoint}/{suffix}"
def _build_azure_deployment_base(endpoint: str, model: str) -> str:
normalized_endpoint = _normalize_endpoint(endpoint)
quoted_model = quote(model, safe="")
if "/openai/deployments/" in normalized_endpoint:
return normalized_endpoint
if "/openai/v1" in normalized_endpoint:
resource_root = normalized_endpoint.split("/openai/v1", maxsplit=1)[0]
return f"{resource_root}/openai/deployments/{quoted_model}"
if normalized_endpoint.endswith("/openai"):
return f"{normalized_endpoint}/deployments/{quoted_model}"
return f"{normalized_endpoint}/openai/deployments/{quoted_model}"
def _build_headers(
api_key: str,
*,
use_bearer: bool,
use_api_key: bool = False,
) -> dict[str, str]:
headers = {
"Content-Type": "application/json",
"Accept": "application/json",
}
normalized_key = str(api_key or "").strip()
if normalized_key:
if use_api_key:
headers["api-key"] = normalized_key
elif use_bearer:
headers["Authorization"] = f"Bearer {normalized_key}"
return headers
def _send_json_request(
method: str,
url: str,
*,
headers: dict[str, str],
payload: dict[str, Any],
timeout_seconds: int,
) -> tuple[int, Any]:
data = json.dumps(payload).encode("utf-8")
request = Request(url=url, data=data, headers=headers, method=method)
try:
with urlopen(request, timeout=timeout_seconds) as response: # noqa: S310
body = response.read().decode("utf-8") if response.length != 0 else ""
return response.status, _parse_json_body(body)
except HTTPError as exc: # pragma: no cover - runtime path
body = exc.read().decode("utf-8", errors="ignore")
detail = _extract_error_message(_parse_json_body(body)) or f"接口返回 {exc.code}"
raise KnowledgeRagError(detail) from exc
except URLError as exc: # pragma: no cover - runtime path
raise KnowledgeRagError(f"无法连接模型接口:{getattr(exc, 'reason', exc)}") from exc
except TimeoutError as exc: # pragma: no cover - runtime path
raise KnowledgeRagError("模型接口调用超时。") from exc
def _parse_json_body(body: str) -> Any:
if not body:
return None
try:
return json.loads(body)
except json.JSONDecodeError:
return {"message": body}
def _extract_error_message(payload: Any) -> str | None:
if payload is None:
return None
if isinstance(payload, dict):
if isinstance(payload.get("detail"), str):
return payload["detail"]
if isinstance(payload.get("message"), str):
return payload["message"]
error_payload = payload.get("error")
if isinstance(error_payload, dict) and isinstance(error_payload.get("message"), str):
return error_payload["message"]
if isinstance(payload, str):
return payload
return None
def _extract_chat_text(payload: Any, *, provider: str) -> str:
if provider == "Ollama":
message = payload.get("message") if isinstance(payload, dict) else None
if isinstance(message, dict):
return str(message.get("content") or "").strip()
return ""
if not isinstance(payload, dict):
return ""
choices = payload.get("choices")
if not isinstance(choices, list) or not choices:
return ""
first_choice = choices[0]
if not isinstance(first_choice, dict):
return ""
message = first_choice.get("message")
if isinstance(message, dict):
content = message.get("content")
if isinstance(content, str):
return content.strip()
if isinstance(content, list):
parts: list[str] = []
for item in content:
if isinstance(item, dict) and item.get("type") == "text":
parts.append(str(item.get("text") or "").strip())
return "\n".join(part for part in parts if part).strip()
text = first_choice.get("text")
if isinstance(text, str):
return text.strip()
return ""
def _extract_embedding_vectors(payload: Any, *, provider: str) -> list[list[float]]:
if provider == "Ollama":
embeddings = payload.get("embeddings") if isinstance(payload, dict) else None
if isinstance(embeddings, list):
return [[float(value) for value in item] for item in embeddings if isinstance(item, list)]
embedding = payload.get("embedding") if isinstance(payload, dict) else None
if isinstance(embedding, list):
return [[float(value) for value in embedding]]
raise KnowledgeRagError("Ollama embedding 返回格式无法识别。")
if not isinstance(payload, dict):
raise KnowledgeRagError("embedding 接口返回格式无效。")
data = payload.get("data")
if not isinstance(data, list) or not data:
raise KnowledgeRagError("embedding 接口没有返回 data。")
vectors: list[list[float]] = []
for item in data:
if not isinstance(item, dict):
continue
embedding = item.get("embedding")
if isinstance(embedding, list):
vectors.append([float(value) for value in embedding])
if not vectors:
raise KnowledgeRagError("embedding 接口返回中未找到向量数据。")
return vectors
def _build_ali_rerank_request(
model: str,
*,
query: str,
documents: list[str],
top_n: int | None,
) -> tuple[str, dict[str, Any]]:
normalized_model = str(model or "").strip()
if normalized_model == "qwen3-rerank":
payload: dict[str, Any] = {
"model": normalized_model,
"query": query,
"documents": documents,
}
if top_n is not None:
payload["top_n"] = top_n
return "https://dashscope.aliyuncs.com/compatible-api/v1/reranks", payload
payload = {
"model": normalized_model,
"input": {
"query": query,
"documents": documents,
},
"parameters": {
"return_documents": False,
},
}
if top_n is not None:
payload["parameters"]["top_n"] = top_n
return "https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank", payload
def _extract_rerank_results(payload: Any, *, provider: str) -> list[dict[str, Any]]:
if not isinstance(payload, dict):
return []
if provider == "Ali" and isinstance(payload.get("output"), dict):
results = payload["output"].get("results")
else:
results = payload.get("results")
if not isinstance(results, list):
return []
normalized: list[dict[str, Any]] = []
for item in results:
if not isinstance(item, dict):
continue
try:
normalized.append(
{
"index": int(item["index"]),
"relevance_score": float(item["relevance_score"]),
}
)
except (KeyError, TypeError, ValueError):
continue
return normalized

View File

@@ -2,6 +2,7 @@ from __future__ import annotations
import base64
import json
import re
import shutil
import subprocess
from dataclasses import dataclass, field
@@ -27,6 +28,7 @@ class PreparedOcrInput:
page_index: int | None = None
preview_kind: str = ""
preview_data_url: str = ""
text_layer: str = ""
@dataclass(slots=True)
@@ -38,6 +40,7 @@ class AggregatedOcrDocument:
model: str = "PP-OCRv5_mobile"
summary_fragments: list[str] = field(default_factory=list)
text_fragments: list[str] = field(default_factory=list)
text_layer_fragments: list[str] = field(default_factory=list)
score_values: list[float] = field(default_factory=list)
warnings: list[str] = field(default_factory=list)
lines: list[OcrRecognizeLineRead] = field(default_factory=list)
@@ -112,12 +115,14 @@ class OcrService:
if suffix == ".pdf":
try:
text_layer = self._extract_pdf_text_layer(temp_path)
prepared_inputs.extend(
self._prepare_pdf_inputs(
pdf_path=temp_path,
filename=normalized_name,
media_type=resolved_media_type,
cleanup_paths=cleanup_paths,
text_layer=text_layer,
)
)
except RuntimeError as exc:
@@ -261,6 +266,7 @@ class OcrService:
filename: str,
media_type: str,
cleanup_paths: list[Path],
text_layer: str = "",
) -> list[PreparedOcrInput]:
output_dir = pdf_path.with_suffix("")
output_dir.mkdir(parents=True, exist_ok=True)
@@ -283,10 +289,33 @@ class OcrService:
page_index=page_index,
preview_kind="image" if page_index == 0 else "",
preview_data_url=preview_data_url if page_index == 0 else "",
text_layer=text_layer if page_index == 0 else "",
)
)
return descriptors
def _extract_pdf_text_layer(self, pdf_path: Path) -> str:
try:
completed = subprocess.run(
[
"pdftotext",
"-layout",
str(pdf_path),
"-",
],
capture_output=True,
text=True,
timeout=self.settings.ocr_timeout_seconds,
check=False,
)
except (OSError, subprocess.SubprocessError, UnicodeError):
return ""
if completed.returncode != 0:
return ""
return self._normalize_extracted_text(completed.stdout)
def _convert_pdf_to_images(self, *, pdf_path: Path, output_dir: Path) -> list[Path]:
prefix = output_dir / "page"
completed = subprocess.run(
@@ -367,6 +396,8 @@ class OcrService:
aggregated.preview_kind = descriptor.preview_kind
if descriptor.preview_data_url and not aggregated.preview_data_url:
aggregated.preview_data_url = descriptor.preview_data_url
if descriptor.text_layer and descriptor.text_layer not in aggregated.text_layer_fragments:
aggregated.text_layer_fragments.append(descriptor.text_layer)
page_summary = str(payload.get("summary", "") or "").strip()
if page_summary:
@@ -401,6 +432,20 @@ class OcrService:
aggregated = aggregated_by_source.get(source_key)
if aggregated is None:
first_descriptor = descriptors[0]
text_layer = self._collect_descriptor_text_layer(descriptors)
if text_layer:
fallback = AggregatedOcrDocument(
filename=first_descriptor.filename,
media_type=first_descriptor.media_type,
source_key=first_descriptor.source_key,
page_count=max(1, len(descriptors)),
preview_kind=first_descriptor.preview_kind,
preview_data_url=first_descriptor.preview_data_url,
warnings=["OCR worker 未返回该文件的识别结果,已使用 PDF 文本层。"],
)
fallback.text_layer_fragments.append(text_layer)
documents.append(self._finalize_document(fallback))
continue
documents.append(
OcrRecognizeDocumentRead(
filename=first_descriptor.filename,
@@ -416,6 +461,13 @@ class OcrService:
return documents
@staticmethod
def _collect_descriptor_text_layer(descriptors: list[PreparedOcrInput]) -> str:
for descriptor in descriptors:
if descriptor.text_layer:
return descriptor.text_layer
return ""
@staticmethod
def _build_lines(
items: list[dict],
@@ -451,13 +503,26 @@ class OcrService:
return summary
def _finalize_document(self, aggregated: AggregatedOcrDocument) -> OcrRecognizeDocumentRead:
full_text = "\n".join(fragment for fragment in aggregated.text_fragments if fragment).strip()
ocr_text = "\n".join(fragment for fragment in aggregated.text_fragments if fragment).strip()
text_layer = "\n".join(fragment for fragment in aggregated.text_layer_fragments if fragment).strip()
full_text, used_text_layer = self._choose_document_text(ocr_text=ocr_text, text_layer=text_layer)
summary = self._truncate_summary(aggregated.summary_fragments or aggregated.text_fragments)
if used_text_layer or self._placeholder_ratio(summary) >= 0.12:
summary = self._summarize_text(full_text)
preview_kind = aggregated.preview_kind
preview_data_url = aggregated.preview_data_url
if (
used_text_layer
and aggregated.media_type == "application/pdf"
and self._placeholder_ratio(ocr_text) >= 0.12
):
preview_kind = ""
preview_data_url = ""
insight = self.document_intelligence_service.build_document_insight(
filename=aggregated.filename,
summary=summary,
text=full_text,
preview_data_url=aggregated.preview_data_url,
preview_data_url=preview_data_url,
)
warnings = list(aggregated.warnings)
for warning in insight.warnings:
@@ -493,8 +558,8 @@ class OcrService:
)
for field in insight.fields
],
preview_kind=aggregated.preview_kind,
preview_data_url=aggregated.preview_data_url,
preview_kind=preview_kind,
preview_data_url=preview_data_url,
warnings=warnings,
lines=sorted(
aggregated.lines,
@@ -502,6 +567,45 @@ class OcrService:
),
)
@classmethod
def _choose_document_text(cls, *, ocr_text: str, text_layer: str) -> tuple[str, bool]:
normalized_ocr_text = cls._normalize_extracted_text(ocr_text)
normalized_text_layer = cls._normalize_extracted_text(text_layer)
if not normalized_text_layer:
return normalized_ocr_text, False
if not normalized_ocr_text:
return normalized_text_layer, True
if cls._placeholder_ratio(normalized_ocr_text) >= 0.12 and cls._meaningful_char_count(normalized_text_layer) >= 8:
return normalized_text_layer, True
if cls._meaningful_char_count(normalized_text_layer) > cls._meaningful_char_count(normalized_ocr_text) * 1.3:
return normalized_text_layer, True
return normalized_ocr_text, False
@staticmethod
def _normalize_extracted_text(value: str) -> str:
lines = [re.sub(r"[ \t]+", " ", line).strip() for line in str(value or "").replace("\r", "\n").split("\n")]
return "\n".join(line for line in lines if line).strip()
@staticmethod
def _summarize_text(value: str) -> str:
lines = [line.strip() for line in str(value or "").splitlines() if line.strip()]
summary = "".join(lines[:3])
if len(summary) > 180:
return f"{summary[:177]}..."
return summary
@staticmethod
def _meaningful_char_count(value: str) -> int:
return len(re.findall(r"[0-9A-Za-z\u4e00-\u9fff]", str(value or "")))
@staticmethod
def _placeholder_ratio(value: str) -> float:
chars = [char for char in str(value or "") if not char.isspace()]
if not chars:
return 0.0
placeholder_count = sum(1 for char in chars if char in {"", "<EFBFBD>"})
return placeholder_count / len(chars)
@staticmethod
def _cleanup_temp_paths(paths: list[Path]) -> None:
for path in reversed(paths):

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More