feat: 完善系统配置、安全增强与知识库功能
- .env.example: API基础路径改为相对路径 /api/v1,支持代理转发 - README.md: 完善项目结构与启动说明文档 - docker-compose.yml: 新增Docker编排配置,支持容器化部署 - docker/: 新增Docker部署相关文档与配置 - server_start.sh: 重构启动脚本,添加容器环境检测、隔离虚拟环境路径、环境变量覆盖机制 - deps.py: 完善API依赖注入,增强权限验证逻辑 - admin_secret.py: 优化管理员密钥加密存储与验证 - config.py: 扩展配置管理,支持多环境变量绑定 - security.py: 增强安全模块,完善加密与认证机制 - db/base.py: 优化数据库基础架构与连接管理 - main.py: 更新应用入口,整合新模块路由 - models/: 完善系统模型配置,支持模型设置持久化 - repositories/settings.py: 优化设置仓储层,增强数据持久化 - services/settings.py: 重构设置服务,精简代码结构 - router.py: 更新API路由配置 - endpoints/knowledge.py: 新增知识库API端点 - schemas/knowledge.py: 新增知识库数据模型 - services/knowledge.py: 新增知识库业务逻辑 - storage/knowledge/.index.json: 知识库索引存储 - api.js: 完善API服务层,增强错误处理 - bootstrap.js: 优化前端初始化与引导流程 - useSetupView.js / useSystemState.js: 重构组合式函数 - TopBar.vue: 优化顶部导航栏组件 - SettingsView.vue: 重构设置页面UI,增强用户体验 - SetupView.vue / SetupRouteView.vue: 完善引导流程页面 - PoliciesView.vue: 优化策略视图组件 - vite.config.js: 更新Vite构建配置 - web_start.sh: 完善前端启动脚本 - views/scripts/: 优化各业务视图JS逻辑 - settings-view.css: 重构设置页面样式 - setup-view.css: 完善引导页样式 - policies-view.css: 优化策略页样式 - test_auth_service.py: 完善认证服务测试 - test_settings_persistence.py: 增强设置持久化测试 - document/: 新增开发文档与工作日志
This commit is contained in:
169
document/development/plan/ai_agent_dual_layer_arch.md
Normal file
169
document/development/plan/ai_agent_dual_layer_arch.md
Normal file
@@ -0,0 +1,169 @@
|
|||||||
|
# X-Financial 智能化财务系统:双层 Agent 架构设计与开发落地全景指南
|
||||||
|
|
||||||
|
> **核心设计理念:确定性与概率性的完美解耦**
|
||||||
|
>
|
||||||
|
> 在企业级财务系统中,“合规性”与“准确性”是不可妥协的底线。大语言模型(LLM)天生具有概率性(会产生幻觉),因此不能直接赋予其修改核心财务数据或放行审批的最高权限。
|
||||||
|
>
|
||||||
|
> 本架构设计的核心,在于构建一个**“双层防线”**:
|
||||||
|
> 1. **外层 Agent (自研流程大脑)**:提供 100% 的确定性。它是系统的执行者,严格按照预设流程和固化的规则行事,不具备“自我意识”,只负责“路由”、“拦截”和“记录”。
|
||||||
|
> 2. **内层 Agent (Hermes 智囊核心)**:提供强大的概率性推理能力。它是系统的思考者,负责处理所有复杂、模糊、非结构化的任务(如阅读长文档、识别潜在风险),但它的输出**不能直接作用于业务**,而是转化为**规则配置**或**建议意见**,交由外层 Agent 或人类管理员执行。
|
||||||
|
>
|
||||||
|
> 这两层架构不是相互独立的两个系统,而是形成一个**“闭环”**:内层提炼规则,外层执行规则;外层收集数据,内层分析数据。这种深度协同,既保障了系统的安全性,又赋予了系统极高的智能化水平。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 一、 系统架构图景与职责边界深度剖析
|
||||||
|
|
||||||
|
### 1. 外层 Agent (Outer Agent):流程与路由的绝对掌控者
|
||||||
|
|
||||||
|
**本质:一个高度可配置的业务工作流引擎与意图分发器。**
|
||||||
|
|
||||||
|
* **开发技术栈建议**:FastAPI (后端) + Vue3 (前端) + PostgreSQL (持久化) + Redis (可选,用于状态缓存)。
|
||||||
|
* **交互形态**:它直接面对用户。它可以是一个类似对话框的界面,但背后的逻辑是基于**状态机 (State Machine)** 驱动的。
|
||||||
|
* **核心模块与职责 (What to do & How to do)**:
|
||||||
|
|
||||||
|
* **模块 1: 意图漏斗 (Intent Router)**
|
||||||
|
* **职责**:精准捕捉用户请求的第一诉求,并将其导向正确的处理管线。
|
||||||
|
* **方法**:
|
||||||
|
* *规则匹配优先*:使用简单的关键词或正则(例如:匹配到“报销”、“打车”字眼,直接激活报销向导)。
|
||||||
|
* *轻量级分类模型兜底*:对于模糊表述(如:“我上周去上海开会的钱怎么还没发?”),调用一个小参数的分类模型(或内层的快捷接口),将其分类为“状态查询”意图,并提取关键实体(如时间:上周,地点:上海)。
|
||||||
|
* **模块 2: 结构化状态机引擎 (State & Flow Controller)**
|
||||||
|
* **职责**:管理每一个业务对象(如一张报销单)的生命周期。从“草稿” -> “提交” -> “一级审批” -> “财务复核” -> “已打款”。
|
||||||
|
* **方法**:拒绝让大模型控制流程走向。流程流转必须基于代码逻辑中的条件判断(例如:如果金额 < 500,且员工级别为 M1,则跳过一级审批,直接进入财务复核)。外层 Agent 负责维护并推进这个状态。
|
||||||
|
* **模块 3: 确定性规则执行器 (Rule Execution Engine)**
|
||||||
|
* **职责**:财务合规的第一道硬性防线。不讲道理,只看数据。
|
||||||
|
* **方法**:当用户提交报销数据时,该模块会查询本地的 `business_rules` 数据库表。如果用户提交的住宿费是 850,而数据库规则明确上限是 800,则立刻抛出“阻断型错误” (Blocking Error)。**此过程绝对禁止调用大模型进行实时推断。**
|
||||||
|
* **模块 4: 标准化 API 网关 (API Gateway & Handshake Layer)**
|
||||||
|
* **职责**:封装所有对外层系统(如 ERP、HR 系统)和对内层 Hermes 的通信接口。控制并发,记录调用日志。
|
||||||
|
|
||||||
|
### 2. 内层 Agent (Hermes):非结构化信息的提炼者与深度思考者
|
||||||
|
|
||||||
|
**本质:一个被严格隔离的智能计算引擎,专门处理人类擅长但传统代码难以处理的“软逻辑”。**
|
||||||
|
|
||||||
|
* **开发技术栈建议**:Hermes 框架 + 向量数据库 (如 Milvus/PGVector) + 强力 LLM (如 GPT-4 或开源大模型)。
|
||||||
|
* **交互形态**:对用户不可见,只作为外层 Agent 的“后端服务”存在。
|
||||||
|
* **核心模块与职责 (What to do & How to do)**:
|
||||||
|
|
||||||
|
* **模块 1: 政策蒸馏器 (Policy Distiller) —— 解决“知行合一”的关键**
|
||||||
|
* **职责**:打破知识库(死文件)与业务流(活代码)之间的壁垒。
|
||||||
|
* **方法 (核心思路)**:
|
||||||
|
1. *触发*:管理员上传一份《差旅新规.pdf》。
|
||||||
|
2. *解析*:Hermes 逐段阅读文档。
|
||||||
|
3. *提取*:使用精心设计的 **Few-Shot Prompt 链**,强制模型识别特定的“控制变量”。
|
||||||
|
*(Prompt 示例: "你是一个专业的财务合规审计员。请阅读以下段落,如果包含任何关于费用上限、职级限制、审批层级的规定,请严格按照以下 JSON Schema 输出:{category, location, level_req, max_amount, is_hard_limit}。如果未找到,输出空。")*
|
||||||
|
4. *回写*:Hermes 将提炼出的 JSON 结构转化为标准的 SQL Update 指令(或通过专用 API 接口),更新外层 Agent 依赖的 `business_rules` 表。
|
||||||
|
* **模块 2: 深度知识检索 (Deep RAG & Interpretation)**
|
||||||
|
* **职责**:为用户提供复杂制度的个性化解读。
|
||||||
|
* **方法**:当外层 Agent 无法解答用户的合规疑问时(意图识别为“政策咨询”),外层将请求转发给 Hermes。Hermes 在向量库中检索相关段落,并结合用户当前的上下文(如:员工职级、出差地),生成一份连贯、人性化的解答。
|
||||||
|
* **模块 3: 异步风险探针 (Asynchronous Risk Auditor)**
|
||||||
|
* **职责**:像“老会计”一样,在海量已发生或正在发生的业务数据中寻找蛛丝马迹。
|
||||||
|
* **方法**:
|
||||||
|
1. *定时任务*:每天凌晨启动。
|
||||||
|
2. *数据聚合*:从外层数据库提取当天的报销流水(去除敏感个资)。
|
||||||
|
3. *模式识别*:通过特定的 Prompt(例如寻找“拆单报销”、“异常高频的出租车票”)。
|
||||||
|
4. *生成报告*:生成结构化的风险预警报告,存入专用表,供管理员次日早晨审核,而不是直接去冻结员工账号。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 二、 核心通信协议 (The Handshake):两层的握手与数据交互
|
||||||
|
|
||||||
|
双层架构的成败,取决于这两层能否顺畅地交换信息,且保证安全。我们需要定义清晰的接口协议。
|
||||||
|
|
||||||
|
### 1. 同步查询接口 (外 -> 内:求知与解惑)
|
||||||
|
|
||||||
|
当外层遇到处理不了的“软逻辑”时触发。
|
||||||
|
|
||||||
|
* **Endpoint (示例)**: `POST /hermes/api/v1/consult`
|
||||||
|
* **外层 Request 结构**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"context": {
|
||||||
|
"user_id": "emp_1001",
|
||||||
|
"current_task": "travel_reimbursement",
|
||||||
|
"form_data": {"city": "北京", "amount": 900}
|
||||||
|
},
|
||||||
|
"query": "因为展会原因酒店全满,只能订900的,能报销吗?"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
* **内层 Hermes Response 结构**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"status": "success",
|
||||||
|
"interpretation": "根据《差旅管理办法》第15条,展会期间允许上浮 20%。您的标准是800,上浮后为960,可以报销。",
|
||||||
|
"action_recommendation": "require_special_approval", // 建议外层采取的动作
|
||||||
|
"citations": ["policy_doc_v2_page_4"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. 异步任务接口 (外 -> 内:派发耗时任务)
|
||||||
|
|
||||||
|
例如请求生成长篇分析报告或进行全量风险巡检。
|
||||||
|
|
||||||
|
* **流程**:
|
||||||
|
1. 外层调用 `POST /hermes/api/v1/jobs/generate_report`。
|
||||||
|
2. 内层 Hermes 立即返回 `202 Accepted` 和一个 `job_id`。
|
||||||
|
3. 内层 Hermes 在后台慢慢算。
|
||||||
|
4. 计算完成后,内层通过 Webhook 回调外层的通知接口,外层再通过系统消息通知用户“您的报告已就绪”。
|
||||||
|
|
||||||
|
### 3. 规则推送机制 (内 -> 外:自动化立法)
|
||||||
|
|
||||||
|
这是最核心的逆向通信。内层提炼出的规则如何生效?
|
||||||
|
|
||||||
|
* **流程**:
|
||||||
|
1. Hermes 提炼出新规则。
|
||||||
|
2. Hermes 调用外层的特权 API (如 `POST /admin/api/rules/sync`),推送规则 payload。
|
||||||
|
3. 外层 Agent 收到后,执行数据库 `UPSERT` 操作更新 `business_rules` 表。
|
||||||
|
4. *(可选但强烈建议)*:进入“待激活”状态,需要人类管理员在系统中点击“确认应用新规则”后,新规才正式生效。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 三、 分阶段开发落地全景计划 (Implementation Roadmap)
|
||||||
|
|
||||||
|
开发应当遵循“先基建后上层、先确定后智能”的原则。
|
||||||
|
|
||||||
|
### Phase 1: 骨架搭建与基石铺设 (Foundation & Outer Shell)
|
||||||
|
*目标:构建一个哪怕没有 AI 也能运转的硬核流程系统,确立两层隔离。*
|
||||||
|
|
||||||
|
1. **架构拆分验证**:在服务器层面,确保 Outer Agent (FastAPI) 和 Inner Hermes 分别在独立的进程(或容器)中运行,仅通过 HTTP/gRPC 通信。
|
||||||
|
2. **动态规则引擎实现 (核心基建)**:
|
||||||
|
* 在 PostgreSQL 中设计 `business_rules` 表结构。必须支持高度扩展性(例如采用 `JSONB` 字段存储具体约束参数)。
|
||||||
|
* 在外层 Agent 开发一个“规则校验服务 (Rule Validation Service)”,该服务能够在任何报销动作发生前,拦截并比对 `business_rules`。
|
||||||
|
3. **标准化流程闭环**:开发一个完整的、基于硬规则驱动的差旅报销单据流转全流程(填单 -> 校验 -> 提交 -> 审批)。验证在“硬规则”下系统运转良好。
|
||||||
|
|
||||||
|
### Phase 2: 知识注入与基础问答 (Hermes RAG Integration)
|
||||||
|
*目标:赋予系统“解答疑问”的能力。*
|
||||||
|
|
||||||
|
1. **内层基建**:配置 Hermes 环境,接入向量数据库。
|
||||||
|
2. **文档清洗管道 (ETL pipeline)**:将现有的财务政策 PDF/Word 文档清洗、分块 (Chunking) 并向量化入库。
|
||||||
|
3. **问答桥接**:
|
||||||
|
* 在外层前端 (Vue3) 提供一个“智能咨询”悬浮窗或独立页面。
|
||||||
|
* 外层 Agent 接收问题,附带上用户的上下文(角色、权限),一并转发给内层 Hermes。
|
||||||
|
* 验证 Hermes 能够根据向量库的内容,给出带出处的准确回答。
|
||||||
|
|
||||||
|
### Phase 3: 核心攻坚 —— 自动立法与双层联通 (Policy Distillation & Sync)
|
||||||
|
*目标:实现从“死文档”到“活规则”的自动化转化。*
|
||||||
|
|
||||||
|
1. **蒸馏 Prompt 工程**:在 Hermes 中反复打磨“政策提炼”的 Prompt。针对你们公司常见的政策描述方式进行微调。
|
||||||
|
2. **结构化提取测试**:手动上传不同版本的政策文档,测试 Hermes 能否稳定、准确地输出 JSON 格式的规则参数。
|
||||||
|
3. **闭环联调**:
|
||||||
|
* 开发 Hermes 向外层推送规则的 API。
|
||||||
|
* 完成全链路测试:管理员界面上传新文档 -> Hermes 后台解析 -> 外层规则库自动更新 -> 前端即时生效新的金额限制。
|
||||||
|
|
||||||
|
### Phase 4: 高阶进化 —— 异步审计与主动防御 (Proactive Risk Auditing)
|
||||||
|
*目标:将系统从“被动响应”升级为“主动防护”。*
|
||||||
|
|
||||||
|
1. **数据安全隧道**:建立从外层业务库向内层 Hermes 传递“脱敏业务快照”的通道。
|
||||||
|
2. **风险模式定义**:梳理出 3-5 种典型的财务风险模式(如:异常聚集的餐饮发票、连续的单日高额交通费)。
|
||||||
|
3. **Hermes 巡检任务**:编写定时任务逻辑,利用大模型的推理能力去比对这些模式和当天的业务快照数据。
|
||||||
|
4. **风险看板**:在外层系统的管理后台开发“风险报告台”,展示 Hermes 生成的预警结果。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 四、 关键风险与防范策略总结
|
||||||
|
|
||||||
|
1. **大模型幻觉污染规则库**:
|
||||||
|
* **防范**:Hermes 提炼的所有硬性规则(尤其是金额、审批级数),在写入外层正式库之前,必须增加一个**“人工审核 (Human-in-the-loop)”** 环节。系统提示“检测到政策更新,提炼出 5 条新规则,请管理员确认应用”。
|
||||||
|
2. **状态机混乱**:
|
||||||
|
* **防范**:外层 Agent 的流程控制代码必须使用强类型和严格的事务控制 (Transaction)。绝不允许任何组件(包括 AI)在不经过状态机合法校验的情况下直接修改数据库中的 `status` 字段。
|
||||||
|
3. **性能瓶颈**:
|
||||||
|
* **防范**:所有外层必须做的事情(拦截、查询)必须在毫秒级完成。所有涉及调用 Hermes 的操作(问答、提炼、分析)全部采用异步设计或提供明确的 Loading 反馈。
|
||||||
@@ -1,5 +1,8 @@
|
|||||||
from collections.abc import Generator
|
from collections.abc import Generator
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Annotated
|
||||||
|
|
||||||
|
from fastapi import Depends, Header, HTTPException, status
|
||||||
from sqlalchemy.orm import Session
|
from sqlalchemy.orm import Session
|
||||||
|
|
||||||
from app.db.session import get_session_factory
|
from app.db.session import get_session_factory
|
||||||
@@ -11,3 +14,49 @@ def get_db() -> Generator[Session, None, None]:
|
|||||||
yield db
|
yield db
|
||||||
finally:
|
finally:
|
||||||
db.close()
|
db.close()
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class CurrentUserContext:
|
||||||
|
username: str
|
||||||
|
name: str
|
||||||
|
role_codes: list[str]
|
||||||
|
is_admin: bool
|
||||||
|
|
||||||
|
|
||||||
|
def get_current_user(
|
||||||
|
x_auth_username: Annotated[str | None, Header()] = None,
|
||||||
|
x_auth_name: Annotated[str | None, Header()] = None,
|
||||||
|
x_auth_role_codes: Annotated[str | None, Header()] = None,
|
||||||
|
x_auth_is_admin: Annotated[str | None, Header()] = None,
|
||||||
|
) -> CurrentUserContext:
|
||||||
|
role_codes = [item.strip() for item in (x_auth_role_codes or "").split(",") if item.strip()]
|
||||||
|
is_admin = str(x_auth_is_admin or "").strip().lower() in {"1", "true", "yes", "on"}
|
||||||
|
|
||||||
|
username = (x_auth_username or "").strip()
|
||||||
|
name = (x_auth_name or username).strip()
|
||||||
|
|
||||||
|
if not username and not name:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||||
|
detail="请先登录后再访问知识库。",
|
||||||
|
)
|
||||||
|
|
||||||
|
return CurrentUserContext(
|
||||||
|
username=username or name,
|
||||||
|
name=name or username,
|
||||||
|
role_codes=role_codes,
|
||||||
|
is_admin=is_admin,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def require_admin_user(
|
||||||
|
current_user: Annotated[CurrentUserContext, Depends(get_current_user)],
|
||||||
|
) -> CurrentUserContext:
|
||||||
|
if current_user.is_admin or "manager" in current_user.role_codes:
|
||||||
|
return current_user
|
||||||
|
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_403_FORBIDDEN,
|
||||||
|
detail="只有管理员可以上传、删除或修改知识库文件。",
|
||||||
|
)
|
||||||
|
|||||||
76
server/src/app/api/v1/endpoints/knowledge.py
Normal file
76
server/src/app/api/v1/endpoints/knowledge.py
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import Annotated
|
||||||
|
|
||||||
|
from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
|
||||||
|
from fastapi.responses import FileResponse
|
||||||
|
|
||||||
|
from app.api.deps import CurrentUserContext, get_current_user, require_admin_user
|
||||||
|
from app.schemas.knowledge import (
|
||||||
|
KnowledgeActionResponse,
|
||||||
|
KnowledgeDocumentDetailRead,
|
||||||
|
KnowledgeLibraryRead,
|
||||||
|
)
|
||||||
|
from app.services.knowledge import KnowledgeService
|
||||||
|
|
||||||
|
router = APIRouter(prefix="/knowledge")
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/library", response_model=KnowledgeLibraryRead)
|
||||||
|
def get_knowledge_library(
|
||||||
|
_: Annotated[CurrentUserContext, Depends(get_current_user)],
|
||||||
|
) -> KnowledgeLibraryRead:
|
||||||
|
return KnowledgeService().list_library()
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/documents/{document_id}", response_model=KnowledgeDocumentDetailRead)
|
||||||
|
def get_knowledge_document(
|
||||||
|
document_id: str,
|
||||||
|
_: Annotated[CurrentUserContext, Depends(get_current_user)],
|
||||||
|
) -> KnowledgeDocumentDetailRead:
|
||||||
|
try:
|
||||||
|
return KnowledgeService().get_document_detail(document_id)
|
||||||
|
except FileNotFoundError as exc:
|
||||||
|
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="知识库文件不存在。") from exc
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/documents", response_model=KnowledgeDocumentDetailRead, status_code=status.HTTP_201_CREATED)
|
||||||
|
async def upload_knowledge_document(
|
||||||
|
request: Request,
|
||||||
|
folder: Annotated[str, Query(min_length=1)],
|
||||||
|
filename: Annotated[str, Query(min_length=1)],
|
||||||
|
current_user: Annotated[CurrentUserContext, Depends(require_admin_user)],
|
||||||
|
) -> KnowledgeDocumentDetailRead:
|
||||||
|
content = await request.body()
|
||||||
|
try:
|
||||||
|
return KnowledgeService().upload_document(folder, filename, content, current_user)
|
||||||
|
except ValueError as exc:
|
||||||
|
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
|
||||||
|
|
||||||
|
|
||||||
|
@router.delete("/documents/{document_id}", response_model=KnowledgeActionResponse)
|
||||||
|
def delete_knowledge_document(
|
||||||
|
document_id: str,
|
||||||
|
_: Annotated[CurrentUserContext, Depends(require_admin_user)],
|
||||||
|
) -> KnowledgeActionResponse:
|
||||||
|
try:
|
||||||
|
KnowledgeService().delete_document(document_id)
|
||||||
|
except FileNotFoundError as exc:
|
||||||
|
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="知识库文件不存在。") from exc
|
||||||
|
|
||||||
|
return KnowledgeActionResponse(detail="知识库文件已删除。")
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/documents/{document_id}/content")
|
||||||
|
def get_knowledge_document_content(
|
||||||
|
document_id: str,
|
||||||
|
disposition: Annotated[str, Query(pattern="^(inline|attachment)$")] = "inline",
|
||||||
|
_: Annotated[CurrentUserContext, Depends(get_current_user)] = None,
|
||||||
|
) -> FileResponse:
|
||||||
|
try:
|
||||||
|
file_path, media_type, filename = KnowledgeService().get_document_content(document_id)
|
||||||
|
except FileNotFoundError as exc:
|
||||||
|
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="知识库文件不存在。") from exc
|
||||||
|
|
||||||
|
_ = disposition
|
||||||
|
return FileResponse(file_path, media_type=media_type, filename=filename)
|
||||||
@@ -4,6 +4,7 @@ from app.api.v1.endpoints.auth import router as auth_router
|
|||||||
from app.api.v1.endpoints.bootstrap import router as bootstrap_router
|
from app.api.v1.endpoints.bootstrap import router as bootstrap_router
|
||||||
from app.api.v1.endpoints.employees import router as employees_router
|
from app.api.v1.endpoints.employees import router as employees_router
|
||||||
from app.api.v1.endpoints.health import router as health_router
|
from app.api.v1.endpoints.health import router as health_router
|
||||||
|
from app.api.v1.endpoints.knowledge import router as knowledge_router
|
||||||
from app.api.v1.endpoints.reimbursements import router as reimbursements_router
|
from app.api.v1.endpoints.reimbursements import router as reimbursements_router
|
||||||
from app.api.v1.endpoints.settings import router as settings_router
|
from app.api.v1.endpoints.settings import router as settings_router
|
||||||
|
|
||||||
@@ -11,6 +12,7 @@ router = APIRouter()
|
|||||||
router.include_router(health_router, tags=["health"])
|
router.include_router(health_router, tags=["health"])
|
||||||
router.include_router(bootstrap_router, tags=["bootstrap"])
|
router.include_router(bootstrap_router, tags=["bootstrap"])
|
||||||
router.include_router(auth_router, tags=["auth"])
|
router.include_router(auth_router, tags=["auth"])
|
||||||
|
router.include_router(knowledge_router, tags=["knowledge"])
|
||||||
router.include_router(employees_router, prefix="/employees", tags=["employees"])
|
router.include_router(employees_router, prefix="/employees", tags=["employees"])
|
||||||
router.include_router(reimbursements_router, prefix="/reimbursements", tags=["reimbursements"])
|
router.include_router(reimbursements_router, prefix="/reimbursements", tags=["reimbursements"])
|
||||||
router.include_router(settings_router, tags=["settings"])
|
router.include_router(settings_router, tags=["settings"])
|
||||||
|
|||||||
@@ -51,6 +51,7 @@ class Settings(BaseSettings):
|
|||||||
log_level: str = Field(default="INFO", alias="LOG_LEVEL")
|
log_level: str = Field(default="INFO", alias="LOG_LEVEL")
|
||||||
log_dir: str = Field(default="logs", alias="LOG_DIR")
|
log_dir: str = Field(default="logs", alias="LOG_DIR")
|
||||||
log_file_enabled: bool = Field(default=True, alias="LOG_FILE_ENABLED")
|
log_file_enabled: bool = Field(default=True, alias="LOG_FILE_ENABLED")
|
||||||
|
storage_root_dir: str = Field(default="storage", alias="STORAGE_ROOT_DIR")
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def resolved_database_url(self) -> str:
|
def resolved_database_url(self) -> str:
|
||||||
@@ -62,6 +63,13 @@ class Settings(BaseSettings):
|
|||||||
f"@{self.postgres_host}:{self.postgres_port}/{self.postgres_db}"
|
f"@{self.postgres_host}:{self.postgres_port}/{self.postgres_db}"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def resolved_storage_root_dir(self) -> Path:
|
||||||
|
path = Path(self.storage_root_dir)
|
||||||
|
if not path.is_absolute():
|
||||||
|
path = SERVER_DIR / path
|
||||||
|
return path.resolve()
|
||||||
|
|
||||||
|
|
||||||
@lru_cache
|
@lru_cache
|
||||||
def get_settings() -> Settings:
|
def get_settings() -> Settings:
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ from app.core.config import get_settings
|
|||||||
from app.core.logging import get_logger, setup_logging
|
from app.core.logging import get_logger, setup_logging
|
||||||
from app.middleware.logging import AccessLogMiddleware
|
from app.middleware.logging import AccessLogMiddleware
|
||||||
from app.services.employee import prepare_employee_directory
|
from app.services.employee import prepare_employee_directory
|
||||||
|
from app.services.knowledge import prepare_knowledge_library
|
||||||
|
|
||||||
|
|
||||||
def create_app() -> FastAPI:
|
def create_app() -> FastAPI:
|
||||||
@@ -50,6 +51,7 @@ def create_app() -> FastAPI:
|
|||||||
@app.on_event("startup")
|
@app.on_event("startup")
|
||||||
def _on_startup() -> None:
|
def _on_startup() -> None:
|
||||||
prepare_employee_directory()
|
prepare_employee_directory()
|
||||||
|
prepare_knowledge_library()
|
||||||
logger.info(
|
logger.info(
|
||||||
"Server ready - host=%s port=%s prefix=%s",
|
"Server ready - host=%s port=%s prefix=%s",
|
||||||
settings.app_host,
|
settings.app_host,
|
||||||
|
|||||||
@@ -1 +1 @@
|
|||||||
__all__ = ["employee", "reimbursement"]
|
__all__ = ["employee", "knowledge", "reimbursement"]
|
||||||
|
|||||||
61
server/src/app/schemas/knowledge.py
Normal file
61
server/src/app/schemas/knowledge.py
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pydantic import BaseModel, Field
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgeFolderRead(BaseModel):
|
||||||
|
name: str
|
||||||
|
count: int
|
||||||
|
icon: str = "mdi mdi-folder"
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgePreviewStatRead(BaseModel):
|
||||||
|
label: str
|
||||||
|
value: str
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgePreviewBlockRead(BaseModel):
|
||||||
|
heading: str
|
||||||
|
lines: list[str] = Field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgePreviewPageRead(BaseModel):
|
||||||
|
title: str
|
||||||
|
subtitle: str
|
||||||
|
stats: list[KnowledgePreviewStatRead] = Field(default_factory=list)
|
||||||
|
blocks: list[KnowledgePreviewBlockRead] = Field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgeDocumentRead(BaseModel):
|
||||||
|
id: str
|
||||||
|
name: str
|
||||||
|
folder: str
|
||||||
|
tag: str
|
||||||
|
time: str
|
||||||
|
version: str
|
||||||
|
state: str
|
||||||
|
stateTone: str
|
||||||
|
owner: str
|
||||||
|
icon: str
|
||||||
|
fileType: str
|
||||||
|
fileTypeLabel: str
|
||||||
|
summary: str
|
||||||
|
mimeType: str
|
||||||
|
extension: str
|
||||||
|
sizeBytes: int
|
||||||
|
canPreview: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgeDocumentDetailRead(KnowledgeDocumentRead):
|
||||||
|
previewKind: str
|
||||||
|
previewPages: list[KnowledgePreviewPageRead] = Field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgeLibraryRead(BaseModel):
|
||||||
|
folders: list[KnowledgeFolderRead] = Field(default_factory=list)
|
||||||
|
documents: list[KnowledgeDocumentRead] = Field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgeActionResponse(BaseModel):
|
||||||
|
ok: bool = True
|
||||||
|
detail: str
|
||||||
634
server/src/app/services/knowledge.py
Normal file
634
server/src/app/services/knowledge.py
Normal file
@@ -0,0 +1,634 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
import json
|
||||||
|
import mimetypes
|
||||||
|
import re
|
||||||
|
from datetime import UTC, datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
from uuid import uuid4
|
||||||
|
from xml.etree import ElementTree
|
||||||
|
from zipfile import BadZipFile, ZipFile
|
||||||
|
|
||||||
|
from app.api.deps import CurrentUserContext
|
||||||
|
from app.core.config import get_settings
|
||||||
|
from app.core.logging import get_logger
|
||||||
|
from app.schemas.knowledge import (
|
||||||
|
KnowledgeDocumentDetailRead,
|
||||||
|
KnowledgeDocumentRead,
|
||||||
|
KnowledgeFolderRead,
|
||||||
|
KnowledgeLibraryRead,
|
||||||
|
KnowledgePreviewBlockRead,
|
||||||
|
KnowledgePreviewPageRead,
|
||||||
|
KnowledgePreviewStatRead,
|
||||||
|
)
|
||||||
|
|
||||||
|
logger = get_logger("app.services.knowledge")
|
||||||
|
|
||||||
|
FIXED_KNOWLEDGE_FOLDERS = [
|
||||||
|
"财务知识库",
|
||||||
|
"制度政策",
|
||||||
|
"报销制度",
|
||||||
|
"差旅规范",
|
||||||
|
"发票管理",
|
||||||
|
"税务合规",
|
||||||
|
"预算管理",
|
||||||
|
"财务共享",
|
||||||
|
"培训资料",
|
||||||
|
"常见问答",
|
||||||
|
]
|
||||||
|
|
||||||
|
ICON_BY_TYPE = {
|
||||||
|
"pdf": "mdi mdi-file-document-outline-pdf pdf",
|
||||||
|
"word": "mdi mdi-file-document-outline-word word",
|
||||||
|
"excel": "mdi mdi-file-document-outline-excel excel",
|
||||||
|
"ppt": "mdi mdi-file-powerpoint-box ppt",
|
||||||
|
"image": "mdi mdi-file-image-outline image",
|
||||||
|
"text": "mdi mdi-file-document-outline text",
|
||||||
|
"archive": "mdi mdi-folder-zip-outline archive",
|
||||||
|
"binary": "mdi mdi-file-outline",
|
||||||
|
}
|
||||||
|
|
||||||
|
TEXT_EXTENSIONS = {"txt", "md", "csv", "json", "xml", "yml", "yaml", "log"}
|
||||||
|
WORD_EXTENSIONS = {"doc", "docx"}
|
||||||
|
EXCEL_EXTENSIONS = {"xls", "xlsx", "csv"}
|
||||||
|
PPT_EXTENSIONS = {"ppt", "pptx"}
|
||||||
|
IMAGE_EXTENSIONS = {"png", "jpg", "jpeg", "gif", "bmp", "webp", "svg"}
|
||||||
|
ARCHIVE_EXTENSIONS = {"zip", "rar", "7z"}
|
||||||
|
STRUCTURED_PREVIEW_EXTENSIONS = {"docx", "xlsx", "pptx"} | TEXT_EXTENSIONS
|
||||||
|
INLINE_PREVIEW_EXTENSIONS = {"pdf"} | IMAGE_EXTENSIONS
|
||||||
|
|
||||||
|
|
||||||
|
def prepare_knowledge_library() -> None:
|
||||||
|
KnowledgeService().ensure_library_ready()
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgeService:
|
||||||
|
def __init__(self, storage_root: Path | None = None) -> None:
|
||||||
|
settings = get_settings()
|
||||||
|
self.storage_root = Path(storage_root or settings.resolved_storage_root_dir)
|
||||||
|
self.library_root = self.storage_root / "knowledge"
|
||||||
|
self.index_path = self.library_root / ".index.json"
|
||||||
|
|
||||||
|
def ensure_library_ready(self) -> None:
|
||||||
|
self.library_root.mkdir(parents=True, exist_ok=True)
|
||||||
|
for folder_name in FIXED_KNOWLEDGE_FOLDERS:
|
||||||
|
(self.library_root / folder_name).mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
if not self.index_path.exists():
|
||||||
|
self._save_index({"version": 1, "documents": []})
|
||||||
|
|
||||||
|
index = self._load_index()
|
||||||
|
if self._reconcile_index(index):
|
||||||
|
self._save_index(index)
|
||||||
|
|
||||||
|
def list_library(self) -> KnowledgeLibraryRead:
|
||||||
|
documents = self._load_documents()
|
||||||
|
folders = [
|
||||||
|
KnowledgeFolderRead(
|
||||||
|
name=folder_name,
|
||||||
|
count=sum(1 for item in documents if item.folder == folder_name),
|
||||||
|
icon="mdi mdi-folder-open" if folder_name == "差旅规范" else "mdi mdi-folder",
|
||||||
|
)
|
||||||
|
for folder_name in FIXED_KNOWLEDGE_FOLDERS
|
||||||
|
]
|
||||||
|
return KnowledgeLibraryRead(folders=folders, documents=documents)
|
||||||
|
|
||||||
|
def get_document_detail(self, document_id: str) -> KnowledgeDocumentDetailRead:
|
||||||
|
self.ensure_library_ready()
|
||||||
|
index = self._load_index()
|
||||||
|
entry = self._require_entry(index, document_id)
|
||||||
|
preview_kind, preview_pages = self._build_preview(entry)
|
||||||
|
document = self._serialize_document(entry)
|
||||||
|
return KnowledgeDocumentDetailRead(
|
||||||
|
**document.model_dump(),
|
||||||
|
previewKind=preview_kind,
|
||||||
|
previewPages=preview_pages,
|
||||||
|
)
|
||||||
|
|
||||||
|
def upload_document(
|
||||||
|
self,
|
||||||
|
folder: str,
|
||||||
|
filename: str,
|
||||||
|
content: bytes,
|
||||||
|
current_user: CurrentUserContext,
|
||||||
|
) -> KnowledgeDocumentDetailRead:
|
||||||
|
self.ensure_library_ready()
|
||||||
|
normalized_folder = self._normalize_folder(folder)
|
||||||
|
normalized_name = self._normalize_filename(filename)
|
||||||
|
|
||||||
|
if not content:
|
||||||
|
raise ValueError("上传文件不能为空。")
|
||||||
|
|
||||||
|
index = self._load_index()
|
||||||
|
existing_entry = next(
|
||||||
|
(
|
||||||
|
item
|
||||||
|
for item in index["documents"]
|
||||||
|
if item["folder"] == normalized_folder
|
||||||
|
and item["original_name"].lower() == normalized_name.lower()
|
||||||
|
),
|
||||||
|
None,
|
||||||
|
)
|
||||||
|
|
||||||
|
document_id = existing_entry["id"] if existing_entry else uuid4().hex
|
||||||
|
stored_name = f"{document_id}__{normalized_name}"
|
||||||
|
target_path = self.library_root / normalized_folder / stored_name
|
||||||
|
|
||||||
|
if existing_entry is not None and existing_entry["stored_name"] != stored_name:
|
||||||
|
old_path = self.library_root / existing_entry["folder"] / existing_entry["stored_name"]
|
||||||
|
if old_path.exists():
|
||||||
|
old_path.unlink()
|
||||||
|
|
||||||
|
target_path.write_bytes(content)
|
||||||
|
|
||||||
|
now = datetime.now(UTC).isoformat()
|
||||||
|
mime_type = mimetypes.guess_type(normalized_name)[0] or "application/octet-stream"
|
||||||
|
checksum = hashlib.sha256(content).hexdigest()
|
||||||
|
extension = self._extract_extension(normalized_name)
|
||||||
|
|
||||||
|
if existing_entry is None:
|
||||||
|
entry = {
|
||||||
|
"id": document_id,
|
||||||
|
"folder": normalized_folder,
|
||||||
|
"original_name": normalized_name,
|
||||||
|
"stored_name": stored_name,
|
||||||
|
"mime_type": mime_type,
|
||||||
|
"extension": extension,
|
||||||
|
"size_bytes": len(content),
|
||||||
|
"sha256": checksum,
|
||||||
|
"created_at": now,
|
||||||
|
"updated_at": now,
|
||||||
|
"uploaded_by": current_user.name,
|
||||||
|
"version_number": 1,
|
||||||
|
}
|
||||||
|
index["documents"].append(entry)
|
||||||
|
logger.info(
|
||||||
|
"Knowledge document uploaded id=%s folder=%s filename=%s by=%s",
|
||||||
|
document_id,
|
||||||
|
normalized_folder,
|
||||||
|
normalized_name,
|
||||||
|
current_user.name,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
existing_entry.update(
|
||||||
|
{
|
||||||
|
"stored_name": stored_name,
|
||||||
|
"mime_type": mime_type,
|
||||||
|
"extension": extension,
|
||||||
|
"size_bytes": len(content),
|
||||||
|
"sha256": checksum,
|
||||||
|
"updated_at": now,
|
||||||
|
"uploaded_by": current_user.name,
|
||||||
|
"version_number": int(existing_entry.get("version_number", 1)) + 1,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
entry = existing_entry
|
||||||
|
logger.info(
|
||||||
|
"Knowledge document updated id=%s folder=%s filename=%s by=%s",
|
||||||
|
document_id,
|
||||||
|
normalized_folder,
|
||||||
|
normalized_name,
|
||||||
|
current_user.name,
|
||||||
|
)
|
||||||
|
|
||||||
|
self._save_index(index)
|
||||||
|
return self.get_document_detail(document_id)
|
||||||
|
|
||||||
|
def delete_document(self, document_id: str) -> None:
|
||||||
|
self.ensure_library_ready()
|
||||||
|
index = self._load_index()
|
||||||
|
entry = self._require_entry(index, document_id)
|
||||||
|
file_path = self._resolve_document_path(entry)
|
||||||
|
if file_path.exists():
|
||||||
|
file_path.unlink()
|
||||||
|
|
||||||
|
index["documents"] = [item for item in index["documents"] if item["id"] != document_id]
|
||||||
|
self._save_index(index)
|
||||||
|
logger.info("Knowledge document deleted id=%s filename=%s", document_id, entry["original_name"])
|
||||||
|
|
||||||
|
def get_document_content(self, document_id: str) -> tuple[Path, str, str]:
|
||||||
|
self.ensure_library_ready()
|
||||||
|
index = self._load_index()
|
||||||
|
entry = self._require_entry(index, document_id)
|
||||||
|
file_path = self._resolve_document_path(entry)
|
||||||
|
|
||||||
|
if not file_path.exists():
|
||||||
|
raise FileNotFoundError(entry["original_name"])
|
||||||
|
|
||||||
|
return file_path, entry["mime_type"], entry["original_name"]
|
||||||
|
|
||||||
|
def _load_documents(self) -> list[KnowledgeDocumentRead]:
|
||||||
|
self.ensure_library_ready()
|
||||||
|
index = self._load_index()
|
||||||
|
self._reconcile_index(index)
|
||||||
|
self._save_index(index)
|
||||||
|
|
||||||
|
documents = [self._serialize_document(entry) for entry in index["documents"]]
|
||||||
|
return sorted(documents, key=lambda item: item.time, reverse=True)
|
||||||
|
|
||||||
|
def _serialize_document(self, entry: dict[str, Any]) -> KnowledgeDocumentRead:
|
||||||
|
extension = entry.get("extension") or self._extract_extension(entry["original_name"])
|
||||||
|
file_type = self._resolve_file_type(extension)
|
||||||
|
size_bytes = int(entry.get("size_bytes") or 0)
|
||||||
|
updated_at = self._format_time(entry.get("updated_at") or entry.get("created_at"))
|
||||||
|
|
||||||
|
return KnowledgeDocumentRead(
|
||||||
|
id=entry["id"],
|
||||||
|
name=entry["original_name"],
|
||||||
|
folder=entry["folder"],
|
||||||
|
tag=f"{entry['folder']} / {extension.upper() or 'FILE'}",
|
||||||
|
time=updated_at,
|
||||||
|
version=f"v{int(entry.get('version_number', 1))}.0",
|
||||||
|
state="已发布",
|
||||||
|
stateTone="success",
|
||||||
|
owner=entry.get("uploaded_by") or "系统导入",
|
||||||
|
icon=ICON_BY_TYPE.get(file_type, ICON_BY_TYPE["binary"]),
|
||||||
|
fileType=file_type,
|
||||||
|
fileTypeLabel=self._resolve_file_type_label(file_type),
|
||||||
|
summary=f"{entry['folder']} · {extension.upper() or 'FILE'} · {self._format_size(size_bytes)}",
|
||||||
|
mimeType=entry.get("mime_type") or "application/octet-stream",
|
||||||
|
extension=extension,
|
||||||
|
sizeBytes=size_bytes,
|
||||||
|
canPreview=self._can_preview(extension),
|
||||||
|
)
|
||||||
|
|
||||||
|
def _build_preview(
|
||||||
|
self, entry: dict[str, Any]
|
||||||
|
) -> tuple[str, list[KnowledgePreviewPageRead]]:
|
||||||
|
extension = self._extract_extension(entry["original_name"])
|
||||||
|
file_path = self._resolve_document_path(entry)
|
||||||
|
|
||||||
|
if extension == "pdf":
|
||||||
|
return "pdf", []
|
||||||
|
|
||||||
|
if extension in IMAGE_EXTENSIONS:
|
||||||
|
return "image", []
|
||||||
|
|
||||||
|
if extension in TEXT_EXTENSIONS:
|
||||||
|
text = self._read_text_preview(file_path)
|
||||||
|
return "text", [self._build_text_preview_page(entry, text)]
|
||||||
|
|
||||||
|
if extension == "docx":
|
||||||
|
text = self._extract_docx_text(file_path)
|
||||||
|
return "text", [self._build_text_preview_page(entry, text)]
|
||||||
|
|
||||||
|
if extension == "xlsx":
|
||||||
|
return "table", [self._build_xlsx_preview_page(entry, file_path)]
|
||||||
|
|
||||||
|
if extension == "pptx":
|
||||||
|
return "slides", self._build_pptx_preview_pages(entry, file_path)
|
||||||
|
|
||||||
|
return (
|
||||||
|
"unsupported",
|
||||||
|
[
|
||||||
|
KnowledgePreviewPageRead(
|
||||||
|
title=entry["original_name"],
|
||||||
|
subtitle="当前格式暂不支持在线解析预览。",
|
||||||
|
stats=[
|
||||||
|
KnowledgePreviewStatRead(label="文件格式", value=extension.upper() or "FILE"),
|
||||||
|
KnowledgePreviewStatRead(label="文件大小", value=self._format_size(entry["size_bytes"])),
|
||||||
|
KnowledgePreviewStatRead(label="建议操作", value="下载后查看"),
|
||||||
|
],
|
||||||
|
blocks=[
|
||||||
|
KnowledgePreviewBlockRead(
|
||||||
|
heading="预览说明",
|
||||||
|
lines=[
|
||||||
|
"当前系统已支持该文件的上传、下载和权限控制。",
|
||||||
|
"如需在线预览,可后续接入专门的文档转换服务。",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
],
|
||||||
|
)
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
def _build_text_preview_page(
|
||||||
|
self, entry: dict[str, Any], text: str
|
||||||
|
) -> KnowledgePreviewPageRead:
|
||||||
|
lines = [line.strip() for line in text.splitlines() if line.strip()]
|
||||||
|
if not lines:
|
||||||
|
lines = ["文件内容为空,或当前文档未提取到可展示文本。"]
|
||||||
|
|
||||||
|
groups = [lines[index : index + 8] for index in range(0, min(len(lines), 24), 8)]
|
||||||
|
blocks = [
|
||||||
|
KnowledgePreviewBlockRead(heading=f"内容片段 {index + 1}", lines=group)
|
||||||
|
for index, group in enumerate(groups)
|
||||||
|
]
|
||||||
|
|
||||||
|
return KnowledgePreviewPageRead(
|
||||||
|
title=entry["original_name"],
|
||||||
|
subtitle="文本提取预览",
|
||||||
|
stats=[
|
||||||
|
KnowledgePreviewStatRead(label="文件格式", value=entry["extension"].upper() or "TEXT"),
|
||||||
|
KnowledgePreviewStatRead(label="可见行数", value=str(len(lines))),
|
||||||
|
KnowledgePreviewStatRead(label="文件大小", value=self._format_size(entry["size_bytes"])),
|
||||||
|
],
|
||||||
|
blocks=blocks,
|
||||||
|
)
|
||||||
|
|
||||||
|
def _build_xlsx_preview_page(
|
||||||
|
self, entry: dict[str, Any], file_path: Path
|
||||||
|
) -> KnowledgePreviewPageRead:
|
||||||
|
rows, sheet_count = self._extract_xlsx_rows(file_path)
|
||||||
|
if not rows:
|
||||||
|
rows = [["未提取到表格内容。"]]
|
||||||
|
|
||||||
|
blocks = [
|
||||||
|
KnowledgePreviewBlockRead(
|
||||||
|
heading=f"第 {index + 1} 行",
|
||||||
|
lines=[" | ".join(cell for cell in row if cell) or "(空行)"],
|
||||||
|
)
|
||||||
|
for index, row in enumerate(rows[:12])
|
||||||
|
]
|
||||||
|
|
||||||
|
return KnowledgePreviewPageRead(
|
||||||
|
title=entry["original_name"],
|
||||||
|
subtitle="表格内容预览",
|
||||||
|
stats=[
|
||||||
|
KnowledgePreviewStatRead(label="工作表数量", value=str(sheet_count)),
|
||||||
|
KnowledgePreviewStatRead(label="预览行数", value=str(min(len(rows), 12))),
|
||||||
|
KnowledgePreviewStatRead(label="文件大小", value=self._format_size(entry["size_bytes"])),
|
||||||
|
],
|
||||||
|
blocks=blocks,
|
||||||
|
)
|
||||||
|
|
||||||
|
def _build_pptx_preview_pages(
|
||||||
|
self, entry: dict[str, Any], file_path: Path
|
||||||
|
) -> list[KnowledgePreviewPageRead]:
|
||||||
|
slides = self._extract_pptx_slides(file_path)
|
||||||
|
if not slides:
|
||||||
|
slides = [["未提取到幻灯片文本。"]]
|
||||||
|
|
||||||
|
pages: list[KnowledgePreviewPageRead] = []
|
||||||
|
for index, slide_lines in enumerate(slides[:8]):
|
||||||
|
pages.append(
|
||||||
|
KnowledgePreviewPageRead(
|
||||||
|
title=entry["original_name"],
|
||||||
|
subtitle=f"幻灯片 {index + 1}",
|
||||||
|
stats=[
|
||||||
|
KnowledgePreviewStatRead(label="页码", value=str(index + 1)),
|
||||||
|
KnowledgePreviewStatRead(label="文本条数", value=str(len(slide_lines))),
|
||||||
|
KnowledgePreviewStatRead(label="文件格式", value="PPTX"),
|
||||||
|
],
|
||||||
|
blocks=[
|
||||||
|
KnowledgePreviewBlockRead(
|
||||||
|
heading="幻灯片内容",
|
||||||
|
lines=slide_lines or ["该页未提取到文本内容。"],
|
||||||
|
)
|
||||||
|
],
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
return pages
|
||||||
|
|
||||||
|
def _load_index(self) -> dict[str, Any]:
|
||||||
|
try:
|
||||||
|
payload = json.loads(self.index_path.read_text(encoding="utf-8"))
|
||||||
|
except (FileNotFoundError, json.JSONDecodeError):
|
||||||
|
payload = {"version": 1, "documents": []}
|
||||||
|
payload.setdefault("documents", [])
|
||||||
|
return payload
|
||||||
|
|
||||||
|
def _save_index(self, index: dict[str, Any]) -> None:
|
||||||
|
self.index_path.write_text(
|
||||||
|
json.dumps(index, ensure_ascii=False, indent=2),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
def _reconcile_index(self, index: dict[str, Any]) -> bool:
|
||||||
|
changed = False
|
||||||
|
documents = index.setdefault("documents", [])
|
||||||
|
known_by_stored = {
|
||||||
|
(item["folder"], item["stored_name"]): item
|
||||||
|
for item in documents
|
||||||
|
if item.get("folder") and item.get("stored_name")
|
||||||
|
}
|
||||||
|
|
||||||
|
existing_items: list[dict[str, Any]] = []
|
||||||
|
for item in documents:
|
||||||
|
file_path = self._resolve_document_path(item)
|
||||||
|
if file_path.exists():
|
||||||
|
item["size_bytes"] = file_path.stat().st_size
|
||||||
|
item["extension"] = self._extract_extension(item["original_name"])
|
||||||
|
item["mime_type"] = item.get("mime_type") or (
|
||||||
|
mimetypes.guess_type(item["original_name"])[0] or "application/octet-stream"
|
||||||
|
)
|
||||||
|
existing_items.append(item)
|
||||||
|
else:
|
||||||
|
changed = True
|
||||||
|
|
||||||
|
for folder_name in FIXED_KNOWLEDGE_FOLDERS:
|
||||||
|
folder_path = self.library_root / folder_name
|
||||||
|
for file_path in folder_path.iterdir():
|
||||||
|
if not file_path.is_file() or file_path.name.startswith("."):
|
||||||
|
continue
|
||||||
|
|
||||||
|
key = (folder_name, file_path.name)
|
||||||
|
if key in known_by_stored:
|
||||||
|
continue
|
||||||
|
|
||||||
|
document_id, original_name = self._parse_stored_name(file_path.name)
|
||||||
|
stat = file_path.stat()
|
||||||
|
existing_items.append(
|
||||||
|
{
|
||||||
|
"id": document_id,
|
||||||
|
"folder": folder_name,
|
||||||
|
"original_name": original_name,
|
||||||
|
"stored_name": file_path.name,
|
||||||
|
"mime_type": mimetypes.guess_type(original_name)[0]
|
||||||
|
or "application/octet-stream",
|
||||||
|
"extension": self._extract_extension(original_name),
|
||||||
|
"size_bytes": stat.st_size,
|
||||||
|
"sha256": "",
|
||||||
|
"created_at": datetime.fromtimestamp(stat.st_ctime, tz=UTC).isoformat(),
|
||||||
|
"updated_at": datetime.fromtimestamp(stat.st_mtime, tz=UTC).isoformat(),
|
||||||
|
"uploaded_by": "系统导入",
|
||||||
|
"version_number": 1,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
changed = True
|
||||||
|
|
||||||
|
if changed or len(existing_items) != len(documents):
|
||||||
|
index["documents"] = existing_items
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _require_entry(self, index: dict[str, Any], document_id: str) -> dict[str, Any]:
|
||||||
|
for entry in index["documents"]:
|
||||||
|
if entry["id"] == document_id:
|
||||||
|
return entry
|
||||||
|
raise FileNotFoundError(document_id)
|
||||||
|
|
||||||
|
def _resolve_document_path(self, entry: dict[str, Any]) -> Path:
|
||||||
|
return self.library_root / entry["folder"] / entry["stored_name"]
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _normalize_filename(filename: str) -> str:
|
||||||
|
normalized = Path(str(filename or "").strip()).name.strip()
|
||||||
|
normalized = normalized.replace("/", "_").replace("\\", "_")
|
||||||
|
if not normalized:
|
||||||
|
raise ValueError("文件名不能为空。")
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _normalize_folder(folder: str) -> str:
|
||||||
|
normalized = str(folder or "").strip()
|
||||||
|
if normalized not in FIXED_KNOWLEDGE_FOLDERS:
|
||||||
|
raise ValueError("只能上传到预设知识库文件夹。")
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _extract_extension(filename: str) -> str:
|
||||||
|
suffix = Path(filename).suffix.lower().lstrip(".")
|
||||||
|
return suffix
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _parse_stored_name(stored_name: str) -> tuple[str, str]:
|
||||||
|
if "__" not in stored_name:
|
||||||
|
return uuid4().hex, stored_name
|
||||||
|
document_id, original_name = stored_name.split("__", 1)
|
||||||
|
return document_id or uuid4().hex, original_name or stored_name
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _format_time(value: str | None) -> str:
|
||||||
|
if not value:
|
||||||
|
return ""
|
||||||
|
try:
|
||||||
|
parsed = datetime.fromisoformat(value)
|
||||||
|
except ValueError:
|
||||||
|
return value
|
||||||
|
return parsed.astimezone(UTC).strftime("%Y-%m-%d %H:%M")
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _format_size(size_bytes: int) -> str:
|
||||||
|
if size_bytes < 1024:
|
||||||
|
return f"{size_bytes} B"
|
||||||
|
if size_bytes < 1024 * 1024:
|
||||||
|
return f"{size_bytes / 1024:.1f} KB"
|
||||||
|
return f"{size_bytes / (1024 * 1024):.1f} MB"
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _resolve_file_type(extension: str) -> str:
|
||||||
|
if extension == "pdf":
|
||||||
|
return "pdf"
|
||||||
|
if extension in WORD_EXTENSIONS:
|
||||||
|
return "word"
|
||||||
|
if extension in EXCEL_EXTENSIONS:
|
||||||
|
return "excel"
|
||||||
|
if extension in PPT_EXTENSIONS:
|
||||||
|
return "ppt"
|
||||||
|
if extension in IMAGE_EXTENSIONS:
|
||||||
|
return "image"
|
||||||
|
if extension in TEXT_EXTENSIONS:
|
||||||
|
return "text"
|
||||||
|
if extension in ARCHIVE_EXTENSIONS:
|
||||||
|
return "archive"
|
||||||
|
return "binary"
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _resolve_file_type_label(file_type: str) -> str:
|
||||||
|
mapping = {
|
||||||
|
"pdf": "PDF 预览",
|
||||||
|
"word": "Word 预览",
|
||||||
|
"excel": "Excel 预览",
|
||||||
|
"ppt": "PPT 预览",
|
||||||
|
"image": "图片预览",
|
||||||
|
"text": "文本预览",
|
||||||
|
"archive": "压缩包",
|
||||||
|
"binary": "文件预览",
|
||||||
|
}
|
||||||
|
return mapping.get(file_type, "文件预览")
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _can_preview(extension: str) -> bool:
|
||||||
|
return extension in INLINE_PREVIEW_EXTENSIONS or extension in STRUCTURED_PREVIEW_EXTENSIONS
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _read_text_preview(file_path: Path) -> str:
|
||||||
|
encodings = ("utf-8", "utf-8-sig", "gbk")
|
||||||
|
for encoding in encodings:
|
||||||
|
try:
|
||||||
|
return file_path.read_text(encoding=encoding)
|
||||||
|
except UnicodeDecodeError:
|
||||||
|
continue
|
||||||
|
return "当前文本文件编码暂不支持在线解析。"
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _extract_docx_text(file_path: Path) -> str:
|
||||||
|
try:
|
||||||
|
with ZipFile(file_path) as archive:
|
||||||
|
xml_content = archive.read("word/document.xml")
|
||||||
|
except (BadZipFile, KeyError):
|
||||||
|
return "当前 Word 文件解析失败。"
|
||||||
|
|
||||||
|
root = ElementTree.fromstring(xml_content)
|
||||||
|
texts = [node.text.strip() for node in root.iter() if node.tag.endswith("}t") and node.text]
|
||||||
|
return "\n".join(texts)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _extract_xlsx_rows(file_path: Path) -> tuple[list[list[str]], int]:
|
||||||
|
try:
|
||||||
|
with ZipFile(file_path) as archive:
|
||||||
|
shared_strings: list[str] = []
|
||||||
|
if "xl/sharedStrings.xml" in archive.namelist():
|
||||||
|
shared_root = ElementTree.fromstring(archive.read("xl/sharedStrings.xml"))
|
||||||
|
shared_strings = [
|
||||||
|
"".join(node.itertext()).strip()
|
||||||
|
for node in shared_root.iter()
|
||||||
|
if node.tag.endswith("}si")
|
||||||
|
]
|
||||||
|
|
||||||
|
sheet_names = sorted(
|
||||||
|
name
|
||||||
|
for name in archive.namelist()
|
||||||
|
if re.fullmatch(r"xl/worksheets/sheet\d+\.xml", name)
|
||||||
|
)
|
||||||
|
if not sheet_names:
|
||||||
|
return [], 0
|
||||||
|
|
||||||
|
first_sheet = ElementTree.fromstring(archive.read(sheet_names[0]))
|
||||||
|
rows: list[list[str]] = []
|
||||||
|
for row in first_sheet.iter():
|
||||||
|
if not row.tag.endswith("}row"):
|
||||||
|
continue
|
||||||
|
row_values: list[str] = []
|
||||||
|
for cell in row:
|
||||||
|
if not cell.tag.endswith("}c"):
|
||||||
|
continue
|
||||||
|
cell_type = cell.attrib.get("t")
|
||||||
|
value_node = next((item for item in cell if item.tag.endswith("}v")), None)
|
||||||
|
if value_node is None or value_node.text is None:
|
||||||
|
row_values.append("")
|
||||||
|
continue
|
||||||
|
raw_value = value_node.text.strip()
|
||||||
|
if cell_type == "s" and raw_value.isdigit():
|
||||||
|
index = int(raw_value)
|
||||||
|
row_values.append(shared_strings[index] if index < len(shared_strings) else raw_value)
|
||||||
|
else:
|
||||||
|
row_values.append(raw_value)
|
||||||
|
if row_values:
|
||||||
|
rows.append(row_values)
|
||||||
|
|
||||||
|
return rows, len(sheet_names)
|
||||||
|
except (BadZipFile, ElementTree.ParseError, KeyError, ValueError):
|
||||||
|
return [], 0
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _extract_pptx_slides(file_path: Path) -> list[list[str]]:
|
||||||
|
try:
|
||||||
|
with ZipFile(file_path) as archive:
|
||||||
|
slide_names = sorted(
|
||||||
|
name
|
||||||
|
for name in archive.namelist()
|
||||||
|
if re.fullmatch(r"ppt/slides/slide\d+\.xml", name)
|
||||||
|
)
|
||||||
|
slides: list[list[str]] = []
|
||||||
|
for slide_name in slide_names:
|
||||||
|
root = ElementTree.fromstring(archive.read(slide_name))
|
||||||
|
texts = [node.text.strip() for node in root.iter() if node.tag.endswith("}t") and node.text]
|
||||||
|
slides.append(texts)
|
||||||
|
return slides
|
||||||
|
except (BadZipFile, ElementTree.ParseError, KeyError):
|
||||||
|
return []
|
||||||
4
server/storage/knowledge/.index.json
Normal file
4
server/storage/knowledge/.index.json
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
{
|
||||||
|
"version": 1,
|
||||||
|
"documents": []
|
||||||
|
}
|
||||||
33
web/src/services/knowledge.js
Normal file
33
web/src/services/knowledge.js
Normal file
@@ -0,0 +1,33 @@
|
|||||||
|
import { apiRequest } from './api.js'
|
||||||
|
|
||||||
|
export function fetchKnowledgeLibrary() {
|
||||||
|
return apiRequest('/knowledge/library')
|
||||||
|
}
|
||||||
|
|
||||||
|
export function fetchKnowledgeDocument(documentId) {
|
||||||
|
return apiRequest(`/knowledge/documents/${documentId}`)
|
||||||
|
}
|
||||||
|
|
||||||
|
export function uploadKnowledgeDocument({ folder, file }) {
|
||||||
|
return apiRequest(
|
||||||
|
`/knowledge/documents?folder=${encodeURIComponent(folder)}&filename=${encodeURIComponent(file.name)}`,
|
||||||
|
{
|
||||||
|
method: 'POST',
|
||||||
|
body: file,
|
||||||
|
contentType: file.type || 'application/octet-stream'
|
||||||
|
}
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
export function deleteKnowledgeDocument(documentId) {
|
||||||
|
return apiRequest(`/knowledge/documents/${documentId}`, {
|
||||||
|
method: 'DELETE'
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
export function fetchKnowledgeDocumentBlob(documentId, disposition = 'inline') {
|
||||||
|
return apiRequest(`/knowledge/documents/${documentId}/content?disposition=${disposition}`, {
|
||||||
|
responseType: 'blob',
|
||||||
|
contentType: null
|
||||||
|
})
|
||||||
|
}
|
||||||
@@ -10,11 +10,10 @@ export const DEFAULT_APP_VIEW_ORDER = [
|
|||||||
'settings'
|
'settings'
|
||||||
]
|
]
|
||||||
|
|
||||||
const ALWAYS_VISIBLE_VIEWS = new Set(['workbench', 'requests', 'chat'])
|
const ALWAYS_VISIBLE_VIEWS = new Set(['workbench', 'requests', 'chat', 'policies'])
|
||||||
const VIEW_ROLE_RULES = {
|
const VIEW_ROLE_RULES = {
|
||||||
overview: ['finance', 'executive'],
|
overview: ['finance', 'executive'],
|
||||||
approval: ['approver'],
|
approval: ['approver'],
|
||||||
policies: ['manager'],
|
|
||||||
audit: ['auditor'],
|
audit: ['auditor'],
|
||||||
employees: ['manager'],
|
employees: ['manager'],
|
||||||
settings: ['manager']
|
settings: ['manager']
|
||||||
|
|||||||
Reference in New Issue
Block a user