feat: 完善系统配置、安全增强与知识库功能

- .env.example: API基础路径改为相对路径 /api/v1,支持代理转发
- README.md: 完善项目结构与启动说明文档
- docker-compose.yml: 新增Docker编排配置,支持容器化部署
- docker/: 新增Docker部署相关文档与配置

- server_start.sh: 重构启动脚本,添加容器环境检测、隔离虚拟环境路径、环境变量覆盖机制
- deps.py: 完善API依赖注入,增强权限验证逻辑
- admin_secret.py: 优化管理员密钥加密存储与验证
- config.py: 扩展配置管理,支持多环境变量绑定
- security.py: 增强安全模块,完善加密与认证机制
- db/base.py: 优化数据库基础架构与连接管理
- main.py: 更新应用入口,整合新模块路由
- models/: 完善系统模型配置,支持模型设置持久化
- repositories/settings.py: 优化设置仓储层,增强数据持久化
- services/settings.py: 重构设置服务,精简代码结构
- router.py: 更新API路由配置

- endpoints/knowledge.py: 新增知识库API端点
- schemas/knowledge.py: 新增知识库数据模型
- services/knowledge.py: 新增知识库业务逻辑
- storage/knowledge/.index.json: 知识库索引存储

- api.js: 完善API服务层,增强错误处理
- bootstrap.js: 优化前端初始化与引导流程
- useSetupView.js / useSystemState.js: 重构组合式函数
- TopBar.vue: 优化顶部导航栏组件
- SettingsView.vue: 重构设置页面UI,增强用户体验
- SetupView.vue / SetupRouteView.vue: 完善引导流程页面
- PoliciesView.vue: 优化策略视图组件
- vite.config.js: 更新Vite构建配置
- web_start.sh: 完善前端启动脚本
- views/scripts/: 优化各业务视图JS逻辑

- settings-view.css: 重构设置页面样式
- setup-view.css: 完善引导页样式
- policies-view.css: 优化策略页样式

- test_auth_service.py: 完善认证服务测试
- test_settings_persistence.py: 增强设置持久化测试
- document/: 新增开发文档与工作日志
This commit is contained in:
caoxiaozhu
2026-05-09 03:04:09 +00:00
parent c2315f68dc
commit 619281afc3
43 changed files with 9337 additions and 8300 deletions

View File

@@ -0,0 +1,169 @@
# X-Financial 智能化财务系统:双层 Agent 架构设计与开发落地全景指南
> **核心设计理念:确定性与概率性的完美解耦**
>
> 在企业级财务系统中“合规性”与“准确性”是不可妥协的底线。大语言模型LLM天生具有概率性会产生幻觉因此不能直接赋予其修改核心财务数据或放行审批的最高权限。
>
> 本架构设计的核心,在于构建一个**“双层防线”**
> 1. **外层 Agent (自研流程大脑)**:提供 100% 的确定性。它是系统的执行者,严格按照预设流程和固化的规则行事,不具备“自我意识”,只负责“路由”、“拦截”和“记录”。
> 2. **内层 Agent (Hermes 智囊核心)**:提供强大的概率性推理能力。它是系统的思考者,负责处理所有复杂、模糊、非结构化的任务(如阅读长文档、识别潜在风险),但它的输出**不能直接作用于业务**,而是转化为**规则配置**或**建议意见**,交由外层 Agent 或人类管理员执行。
>
> 这两层架构不是相互独立的两个系统,而是形成一个**“闭环”**:内层提炼规则,外层执行规则;外层收集数据,内层分析数据。这种深度协同,既保障了系统的安全性,又赋予了系统极高的智能化水平。
---
## 一、 系统架构图景与职责边界深度剖析
### 1. 外层 Agent (Outer Agent):流程与路由的绝对掌控者
**本质:一个高度可配置的业务工作流引擎与意图分发器。**
* **开发技术栈建议**FastAPI (后端) + Vue3 (前端) + PostgreSQL (持久化) + Redis (可选,用于状态缓存)。
* **交互形态**:它直接面对用户。它可以是一个类似对话框的界面,但背后的逻辑是基于**状态机 (State Machine)** 驱动的。
* **核心模块与职责 (What to do & How to do)**
* **模块 1: 意图漏斗 (Intent Router)**
* **职责**:精准捕捉用户请求的第一诉求,并将其导向正确的处理管线。
* **方法**
* *规则匹配优先*:使用简单的关键词或正则(例如:匹配到“报销”、“打车”字眼,直接激活报销向导)。
* *轻量级分类模型兜底*:对于模糊表述(如:“我上周去上海开会的钱怎么还没发?”),调用一个小参数的分类模型(或内层的快捷接口),将其分类为“状态查询”意图,并提取关键实体(如时间:上周,地点:上海)。
* **模块 2: 结构化状态机引擎 (State & Flow Controller)**
* **职责**:管理每一个业务对象(如一张报销单)的生命周期。从“草稿” -> “提交” -> “一级审批” -> “财务复核” -> “已打款”。
* **方法**:拒绝让大模型控制流程走向。流程流转必须基于代码逻辑中的条件判断(例如:如果金额 < 500且员工级别为 M1则跳过一级审批直接进入财务复核。外层 Agent 负责维护并推进这个状态。
* **模块 3: 确定性规则执行器 (Rule Execution Engine)**
* **职责**:财务合规的第一道硬性防线。不讲道理,只看数据。
* **方法**:当用户提交报销数据时,该模块会查询本地的 `business_rules` 数据库表。如果用户提交的住宿费是 850而数据库规则明确上限是 800则立刻抛出“阻断型错误” (Blocking Error)。**此过程绝对禁止调用大模型进行实时推断。**
* **模块 4: 标准化 API 网关 (API Gateway & Handshake Layer)**
* **职责**:封装所有对外层系统(如 ERP、HR 系统)和对内层 Hermes 的通信接口。控制并发,记录调用日志。
### 2. 内层 Agent (Hermes):非结构化信息的提炼者与深度思考者
**本质:一个被严格隔离的智能计算引擎,专门处理人类擅长但传统代码难以处理的“软逻辑”。**
* **开发技术栈建议**Hermes 框架 + 向量数据库 (如 Milvus/PGVector) + 强力 LLM (如 GPT-4 或开源大模型)。
* **交互形态**:对用户不可见,只作为外层 Agent 的“后端服务”存在。
* **核心模块与职责 (What to do & How to do)**
* **模块 1: 政策蒸馏器 (Policy Distiller) —— 解决“知行合一”的关键**
* **职责**:打破知识库(死文件)与业务流(活代码)之间的壁垒。
* **方法 (核心思路)**
1. *触发*:管理员上传一份《差旅新规.pdf》。
2. *解析*Hermes 逐段阅读文档。
3. *提取*:使用精心设计的 **Few-Shot Prompt 链**,强制模型识别特定的“控制变量”。
*(Prompt 示例: "你是一个专业的财务合规审计员。请阅读以下段落,如果包含任何关于费用上限、职级限制、审批层级的规定,请严格按照以下 JSON Schema 输出:{category, location, level_req, max_amount, is_hard_limit}。如果未找到,输出空。")*
4. *回写*Hermes 将提炼出的 JSON 结构转化为标准的 SQL Update 指令(或通过专用 API 接口),更新外层 Agent 依赖的 `business_rules` 表。
* **模块 2: 深度知识检索 (Deep RAG & Interpretation)**
* **职责**:为用户提供复杂制度的个性化解读。
* **方法**:当外层 Agent 无法解答用户的合规疑问时(意图识别为“政策咨询”),外层将请求转发给 Hermes。Hermes 在向量库中检索相关段落,并结合用户当前的上下文(如:员工职级、出差地),生成一份连贯、人性化的解答。
* **模块 3: 异步风险探针 (Asynchronous Risk Auditor)**
* **职责**:像“老会计”一样,在海量已发生或正在发生的业务数据中寻找蛛丝马迹。
* **方法**
1. *定时任务*:每天凌晨启动。
2. *数据聚合*:从外层数据库提取当天的报销流水(去除敏感个资)。
3. *模式识别*:通过特定的 Prompt例如寻找“拆单报销”、“异常高频的出租车票”
4. *生成报告*:生成结构化的风险预警报告,存入专用表,供管理员次日早晨审核,而不是直接去冻结员工账号。
---
## 二、 核心通信协议 (The Handshake):两层的握手与数据交互
双层架构的成败,取决于这两层能否顺畅地交换信息,且保证安全。我们需要定义清晰的接口协议。
### 1. 同步查询接口 (外 -> 内:求知与解惑)
当外层遇到处理不了的“软逻辑”时触发。
* **Endpoint (示例)**: `POST /hermes/api/v1/consult`
* **外层 Request 结构**:
```json
{
"context": {
"user_id": "emp_1001",
"current_task": "travel_reimbursement",
"form_data": {"city": "北京", "amount": 900}
},
"query": "因为展会原因酒店全满只能订900的能报销吗"
}
```
* **内层 Hermes Response 结构**:
```json
{
"status": "success",
"interpretation": "根据《差旅管理办法》第15条展会期间允许上浮 20%。您的标准是800上浮后为960可以报销。",
"action_recommendation": "require_special_approval", // 建议外层采取的动作
"citations": ["policy_doc_v2_page_4"]
}
```
### 2. 异步任务接口 (外 -> 内:派发耗时任务)
例如请求生成长篇分析报告或进行全量风险巡检。
* **流程**
1. 外层调用 `POST /hermes/api/v1/jobs/generate_report`。
2. 内层 Hermes 立即返回 `202 Accepted` 和一个 `job_id`。
3. 内层 Hermes 在后台慢慢算。
4. 计算完成后,内层通过 Webhook 回调外层的通知接口,外层再通过系统消息通知用户“您的报告已就绪”。
### 3. 规则推送机制 (内 -> 外:自动化立法)
这是最核心的逆向通信。内层提炼出的规则如何生效?
* **流程**
1. Hermes 提炼出新规则。
2. Hermes 调用外层的特权 API (如 `POST /admin/api/rules/sync`),推送规则 payload。
3. 外层 Agent 收到后,执行数据库 `UPSERT` 操作更新 `business_rules` 表。
4. *(可选但强烈建议)*:进入“待激活”状态,需要人类管理员在系统中点击“确认应用新规则”后,新规才正式生效。
---
## 三、 分阶段开发落地全景计划 (Implementation Roadmap)
开发应当遵循“先基建后上层、先确定后智能”的原则。
### Phase 1: 骨架搭建与基石铺设 (Foundation & Outer Shell)
*目标:构建一个哪怕没有 AI 也能运转的硬核流程系统,确立两层隔离。*
1. **架构拆分验证**:在服务器层面,确保 Outer Agent (FastAPI) 和 Inner Hermes 分别在独立的进程(或容器)中运行,仅通过 HTTP/gRPC 通信。
2. **动态规则引擎实现 (核心基建)**
* 在 PostgreSQL 中设计 `business_rules` 表结构。必须支持高度扩展性(例如采用 `JSONB` 字段存储具体约束参数)。
* 在外层 Agent 开发一个“规则校验服务 (Rule Validation Service)”,该服务能够在任何报销动作发生前,拦截并比对 `business_rules`。
3. **标准化流程闭环**:开发一个完整的、基于硬规则驱动的差旅报销单据流转全流程(填单 -> 校验 -> 提交 -> 审批)。验证在“硬规则”下系统运转良好。
### Phase 2: 知识注入与基础问答 (Hermes RAG Integration)
*目标:赋予系统“解答疑问”的能力。*
1. **内层基建**:配置 Hermes 环境,接入向量数据库。
2. **文档清洗管道 (ETL pipeline)**:将现有的财务政策 PDF/Word 文档清洗、分块 (Chunking) 并向量化入库。
3. **问答桥接**
* 在外层前端 (Vue3) 提供一个“智能咨询”悬浮窗或独立页面。
* 外层 Agent 接收问题,附带上用户的上下文(角色、权限),一并转发给内层 Hermes。
* 验证 Hermes 能够根据向量库的内容,给出带出处的准确回答。
### Phase 3: 核心攻坚 —— 自动立法与双层联通 (Policy Distillation & Sync)
*目标:实现从“死文档”到“活规则”的自动化转化。*
1. **蒸馏 Prompt 工程**:在 Hermes 中反复打磨“政策提炼”的 Prompt。针对你们公司常见的政策描述方式进行微调。
2. **结构化提取测试**:手动上传不同版本的政策文档,测试 Hermes 能否稳定、准确地输出 JSON 格式的规则参数。
3. **闭环联调**
* 开发 Hermes 向外层推送规则的 API。
* 完成全链路测试:管理员界面上传新文档 -> Hermes 后台解析 -> 外层规则库自动更新 -> 前端即时生效新的金额限制。
### Phase 4: 高阶进化 —— 异步审计与主动防御 (Proactive Risk Auditing)
*目标:将系统从“被动响应”升级为“主动防护”。*
1. **数据安全隧道**:建立从外层业务库向内层 Hermes 传递“脱敏业务快照”的通道。
2. **风险模式定义**:梳理出 3-5 种典型的财务风险模式(如:异常聚集的餐饮发票、连续的单日高额交通费)。
3. **Hermes 巡检任务**:编写定时任务逻辑,利用大模型的推理能力去比对这些模式和当天的业务快照数据。
4. **风险看板**:在外层系统的管理后台开发“风险报告台”,展示 Hermes 生成的预警结果。
---
## 四、 关键风险与防范策略总结
1. **大模型幻觉污染规则库**
* **防范**Hermes 提炼的所有硬性规则(尤其是金额、审批级数),在写入外层正式库之前,必须增加一个**“人工审核 (Human-in-the-loop)”** 环节。系统提示“检测到政策更新,提炼出 5 条新规则,请管理员确认应用”。
2. **状态机混乱**
* **防范**:外层 Agent 的流程控制代码必须使用强类型和严格的事务控制 (Transaction)。绝不允许任何组件(包括 AI在不经过状态机合法校验的情况下直接修改数据库中的 `status` 字段。
3. **性能瓶颈**
* **防范**:所有外层必须做的事情(拦截、查询)必须在毫秒级完成。所有涉及调用 Hermes 的操作(问答、提炼、分析)全部采用异步设计或提供明确的 Loading 反馈。

View File

@@ -1,5 +1,8 @@
from collections.abc import Generator
from dataclasses import dataclass
from typing import Annotated
from fastapi import Depends, Header, HTTPException, status
from sqlalchemy.orm import Session
from app.db.session import get_session_factory
@@ -11,3 +14,49 @@ def get_db() -> Generator[Session, None, None]:
yield db
finally:
db.close()
@dataclass(slots=True)
class CurrentUserContext:
username: str
name: str
role_codes: list[str]
is_admin: bool
def get_current_user(
x_auth_username: Annotated[str | None, Header()] = None,
x_auth_name: Annotated[str | None, Header()] = None,
x_auth_role_codes: Annotated[str | None, Header()] = None,
x_auth_is_admin: Annotated[str | None, Header()] = None,
) -> CurrentUserContext:
role_codes = [item.strip() for item in (x_auth_role_codes or "").split(",") if item.strip()]
is_admin = str(x_auth_is_admin or "").strip().lower() in {"1", "true", "yes", "on"}
username = (x_auth_username or "").strip()
name = (x_auth_name or username).strip()
if not username and not name:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="请先登录后再访问知识库。",
)
return CurrentUserContext(
username=username or name,
name=name or username,
role_codes=role_codes,
is_admin=is_admin,
)
def require_admin_user(
current_user: Annotated[CurrentUserContext, Depends(get_current_user)],
) -> CurrentUserContext:
if current_user.is_admin or "manager" in current_user.role_codes:
return current_user
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="只有管理员可以上传、删除或修改知识库文件。",
)

View File

@@ -0,0 +1,76 @@
from __future__ import annotations
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
from fastapi.responses import FileResponse
from app.api.deps import CurrentUserContext, get_current_user, require_admin_user
from app.schemas.knowledge import (
KnowledgeActionResponse,
KnowledgeDocumentDetailRead,
KnowledgeLibraryRead,
)
from app.services.knowledge import KnowledgeService
router = APIRouter(prefix="/knowledge")
@router.get("/library", response_model=KnowledgeLibraryRead)
def get_knowledge_library(
_: Annotated[CurrentUserContext, Depends(get_current_user)],
) -> KnowledgeLibraryRead:
return KnowledgeService().list_library()
@router.get("/documents/{document_id}", response_model=KnowledgeDocumentDetailRead)
def get_knowledge_document(
document_id: str,
_: Annotated[CurrentUserContext, Depends(get_current_user)],
) -> KnowledgeDocumentDetailRead:
try:
return KnowledgeService().get_document_detail(document_id)
except FileNotFoundError as exc:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="知识库文件不存在。") from exc
@router.post("/documents", response_model=KnowledgeDocumentDetailRead, status_code=status.HTTP_201_CREATED)
async def upload_knowledge_document(
request: Request,
folder: Annotated[str, Query(min_length=1)],
filename: Annotated[str, Query(min_length=1)],
current_user: Annotated[CurrentUserContext, Depends(require_admin_user)],
) -> KnowledgeDocumentDetailRead:
content = await request.body()
try:
return KnowledgeService().upload_document(folder, filename, content, current_user)
except ValueError as exc:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
@router.delete("/documents/{document_id}", response_model=KnowledgeActionResponse)
def delete_knowledge_document(
document_id: str,
_: Annotated[CurrentUserContext, Depends(require_admin_user)],
) -> KnowledgeActionResponse:
try:
KnowledgeService().delete_document(document_id)
except FileNotFoundError as exc:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="知识库文件不存在。") from exc
return KnowledgeActionResponse(detail="知识库文件已删除。")
@router.get("/documents/{document_id}/content")
def get_knowledge_document_content(
document_id: str,
disposition: Annotated[str, Query(pattern="^(inline|attachment)$")] = "inline",
_: Annotated[CurrentUserContext, Depends(get_current_user)] = None,
) -> FileResponse:
try:
file_path, media_type, filename = KnowledgeService().get_document_content(document_id)
except FileNotFoundError as exc:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="知识库文件不存在。") from exc
_ = disposition
return FileResponse(file_path, media_type=media_type, filename=filename)

View File

@@ -4,6 +4,7 @@ from app.api.v1.endpoints.auth import router as auth_router
from app.api.v1.endpoints.bootstrap import router as bootstrap_router
from app.api.v1.endpoints.employees import router as employees_router
from app.api.v1.endpoints.health import router as health_router
from app.api.v1.endpoints.knowledge import router as knowledge_router
from app.api.v1.endpoints.reimbursements import router as reimbursements_router
from app.api.v1.endpoints.settings import router as settings_router
@@ -11,6 +12,7 @@ router = APIRouter()
router.include_router(health_router, tags=["health"])
router.include_router(bootstrap_router, tags=["bootstrap"])
router.include_router(auth_router, tags=["auth"])
router.include_router(knowledge_router, tags=["knowledge"])
router.include_router(employees_router, prefix="/employees", tags=["employees"])
router.include_router(reimbursements_router, prefix="/reimbursements", tags=["reimbursements"])
router.include_router(settings_router, tags=["settings"])

View File

@@ -51,6 +51,7 @@ class Settings(BaseSettings):
log_level: str = Field(default="INFO", alias="LOG_LEVEL")
log_dir: str = Field(default="logs", alias="LOG_DIR")
log_file_enabled: bool = Field(default=True, alias="LOG_FILE_ENABLED")
storage_root_dir: str = Field(default="storage", alias="STORAGE_ROOT_DIR")
@property
def resolved_database_url(self) -> str:
@@ -62,6 +63,13 @@ class Settings(BaseSettings):
f"@{self.postgres_host}:{self.postgres_port}/{self.postgres_db}"
)
@property
def resolved_storage_root_dir(self) -> Path:
path = Path(self.storage_root_dir)
if not path.is_absolute():
path = SERVER_DIR / path
return path.resolve()
@lru_cache
def get_settings() -> Settings:

View File

@@ -8,6 +8,7 @@ from app.core.config import get_settings
from app.core.logging import get_logger, setup_logging
from app.middleware.logging import AccessLogMiddleware
from app.services.employee import prepare_employee_directory
from app.services.knowledge import prepare_knowledge_library
def create_app() -> FastAPI:
@@ -50,6 +51,7 @@ def create_app() -> FastAPI:
@app.on_event("startup")
def _on_startup() -> None:
prepare_employee_directory()
prepare_knowledge_library()
logger.info(
"Server ready - host=%s port=%s prefix=%s",
settings.app_host,

View File

@@ -1 +1 @@
__all__ = ["employee", "reimbursement"]
__all__ = ["employee", "knowledge", "reimbursement"]

View File

@@ -0,0 +1,61 @@
from __future__ import annotations
from pydantic import BaseModel, Field
class KnowledgeFolderRead(BaseModel):
name: str
count: int
icon: str = "mdi mdi-folder"
class KnowledgePreviewStatRead(BaseModel):
label: str
value: str
class KnowledgePreviewBlockRead(BaseModel):
heading: str
lines: list[str] = Field(default_factory=list)
class KnowledgePreviewPageRead(BaseModel):
title: str
subtitle: str
stats: list[KnowledgePreviewStatRead] = Field(default_factory=list)
blocks: list[KnowledgePreviewBlockRead] = Field(default_factory=list)
class KnowledgeDocumentRead(BaseModel):
id: str
name: str
folder: str
tag: str
time: str
version: str
state: str
stateTone: str
owner: str
icon: str
fileType: str
fileTypeLabel: str
summary: str
mimeType: str
extension: str
sizeBytes: int
canPreview: bool = False
class KnowledgeDocumentDetailRead(KnowledgeDocumentRead):
previewKind: str
previewPages: list[KnowledgePreviewPageRead] = Field(default_factory=list)
class KnowledgeLibraryRead(BaseModel):
folders: list[KnowledgeFolderRead] = Field(default_factory=list)
documents: list[KnowledgeDocumentRead] = Field(default_factory=list)
class KnowledgeActionResponse(BaseModel):
ok: bool = True
detail: str

View File

@@ -0,0 +1,634 @@
from __future__ import annotations
import hashlib
import json
import mimetypes
import re
from datetime import UTC, datetime
from pathlib import Path
from typing import Any
from uuid import uuid4
from xml.etree import ElementTree
from zipfile import BadZipFile, ZipFile
from app.api.deps import CurrentUserContext
from app.core.config import get_settings
from app.core.logging import get_logger
from app.schemas.knowledge import (
KnowledgeDocumentDetailRead,
KnowledgeDocumentRead,
KnowledgeFolderRead,
KnowledgeLibraryRead,
KnowledgePreviewBlockRead,
KnowledgePreviewPageRead,
KnowledgePreviewStatRead,
)
logger = get_logger("app.services.knowledge")
FIXED_KNOWLEDGE_FOLDERS = [
"财务知识库",
"制度政策",
"报销制度",
"差旅规范",
"发票管理",
"税务合规",
"预算管理",
"财务共享",
"培训资料",
"常见问答",
]
ICON_BY_TYPE = {
"pdf": "mdi mdi-file-document-outline-pdf pdf",
"word": "mdi mdi-file-document-outline-word word",
"excel": "mdi mdi-file-document-outline-excel excel",
"ppt": "mdi mdi-file-powerpoint-box ppt",
"image": "mdi mdi-file-image-outline image",
"text": "mdi mdi-file-document-outline text",
"archive": "mdi mdi-folder-zip-outline archive",
"binary": "mdi mdi-file-outline",
}
TEXT_EXTENSIONS = {"txt", "md", "csv", "json", "xml", "yml", "yaml", "log"}
WORD_EXTENSIONS = {"doc", "docx"}
EXCEL_EXTENSIONS = {"xls", "xlsx", "csv"}
PPT_EXTENSIONS = {"ppt", "pptx"}
IMAGE_EXTENSIONS = {"png", "jpg", "jpeg", "gif", "bmp", "webp", "svg"}
ARCHIVE_EXTENSIONS = {"zip", "rar", "7z"}
STRUCTURED_PREVIEW_EXTENSIONS = {"docx", "xlsx", "pptx"} | TEXT_EXTENSIONS
INLINE_PREVIEW_EXTENSIONS = {"pdf"} | IMAGE_EXTENSIONS
def prepare_knowledge_library() -> None:
KnowledgeService().ensure_library_ready()
class KnowledgeService:
def __init__(self, storage_root: Path | None = None) -> None:
settings = get_settings()
self.storage_root = Path(storage_root or settings.resolved_storage_root_dir)
self.library_root = self.storage_root / "knowledge"
self.index_path = self.library_root / ".index.json"
def ensure_library_ready(self) -> None:
self.library_root.mkdir(parents=True, exist_ok=True)
for folder_name in FIXED_KNOWLEDGE_FOLDERS:
(self.library_root / folder_name).mkdir(parents=True, exist_ok=True)
if not self.index_path.exists():
self._save_index({"version": 1, "documents": []})
index = self._load_index()
if self._reconcile_index(index):
self._save_index(index)
def list_library(self) -> KnowledgeLibraryRead:
documents = self._load_documents()
folders = [
KnowledgeFolderRead(
name=folder_name,
count=sum(1 for item in documents if item.folder == folder_name),
icon="mdi mdi-folder-open" if folder_name == "差旅规范" else "mdi mdi-folder",
)
for folder_name in FIXED_KNOWLEDGE_FOLDERS
]
return KnowledgeLibraryRead(folders=folders, documents=documents)
def get_document_detail(self, document_id: str) -> KnowledgeDocumentDetailRead:
self.ensure_library_ready()
index = self._load_index()
entry = self._require_entry(index, document_id)
preview_kind, preview_pages = self._build_preview(entry)
document = self._serialize_document(entry)
return KnowledgeDocumentDetailRead(
**document.model_dump(),
previewKind=preview_kind,
previewPages=preview_pages,
)
def upload_document(
self,
folder: str,
filename: str,
content: bytes,
current_user: CurrentUserContext,
) -> KnowledgeDocumentDetailRead:
self.ensure_library_ready()
normalized_folder = self._normalize_folder(folder)
normalized_name = self._normalize_filename(filename)
if not content:
raise ValueError("上传文件不能为空。")
index = self._load_index()
existing_entry = next(
(
item
for item in index["documents"]
if item["folder"] == normalized_folder
and item["original_name"].lower() == normalized_name.lower()
),
None,
)
document_id = existing_entry["id"] if existing_entry else uuid4().hex
stored_name = f"{document_id}__{normalized_name}"
target_path = self.library_root / normalized_folder / stored_name
if existing_entry is not None and existing_entry["stored_name"] != stored_name:
old_path = self.library_root / existing_entry["folder"] / existing_entry["stored_name"]
if old_path.exists():
old_path.unlink()
target_path.write_bytes(content)
now = datetime.now(UTC).isoformat()
mime_type = mimetypes.guess_type(normalized_name)[0] or "application/octet-stream"
checksum = hashlib.sha256(content).hexdigest()
extension = self._extract_extension(normalized_name)
if existing_entry is None:
entry = {
"id": document_id,
"folder": normalized_folder,
"original_name": normalized_name,
"stored_name": stored_name,
"mime_type": mime_type,
"extension": extension,
"size_bytes": len(content),
"sha256": checksum,
"created_at": now,
"updated_at": now,
"uploaded_by": current_user.name,
"version_number": 1,
}
index["documents"].append(entry)
logger.info(
"Knowledge document uploaded id=%s folder=%s filename=%s by=%s",
document_id,
normalized_folder,
normalized_name,
current_user.name,
)
else:
existing_entry.update(
{
"stored_name": stored_name,
"mime_type": mime_type,
"extension": extension,
"size_bytes": len(content),
"sha256": checksum,
"updated_at": now,
"uploaded_by": current_user.name,
"version_number": int(existing_entry.get("version_number", 1)) + 1,
}
)
entry = existing_entry
logger.info(
"Knowledge document updated id=%s folder=%s filename=%s by=%s",
document_id,
normalized_folder,
normalized_name,
current_user.name,
)
self._save_index(index)
return self.get_document_detail(document_id)
def delete_document(self, document_id: str) -> None:
self.ensure_library_ready()
index = self._load_index()
entry = self._require_entry(index, document_id)
file_path = self._resolve_document_path(entry)
if file_path.exists():
file_path.unlink()
index["documents"] = [item for item in index["documents"] if item["id"] != document_id]
self._save_index(index)
logger.info("Knowledge document deleted id=%s filename=%s", document_id, entry["original_name"])
def get_document_content(self, document_id: str) -> tuple[Path, str, str]:
self.ensure_library_ready()
index = self._load_index()
entry = self._require_entry(index, document_id)
file_path = self._resolve_document_path(entry)
if not file_path.exists():
raise FileNotFoundError(entry["original_name"])
return file_path, entry["mime_type"], entry["original_name"]
def _load_documents(self) -> list[KnowledgeDocumentRead]:
self.ensure_library_ready()
index = self._load_index()
self._reconcile_index(index)
self._save_index(index)
documents = [self._serialize_document(entry) for entry in index["documents"]]
return sorted(documents, key=lambda item: item.time, reverse=True)
def _serialize_document(self, entry: dict[str, Any]) -> KnowledgeDocumentRead:
extension = entry.get("extension") or self._extract_extension(entry["original_name"])
file_type = self._resolve_file_type(extension)
size_bytes = int(entry.get("size_bytes") or 0)
updated_at = self._format_time(entry.get("updated_at") or entry.get("created_at"))
return KnowledgeDocumentRead(
id=entry["id"],
name=entry["original_name"],
folder=entry["folder"],
tag=f"{entry['folder']} / {extension.upper() or 'FILE'}",
time=updated_at,
version=f"v{int(entry.get('version_number', 1))}.0",
state="已发布",
stateTone="success",
owner=entry.get("uploaded_by") or "系统导入",
icon=ICON_BY_TYPE.get(file_type, ICON_BY_TYPE["binary"]),
fileType=file_type,
fileTypeLabel=self._resolve_file_type_label(file_type),
summary=f"{entry['folder']} · {extension.upper() or 'FILE'} · {self._format_size(size_bytes)}",
mimeType=entry.get("mime_type") or "application/octet-stream",
extension=extension,
sizeBytes=size_bytes,
canPreview=self._can_preview(extension),
)
def _build_preview(
self, entry: dict[str, Any]
) -> tuple[str, list[KnowledgePreviewPageRead]]:
extension = self._extract_extension(entry["original_name"])
file_path = self._resolve_document_path(entry)
if extension == "pdf":
return "pdf", []
if extension in IMAGE_EXTENSIONS:
return "image", []
if extension in TEXT_EXTENSIONS:
text = self._read_text_preview(file_path)
return "text", [self._build_text_preview_page(entry, text)]
if extension == "docx":
text = self._extract_docx_text(file_path)
return "text", [self._build_text_preview_page(entry, text)]
if extension == "xlsx":
return "table", [self._build_xlsx_preview_page(entry, file_path)]
if extension == "pptx":
return "slides", self._build_pptx_preview_pages(entry, file_path)
return (
"unsupported",
[
KnowledgePreviewPageRead(
title=entry["original_name"],
subtitle="当前格式暂不支持在线解析预览。",
stats=[
KnowledgePreviewStatRead(label="文件格式", value=extension.upper() or "FILE"),
KnowledgePreviewStatRead(label="文件大小", value=self._format_size(entry["size_bytes"])),
KnowledgePreviewStatRead(label="建议操作", value="下载后查看"),
],
blocks=[
KnowledgePreviewBlockRead(
heading="预览说明",
lines=[
"当前系统已支持该文件的上传、下载和权限控制。",
"如需在线预览,可后续接入专门的文档转换服务。",
],
)
],
)
],
)
def _build_text_preview_page(
self, entry: dict[str, Any], text: str
) -> KnowledgePreviewPageRead:
lines = [line.strip() for line in text.splitlines() if line.strip()]
if not lines:
lines = ["文件内容为空,或当前文档未提取到可展示文本。"]
groups = [lines[index : index + 8] for index in range(0, min(len(lines), 24), 8)]
blocks = [
KnowledgePreviewBlockRead(heading=f"内容片段 {index + 1}", lines=group)
for index, group in enumerate(groups)
]
return KnowledgePreviewPageRead(
title=entry["original_name"],
subtitle="文本提取预览",
stats=[
KnowledgePreviewStatRead(label="文件格式", value=entry["extension"].upper() or "TEXT"),
KnowledgePreviewStatRead(label="可见行数", value=str(len(lines))),
KnowledgePreviewStatRead(label="文件大小", value=self._format_size(entry["size_bytes"])),
],
blocks=blocks,
)
def _build_xlsx_preview_page(
self, entry: dict[str, Any], file_path: Path
) -> KnowledgePreviewPageRead:
rows, sheet_count = self._extract_xlsx_rows(file_path)
if not rows:
rows = [["未提取到表格内容。"]]
blocks = [
KnowledgePreviewBlockRead(
heading=f"{index + 1}",
lines=[" | ".join(cell for cell in row if cell) or "(空行)"],
)
for index, row in enumerate(rows[:12])
]
return KnowledgePreviewPageRead(
title=entry["original_name"],
subtitle="表格内容预览",
stats=[
KnowledgePreviewStatRead(label="工作表数量", value=str(sheet_count)),
KnowledgePreviewStatRead(label="预览行数", value=str(min(len(rows), 12))),
KnowledgePreviewStatRead(label="文件大小", value=self._format_size(entry["size_bytes"])),
],
blocks=blocks,
)
def _build_pptx_preview_pages(
self, entry: dict[str, Any], file_path: Path
) -> list[KnowledgePreviewPageRead]:
slides = self._extract_pptx_slides(file_path)
if not slides:
slides = [["未提取到幻灯片文本。"]]
pages: list[KnowledgePreviewPageRead] = []
for index, slide_lines in enumerate(slides[:8]):
pages.append(
KnowledgePreviewPageRead(
title=entry["original_name"],
subtitle=f"幻灯片 {index + 1}",
stats=[
KnowledgePreviewStatRead(label="页码", value=str(index + 1)),
KnowledgePreviewStatRead(label="文本条数", value=str(len(slide_lines))),
KnowledgePreviewStatRead(label="文件格式", value="PPTX"),
],
blocks=[
KnowledgePreviewBlockRead(
heading="幻灯片内容",
lines=slide_lines or ["该页未提取到文本内容。"],
)
],
)
)
return pages
def _load_index(self) -> dict[str, Any]:
try:
payload = json.loads(self.index_path.read_text(encoding="utf-8"))
except (FileNotFoundError, json.JSONDecodeError):
payload = {"version": 1, "documents": []}
payload.setdefault("documents", [])
return payload
def _save_index(self, index: dict[str, Any]) -> None:
self.index_path.write_text(
json.dumps(index, ensure_ascii=False, indent=2),
encoding="utf-8",
)
def _reconcile_index(self, index: dict[str, Any]) -> bool:
changed = False
documents = index.setdefault("documents", [])
known_by_stored = {
(item["folder"], item["stored_name"]): item
for item in documents
if item.get("folder") and item.get("stored_name")
}
existing_items: list[dict[str, Any]] = []
for item in documents:
file_path = self._resolve_document_path(item)
if file_path.exists():
item["size_bytes"] = file_path.stat().st_size
item["extension"] = self._extract_extension(item["original_name"])
item["mime_type"] = item.get("mime_type") or (
mimetypes.guess_type(item["original_name"])[0] or "application/octet-stream"
)
existing_items.append(item)
else:
changed = True
for folder_name in FIXED_KNOWLEDGE_FOLDERS:
folder_path = self.library_root / folder_name
for file_path in folder_path.iterdir():
if not file_path.is_file() or file_path.name.startswith("."):
continue
key = (folder_name, file_path.name)
if key in known_by_stored:
continue
document_id, original_name = self._parse_stored_name(file_path.name)
stat = file_path.stat()
existing_items.append(
{
"id": document_id,
"folder": folder_name,
"original_name": original_name,
"stored_name": file_path.name,
"mime_type": mimetypes.guess_type(original_name)[0]
or "application/octet-stream",
"extension": self._extract_extension(original_name),
"size_bytes": stat.st_size,
"sha256": "",
"created_at": datetime.fromtimestamp(stat.st_ctime, tz=UTC).isoformat(),
"updated_at": datetime.fromtimestamp(stat.st_mtime, tz=UTC).isoformat(),
"uploaded_by": "系统导入",
"version_number": 1,
}
)
changed = True
if changed or len(existing_items) != len(documents):
index["documents"] = existing_items
return True
return False
def _require_entry(self, index: dict[str, Any], document_id: str) -> dict[str, Any]:
for entry in index["documents"]:
if entry["id"] == document_id:
return entry
raise FileNotFoundError(document_id)
def _resolve_document_path(self, entry: dict[str, Any]) -> Path:
return self.library_root / entry["folder"] / entry["stored_name"]
@staticmethod
def _normalize_filename(filename: str) -> str:
normalized = Path(str(filename or "").strip()).name.strip()
normalized = normalized.replace("/", "_").replace("\\", "_")
if not normalized:
raise ValueError("文件名不能为空。")
return normalized
@staticmethod
def _normalize_folder(folder: str) -> str:
normalized = str(folder or "").strip()
if normalized not in FIXED_KNOWLEDGE_FOLDERS:
raise ValueError("只能上传到预设知识库文件夹。")
return normalized
@staticmethod
def _extract_extension(filename: str) -> str:
suffix = Path(filename).suffix.lower().lstrip(".")
return suffix
@staticmethod
def _parse_stored_name(stored_name: str) -> tuple[str, str]:
if "__" not in stored_name:
return uuid4().hex, stored_name
document_id, original_name = stored_name.split("__", 1)
return document_id or uuid4().hex, original_name or stored_name
@staticmethod
def _format_time(value: str | None) -> str:
if not value:
return ""
try:
parsed = datetime.fromisoformat(value)
except ValueError:
return value
return parsed.astimezone(UTC).strftime("%Y-%m-%d %H:%M")
@staticmethod
def _format_size(size_bytes: int) -> str:
if size_bytes < 1024:
return f"{size_bytes} B"
if size_bytes < 1024 * 1024:
return f"{size_bytes / 1024:.1f} KB"
return f"{size_bytes / (1024 * 1024):.1f} MB"
@staticmethod
def _resolve_file_type(extension: str) -> str:
if extension == "pdf":
return "pdf"
if extension in WORD_EXTENSIONS:
return "word"
if extension in EXCEL_EXTENSIONS:
return "excel"
if extension in PPT_EXTENSIONS:
return "ppt"
if extension in IMAGE_EXTENSIONS:
return "image"
if extension in TEXT_EXTENSIONS:
return "text"
if extension in ARCHIVE_EXTENSIONS:
return "archive"
return "binary"
@staticmethod
def _resolve_file_type_label(file_type: str) -> str:
mapping = {
"pdf": "PDF 预览",
"word": "Word 预览",
"excel": "Excel 预览",
"ppt": "PPT 预览",
"image": "图片预览",
"text": "文本预览",
"archive": "压缩包",
"binary": "文件预览",
}
return mapping.get(file_type, "文件预览")
@staticmethod
def _can_preview(extension: str) -> bool:
return extension in INLINE_PREVIEW_EXTENSIONS or extension in STRUCTURED_PREVIEW_EXTENSIONS
@staticmethod
def _read_text_preview(file_path: Path) -> str:
encodings = ("utf-8", "utf-8-sig", "gbk")
for encoding in encodings:
try:
return file_path.read_text(encoding=encoding)
except UnicodeDecodeError:
continue
return "当前文本文件编码暂不支持在线解析。"
@staticmethod
def _extract_docx_text(file_path: Path) -> str:
try:
with ZipFile(file_path) as archive:
xml_content = archive.read("word/document.xml")
except (BadZipFile, KeyError):
return "当前 Word 文件解析失败。"
root = ElementTree.fromstring(xml_content)
texts = [node.text.strip() for node in root.iter() if node.tag.endswith("}t") and node.text]
return "\n".join(texts)
@staticmethod
def _extract_xlsx_rows(file_path: Path) -> tuple[list[list[str]], int]:
try:
with ZipFile(file_path) as archive:
shared_strings: list[str] = []
if "xl/sharedStrings.xml" in archive.namelist():
shared_root = ElementTree.fromstring(archive.read("xl/sharedStrings.xml"))
shared_strings = [
"".join(node.itertext()).strip()
for node in shared_root.iter()
if node.tag.endswith("}si")
]
sheet_names = sorted(
name
for name in archive.namelist()
if re.fullmatch(r"xl/worksheets/sheet\d+\.xml", name)
)
if not sheet_names:
return [], 0
first_sheet = ElementTree.fromstring(archive.read(sheet_names[0]))
rows: list[list[str]] = []
for row in first_sheet.iter():
if not row.tag.endswith("}row"):
continue
row_values: list[str] = []
for cell in row:
if not cell.tag.endswith("}c"):
continue
cell_type = cell.attrib.get("t")
value_node = next((item for item in cell if item.tag.endswith("}v")), None)
if value_node is None or value_node.text is None:
row_values.append("")
continue
raw_value = value_node.text.strip()
if cell_type == "s" and raw_value.isdigit():
index = int(raw_value)
row_values.append(shared_strings[index] if index < len(shared_strings) else raw_value)
else:
row_values.append(raw_value)
if row_values:
rows.append(row_values)
return rows, len(sheet_names)
except (BadZipFile, ElementTree.ParseError, KeyError, ValueError):
return [], 0
@staticmethod
def _extract_pptx_slides(file_path: Path) -> list[list[str]]:
try:
with ZipFile(file_path) as archive:
slide_names = sorted(
name
for name in archive.namelist()
if re.fullmatch(r"ppt/slides/slide\d+\.xml", name)
)
slides: list[list[str]] = []
for slide_name in slide_names:
root = ElementTree.fromstring(archive.read(slide_name))
texts = [node.text.strip() for node in root.iter() if node.tag.endswith("}t") and node.text]
slides.append(texts)
return slides
except (BadZipFile, ElementTree.ParseError, KeyError):
return []

View File

@@ -0,0 +1,4 @@
{
"version": 1,
"documents": []
}

View File

@@ -0,0 +1,33 @@
import { apiRequest } from './api.js'
export function fetchKnowledgeLibrary() {
return apiRequest('/knowledge/library')
}
export function fetchKnowledgeDocument(documentId) {
return apiRequest(`/knowledge/documents/${documentId}`)
}
export function uploadKnowledgeDocument({ folder, file }) {
return apiRequest(
`/knowledge/documents?folder=${encodeURIComponent(folder)}&filename=${encodeURIComponent(file.name)}`,
{
method: 'POST',
body: file,
contentType: file.type || 'application/octet-stream'
}
)
}
export function deleteKnowledgeDocument(documentId) {
return apiRequest(`/knowledge/documents/${documentId}`, {
method: 'DELETE'
})
}
export function fetchKnowledgeDocumentBlob(documentId, disposition = 'inline') {
return apiRequest(`/knowledge/documents/${documentId}/content?disposition=${disposition}`, {
responseType: 'blob',
contentType: null
})
}

View File

@@ -10,11 +10,10 @@ export const DEFAULT_APP_VIEW_ORDER = [
'settings'
]
const ALWAYS_VISIBLE_VIEWS = new Set(['workbench', 'requests', 'chat'])
const ALWAYS_VISIBLE_VIEWS = new Set(['workbench', 'requests', 'chat', 'policies'])
const VIEW_ROLE_RULES = {
overview: ['finance', 'executive'],
approval: ['approver'],
policies: ['manager'],
audit: ['auditor'],
employees: ['manager'],
settings: ['manager']