Commit Graph

4 Commits

Author SHA1 Message Date
caoxiaozhu
88e91a5900 feat(ocr): PDF 文本层可用时跳过 worker 调用并补装 poppler-data
- OcrService 提取 PDF 文本层后若有效字符达到阈值,直接构建文档并写入结果缓存,不再触发 OCR worker,仅无文本层时才解析 python_bin/worker_path 调用 worker
- _build_text_layer_document 复用 AggregatedOcrDocument 聚合文本层片段,_has_usable_pdf_text_layer 基于 meaningful_char_count 判定
- docker-compose 与 paddleocr bootstrap 脚本补装 poppler-data,保证 PDF 文本层抽取的中文编码正确
- 新增文本层直取与运行时依赖两项 ocr_service 单测
2026-06-21 23:23:59 +08:00
caoxiaozhu
59ba76c74a feat(startup): 服务端启动 bootstrap 与缓存预热
- 新增 STARTUP_BOOTSTRAP_ENABLED / STARTUP_CACHE_WARMUP_ENABLED 配置开关
- lifespan 拆分 bootstrap 步骤并后台线程预热缓存,失败可降级继续启动
- server_start.sh / web_start.sh 扩展 SERVER_PORT、启动与调度开关的 env 覆盖
- bootstrap_paddleocr_mobile.sh 改用 python3 并补充 poppler-utils 依赖
- 补充启动 bootstrap 与 env 覆盖优先级测试
2026-06-18 22:11:37 +08:00
caoxiaozhu
e124e4bbcb feat: 报销审批流重构与管家计划全链路贯通
- 重构报销状态注册表、审批流路由与平台风险标记
- 完善管家意图规划器与模型计划构建器全链路
- 新增 OCR Worker 脚本、数据库会话管理与通知状态
- 优化文档中心、日志视图、预算中心与员工管理交互
- 增强工作台摘要、图标资源与全局主题样式
- 补充审批路由、状态注册、OCR 服务与管家规划器测试覆盖
2026-06-06 17:19:07 +08:00
caoxiaozhu
fb23a6976a feat(server): add OCR invoice processing functionality
New endpoints:
- server/src/app/api/v1/endpoints/ocr.py: OCR API endpoints for invoice scanning

New schemas:
- server/src/app/schemas/ocr.py: OCR request/response data schemas

New services:
- server/src/app/services/ocr.py: OCR processing business logic
- server/src/app/services/expense_claims.py: expense claims management service

Scripts:
- server/scripts/bootstrap_paddleocr_mobile.sh: PaddleOCR mobile setup script
- server/scripts/paddle_ocr_worker.py: PaddleOCR worker process
2026-05-12 03:04:10 +00:00