caoxiaozhu
88e91a5900
feat(ocr): PDF 文本层可用时跳过 worker 调用并补装 poppler-data
- OcrService 提取 PDF 文本层后若有效字符达到阈值,直接构建文档并写入结果缓存,不再触发 OCR worker,仅无文本层时才解析 python_bin/worker_path 调用 worker
- _build_text_layer_document 复用 AggregatedOcrDocument 聚合文本层片段,_has_usable_pdf_text_layer 基于 meaningful_char_count 判定
- docker-compose 与 paddleocr bootstrap 脚本补装 poppler-data,保证 PDF 文本层抽取的中文编码正确
- 新增文本层直取与运行时依赖两项 ocr_service 单测
2026-06-21 23:23:59 +08:00
..
2026-06-03 09:25:23 +08:00
2026-06-03 15:46:56 +08:00
2026-06-21 23:23:59 +08:00
2026-06-21 23:23:59 +08:00
2026-05-18 02:53:06 +00:00
2026-06-01 17:07:14 +08:00
2026-05-11 05:18:16 +00:00
2026-06-03 09:25:23 +08:00
2026-06-03 09:25:23 +08:00
2026-06-09 08:32:00 +00:00
2026-06-03 09:25:23 +08:00
2026-06-03 09:25:23 +08:00
2026-06-01 17:07:14 +08:00
2026-06-03 09:25:23 +08:00
2026-05-28 12:09:49 +08:00
2026-05-26 17:29:35 +08:00
2026-05-19 16:19:03 +00:00
2026-05-19 16:19:03 +00:00