feat: 完善 AI-Core 文档解析器

- 添加多种文档解析器 (PDF, Word, Excel, Markdown 等)
- 添加基础解析器和链式解析器
- 添加存储和注册机制
- 添加 gRPC 服务实现

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-10 15:01:52 +08:00
parent 54473bc378
commit d24b29afe4
19 changed files with 4056 additions and 31 deletions

View File

@@ -1,38 +1,10 @@
"""
Parser module for WeKnora document processing system.
This module provides document parsers for various file formats including:
- Microsoft Word documents (.doc, .docx)
- PDF documents
- Markdown files
- Plain text files
- Images with text content
- Web pages
The parsers extract content from documents and can split them into
meaningful chunks for further processing and indexing.
Parser module for AI-Core document processing.
"""
from .doc_parser import DocParser
from .docx2_parser import Docx2Parser
from .excel_parser import ExcelParser
from .image_parser import ImageParser
from .markdown_parser import MarkdownParser
from .parser import Parser
from .pdf_parser import PDFParser
from .registry import ParserEngineRegistry, registry
from .web_parser import WebParser
from .parser_simple import Parser, Document
# Export public classes and modules
__all__ = [
"Docx2Parser",
"DocParser",
"PDFParser",
"MarkdownParser",
"ImageParser",
"WebParser",
"Parser",
"ExcelParser",
"ParserEngineRegistry",
"registry",
"Document",
]