chore: 删除 start-local.ps1 启动脚本

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
refactor: 重构 ai-core 代码结构
2026-03-09 16:08:55 +08:00 · 2026-03-09 16:08:44 +08:00 · 2026-03-09 16:08:38 +08:00
12 changed files with 204 additions and 762 deletions
--- a/ai-core/README.md
+++ b/ai-core/README.md
@@ -1,33 +1,31 @@
 # AI-Core 文档解析服务

-基于 Python 和 Microsoft MarkItDown 的 gRPC 文档解析服务，支持多种文件格式转换为 Markdown。
+基于 Python 的 gRPC 文档解析服务，支持多种文件格式转换为 Markdown。

-## 特性
+## 功能特性

- **统一解析引擎** - 使用 Microsoft MarkItDown，一个库支持所有格式
- **支持格式广泛** - PDF、DOCX、DOC、PPTX、PPT、XLSX、XLS、CSV、图片、网页等
- **gRPC 接口** - 高性能、类型安全的 RPC 通信
- **依赖简单** - 只需安装 3 个核心包
- **易于部署** - 一键启动，开箱即用
+- 支持多种文件格式：PDF、DOCX、DOC、XLSX、XLS、CSV、Markdown、图片等
+- 多解析引擎支持（builtin、markitdown）
+- gRPC 接口，高性能通信
+- 支持通过 URL 下载文件并解析
+- 可配置的解析引擎和参数

 ## 项目结构

 ```
 ai-core/
 ├── main.py                      # 服务启动入口
-├── requirements.txt             # Python 依赖（仅 3 个包）
-├── generate_grpc.py             # gRPC 代码生成脚本
-├── start.sh                     # Linux/Mac 启动脚本
-├── start.ps1                    # Windows 启动脚本
+├── requirements.txt             # Python 依赖
 ├── proto/                       # gRPC 协议定义
-│   ├── document_parser.proto    # Protocol Buffers 定义
-│   ├── document_parser_pb2.py  # 生成的 Python 代码
-│   └── document_parser_pb2_grpc.py
+│   └── document_parser.proto    # Protocol Buffers 定义
 ├── parser/                      # 文档解析器模块
-│   ├── __init__.py
-│   └── parser.py               # MarkItDown 解析器
+│   ├── base_parser.py           # 基础解析器接口
+│   ├── parser.py                # 解析器门面
+│   ├── registry.py              # 解析器注册表
+│   ├── docx_parser.py           # DOCX 解析器
+│   ├── pdf_parser.py            # PDF 解析器
+│   └── ...
 └── service/                     # gRPC 服务实现
-    ├── __init__.py
    └── grpc_server.py           # gRPC 服务器
 ```

@@ -39,39 +37,19 @@ ai-core/
 pip install -r requirements.txt
 ```

-依赖包：
- `grpcio` - gRPC 框架
- `grpcio-tools` - gRPC 工具
- `grpcio-reflection` - gRPC 反射
- `protobuf` - Protocol Buffers
- `requests` - HTTP 请求
- `markitdown` - Microsoft 文档解析引擎
-
 ### 2. 生成 gRPC 代码

 ```bash
-python generate_grpc.py
+python -m grpc_tools.protoc \
+    --proto_path=proto \
+    --python_out=proto \
+    --grpc_python_out=proto \
+    proto/document_parser.proto
 ```

-这会在 `proto` 目录下生成两个文件：
- `document_parser_pb2.py`
- `document_parser_pb2_grpc.py`
-
 ## 使用

-### 方式 1: 使用启动脚本（推荐）
-
-**Windows:**
-```powershell
-.\start.ps1
-```
-
-**Linux/Mac:**
-```bash
-bash start.sh
-```
-
-### 方式 2: 直接运行
+### 启动服务

 ```bash
 python main.py --port 50051 --max-workers 10
@@ -82,9 +60,9 @@ python main.py --port 50051 --max-workers 10
 - `--max-workers`: 最大工作线程数（默认 10）
 - `--log-level`: 日志级别（DEBUG/INFO/WARNING/ERROR，默认 INFO）

-## gRPC 接口
+### gRPC 接口

-### ParseDocument
+#### ParseDocument

 解析文档为 Markdown

@@ -92,8 +70,8 @@ python main.py --port 50051 --max-workers 10
 message ParseRequest {
  string file_url = 1;                    // 文件 URL（必填）
  string file_name = 2;                   // 文件名（必填）
-  string file_type = 3;                   // 文件类型（可选）
-  string parser_engine = 4;               // 解析引擎（可选）
+  string file_type = 3;                   // 文件类型（必填，如 pdf、docx）
+  string parser_engine = 4;               // 解析引擎（可选，默认 builtin）
  map<string, string> engine_overrides = 5;// 引擎参数覆盖（可选）
 }

@@ -107,26 +85,17 @@ message ParseResponse {
 }
 ```

-### GetSupportedFormats
+#### GetSupportedFormats

 获取支持的文件格式列表

-### GetEngines
+#### GetEngines

 获取可用的解析引擎列表

 ## Go 客户端调用示例

 ```go
-import (
-    "context"
-    "log"
-    
-    "google.golang.org/grpc"
-    "google.golang.org/grpc/credentials/insecure"
-)
-
-func main() {
 conn, err := grpc.Dial("localhost:50051", grpc.WithTransportCredentials(insecure.NewCredentials()))
 if err != nil {
    log.Fatalf("Failed to connect: %v", err)
@@ -139,82 +108,42 @@ func main() {
    FileUrl:   "http://localhost:8082/files/abc123.pdf",
    FileName:  "example.pdf",
    FileType:  "pdf",
+    ParserEngine: "builtin",
 })

 if err != nil {
    log.Fatalf("Failed to parse: %v", err)
 }

-    log.Printf("Success: %v", resp.Success)
-    log.Printf("Content length: %d", resp.ContentLength)
-    log.Printf("Markdown content:\n%s", resp.Content)
-}
+fmt.Println("Markdown content:")
+fmt.Println(resp.Content)
 ```

 ## 支持的文件格式

-| 类别 | 支持的扩展名 |
-|------|-------------|
-| **文档** | pdf, docx, doc, pptx, ppt |
-| **表格** | xlsx, xls, csv |
-| **文本** | md, markdown, txt |
-| **图片** | jpg, jpeg, png, gif, bmp, tiff, webp |
-| **网页** | html, htm |
-
-## 为什么选择 MarkItDown？
-
-1. **微软官方支持** - Microsoft 开发，持续维护
-2. **格式覆盖全** - 一个库支持所有常见格式
-3. **统一接口** - 无需为每种格式单独实现
-4. **安装简单** - 只需 `pip install markitdown`
-5. **性能优秀** - 基于优化的解析算法
-
-## 故障排查
-
-### 端口已被占用
-
-如果提示端口 50051 已被占用，可以更换端口：
-
-```bash
-python main.py --port 50052
-```
-
-### gRPC 代码未生成
-
-如果提示找不到 `docparser_pb2`，运行：
-
-```bash
-python generate_grpc.py
-```
-
-### 依赖安装失败
-
-确保使用 Python 3.8+：
-
-```bash
-python --version
-pip --version
-```
+| 格式 | 扩展名 | 说明 |
+|------|--------|------|
+| PDF | pdf | PDF 文档 |
+| Word | docx, doc | Microsoft Word 文档 |
+| Excel | xlsx, xls | Microsoft Excel 表格 |
+| CSV | csv | 逗号分隔值文件 |
+| Markdown | md, markdown | Markdown 文件 |
+| 图片 | jpg, jpeg, png, gif, bmp, tiff, webp | 常见图片格式 |
+| PowerPoint | pptx, ppt | PowerPoint 演示文稿 |

 ## 开发

-### 测试解析器
+### 添加新的解析器

-```python
-from parser import Parser
+1. 继承 `BaseParser` 类
+2. 实现 `parse_into_text` 方法
+3. 在 `registry.py` 中注册

-parser = Parser()
+### 添加新的解析引擎

-# 解析文件
-result = parser.parse("path/to/file.pdf")
-print(result["content"])
-
-# 解析字节内容
-with open("file.pdf", "rb") as f:
-    content = f.read()
-result = parser.parse_bytes(content, "file.pdf")
-print(result["content"])
-```
+1. 在 `registry.py` 中使用 `register()` 方法注册
+2. 提供 `check_available` 函数检查依赖
+3. 添加对应的解析器类

 ## 许可证

--- a/ai-core/main.py
+++ b/ai-core/main.py
@@ -1,59 +0,0 @@
-import argparse
-import logging
-import os
-import sys
-
-sys.path.insert(0, os.path.dirname(__file__))
-
-from service.grpc_server import serve
-
-DEFAULT_PORT = 50051
-DEFAULT_MAX_WORKERS = 10
-
-def main():
-    parser = argparse.ArgumentParser(
-        description="Document Parser gRPC Server (MarkItDown)",
-        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
-    )
-    parser.add_argument(
-        "--port",
-        type=int,
-        default=DEFAULT_PORT,
-        help="Port to listen on",
-    )
-    parser.add_argument(
-        "--max-workers",
-        type=int,
-        default=DEFAULT_MAX_WORKERS,
-        help="Maximum number of worker threads",
-    )
-    parser.add_argument(
-        "--log-level",
-        type=str,
-        default="INFO",
-        choices=["DEBUG", "INFO", "WARNING", "ERROR"],
-        help="Log level",
-    )
-    
-    args = parser.parse_args()
-
-    logging.basicConfig(
-        level=getattr(logging, args.log_level),
-        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
-    )
-    
-    logger = logging.getLogger(__name__)
-    logger.info("Starting Document Parser gRPC Server (MarkItDown)")
-    logger.info("Port: %d", args.port)
-    logger.info("Max workers: %d", args.max_workers)
-    
-    try:
-        serve(port=args.port, max_workers=args.max_workers)
-    except KeyboardInterrupt:
-        logger.info("Server shutdown requested")
-    except Exception as e:
-        logger.error("Server error: %s", str(e), exc_info=True)
-        sys.exit(1)
-
-if __name__ == "__main__":
-    main()
--- a/ai-core/parser/init.py
+++ b/ai-core/parser/init.py
@@ -1,9 +1,38 @@
 """
-Parser module for AI-Core document processing system.
+Parser module for WeKnora document processing system.

-This module provides document parsing using Microsoft MarkItDown.
+This module provides document parsers for various file formats including:
+- Microsoft Word documents (.doc, .docx)
+- PDF documents
+- Markdown files
+- Plain text files
+- Images with text content
+- Web pages
+
+The parsers extract content from documents and can split them into
+meaningful chunks for further processing and indexing.
 """

+from .doc_parser import DocParser
+from .docx2_parser import Docx2Parser
+from .excel_parser import ExcelParser
+from .image_parser import ImageParser
+from .markdown_parser import MarkdownParser
 from .parser import Parser
+from .pdf_parser import PDFParser
+from .registry import ParserEngineRegistry, registry
+from .web_parser import WebParser

-__all__ = ["Parser"]
+# Export public classes and modules
+__all__ = [
+    "Docx2Parser",
+    "DocParser",
+    "PDFParser",
+    "MarkdownParser",
+    "ImageParser",
+    "WebParser",
+    "Parser",
+    "ExcelParser",
+    "ParserEngineRegistry",
+    "registry",
+]
--- a/ai-core/parser/parser.py
+++ b/ai-core/parser/parser.py
@@ -1,199 +0,0 @@
-import logging
-import os
-import tempfile
-from typing import Optional, Dict, Any
-from markitdown import MarkItDown
-
-from .vlm_client import VLMClient
-from .config import get_vlm_config
-
-logger = logging.getLogger(__name__)
-
-
-class Parser:
-    """基于 MarkItDown + VLM 的统一文档解析器
-
-    支持格式：PDF、DOCX、DOC、PPTX、PPT、XLSX、XLS、CSV、图片、网页、Markdown 等
-
-    VLM 解析：
-    - 方式一：启动时配置（config.yaml 或环境变量）
-    - 方式二：gRPC 请求时传入 VLM 配置（优先级更高）
-    """
-
-    def __init__(self):
-        self.markitdown = MarkItDown()
-        self.vlm_client: Optional[VLMClient] = None
-
-        # 尝试加载配置的 VLM
-        vlm_config = get_vlm_config()
-        if vlm_config:
-            self.vlm_client = VLMClient(vlm_config)
-            logger.info(f"VLM enabled: provider={vlm_config.get('provider')}, model={vlm_config.get('model')}")
-        else:
-            logger.info("VLM not configured, using MarkItDown only")
-
-    def set_vlm_config(self, config: Dict[str, Any]) -> None:
-        """手动设置 VLM 配置（优先级高于全局配置）"""
-        if config and config.get("enabled") and config.get("api_key"):
-            self.vlm_client = VLMClient(config)
-            logger.info(f"VLM enabled: provider={config.get('provider')}, model={config.get('model')}")
-        else:
-            self.vlm_client = None
-            logger.info("VLM disabled")
-
-    def parse(self, file_path: str, file_type: Optional[str] = None, vlm_config: Optional[Dict[str, Any]] = None) -> dict:
-        """解析文档为 Markdown
-
-        Args:
-            file_path: 文件路径或 URL
-            file_type: 文件类型（可选，MarkItDown 会自动检测）
-            vlm_config: VLM 配置（可选，优先级高于全局配置）
-
-        Returns:
-            dict: 包含 markdown 内容和元数据
-        """
-        # 如果有 VLM 配置，覆盖全局配置
-        if vlm_config:
-            self.set_vlm_config(vlm_config)
-
-        try:
-            logger.info(f"Parsing file: {file_path}")
-
-            result = self.markitdown.convert(file_path)
-
-            logger.info(f"Parse successful: {len(result.text_content)} characters")
-
-            return {
-                "success": True,
-                "content": result.text_content,
-                "content_length": len(result.text_content),
-                "metadata": result.metadata if hasattr(result, 'metadata') else {}
-            }
-        except Exception as e:
-            logger.error(f"Parse error: {e}", exc_info=True)
-            return {
-                "success": False,
-                "content": "",
-                "content_length": 0,
-                "error": str(e)
-            }
-
-    def parse_bytes(self, content: bytes, file_name: str, file_type: Optional[str] = None, vlm_config: Optional[Dict[str, Any]] = None) -> dict:
-        """解析字节内容为 Markdown
-
-        Args:
-            content: 文件字节内容
-            file_name: 文件名
-            file_type: 文件类型（可选）
-            vlm_config: VLM 配置（可选，优先级高于全局配置）
-
-        Returns:
-            dict: 包含 markdown 内容和元数据
-        """
-        # 如果有 VLM 配置，覆盖全局配置
-        if vlm_config:
-            self.set_vlm_config(vlm_config)
-
-        try:
-            logger.info(f"Parsing bytes: {file_name}, size: {len(content)} bytes")
-
-            # 检查是否应该使用 VLM（根据文件名自动判断）
-            if self._should_use_vlm(file_name):
-                logger.info("Using VLM for parsing")
-                return self._parse_with_vlm(content, file_name)
-
-            # 否则使用 MarkItDown
-            with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file_name)[1] or '') as temp_file:
-                temp_file.write(content)
-                temp_path = temp_file.name
-
-            try:
-                result = self.markitdown.convert(temp_path)
-
-                logger.info(f"Parse successful: {len(result.text_content)} characters")
-
-                return {
-                    "success": True,
-                    "content": result.text_content,
-                    "content_length": len(result.text_content),
-                    "metadata": result.metadata if hasattr(result, 'metadata') else {}
-                }
-            finally:
-                os.unlink(temp_path)
-        except Exception as e:
-            logger.error(f"Parse bytes error: {e}", exc_info=True)
-            return {
-                "success": False,
-                "content": "",
-                "content_length": 0,
-                "error": str(e)
-            }
-
-    def _should_use_vlm(self, file_name: str) -> bool:
-        """判断是否应该使用 VLM"""
-        if not self.vlm_client:
-            return False
-
-        # 图片文件使用 VLM
-        image_exts = ['.jpg', '.jpeg', '.png', '.gif', '.bmp', '.webp', '.tiff']
-        ext = os.path.splitext(file_name)[1].lower()
-        return ext in image_exts
-
-    def _parse_with_vlm(self, content: bytes, file_name: str) -> dict:
-        """使用 VLM 解析"""
-        if not self.vlm_client:
-            return {
-                "success": False,
-                "content": "",
-                "content_length": 0,
-                "error": "VLM not configured"
-            }
-
-        # 确定 MIME 类型
-        ext = os.path.splitext(file_name)[1].lower()
-        mime_types = {
-            '.jpg': 'image/jpeg',
-            '.jpeg': 'image/jpeg',
-            '.png': 'image/png',
-            '.gif': 'image/gif',
-            '.bmp': 'image/bmp',
-            '.webp': 'image/webp',
-            '.tiff': 'image/tiff',
-        }
-        mime_type = mime_types.get(ext, 'image/png')
-
-        try:
-            result = self.vlm_client.analyze_image(content, mime_type)
-
-            if result.get("success"):
-                return {
-                    "success": True,
-                    "content": result["content"],
-                    "content_length": len(result["content"]),
-                    "metadata": {"vlm_used": True}
-                }
-            else:
-                return {
-                    "success": False,
-                    "content": "",
-                    "content_length": 0,
-                    "error": result.get("error", "VLM parsing failed")
-                }
-        except Exception as e:
-            logger.error(f"VLM parsing error: {e}")
-            return {
-                "success": False,
-                "content": "",
-                "content_length": 0,
-                "error": str(e)
-            }
-
-
-if __name__ == "__main__":
-    parser = Parser()
-
-    # 测试
-    test_url = "https://example.com"
-    result = parser.parse(test_url)
-    print(f"Success: {result['success']}")
-    print(f"Content length: {result['content_length']}")
--- a/ai-core/requirements.txt
+++ b/ai-core/requirements.txt
@@ -1,14 +0,0 @@
-# AI-Core Document Parser - 基于 MarkItDown
-
-# gRPC 框架
-grpcio>=1.60.0
-grpcio-tools>=1.60.0
-grpcio-reflection>=1.60.0
-protobuf>=4.25.0
-
-# 配置文件解析
-pyyaml>=6.0
-requests>=2.31.0
-
-# 文档解析 - markitdown 及其所有依赖
-markitdown[pdf,docx,pptx,xlsx,all]>=0.0.1
--- a/ai-core/service/grpc_server.py
+++ b/ai-core/service/grpc_server.py
@@ -1,259 +0,0 @@
-import logging
-import requests
-from concurrent import futures
-
-import grpc
-from grpc_reflection.v1alpha import reflection
-
-import sys
-import os
-
-sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
-sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "proto"))
-
-from parser.parser import Parser
-
-logger = logging.getLogger(__name__)
-
-docparser_pb2 = None
-docparser_pb2_grpc = None
-
-def _import_grpc_protobuf():
-    """Import gRPC protobuf modules"""
-    global docparser_pb2, docparser_pb2_grpc
-    if docparser_pb2 is not None and docparser_pb2_grpc is not None:
-        return
-    
-    try:
-        import document_parser_pb2 as dpb2
-        import document_parser_pb2_grpc as dpb2_grpc
-        docparser_pb2 = dpb2
-        docparser_pb2_grpc = dpb2_grpc
-        logger.info("Successfully imported gRPC protobuf modules")
-    except ImportError as e:
-        logger.error(f"Failed to import gRPC protobuf: {e}")
-        raise ImportError(
-            "gRPC protobuf files not found. Please run: python generate_grpc.py"
-        ) from e
-
-
-class DocumentParserServicer:
-    """gRPC 服务实现，使用 MarkItDown"""
-
-    def __init__(self, max_workers: int = 10):
-        _import_grpc_protobuf()
-        self.parser = Parser()
-        self.max_workers = max_workers
-        logger.info("DocumentParserServicer initialized")
-
-    def ParseDocument(self, request, context):
-        """解析文档"""
-        try:
-            logger.info(
-                "ParseDocument request: file_url=%s, file_name=%s, file_type=%s",
-                request.file_url,
-                request.file_name,
-                request.file_type,
-            )
-
-            file_url = request.file_url
-            file_name = request.file_name
-
-            if not file_url:
-                return docparser_pb2.ParseResponse(
-                    success=False,
-                    content="",
-                    message="file_url is required",
-                    content_length=0,
-                )
-
-            if not file_name:
-                return docparser_pb2.ParseResponse(
-                    success=False,
-                    content="",
-                    message="file_name is required",
-                    content_length=0,
-                )
-
-            # 提取 VLM 配置
-            vlm_config = None
-            if hasattr(request, 'vlm_config') and request.vlm_config:
-                vlm_cfg = request.vlm_config
-                if vlm_cfg.enabled:
-                    vlm_config = {
-                        "enabled": vlm_cfg.enabled,
-                        "provider": vlm_cfg.provider,
-                        "model": vlm_cfg.model,
-                        "api_key": vlm_cfg.api_key,
-                        "base_url": vlm_cfg.base_url,
-                        "prompt": vlm_cfg.prompt,
-                    }
-                    logger.info(f"VLM config: provider={vlm_cfg.provider}, model={vlm_cfg.model}")
-
-            logger.info("Downloading file from URL: %s", file_url)
-
-            try:
-                response = requests.get(
-                    file_url,
-                    timeout=60,
-                    headers={"User-Agent": "DocParser/1.0"},
-                )
-                response.raise_for_status()
-                content = response.content
-                logger.info("Downloaded %d bytes", len(content))
-            except requests.RequestException as e:
-                logger.error("Failed to download file: %s", str(e))
-                return docparser_pb2.ParseResponse(
-                    success=False,
-                    content="",
-                    message=f"Failed to download file: {str(e)}",
-                    content_length=0,
-                )
-
-            logger.info("Parsing file with MarkItDown + VLM")
-
-            result = self.parser.parse_bytes(content, file_name, vlm_config=vlm_config)
-
-            if not result.get("success", False):
-                logger.warning("Parser returned failure: %s", result.get("error", "Unknown error"))
-                return docparser_pb2.ParseResponse(
-                    success=False,
-                    content="",
-                    message=result.get("error", "Parse failed"),
-                    content_length=0,
-                )
-
-            markdown_content = result.get("content", "")
-            logger.info(
-                "Parse successful: content_length=%d",
-                len(markdown_content),
-            )
-
-            return docparser_pb2.ParseResponse(
-                success=True,
-                content=markdown_content,
-                message="Parse successful",
-                content_length=len(markdown_content),
-                file_type=request.file_type or "auto",
-                parser_engine="markitdown",
-            )
-
-        except Exception as e:
-            logger.error("ParseDocument error: %s", str(e), exc_info=True)
-            return docparser_pb2.ParseResponse(
-                success=False,
-                content="",
-                message=f"Parse error: {str(e)}",
-                content_length=0,
-            )
-
-    def GetSupportedFormats(self, request, context):
-        """获取支持的文件格式"""
-        try:
-            logger.info("GetSupportedFormats request")
-
-            file_types = [
-                "pdf", "docx", "doc", "pptx", "ppt", 
-                "xlsx", "xls", "csv", 
-                "md", "markdown",
-                "jpg", "jpeg", "png", "gif", "bmp", "tiff", "webp",
-                "html", "htm", "txt",
-            ]
-
-            file_type_descriptions = {
-                "pdf": "PDF Document",
-                "docx": "Microsoft Word Document",
-                "doc": "Microsoft Word Document (Legacy)",
-                "pptx": "Microsoft PowerPoint Presentation",
-                "ppt": "Microsoft PowerPoint Presentation (Legacy)",
-                "xlsx": "Microsoft Excel Spreadsheet",
-                "xls": "Microsoft Excel Spreadsheet (Legacy)",
-                "csv": "Comma-Separated Values",
-                "md": "Markdown File",
-                "markdown": "Markdown File",
-                "jpg": "JPEG Image",
-                "jpeg": "JPEG Image",
-                "png": "PNG Image",
-                "gif": "GIF Image",
-                "bmp": "BMP Image",
-                "tiff": "TIFF Image",
-                "webp": "WebP Image",
-                "html": "HTML Document",
-                "htm": "HTML Document",
-                "txt": "Plain Text File",
-            }
-
-            return docparser_pb2.SupportedFormatsResponse(
-                file_types=file_types,
-                file_type_descriptions=file_type_descriptions,
-            )
-        except Exception as e:
-            logger.error("GetSupportedFormats error: %s", str(e), exc_info=True)
-            context.set_code(grpc.StatusCode.INTERNAL)
-            context.set_details(str(e))
-            return docparser_pb2.SupportedFormatsResponse()
-
-    def GetEngines(self, request, context):
-        """获取可用的解析引擎列表"""
-        try:
-            logger.info("GetEngines request")
-
-            engine_info = docparser_pb2.EngineInfo(
-                name="markitdown",
-                description="Microsoft MarkItDown - 统一文档解析引擎",
-                supported_file_types=[
-                    "pdf", "docx", "doc", "pptx", "ppt", 
-                    "xlsx", "xls", "csv", 
-                    "md", "markdown",
-                    "jpg", "jpeg", "png", "gif", "bmp", "tiff", "webp",
-                    "html", "htm", "txt",
-                ],
-                available=True,
-                unavailable_reason="",
-            )
-
-            return docparser_pb2.EnginesResponse(engines=[engine_info])
-        except Exception as e:
-            logger.error("GetEngines error: %s", str(e), exc_info=True)
-            context.set_code(grpc.StatusCode.INTERNAL)
-            context.set_details(str(e))
-            return docparser_pb2.EnginesResponse()
-
-
-def serve(port: int = 50051, max_workers: int = 10):
-    """启动 gRPC 服务"""
-    _import_grpc_protobuf()
-    
-    server = grpc.server(futures.ThreadPoolExecutor(max_workers=max_workers))
-
-    servicer = DocumentParserServicer(max_workers=max_workers)
-    docparser_pb2_grpc.add_DocumentParserServicer_to_server(servicer, server)
-
-    reflection.enable_server_reflection(
-        service_names=[
-            docparser_pb2.DESCRIPTOR.services_by_name["DocumentParser"].full_name,
-            reflection.SERVICE_NAME,
-        ],
-        server=server,
-    )
-
-    server.add_insecure_port(f"0.0.0.0:{port}")
-    server.start()
-
-    logger.info("DocumentParser gRPC server (MarkItDown) started on port %d", port)
-    logger.info("gRPC reflection enabled")
-    
-    try:
-        server.wait_for_termination()
-    except KeyboardInterrupt:
-        logger.info("Shutting down server...")
-        server.stop(0)
-        logger.info("Server stopped")
-
-
-if __name__ == "__main__":
-    logging.basicConfig(
-        level=logging.INFO,
-        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
-    )
-    serve()
--- a/docker-compose.dev.yml
+++ b/docker-compose.dev.yml
@@ -1,33 +0,0 @@
-services:
-  # MySQL 数据库
-  x-agent-mysql:
-    image: mysql:8.0
-    container_name: x-agents-mysql
-    environment:
-      MYSQL_ROOT_PASSWORD: root
-      MYSQL_DATABASE: x_agents
-    volumes:
-      - mysql-data:/var/lib/mysql
-    ports:
-      - "6036:3306"
-    healthcheck:
-      test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
-      interval: 10s
-      timeout: 5s
-      retries: 5
-    restart: unless-stopped
-    command: --default-authentication-plugin=mysql_native_password
-
-  # Redis
-  x-agent-redis:
-    image: redis:7-alpine
-    container_name: x-agents-redis
-    ports:
-      - "6037:6379"
-    volumes:
-      - redis-data:/data
-    restart: unless-stopped
-
-volumes:
-  mysql-data:
-  redis-data:
--- a/start-local.ps1
+++ b/start-local.ps1
@@ -1,41 +0,0 @@
-# X-Agents 本地启动脚本（Go + 前端）
-# 运行方式: .\start-local.ps1
-
-$ErrorActionPreference = "Stop"
-
-Write-Host "======================================" -ForegroundColor Cyan
-Write-Host "  X-Agents 本地启动 (Go + 前端)" -ForegroundColor Cyan
-Write-Host "======================================" -ForegroundColor Cyan
-
-# 1. 启动数据库
-Write-Host "[启动] 数据库..." -ForegroundColor Green
-docker compose -f docker-compose.dev.yml up -d
-
-# 2. 启动 Go 服务
-Write-Host "[启动] Go API 服务..." -ForegroundColor Green
-Start-Process powershell -ArgumentList "-NoExit", "-Command", @"
-cd $PWD\server
-go run ./cmd/api
-"@ -WindowStyle Normal
-
-# 3. 启动前端
-Write-Host "[启动] 前端服务..." -ForegroundColor Green
-if (Test-Path "web/package.json") {
-    Start-Process powershell -ArgumentList "-NoExit", "-Command", @"
-cd $PWD\web
-npm run dev
-"@ -WindowStyle Normal
-}
-
-Write-Host ""
-Write-Host "======================================" -ForegroundColor Green
-Write-Host "  服务已启动！" -ForegroundColor Green
-Write-Host "======================================" -ForegroundColor Green
-Write-Host ""
-Write-Host "服务地址：" -ForegroundColor White
-Write-Host "  - Go API:  http://localhost:8080" -ForegroundColor Cyan
-Write-Host "  - 前端:    http://localhost:5173" -ForegroundColor Cyan
-Write-Host "  - MySQL:   localhost:6036" -ForegroundColor Cyan
-Write-Host "  - Redis:   localhost:6037" -ForegroundColor Cyan
-
-Read-Host | Out-Null
--- a/web/src/components/Sidebar.vue
+++ b/web/src/components/Sidebar.vue
@@ -1,7 +1,9 @@
 <script setup lang="ts">
-import { computed, ref } from 'vue'
+import { computed, ref, onMounted } from 'vue'
 import { useRouter, useRoute } from 'vue-router'
 import { ElMessageBox } from 'element-plus'
+import { fetchKnowledgeBases } from '@/views/knowledge/useKnowledge'
+import { useDatabase } from '@/views/database/useDatabase'

 // 下拉菜单展开状态
 const userDropdownVisible = ref(false)
@@ -9,6 +11,26 @@ const userDropdownVisible = ref(false)
 const router = useRouter()
 const route = useRoute()

+// 获取 Knowledge 数量
+const knowledgeCount = ref(0)
+const fetchKnowledgeCount = async () => {
+  try {
+    const data = await fetchKnowledgeBases()
+    knowledgeCount.value = data?.length || 0
+  } catch (e) {
+    console.error('Failed to fetch knowledge count:', e)
+  }
+}
+
+// 获取 Database 数量
+const { databases, fetchDatabases } = useDatabase()
+const databaseCount = computed(() => databases.value?.length || 0)
+
+onMounted(() => {
+  fetchKnowledgeCount()
+  fetchDatabases()
+})
+
 interface MenuItem {
  name: string
  icon: string
@@ -17,13 +39,13 @@ interface MenuItem {
  path?: string
 }

-const mainMenu: MenuItem[] = [
+const mainMenu = computed<MenuItem[]>(() => [
  { name: 'Dashboard', icon: 'fa-gauge', path: '/dashboard' },
  { name: 'Agents', icon: 'fa-robot', badge: 3, path: '/agents' },
-  { name: 'Team', icon: 'fa-users', path: '/team' },
-  { name: 'Database', icon: 'fa-database', path: '/database' },
-  { name: 'Knowledge', icon: 'fa-brain', path: '/knowledge' },
-]
+  { name: 'Script', icon: 'fa-code', path: '/script' },
+  { name: 'Database', icon: 'fa-database', path: '/database', badge: databaseCount.value },
+  { name: 'Knowledge', icon: 'fa-brain', path: '/knowledge', badge: knowledgeCount.value },
+])

 const middleMenu: MenuItem[] = [
  { name: 'Skills', icon: 'fa-wand-magic-sparkles', badge: 21, path: '/mcp' },
@@ -35,17 +57,23 @@ const bottomMenu: MenuItem[] = [
 ]

 const bottomMenu2: MenuItem[] = [
-  { name: 'Account', icon: 'fa-user' },
+  { name: 'Account', icon: 'fa-user', path: '/account' },
 ]

 const activeMenu = computed(() => {
  const currentPath = route.path
  // Check main menu
-  const menuItem = mainMenu.find(item => item.path === currentPath)
+  const menuItem = mainMenu.value.find(item => item.path === currentPath)
  if (menuItem) return menuItem.name
-  // Check bottom menu
+  // Check middle menu (Skills, Tools)
+  const middleItem = middleMenu.find(item => item.path === currentPath)
+  if (middleItem) return middleItem.name
+  // Check bottom menu (Settings)
  const bottomItem = bottomMenu.find(item => item.path === currentPath)
  if (bottomItem) return bottomItem.name
+  // Check bottomMenu2 (Account)
+  const bottomItem2 = bottomMenu2.find(item => item.path === currentPath)
+  if (bottomItem2) return bottomItem2.name
  return 'Dashboard'
 })

@@ -101,8 +129,8 @@ const handleUserCommand = (command: string) => {
    <!-- 导航菜单 -->
    <nav class="flex-1 px-3 py-2">
      <ul class="space-y-1">
-        <!-- Dashboard 激活项 -->
-        <li v-for="item in mainMenu" :key="item.name">
+        <!-- Dashboard, Agents -->
+        <li v-for="item in mainMenu.slice(0, 2)" :key="item.name">
          <a
            href="#"
            class="flex items-center justify-between px-3 py-2.5 rounded-lg transition-colors text-sm"
@@ -117,7 +145,26 @@ const handleUserCommand = (command: string) => {
          </a>
        </li>

-        <!-- 分隔线 -->
+        <!-- 分隔线1 -->
+        <li class="my-4 border-t border-dark-500"></li>
+
+        <!-- Database, Knowledge -->
+        <li v-for="item in mainMenu.slice(2)" :key="item.name">
+          <a
+            href="#"
+            class="flex items-center justify-between px-3 py-2.5 rounded-lg transition-colors text-sm"
+            :class="activeMenu === item.name ? 'bg-dark-600 text-white' : 'text-gray-400 hover:bg-dark-600 hover:text-white'"
+            @click="navigateTo(item)"
+          >
+            <div class="flex items-center gap-3">
+              <i :class="['fa-solid', item.icon, 'w-5', 'text-center']"></i>
+              <span>{{ item.name }}</span>
+            </div>
+            <span v-if="item.badge" class="bg-dark-500 text-xs px-2 py-0.5 rounded-full">{{ item.badge }}</span>
+          </a>
+        </li>
+
+        <!-- 分隔线2 -->
        <li class="my-4 border-t border-dark-500"></li>

        <!-- Skills & Tools -->
@@ -136,9 +183,6 @@ const handleUserCommand = (command: string) => {
          </a>
        </li>

-        <!-- 分隔线 -->
-        <li class="my-4 border-t border-dark-500"></li>
-
        <!-- Settings -->
        <li v-for="item in bottomMenu" :key="item.name">
          <a
@@ -166,11 +210,14 @@ const handleUserCommand = (command: string) => {
        <li v-for="item in bottomMenu2" :key="item.name">
          <a
            href="#"
-            class="flex items-center gap-3 px-3 py-2.5 rounded-lg hover:bg-dark-600 text-gray-400 hover:text-white transition-colors text-sm"
+            class="flex items-center justify-between px-3 py-2.5 rounded-lg transition-colors text-sm"
+            :class="activeMenu === item.name ? 'bg-dark-600 text-white' : 'text-gray-400 hover:bg-dark-600 hover:text-white'"
            @click="navigateTo(item)"
          >
+            <div class="flex items-center gap-3">
              <i :class="['fa-solid', item.icon, 'w-5', 'text-center']"></i>
              <span>{{ item.name }}</span>
+            </div>
          </a>
        </li>
      </ul>
--- a/web/src/router/index.ts
+++ b/web/src/router/index.ts
@@ -6,8 +6,10 @@ import Team from '@/views/Team.vue'
 import Skill from '@/views/Skill.vue'
 import ModelAPIs from '@/views/ModelAPIs.vue'
 import Database from '@/views/Database.vue'
+import Script from '@/views/Script.vue'
 import Knowledge from '@/views/Knowledge.vue'
 import Settings from '@/views/Settings.vue'
+import Account from '@/views/Account.vue'

 const router = createRouter({
  history: createWebHistory(import.meta.env.BASE_URL),
@@ -47,6 +49,11 @@ const router = createRouter({
      name: 'database',
      component: Database
    },
+    {
+      path: '/script',
+      name: 'script',
+      component: Script
+    },
    {
      path: '/knowledge',
      name: 'knowledge',
@@ -56,6 +63,11 @@ const router = createRouter({
      path: '/settings',
      name: 'settings',
      component: Settings
+    },
+    {
+      path: '/account',
+      name: 'account',
+      component: Account
    }
  ]
 })
--- a/web/src/views/Account.vue
+++ b/web/src/views/Account.vue
@@ -0,0 +1,15 @@
+<script setup lang="ts">
+// Account 页面 - 占位
+</script>
+
+<template>
+  <div class="p-6 min-h-screen">
+    <div class="flex items-center gap-2 mb-6">
+      <i class="fa-solid fa-user text-gray-400"></i>
+      <span class="font-medium">Account</span>
+    </div>
+    <div class="text-gray-400">
+      Account management coming soon...
+    </div>
+  </div>
+</template>
--- a/web/src/views/Script.vue
+++ b/web/src/views/Script.vue
@@ -0,0 +1,15 @@
+<script setup lang="ts">
+// Script 页面 - 占位
+</script>
+
+<template>
+  <div class="p-6 min-h-screen">
+    <div class="flex items-center gap-2 mb-6">
+      <i class="fa-solid fa-code text-gray-400"></i>
+      <span class="font-medium">Script</span>
+    </div>
+    <div class="text-gray-400">
+      Script management coming soon...
+    </div>
+  </div>
+</template>