X-Agents/team-require/ai/ai-core-api.md

# AI-Core 文档解析服务 API 对接文档

## 服务地址

```
localhost:50051
```

## gRPC API 定义

### 1. ParseDocument - 解析文档

**请求 (ParseRequest)**
```protobuf
message ParseRequest {
  string file_url = 1;      // 文件 URL（必填）
  string file_name = 2;     // 文件名，带扩展名（必填）
  string file_type = 3;     // 文件类型（可选，自动检测）
  map<string, string> engine_overrides = 4;  // 引擎配置
}
```

**响应 (ParseResponse)**
```protobuf
message ParseResponse {
  bool success = 1;           // 是否成功
  string content = 2;         // Markdown 内容
  string message = 3;         // 状态消息
  int32 content_length = 4;   // 内容长度
  string file_type = 5;       // 文件类型
  string parser_engine = 6;   // 解析引擎 (markitdown)
}
```

### 2. GetSupportedFormats - 获取支持的格式

**请求**: 空消息

**响应**
- `file_types`: string[] - 支持的扩展名列表
- `file_type_descriptions`: map<string, string> - 格式描述

---

## Golang 对接示例

### 1. 安装依赖

```bash
go get google.golang.org/grpc
go get google.golang.org/grpc/credentials/insecure
```

### 2. 生成 Go Proto 代码

需要先将 `proto/document_parser.proto` 生成 Go 代码：

```bash
# 方法一：使用 grpc_tools
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
protoc --go_out=. --go_opt=paths=source_relative \
       --go-grpc_out=. --go-grpc_opt=paths=source_relative \
       proto/document_parser.proto
```

### 3. 完整调用代码

```go
package main

import (
    "context"
    "fmt"
    "log"

    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials/insecure"

    pb "your-project/proto"  // 替换为你的 proto 包路径
)

func main() {
    // 连接 gRPC 服务
    conn, err := grpc.Dial(
        "localhost:50051",
        grpc.WithTransportCredentials(insecure.NewCredentials()),
        grpc.WithBlock(),
    )
    if err != nil {
        log.Fatalf("连接失败: %v", err)
    }
    defer conn.Close()

    client := pb.NewDocumentParserClient(conn)
    ctx := context.Background()

    // 调用 ParseDocument
    req := &pb.ParseRequest{
        FileUrl:  "https://example.com/document.pdf",
        FileName: "document.pdf",
    }

    resp, err := client.ParseDocument(ctx, req)
    if err != nil {
        log.Fatalf("解析失败: %v", err)
    }

    // 处理响应
    if resp.Success {
        fmt.Printf("解析成功！\n")
        fmt.Printf("内容长度: %d 字符\n", resp.ContentLength)
        fmt.Printf("Markdown 内容:\n%s\n", resp.Content)
    } else {
        fmt.Printf("解析失败: %s\n", resp.Message)
    }
}
```

### 4. 获取支持的格式

```go
// 获取支持的文件格式
formatsReq := &pb.Empty{}
formatsResp, err := client.GetSupportedFormats(ctx, formatsReq)
if err != nil {
    log.Fatal(err)
}

fmt.Println("支持的格式:")
for _, ft := range formatsResp.FileTypes {
    desc := formatsResp.FileTypeDescriptions[ft]
    fmt.Printf("  - %s: %s\n", ft, desc)
}
```

---

## 注意事项

1. **文件 URL**: 必须是可直接访问的 URL，服务会下载文件到内存解析
2. **文件名**: 必须带扩展名（如 `.pdf`, `.docx`），用于自动识别文件类型
3. **返回内容**: 直接返回 Markdown 格式文本，可用于向量检索或 LLM 处理