first-update

This commit is contained in:
2026-03-17 14:36:31 +08:00
parent 72f08aee7c
commit 4eddf05e79
516 changed files with 115270 additions and 1 deletions

View File

@@ -0,0 +1,16 @@
node_modules
.next
.git
.github
README.md
README.zh-CN.md
.gitignore
.env.local
.env.development.local
.env.test.local
.env.production.local
/test
/local-db
/video
/prisma/*.sqlite
/prisma/*.sqlite-*

6
easy-dataset-main/.gitattributes vendored Normal file
View File

@@ -0,0 +1,6 @@
# Ensure shell scripts always use LF line endings
*.sh text eol=lf
docker-entrypoint.sh text eol=lf
# Ensure Dockerfile uses LF
Dockerfile text eol=lf

View File

@@ -0,0 +1,40 @@
---
name: Bug report
about: Create a report to help us improve
title: '[Bug]'
labels: bug
assignees: ''
---
**注意:请务必按照此模版填写 ISSUES 信息,否则 ISSUE 将不会得到回复**
**问题描述**
清晰、简洁地描述该问题的具体情况。
**桌面设备(请完善以下信息)**
- 操作系统:[例如、Window、MAC]
- 浏览器:[例如谷歌浏览器Chrome苹果浏览器Safari]
- Easy Dataset 版本:[例如1.2.2]
**使用模型**
- 模型提供商:例如火山引擎
- 模型名称:例如 DeepSeek R1
**复现步骤**
重现该问题的操作步骤:
1. 进入“……”页面。
2. 点击“……”。
3. 向下滚动到“……”。
4. 这时会看到错误提示。
**预期结果**
清晰、简洁地描述你原本期望出现的情况。
**截图**
如果有必要,请附上截图,以便更好地说明你的问题。
**其他相关信息**
在此处添加关于该问题的其他任何相关背景信息。

View File

@@ -0,0 +1,19 @@
---
name: 'Feature or enhancement '
about: Suggest an idea for this project
title: '[Feature]'
labels: enhancement
assignees: ''
---
**你的功能请求是否与某个问题相关?请描述。**
清晰、简洁地描述一下存在的问题是什么。例如:当我[具体情况]时,我总是感到很沮丧。
**描述你期望的解决方案**
清晰、简洁地描述你希望实现的情况。
**描述你考虑过的替代方案**
清晰、简洁地描述你所考虑过的任何其他解决方案或功能。
**其他相关信息**
在此处添加与该功能请求相关的其他任何背景信息或截图。

View File

@@ -0,0 +1,40 @@
---
name: Question
about: Ask questions you want to know
title: '[Question]'
labels: question
assignees: ''
---
**注意:请务必按照此模版填写 ISSUES 信息,否则 ISSUE 将不会得到回复**
**问题描述**
清晰、简洁地描述该问题的具体情况。
**桌面设备(请完善以下信息)**
- 操作系统:[例如、Window、MAC]
- 浏览器:[例如谷歌浏览器Chrome苹果浏览器Safari]
- Easy Dataset 版本:[例如1.2.2]
**使用模型**
- 模型提供商:例如火山引擎
- 模型名称:例如 DeepSeek R1
**复现步骤**
重现该问题的操作步骤:
1. 进入“……”页面。
2. 点击“……”。
3. 向下滚动到“……”。
4. 这时会看到错误提示。
**预期结果**
清晰、简洁地描述你原本期望出现的情况。
**截图**
如果有必要,请附上截图,以便更好地说明你的问题。
**其他相关信息**
在此处添加关于该问题的其他任何相关背景信息。

View File

@@ -0,0 +1,12 @@
### 变更类型- [ ] 新功能feat
- [ ] 修复fix
- [ ] 文档docs
- [ ] 重构refactor
### 变更描述- 简要说明修改内容关联Issue#123
### 文档更新- [ ] README.md
- [ ] 贡献指南
- [ ] 接口文档(如有)

View File

@@ -0,0 +1,48 @@
name: Build and Push Docker image on Tag
on:
push:
tags:
- '*'
jobs:
docker-image-release:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: ghcr.io/${{ github.repository_owner }}/easy-dataset
tags: |
type=ref,event=tag
type=raw,value=latest,enable={{is_default_branch}}
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: true
platforms: linux/amd64,linux/arm64
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

22
easy-dataset-main/.gitignore vendored Normal file
View File

@@ -0,0 +1,22 @@
node_modules
build
.vscode
website-local.json
ai-local.json
.next
.DS_Store
tsconfig.tsbuildinfo
mock-login-callback.ts
.env.local
/src/test/crawler
/src/test/mock
/test
/dist
/prisma/*.sqlite
.idea
!local-db/empty.txt
/local-db
prisma/local-db/db.sqlite
/local-db2
.trae
opencode.json

View File

@@ -0,0 +1,3 @@
#!/usr/bin/env sh
npx commitlint --edit "$1"

View File

@@ -0,0 +1 @@
npx lint-staged

3
easy-dataset-main/.npmrc Normal file
View File

@@ -0,0 +1,3 @@
# 国内用户可使用淘宝源加速 (Chinese users can use Taobao registry for faster downloads)
# registry=https://registry.npmmirror.com
registry=https://registry.npmjs.org

View File

@@ -0,0 +1,13 @@
module.exports = {
semi: true,
trailingComma: 'none',
singleQuote: true,
tabWidth: 2,
useTabs: false,
bracketSpacing: true,
arrowParens: 'avoid',
proseWrap: 'preserve',
jsxBracketSameLine: true,
printWidth: 120,
endOfLine: 'auto'
};

View File

@@ -0,0 +1,124 @@
# Easy DataSet 项目架构设计
## 项目概述
Easy DataSet 是一个用于创建大模型微调数据集的应用程序。用户可以上传文本文件,系统会自动分割文本并生成问题,最终生成用于微调的数据集。
## 技术栈
- **前端框架**: Next.js 14 (App Router)
- **UI 框架**: Material-UI (MUI)
- **数据存储**: fs 文件系统模拟数据库
- **开发语言**: JavaScript
- **依赖管理**: pnpm
## 目录结构
```
easy-dataset/
├── app/ # Next.js 应用目录
│ ├── api/ # API 路由
│ │ └── projects/ # 项目相关 API
│ ├── projects/ # 项目相关页面
│ │ ├── [projectId]/ # 项目详情页面
│ └── page.js # 主页
├── components/ # React 组件
│ ├── home/ # 主页相关组件
│ │ ├── HeroSection.js
│ │ ├── ProjectList.js
│ │ └── StatsCard.js
│ ├── Navbar.js # 导航栏组件
│ └── CreateProjectDialog.js
├── lib/ # 工具库
│ └── db/ # 数据库模块
│ ├── base.js # 基础工具函数
│ ├── projects.js # 项目管理
│ ├── texts.js # 文本处理
│ ├── datasets.js # 数据集管理
│ └── index.js # 模块导出
├── styles/ # 样式文件
│ └── home.js # 主页样式
└── local-db/ # 本地数据库目录
```
## 核心模块设计
### 1. 数据库模块 (`lib/db/`)
#### base.js
- 提供基础的文件操作功能
- 确保数据库目录存在
- 读写 JSON 文件的工具函数
#### projects.js
- 项目的 CRUD 操作
- 项目配置管理
- 项目目录结构维护
#### texts.js
- 文献处理功能
- 文本片段存储和检索
- 文件上传处理
#### datasets.js
- 数据集生成和管理
- 问题列表管理
- 标签树管理
### 2. 前端组件 (`components/`)
#### Navbar.js
- 顶部导航栏
- 项目切换
- 模型选择
- 主题切换
#### home/ 目录组件
- HeroSection.js: 主页顶部展示区
- ProjectList.js: 项目列表展示
- StatsCard.js: 数据统计展示
- CreateProjectDialog.js: 创建项目的对话框
### 3. 页面路由 (`app/`)
#### 主页 (`page.js`)
- 项目列表展示
- 创建项目入口
- 数据统计展示
#### 项目详情页 (`projects/[projectId]/`)
- text-split/: 文献处理页面
- questions/: 问题列表页面
- datasets/: 数据集页面
- settings/: 项目设置页面
#### API 路由 (`api/`)
- projects/: 项目管理 API
- texts/: 文本处理 API
- questions/: 问题生成 API
- datasets/: 数据集管理 API
## 数据流设计
### 项目创建流程
1. 用户通过主页或导航栏创建新项目
2. 填写项目基本信息(名称、描述)
3. 系统创建项目目录和初始配置文件
4. 重定向到项目详情页
### 文献处理流程
1. 用户上传 Markdown 文件
2. 系统保存原始文件到项目目录
3. 调用文本分割服务,生成片段和目录结构
4. 展示分割结果和提取的目录
### 问题生成流程
1. 用户选择需要生成问题的文本片段
2. 系统调用大模型API生成问题
3. 保存问题到问题列表和标签树
### 数据集生成流程
1. 用户选择需要生成答案的问题
2. 系统调用大模型API生成答案
3. 保存数据集结果
4. 提供导出功能

254
easy-dataset-main/AGENTS.md Normal file
View File

@@ -0,0 +1,254 @@
# Easy Dataset Agent 指南
## 项目概述
Easy Dataset 是一个专为大型语言模型LLM微调数据集创建而设计的应用程序。它提供完整的workflow从文档处理到数据集导出支持多种文件格式和AI模型。
## 技术栈
- **前端**: Next.js 14 (App Router), React 18, Material-UI v5
- **后端**: Node.js, Prisma ORM, SQLite
- **AI集成**: OpenAI API, Ollama, 智谱AI, OpenRouter
- **桌面应用**: Electron
- **国际化**: i18next
- **构建工具**: npm/pnpm, Electron Builder
## 核心架构
### 1. 数据流架构
```
文档上传 → 文本分割 → 问题生成 → 答案生成 → 数据集导出
↓ ↓ ↓ ↓ ↓
文件处理 智能分块 LLM生成 LLM生成 格式转换
```
### 2. 模块结构
```
lib/
├── api/ # API接口层
├── db/ # 数据访问层
├── file/ # 文件处理模块
├── llm/ # AI模型集成
├── services/ # 业务逻辑层
└── util/ # 工具函数
```
## 开发指南
### 环境设置
```bash
# 安装依赖
npm install
# 数据库初始化
npm run db:push
# 开发模式
npm run dev
# 构建
npm run build
```
### 代码规范
- 使用ES6+语法
- 模块化开发
- 异步操作使用async/await
- 错误处理使用try/catch
- 注释使用JSDoc格式
### 重要文件路径
- **主入口**: `app/page.js`
- **项目路由**: `app/projects/[projectId]/`
- **API路由**: `app/api/`
- **LLM核心**: `lib/llm/core/index.js`
- **任务处理**: `lib/services/tasks/`
## 功能模块详解
### 1. 文档处理模块 (`lib/file/`)
- **支持的格式**: PDF, Markdown, DOCX, EPUB, TXT
- **核心功能**:
- 智能文本分割
- 目录结构提取
- 自定义分隔符分块
- 多语言支持
### 2. AI模型集成 (`lib/llm/`)
- **支持的提供商**:
- OpenAI (GPT系列)
- Ollama (本地模型)
- 智谱AI (GLM系列)
- OpenRouter (多模型聚合)
- **功能特性**:
- 统一API接口
- 流式输出支持
- 多语言提示词
- 错误重试机制
### 3. 任务系统 (`lib/services/tasks/`)
- **任务类型**:
- 文件处理任务
- 问题生成任务
- 答案生成任务
- 数据清洗任务
- **状态管理**: 待处理、处理中、完成、失败
### 4. 数据管理 (`lib/db/`)
- **数据模型**:
- Project (项目)
- Text/Chunk (文本块)
- Question (问题)
- Dataset (数据集)
- Tag (标签)
## 常用开发任务
### 添加新的AI模型提供商
1.`lib/llm/core/providers/` 创建新的provider文件
2. 实现基础接口 (generate, streamGenerate)
3.`lib/llm/core/index.js` 中注册provider
4. 更新配置文件和UI界面
### 添加新的文件格式支持
1.`lib/file/file-process/` 创建格式处理器
2. 实现内容提取和文本转换逻辑
3. 更新文件类型检测和验证
4. 添加相应的UI组件
### 自定义提示词模板
1.`lib/llm/prompts/` 创建新的提示词文件
2. 使用i18n支持多语言
3. 在设置界面添加配置选项
4. 测试不同模型的效果
### 添加新的导出格式
1.`components/export/` 创建新的导出组件
2. 实现数据格式转换逻辑
3. 更新导出对话框界面
4. 添加格式验证和错误处理
## 调试技巧
### 1. 数据库调试
```bash
# 打开Prisma Studio
npm run db:studio
# 查看数据库文件
sqlite3 prisma/db.sqlite
```
### 2. LLM API调试
```javascript
// 在lib/llm/core/index.js中添加日志
console.log('LLM Request:', { provider, model, prompt });
console.log('LLM Response:', response);
```
### 3. 文件处理调试
```javascript
// 在lib/file/中添加调试信息
console.log('File processing:', fileName, fileType);
console.log('Text chunks:', chunks.length, chunks[0]);
```
## 性能优化建议
### 1. 文件处理优化
- 大文件分片处理
- 异步并发处理
- 内存使用监控
- 进度条显示
### 2. LLM调用优化
- 请求缓存机制
- 批量处理请求
- 重试策略优化
- 并发数控制
### 3. 前端性能优化
- 组件懒加载
- 虚拟滚动列表
- 图片懒加载
- 代码分割
## 常见问题解决
### 1. 数据库相关问题
- **问题**: 数据库连接失败
- **解决**: 检查prisma配置确保数据库文件存在
### 2. LLM API相关问题
- **问题**: API调用超时
- **解决**: 调整超时时间,检查网络连接,增加重试机制
### 3. 文件处理问题
- **问题**: 大文件处理内存溢出
- **解决**: 使用流式处理,分块读取,增加内存限制
### 4. Electron打包问题
- **问题**: 打包后应用无法启动
- **解决**: 检查依赖项配置确保native模块正确打包
## 部署指南
### Docker部署
```bash
# 构建镜像
docker build -t easy-dataset .
# 运行容器
docker run -d -p 1717:1717 -v ./local-db:/app/local-db easy-dataset
```
### 桌面应用构建
```bash
# 构建各平台安装包
npm run electron-build-mac # macOS
npm run electron-build-win # Windows
npm run electron-build-linux # Linux
```
## 贡献指南
### 提交规范
- 使用conventional commits格式
- 提交前运行lint检查
- 更新相关文档
- 添加测试用例
### 分支策略
- `main`: 主分支,稳定版本
- `dev`: 开发分支,集成新功能
- `feature/*`: 功能分支
- `fix/*`: 修复分支
---

View File

@@ -0,0 +1,183 @@
# Easy DataSet 项目架构设计
## 项目概述
Easy DataSet 是一个用于创建大模型微调数据集的应用程序。用户可以上传文本文件,系统会自动分割文本并生成问题,最终生成用于微调的数据集。
## 技术栈
- **前端框架**: Next.js 14 (App Router)
- **UI 框架**: Material-UI (MUI)
- **数据存储**: fs 文件系统模拟数据库
- **开发语言**: JavaScript
## 目录结构
```
easy-dataset/
├── app/ # Next.js 应用目录
│ ├── api/ # API 路由
│ │ └── projects/ # 项目相关 API
│ ├── projects/ # 项目相关页面
│ │ ├── [projectId]/ # 项目详情页面
│ └── page.js # 主页
├── components/ # React 组件
│ ├── home/ # 主页相关组件
│ │ ├── HeroSection.js
│ │ ├── ProjectList.js
│ │ └── StatsCard.js
│ ├── Navbar.js # 导航栏组件
│ └── CreateProjectDialog.js
├── lib/ # 工具库
│ └── db/ # 数据库模块
│ ├── base.js # 基础工具函数
│ ├── projects.js # 项目管理
│ ├── texts.js # 文本处理
│ ├── datasets.js # 数据集管理
│ └── index.js # 模块导出
├── styles/ # 样式文件
│ └── home.js # 主页样式
└── local-db/ # 本地数据库目录
```
## 核心模块设计
### 1. 数据库模块 (`lib/db/`)
#### base.js
- 提供基础的文件操作功能
- 确保数据库目录存在
- 读写 JSON 文件的工具函数
#### projects.js
- 项目的 CRUD 操作
- 项目配置管理
- 项目目录结构维护
#### texts.js
- 文献处理功能
- 文本片段存储和检索
- 文件上传处理
#### datasets.js
- 数据集生成和管理
- 问题列表管理
- 标签树管理
### 2. 前端组件 (`components/`)
#### Navbar.js
- 顶部导航栏
- 项目切换
- 模型选择
- 主题切换
#### home/ 目录组件
- HeroSection.js: 主页顶部展示区
- ProjectList.js: 项目列表展示
- StatsCard.js: 数据统计展示
- CreateProjectDialog.js: 创建项目的对话框
### 3. 页面路由 (`app/`)
#### 主页 (`page.js`)
- 项目列表展示
- 创建项目入口
- 数据统计展示
#### 项目详情页 (`projects/[projectId]/`)
- text-split/: 文献处理页面
- questions/: 问题列表页面
- datasets/: 数据集页面
- settings/: 项目设置页面
#### API 路由 (`api/`)
- projects/: 项目管理 API
- texts/: 文本处理 API
- questions/: 问题生成 API
- datasets/: 数据集管理 API
## 数据流设计
### 项目创建流程
1. 用户通过主页或导航栏创建新项目
2. 填写项目基本信息(名称、描述)
3. 系统创建项目目录和初始配置文件
4. 重定向到项目详情页
### 文献处理流程
1. 用户上传 Markdown 文件
2. 系统保存原始文件到项目目录
3. 调用文本分割服务,生成片段和目录结构
4. 展示分割结果和提取的目录
### 问题生成流程
1. 用户选择需要生成问题的文本片段
2. 系统调用大模型API生成问题
3. 保存问题到问题列表和标签树
### 数据集生成流程
1. 用户选择需要生成答案的问题
2. 系统调用大模型API生成答案
3. 保存数据集结果
4. 提供导出功能
## 模型配置
支持多种大模型提供商配置:
- Ollama
- OpenAI
- 硅基流动
- 深度求索
- 智谱AI
每个提供商支持配置:
- API 地址
- API 密钥
- 模型名称
## 未来扩展方向
1. 支持更多文件格式PDF、DOC等
2. 增加数据集质量评估功能
3. 添加数据集版本管理
4. 实现团队协作功能
5. 增加更多数据集导出格式
## 国际化处理
### 技术选型
- **国际化库**: i18next + react-i18next
- **语言检测**: i18next-browser-languagedetector
- **支持语言**: 英文(en)、简体中文(zh-CN)
### 目录结构
```
easy-dataset/
├── locales/ # 国际化资源目录
│ ├── en/ # 英文翻译
│ │ └── translation.json
│ ├── zh-CN/ # 中文翻译
│ │ └── translation.json
│ └── pt-BR/ # 中文翻译
│ └── translation.json
├── lib/
│ └── i18n.js # i18next 配置
```

View File

@@ -0,0 +1,86 @@
# 创建包含pnpm的基础镜像
FROM node:20-alpine AS pnpm-base
RUN npm install -g pnpm@9
# 构建阶段
FROM pnpm-base AS builder
WORKDIR /app
# 添加构建参数,用于识别目标平台
ARG TARGETPLATFORM
# 安装构建依赖
RUN apk add --no-cache --virtual .build-deps \
python3 \
make \
g++ \
cairo-dev \
pango-dev \
jpeg-dev \
giflib-dev \
librsvg-dev \
build-base \
pixman-dev \
pkgconfig
# 复制依赖文件和npm配置并安装(.npmrc中可配置国内源加速)
COPY package.json pnpm-lock.yaml .npmrc ./
RUN pnpm install
# 复制源代码
COPY . .
# 根据目标平台设置Prisma二进制目标并构建应用
RUN if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
echo "Configuring for ARM64 platform"; \
sed -i 's/binaryTargets = \[.*\]/binaryTargets = \["linux-musl-arm64-openssl-3.0.x"\]/' prisma/schema.prisma; \
PRISMA_CLI_BINARY_TARGETS="linux-musl-arm64-openssl-3.0.x" pnpm build; \
else \
echo "Configuring for AMD64 platform (default)"; \
sed -i 's/binaryTargets = \[.*\]/binaryTargets = \["linux-musl-openssl-3.0.x"\]/' prisma/schema.prisma; \
PRISMA_CLI_BINARY_TARGETS="linux-musl-openssl-3.0.x" pnpm build; \
fi
# 构建完成后移除开发依赖,只保留生产依赖
RUN pnpm prune --prod
# 运行阶段
FROM pnpm-base AS runner
WORKDIR /app
# 只安装运行时依赖
RUN apk add --no-cache \
cairo \
pango \
jpeg \
giflib \
librsvg \
pixman
# 复制package.json和.env文件
COPY package.json .env ./
# 从构建阶段复制精简后的node_modules只包含生产依赖
COPY --from=builder /app/node_modules ./node_modules
# 从构建阶段复制构建产物
COPY --from=builder /app/.next ./.next
COPY --from=builder /app/public ./public
COPY --from=builder /app/electron ./electron
# 复制 prisma 到模板目录(用于自动初始化)
COPY --from=builder /app/prisma /app/prisma-template
# 复制并设置 entrypoint 脚本sed 去除 Windows 换行符 \r防止 CRLF 导致 "no such file or directory"
COPY docker-entrypoint.sh /usr/local/bin/
RUN sed -i 's/\r$//' /usr/local/bin/docker-entrypoint.sh && \
chmod +x /usr/local/bin/docker-entrypoint.sh
# 设置生产环境
ENV NODE_ENV=production
EXPOSE 1717
# 使用 entrypoint 脚本
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
CMD ["pnpm", "start"]

40
easy-dataset-main/LICENSE Normal file
View File

@@ -0,0 +1,40 @@
GNU AFFERO GENERAL PUBLIC LICENSE
Version 3, 19 November 2007
Copyright (C) 2025 Easy Dataset Project
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see https://www.gnu.org/licenses/.
Additional Terms for Easy Dataset:
1. Contact Information
If you wish to use Easy Dataset under different terms, please contact the
copyright holders at: 1009903985@qq.com
2. Branding Restrictions
You may not use the names "Easy Dataset" or "EasyDataset" to endorse or
promote products derived from this software without prior written permission.
3. Disclaimer of Warranty
The software is provided "as is", without warranty of any kind, express or
implied, including but not limited to the warranties of merchantability,
fitness for a particular purpose and noninfringement. In no event shall the
authors or copyright holders be liable for any claim, damages or other
liability, whether in an action of contract, tort or otherwise, arising from,
out of or in connection with the software or the use or other dealings in the
software.
4. Compliance with Laws
You are responsible for ensuring your use of the software complies with all
applicable laws, including but not limited to export control regulations.

294
easy-dataset-main/README.md Normal file
View File

@@ -0,0 +1,294 @@
<div align="center">
![](./public//imgs/bg2.png)
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ConardLi/easy-dataset">
<img alt="GitHub Downloads (all assets, all releases)" src="https://img.shields.io/github/downloads/ConardLi/easy-dataset/total">
<img alt="GitHub Release" src="https://img.shields.io/github/v/release/ConardLi/easy-dataset">
<img src="https://img.shields.io/badge/license-AGPL--3.0-green.svg" alt="AGPL 3.0 License"/>
<img alt="GitHub contributors" src="https://img.shields.io/github/contributors/ConardLi/easy-dataset">
<img alt="GitHub last commit" src="https://img.shields.io/github/last-commit/ConardLi/easy-dataset">
<a href="https://arxiv.org/abs/2507.04009v1" target="_blank">
<img src="https://img.shields.io/badge/arXiv-2507.04009-b31b1b.svg" alt="arXiv:2507.04009">
</a>
<a href="https://trendshift.io/repositories/13944" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13944" alt="ConardLi%2Feasy-dataset | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
**A powerful tool for creating fine-tuning datasets for Large Language Models**
[简体中文](./README.zh-CN.md) | [English](./README.md) | [Türkçe](./README.tr.md)
[Features](#features) • [Quick Start](#local-run) • [Documentation](https://docs.easy-dataset.com/ed/en) • [Contributing](#contributing) • [License](#license)
If you like this project, please give it a Star⭐, or buy the author a coffee => [Donate](./public/imgs/aw.jpg) ❤️!
</div>
## Overview
Easy Dataset is an application specifically designed for building large language model (LLM) datasets. It features an intuitive interface, along with built-in powerful document parsing tools, intelligent segmentation algorithms, data cleaning and augmentation capabilities. The application can convert domain-specific documents in various formats into high-quality structured datasets, which are applicable to scenarios such as model fine-tuning, retrieval-augmented generation (RAG), and model performance evaluation.
![](./public/imgs/arc3.png)
## News
🎉🎉 Easy Dataset Version 1.7.0 launches brand-new evaluation capabilities! You can effortlessly convert domain-specific documents into evaluation datasets (test sets) and automatically run multi-dimensional evaluation tasks. Additionally, it comes with a human blind test system, enabling you to easily meet needs such as vertical domain model evaluation, post-fine-tuning model performance assessment, and RAG recall rate evaluation. Tutorial: [https://www.bilibili.com/video/BV1CRrVB7Eb4/](https://www.bilibili.com/video/BV1CRrVB7Eb4/)
## Features
### 📄 Document Processing & Data Generation
- **Intelligent Document Processing**: Supports PDF, Markdown, DOCX, TXT, EPUB and more formats with intelligent recognition
- **Intelligent Text Splitting**: Multiple splitting algorithms (Markdown structure, recursive separators, fixed length, code-aware chunking), with customizable visual segmentation
- **Intelligent Question Generation**: Auto-extract relevant questions from text segments, with question templates and batch generation
- **Domain Label Tree**: Intelligently builds global domain label trees based on document structure, with auto-tagging capabilities
- **Answer Generation**: Uses LLM API to generate comprehensive answers and Chain of Thought (COT), with AI optimization
- **Data Cleaning**: Intelligent text cleaning to remove noise and improve data quality
### 🔄 Multiple Dataset Types
- **Single-Turn QA Datasets**: Standard question-answer pairs for basic fine-tuning
- **Multi-Turn Dialogue Datasets**: Customizable roles and scenarios for conversational format
- **Image QA Datasets**: Generate visual QA data from images, with multiple import methods (directory, PDF, ZIP)
- **Data Distillation**: Generate label trees and questions directly from domain topics without uploading documents
### 📊 Model Evaluation System
- **Evaluation Datasets**: Generate true/false, single-choice, multiple-choice, short-answer, and open-ended questions
- **Automated Model Evaluation**: Use Judge Model to automatically evaluate model answer quality with customizable scoring rules
- **Human Blind Test (Arena)**: Double-blind comparison of two models' answers for unbiased evaluation
- **AI Quality Assessment**: Automatic quality scoring and filtering of generated datasets
### 🛠️ Advanced Features
- **Custom Prompts**: Project-level customization of all prompt templates (question generation, answer generation, data cleaning, etc.)
- **GA Pair Generation**: Genre-Audience pair generation to enrich data diversity
- **Task Management Center**: Background batch task processing with monitoring and interruption support
- **Resource Monitoring Dashboard**: Token consumption statistics, API call tracking, model performance analysis
- **Model Testing Playground**: Compare up to 3 models simultaneously
### 📤 Export & Integration
- **Multiple Export Formats**: Alpaca, ShareGPT, Multilingual-Thinking formats with JSON/JSONL file types
- **Balanced Export**: Configure export counts per tag for dataset balancing
- **LLaMA Factory Integration**: One-click LLaMA Factory configuration file generation
- **Hugging Face Upload**: Direct upload datasets to Hugging Face Hub
### 🤖 Model Support
- **Wide Model Compatibility**: Compatible with all LLM APIs that follow the OpenAI format
- **Multi-Provider Support**: OpenAI, Ollama (local models), Zhipu AI, Alibaba Bailian, OpenRouter, and more
- **Vision Models**: Support Gemini, Claude, etc. for PDF parsing and image QA
### 🌐 User Experience
- **User-Friendly Interface**: Modern, intuitive UI designed for both technical and non-technical users
- **Multi-Language Support**: Complete Chinese, English, Turkish and Portuguese language support 🇹🇷
- **Dataset Square**: Discover and explore public dataset resources
- **Desktop Clients**: Available for Windows, macOS, and Linux
## Quick Demo
https://github.com/user-attachments/assets/6ddb1225-3d1b-4695-90cd-aa4cb01376a8
## Local Run
### Download Client
<table style="width: 100%">
<tr>
<td width="20%" align="center">
<b>Windows</b>
</td>
<td width="30%" align="center" colspan="2">
<b>MacOS</b>
</td>
<td width="20%" align="center">
<b>Linux</b>
</td>
</tr>
<tr style="text-align: center">
<td align="center" valign="middle">
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
<img src='./public/imgs/windows.png' style="height:24px; width: 24px" />
<br />
<b>Setup.exe</b>
</a>
</td>
<td align="center" valign="middle">
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
<img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
<br />
<b>Intel</b>
</a>
</td>
<td align="center" valign="middle">
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
<img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
<br />
<b>M</b>
</a>
</td>
<td align="center" valign="middle">
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
<img src='./public/imgs/linux.png' style="height:24px; width: 24px" />
<br />
<b>AppImage</b>
</a>
</td>
</tr>
</table>
### Install with NPM
1. Clone the repository:
```bash
git clone https://github.com/ConardLi/easy-dataset.git
cd easy-dataset
```
2. Install dependencies:
```bash
npm install
```
3. Start the development server:
```bash
npm run build
npm run start
```
4. Open your browser and visit `http://localhost:1717`
### Using the Official Docker Image
1. Clone the repository:
```bash
git clone https://github.com/ConardLi/easy-dataset.git
cd easy-dataset
```
2. Modify the `docker-compose.yml` file:
```yml
services:
easy-dataset:
image: ghcr.io/conardli/easy-dataset
container_name: easy-dataset
ports:
- '1717:1717'
volumes:
- ./local-db:/app/local-db
- ./prisma:/app/prisma
restart: unless-stopped
```
> **Note:** It is recommended to use the `local-db` and `prisma` folders in the current code repository directory as mount paths to maintain consistency with the database paths when starting via NPM.
> **Note:** The database file will be automatically initialized on first startup, no need to manually run `npm run db:push`.
3. Start with docker-compose:
```bash
docker-compose up -d
```
4. Open a browser and visit `http://localhost:1717`
### Building with a Local Dockerfile
If you want to build the image yourself, use the Dockerfile in the project root directory:
1. Clone the repository:
```bash
git clone https://github.com/ConardLi/easy-dataset.git
cd easy-dataset
```
2. Build the Docker image:
```bash
docker build -t easy-dataset .
```
3. Run the container:
```bash
docker run -d \
-p 1717:1717 \
-v ./local-db:/app/local-db \
-v ./prisma:/app/prisma \
--name easy-dataset \
easy-dataset
```
> **Note:** It is recommended to use the `local-db` and `prisma` folders in the current code repository directory as mount paths to maintain consistency with the database paths when starting via NPM.
> **Note:** The database file will be automatically initialized on first startup, no need to manually run `npm run db:push`.
4. Open a browser and visit `http://localhost:1717`
## Documentation
- View the demo video of this project: [Easy Dataset Demo Video](https://www.bilibili.com/video/BV1y8QpYGE57/)
- For detailed documentation on all features and APIs, visit our [Documentation Site](https://docs.easy-dataset.com/ed/en)
- View the paper of this project: [Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents](https://arxiv.org/abs/2507.04009v1)
## Community Practice
- [Complete test set generation and model evaluation with Easy Dataset](https://www.bilibili.com/video/BV1CRrVB7Eb4/)
- [Easy Dataset × LLaMA Factory: Enabling LLMs to Efficiently Learn Domain Knowledge](https://buaa-act.feishu.cn/wiki/GVzlwYcRFiR8OLkHbL6cQpYin7g)
- [Easy Dataset Practical Guide: How to Build High-Quality Datasets?](https://www.bilibili.com/video/BV1MRMnz1EGW)
- [Interpretation of Key Feature Updates in Easy Dataset](https://www.bilibili.com/video/BV1fyJhzHEb7/)
- [Foundation Models Fine-tuning Datasets: Basic Knowledge Popularization](https://docs.easy-dataset.com/zhi-shi-ke-pu)
## Contributing
We welcome contributions from the community! If you'd like to contribute to Easy Dataset, please follow these steps:
1. Fork the repository
2. Create a new branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Commit your changes (`git commit -m 'Add some amazing feature'`)
5. Push to the branch (`git push origin feature/amazing-feature`)
6. Open a Pull Request (submit to the DEV branch)
Please ensure that tests are appropriately updated and adhere to the existing coding style.
## Join Discussion Group & Contact the Author
https://docs.easy-dataset.com/geng-duo/lian-xi-wo-men
## License
This project is licensed under the AGPL 3.0 License - see the [LICENSE](LICENSE) file for details.
## Citation
If this work is helpful, please kindly cite as:
```bibtex
@misc{miao2025easydataset,
title={Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents},
author={Ziyang Miao and Qiyu Sun and Jingyuan Wang and Yuchen Gong and Yaowei Zheng and Shiqi Li and Richong Zhang},
year={2025},
eprint={2507.04009},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.04009}
}
```
## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=ConardLi/easy-dataset&type=Date)](https://www.star-history.com/#ConardLi/easy-dataset&Date)
<div align="center">
<sub>Built with ❤️ by <a href="https://github.com/ConardLi">ConardLi</a> • Follow me: <a href="./public/imgs/weichat.jpg">WeChat Official Account</a><a href="https://space.bilibili.com/474921808">Bilibili</a><a href="https://juejin.cn/user/3949101466785709">Juejin</a><a href="https://www.zhihu.com/people/wen-ti-chao-ji-duo-de-xiao-qi">Zhihu</a><a href="https://www.youtube.com/@garden-conard">Youtube</a></sub>
</div>

View File

@@ -0,0 +1,319 @@
<div align="center">
![](./public//imgs/bg2.png)
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ConardLi/easy-dataset">
<img alt="GitHub Downloads (all assets, all releases)" src="https://img.shields.io/github/downloads/ConardLi/easy-dataset/total">
<img alt="GitHub Release" src="https://img.shields.io/github/v/release/ConardLi/easy-dataset">
<img src="https://img.shields.io/badge/license-AGPL--3.0-green.svg" alt="AGPL 3.0 License"/>
<img alt="GitHub contributors" src="https://img.shields.io/github/contributors/ConardLi/easy-dataset">
<img alt="GitHub last commit" src="https://img.shields.io/github/last-commit/ConardLi/easy-dataset">
<a href="https://arxiv.org/abs/2507.04009v1" target="_blank">
<img src="https://img.shields.io/badge/arXiv-2507.04009-b31b1b.svg" alt="arXiv:2507.04009">
</a>
<a href="https://trendshift.io/repositories/13944" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13944" alt="ConardLi%2Feasy-dataset | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
**Büyük Dil Modelleri için ince ayar veri setleri oluşturmak için güçlü bir araç**
[简体中文](./README.zh-CN.md) | [English](./README.md) | [Türkçe](./README.tr.md)
[Özellikler](#özellikler) • [Hızlı Başlangıç](#yerel-çalıştırma) • [Dokümantasyon](https://docs.easy-dataset.com/ed/en) • [Katkıda Bulunma](#katkıda-bulunma) • [Lisans](#lisans)
Bu projeyi beğendiyseniz, lütfen bir Yıldız⭐ verin veya yazara bir kahve ısmarlayın => [Bağış](./public/imgs/aw.jpg) ❤️!
</div>
## Genel Bakış
Easy Dataset, Büyük Dil Modelleri (LLM'ler) için özel olarak tasarlanmış ince ayar veri setleri oluşturmak için bir uygulamadır. Alana özgü dosyaları yüklemek, içeriği akıllıca bölmek, sorular oluşturmak ve model ince ayarı için yüksek kaliteli eğitim verileri üretmek için sezgisel bir arayüz sağlar.
Easy Dataset ile alan bilgisini yapılandırılmış veri setlerine dönüştürebilir, OpenAI formatını takip eden tüm LLM API'leriyle uyumlu çalışabilir ve ince ayar sürecini basit ve verimli hale getirebilirsiniz.
![](./public/imgs/arc3.png)
## Özellikler
- **Akıllı Belge İşleme**: PDF, Markdown, DOCX dahil birden fazla formatın akıllı tanınması ve işlenmesi desteği
- **Akıllı Metin Bölme**: Birden fazla akıllı metin bölme algoritması ve özelleştirilebilir görsel segmentasyon desteği
- **Akıllı Soru Üretimi**: Her metin bölümünden ilgili soruları çıkarır
- **Alan Etiketleri**: Veri setleri için global alan etiketlerini akıllıca oluşturur, küresel anlama yeteneklerine sahiptir
- **Cevap Üretimi**: Kapsamlı cevaplar ve Düşünce Zinciri (COT) oluşturmak için LLM API kullanır
- **Esnek Düzenleme**: Sürecin herhangi bir aşamasında soruları, cevapları ve veri setlerini düzenleyin
- **Çoklu Dışa Aktarma Formatları**: Veri setlerini çeşitli formatlarda (Alpaca, ShareGPT, çok dilli düşünme) ve dosya türlerinde (JSON, JSONL) dışa aktarın
- **Geniş Model Desteği**: OpenAI formatını takip eden tüm LLM API'leriyle uyumlu
- **Tam Türkçe Dil Desteği**: Tüm arayüz ve AI işlemleri için eksiksiz Türkçe çeviriler 🇹🇷
- **Kullanıcı Dostu Arayüz**: Hem teknik hem de teknik olmayan kullanıcılar için tasarlanmış sezgisel kullanıcı arayüzü
- **Özel Sistem İstemleri**: Model yanıtlarını yönlendirmek için özel sistem istemleri ekleyin
## Hızlı Demo
https://github.com/user-attachments/assets/6ddb1225-3d1b-4695-90cd-aa4cb01376a8
## Yerel Çalıştırma
### İstemciyi İndirin
<table style="width: 100%">
<tr>
<td width="20%" align="center">
<b>Windows</b>
</td>
<td width="30%" align="center" colspan="2">
<b>MacOS</b>
</td>
<td width="20%" align="center">
<b>Linux</b>
</td>
</tr>
<tr style="text-align: center">
<td align="center" valign="middle">
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
<img src='./public/imgs/windows.png' style="height:24px; width: 24px" />
<br />
<b>Setup.exe</b>
</a>
</td>
<td align="center" valign="middle">
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
<img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
<br />
<b>Intel</b>
</a>
</td>
<td align="center" valign="middle">
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
<img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
<br />
<b>M</b>
</a>
</td>
<td align="center" valign="middle">
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
<img src='./public/imgs/linux.png' style="height:24px; width: 24px" />
<br />
<b>AppImage</b>
</a>
</td>
</tr>
</table>
### NPM ile Kurulum
```bash
npm install
npm run db:push
npm run dev
```
### Docker ile Kurulum
```bash
docker-compose up -d
```
Ardından `http://localhost:1717` adresine gidin.
## Desteklenen AI Sağlayıcıları
Easy Dataset, aşağıdakiler dahil olmak üzere birden fazla AI sağlayıcısını destekler:
- **OpenAI**: GPT-4, GPT-3.5-turbo ve diğer modeller
- **Ollama**: Yerel model çalıştırma
- **智谱AI (GLM)**: Çince modeller
- **OpenRouter**: Çoklu model aggregatör
- **Özel API Uç Noktaları**: OpenAI formatını takip eden herhangi bir API
## Proje Yapısı
```
easy-dataset/
├── app/ # Next.js uygulama yönlendiricisi
│ ├── api/ # API rotaları
│ ├── projects/ # Proje sayfaları
│ └── dataset-square/ # Veri seti galerisi
├── components/ # React bileşenleri
├── lib/ # Temel kütüphaneler
│ ├── llm/ # LLM entegrasyonu
│ ├── db/ # Veritabanı erişimi
│ ├── file/ # Dosya işleme
│ └── services/ # İş mantığı
├── locales/ # i18n çevirileri
│ ├── en/ # İngilizce
│ ├── zh-CN/ # Basitleştirilmiş Çince
│ └── tr/ # Türkçe
├── prisma/ # Veritabanı şeması
└── electron/ # Electron masaüstü uygulaması
```
## Kullanım Rehberi
### 1. Proje Oluşturma
İlk olarak, yeni bir proje oluşturun ve proje adını, açıklamasını ve diğer temel bilgileri yapılandırın.
### 2. Dosya Yükleme
Alana özgü belgelerinizi yükleyin. Desteklenen formatlar:
- PDF
- Markdown (.md)
- Microsoft Word (.docx)
- EPUB
- Düz metin (.txt)
### 3. Metin Bölme
Dosyalar aşağıdaki yöntemlerle akıllıca bölünebilir:
- Doğal dil işleme tabanlı semantik bölme
- Özel ayırıcılara dayalı bölme
- Karakter sayısına dayalı sabit boyutlu bölme
- Manuel görsel bölme
### 4. Alan Etiketleri Oluşturma
Sistem, belge içeriğine dayalı olarak otomatik olarak hiyerarşik alan etiketleri oluşturabilir ve iki seviyeyi destekler.
### 5. Soru Üretimi
Her metin bloğu için sistem:
- İçeriğe dayalı alakalı sorular oluşturur
- Tür ve hedef kitle perspektifi sorgulamayı destekler
- Soru sayısını özelleştirme seçeneği sunar
### 6. Cevap Üretimi
Yapılandırılmış LLM API'si kullanarak:
- Her soru için kapsamlı cevaplar oluşturur
- Düşünce Zinciri (COT) üretimini destekler
- Farklı cevap şablonları destekler
### 7. Veri Seti Dışa Aktarma
Veri setinizi çeşitli formatlarda dışa aktarın:
- **Alpaca Format**: Basit talimat-takip formatı
- **ShareGPT Format**: Çok turlu konuşma formatı
- **Çok Dilli Düşünme**: COT ile genişletilmiş format
- **Özel Format**: Kendi JSON yapınızı tanımlayın
Dışa aktarma hedefleri:
- Yerel dosya sistemi
- Hugging Face Hub
- LLaMA Factory uyumluluğu
## Gelişmiş Özellikler
### Veri Damıtma
Mevcut veri setlerinden yeni eğitim örnekleri oluşturun:
- Soru damıtma: Mevcut soru-cevap çiftlerinden yeni sorular oluşturun
- Etiket damıtma: Otomatik etiket ve kategorizasyon oluşturma
### Tür-Hedef Kitle (GA) Çiftleri
Spesifik içerik stilleri ve hedef kitleler için veri setlerini uyarlayın:
- Tür: Akademik, teknik, yaratıcı yazma, vb.
- Hedef Kitle: Yeni başlayanlar, uzmanlar, öğrenciler, vb.
### Toplu İşlemler
Birden fazla öğeye verimli bir şekilde işlem:
- Toplu soru üretimi
- Toplu cevap üretimi
- Toplu veri seti dışa aktarma
### Görev Yönetimi
Tüm arka plan görevlerini izleyin ve yönetin:
- Dosya işleme görevleri
- Soru üretim görevleri
- Cevap üretim görevleri
- Dışa aktarma görevleri
## Yapılandırma
### LLM API Yapılandırması
Ayarlar sayfasında LLM API'nizi yapılandırın:
1. **Sağlayıcı**: OpenAI, Ollama, 智谱AI veya özel seçin
2. **API Anahtarı**: API anahtarınızı girin (gerekirse)
3. **Model**: Kullanılacak modeli seçin
4. **Temel URL**: Özel API'ler için temel URL'yi ayarlayın
### Görev Ayarları
Görev yürütme parametrelerini özelleştirin:
- Soru üretimi için eşzamanlılık
- Cevap üretimi için eşzamanlılık
- Varsayılan soru sayısı
- Varsayılan cevap şablonu
### Özel İstemler
Her görev türü için özel sistem istemleri ekleyin:
- Soru üretim istemi
- Cevap üretim istemi
- Etiket üretim istemi
- Damıtma istemi
## Katkıda Bulunma
Katkılara hoş geldiniz! Lütfen şu adımları izleyin:
1. Repo'yu fork edin
2. Bir özellik dalı oluşturun (`git checkout -b feature/amazing-feature`)
3. Değişikliklerinizi commit edin (`git commit -m 'Add some amazing feature'`)
4. Dala push edin (`git push origin feature/amazing-feature`)
5. Bir Pull Request açın
## Lisans
Bu proje AGPL-3.0 Lisansı altında lisanslanmıştır. Detaylar için [LICENSE](./LICENSE) dosyasına bakın.
## İletişim
- **GitHub Issues**: [Yeni bir sorun oluşturun](https://github.com/ConardLi/easy-dataset/issues)
- **Email**: lhj19950927@gmail.com
- **WeChat Grubu**: README'deki QR koduna bakın
## Alıntı
Bu aracı araştırmanızda kullanırsanız, lütfen şu şekilde alıntı yapın:
```bibtex
@misc{easy-dataset-2025,
title={Easy Dataset: A Tool for Creating Fine-tuning Datasets for Large Language Models},
author={Conard Li},
year={2025},
publisher={GitHub},
howpublished={\url{https://github.com/ConardLi/easy-dataset}}
}
```
## Teşekkürler
Bu proje aşağıdaki harika açık kaynak projelerini kullanır:
- [Next.js](https://nextjs.org/)
- [React](https://reactjs.org/)
- [Material-UI](https://mui.com/)
- [Prisma](https://www.prisma.io/)
- [Electron](https://www.electronjs.org/)
---
<div align="center">
⭐️ Bu projeyi beğendiyseniz, lütfen bir yıldız verin! ⭐️
</div>

View File

@@ -0,0 +1,300 @@
<div align="center">
![](./public//imgs/bg2.png)
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ConardLi/easy-dataset">
<img alt="GitHub Downloads (all assets, all releases)" src="https://img.shields.io/github/downloads/ConardLi/easy-dataset/total">
<img alt="GitHub Release" src="https://img.shields.io/github/v/release/ConardLi/easy-dataset">
<img src="https://img.shields.io/badge/license-AGPL--3.0-green.svg" alt="AGPL 3.0 License"/>
<img alt="GitHub contributors" src="https://img.shields.io/github/contributors/ConardLi/easy-dataset">
<img alt="GitHub last commit" src="https://img.shields.io/github/last-commit/ConardLi/easy-dataset">
<a href="https://arxiv.org/abs/2507.04009v1" target="_blank">
<img src="https://img.shields.io/badge/arXiv-2507.04009-b31b1b.svg" alt="arXiv:2507.04009">
</a>
<a href="https://trendshift.io/repositories/13944" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13944" alt="ConardLi%2Feasy-dataset | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
**一个强大的大型语言模型微调数据集创建工具**
[简体中文](./README.zh-CN.md) | [English](./README.md)
[功能特点](#功能特点) • [快速开始](#本地运行) • [使用文档](https://docs.easy-dataset.com/) • [贡献](#贡献) • [许可证](#许可证)
如果喜欢本项目,请给本项目留下 Star⭐或者请作者喝杯咖啡呀 => [打赏作者](./public/imgs/aw.jpg) ❤️!
</div>
## 概述
Easy Dataset 是一个专为创建大型语言模型数据集而设计的应用程序。它提供了直观的界面内置了强大的文档解析工具、智能分割算法、数据清洗和数据增强能力可以将各种格式的领域文献转化为高质量结构化数据集可用于模型微调、RAG、模型效果评估等场景。
![Easy Dataset 产品架构图](./public/imgs/arc3.png)
## 新闻
🎉🎉 Easy Dataset 1.7.0 版本上线全新的评估能力你可以轻松将领域文献转换为评估数据集测试集并且可以自动执行多维度评估任务另外还配备人工盲测系统可以轻松助你完成垂直领域模型评估、模型微调后效果评估、RAG 召回率评估等需求,使用教程: [https://www.bilibili.com/video/BV1CRrVB7Eb4/](https://www.bilibili.com/video/BV1CRrVB7Eb4/)
## 功能特点
### 📄 文档处理与数据生成
- **智能文档处理**:支持 PDF、Markdown、DOCX、TXT、EPUB 等多种格式智能识别和处理
- **智能文本分割**支持多种智能文本分割算法Markdown 结构、递归分隔符、固定长度、代码智能分块等),支持自定义可视化分段
- **智能问题生成**:从每个文本片段中自动提取相关问题,支持问题模板和批量生成
- **领域标签树**:基于文档目录智能构建全局领域标签树,具备全局理解和自动打标能力
- **答案生成**:使用 LLM API 为每个问题生成全面的答案和思维链COT支持 AI 智能优化
- **数据清洗**:智能清洗文本块内容,去除噪音数据,提升数据质量
### 🔄 多种数据集类型
- **单轮问答数据集**:标准的问答对格式,适合基础微调
- **多轮对话数据集**:支持自定义角色和场景的多轮对话格式
- **图片问答数据集**基于图片生成视觉问答数据支持多种导入方式目录、PDF、压缩包
- **数据蒸馏**:无需上传文档,直接从领域主题自动生成标签树和问题
### 📊 模型评估体系
- **评估数据集**:支持生成判断题、单选题、多选题、简答题、开放题等多种题型的评估测试集
- **模型自动评估**使用教师模型Judge Model自动评估模型回答质量支持自定义评分规则
- **人工盲测 (Arena)**:双盲对比两个模型的回答质量,消除偏见进行公正评判
- **AI 质量评估**:对生成的数据集进行自动质量评分和筛选
### 🛠️ 高级功能
- **自定义提示词**:项目级自定义各类提示词模板(问题生成、答案生成、数据清洗等)
- **GA 组合生成**:文体-受众对生成,丰富数据多样性
- **任务管理中心**:后台批量任务处理,支持任务监控和中断
- **资源监控看板**Token 消耗统计、调用次数追踪、模型性能分析
- **模型测试 Playground**:支持最多 3 个模型同时对比测试
### 📤 导出与集成
- **多种导出格式**:支持 Alpaca、ShareGPT、Multilingual-Thinking 等格式JSON/JSONL 文件类型
- **平衡导出**:按标签配置导出数量,实现数据集均衡
- **LLaMA Factory 集成**:一键生成 LLaMA Factory 配置文件
- **Hugging Face 上传**:直接将数据集上传至 Hugging Face Hub
### 🤖 模型支持
- **广泛的模型兼容**:兼容所有遵循 OpenAI 格式的 LLM API
- **多提供商支持**OpenAI、Ollama本地模型、智谱 AI、阿里百炼、OpenRouter 等
- **视觉模型**:支持 Gemini、Claude 等视觉模型用于 PDF 解析和图片问答
### 🌐 用户体验
- **用户友好界面**:为技术和非技术用户设计的现代化直观 UI
- **多语言支持**:完整的中英文界面支持
- **数据集广场**:发现和探索各种公开数据集资源
- **桌面客户端**:提供 Windows、macOS、Linux 桌面应用
## 快速演示
https://github.com/user-attachments/assets/6ddb1225-3d1b-4695-90cd-aa4cb01376a8
## 本地运行
### 下载客户端
<table style="width: 100%">
<tr>
<td width="20%" align="center">
<b>Windows</b>
</td>
<td width="30%" align="center" colspan="2">
<b>MacOS</b>
</td>
<td width="20%" align="center">
<b>Linux</b>
</td>
</tr>
<tr style="text-align: center">
<td align="center" valign="middle">
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
<img src='./public/imgs/windows.png' style="height:24px; width: 24px" />
<br />
<b>Setup.exe</b>
</a>
</td>
<td align="center" valign="middle">
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
<img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
<br />
<b>Intel</b>
</a>
</td>
<td align="center" valign="middle">
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
<img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
<br />
<b>M</b>
</a>
</td>
<td align="center" valign="middle">
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
<img src='./public/imgs/linux.png' style="height:24px; width: 24px" />
<br />
<b>AppImage</b>
</a>
</td>
</tr>
</table>
### 使用 NPM 安装
1. 克隆仓库:
```bash
git clone https://github.com/ConardLi/easy-dataset.git
cd easy-dataset
```
2. 安装依赖:
```bash
npm install
```
3. 启动开发服务器:
```bash
npm run build
npm run start
```
4. 打开浏览器并访问 `http://localhost:1717`
### 使用官方 Docker 镜像
1. 克隆仓库:
```bash
git clone https://github.com/ConardLi/easy-dataset.git
cd easy-dataset
```
2. 更改 `docker-compose.yml` 文件:
```yml
services:
easy-dataset:
image: ghcr.io/conardli/easy-dataset
container_name: easy-dataset
ports:
- '1717:1717'
volumes:
- ./local-db:/app/local-db
- ./prisma:/app/prisma
restart: unless-stopped
```
> **注意:** 建议直接使用当前代码仓库目录下的 `local-db` 和 `prisma` 文件夹作为挂载路径,这样可以和 NPM 启动时的数据库路径保持一致。
> **注意:** 数据库文件会在首次启动时自动初始化,无需手动执行 `npm run db:push`。
3. 使用 docker-compose 启动
```bash
docker-compose up -d
```
4. 打开浏览器并访问 `http://localhost:1717`
### 使用本地 Dockerfile 构建
如果你想自行构建镜像,可以使用项目根目录中的 Dockerfile
1. 克隆仓库:
```bash
git clone https://github.com/ConardLi/easy-dataset.git
cd easy-dataset
```
2. 构建 Docker 镜像:
```bash
docker build -t easy-dataset .
```
3. 运行容器:
```bash
docker run -d \
-p 1717:1717 \
-v ./local-db:/app/local-db \
-v ./prisma:/app/prisma \
--name easy-dataset \
easy-dataset
```
> **注意:** 建议直接使用当前代码仓库目录下的 `local-db` 和 `prisma` 文件夹作为挂载路径,这样可以和 NPM 启动时的数据库路径保持一致。
> **注意:** 数据库文件会在首次启动时自动初始化,无需手动执行 `npm run db:push`。
4. 打开浏览器,访问 `http://localhost:1717`
## 文档
- 有关所有功能和 API 的详细文档,请访问我们的 [文档站点](https://docs.easy-dataset.com/)
- 查看本项目的演示视频:[Easy Dataset 演示视频](https://www.bilibili.com/video/BV1y8QpYGE57/)
- 查看本项目的论文:[Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents](https://arxiv.org/abs/2507.04009v1)
## 社区教程
- [使用 Easy Dataset 完成测试集生成和模型评估](https://www.bilibili.com/video/BV1CRrVB7Eb4/)
- [Easy Dataset × LLaMA Factory: 让大模型高效学习领域知识](https://buaa-act.feishu.cn/wiki/KY9xwTGs1iqHrRkjXBwcZP9WnL9)
- [Easy Dataset 使用实战: 如何构建高质量数据集?](https://www.bilibili.com/video/BV1MRMnz1EGW)
- [Easy Dataset 1.4 重点功能更新解读](https://www.bilibili.com/video/BV1fyJhzHEb7/)
- [Easy Dataset 1.6 重点功能更新解读](https://www.bilibili.com/video/BV1Rq1hBtEJa/)
- [大模型微调数据集: 基础知识科普](https://docs.easy-dataset.com/zhi-shi-ke-pu)
- [实战案例1生成汽车图片识别数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-1-sheng-cheng-qi-che-tu-pian-shi-bie-shu-ju-ji)
- [实战案例2评论情感分类数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-2-ping-lun-qing-gan-fen-lei-shu-ju-ji)
- [实战案例3物理学多轮对话数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-3-wu-li-xue-duo-lun-dui-hua-shu-ju-ji)
- [实战案例4AI 智能体安全数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-4ai-zhi-neng-ti-an-quan-shu-ju-ji)
- [实战案例5从图文 PPT 中提取数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-5-cong-tu-wen-ppt-zhong-ti-qu-shu-ju-ji)
## 贡献
我们欢迎社区的贡献!如果您想为 Easy Dataset 做出贡献,请按照以下步骤操作:
1. Fork 仓库
2. 创建新分支(`git checkout -b feature/amazing-feature`
3. 进行更改
4. 提交更改(`git commit -m '添加一些惊人的功能'`
5. 推送到分支(`git push origin feature/amazing-feature`
6. 打开 Pull Request提交至 DEV 分支)
请确保适当更新测试并遵守现有的编码风格。
## 加交流群 & 联系作者
https://docs.easy-dataset.com/geng-duo/lian-xi-wo-men
## 许可证
本项目采用 AGPL 3.0 许可证 - 有关详细信息,请参阅 [LICENSE](LICENSE) 文件。
## 引用
如果您觉得此项目有帮助,请考虑以下列格式引用
```bibtex
@misc{miao2025easydataset,
title={Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents},
author={Ziyang Miao and Qiyu Sun and Jingyuan Wang and Yuchen Gong and Yaowei Zheng and Shiqi Li and Richong Zhang},
year={2025},
eprint={2507.04009},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.04009}
}
```
## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=ConardLi/easy-dataset&type=Date)](https://www.star-history.com/#ConardLi/easy-dataset&Date)
<div align="center">
<sub>由 <a href="https://github.com/ConardLi">ConardLi</a> 用 ❤️ 构建 • 关注我:<a href="./public/imgs/weichat.jpg">公众号</a><a href="https://space.bilibili.com/474921808">B站</a><a href="https://juejin.cn/user/3949101466785709">掘金</a><a href="https://www.zhihu.com/people/wen-ti-chao-ji-duo-de-xiao-qi">知乎</a><a href="https://www.youtube.com/@garden-conard">Youtube</a></sub>
</div>

View File

@@ -0,0 +1,86 @@
import { NextResponse } from 'next/server';
import path from 'path';
import fs from 'fs';
// Get current version
function getCurrentVersion() {
try {
const packageJsonPath = path.join(process.cwd(), 'package.json');
const packageJson = JSON.parse(fs.readFileSync(packageJsonPath, 'utf8'));
return packageJson.version;
} catch (error) {
console.error('Failed to read version from package.json:', String(error));
return '1.0.0';
}
}
// Get latest version from GitHub
async function getLatestVersion() {
try {
const owner = 'ConardLi';
const repo = 'easy-dataset';
const response = await fetch(`https://api.github.com/repos/${owner}/${repo}/releases/latest`);
if (!response.ok) {
throw new Error(`GitHub API request failed: ${response.status}`);
}
const data = await response.json();
return data.tag_name.replace('v', '');
} catch (error) {
console.error('Failed to fetch latest version:', String(error));
return null;
}
}
// Check for updates
export async function GET() {
try {
const currentVersion = getCurrentVersion();
const latestVersion = await getLatestVersion();
if (!latestVersion) {
return NextResponse.json({
hasUpdate: false,
currentVersion,
latestVersion: null,
error: 'Failed to fetch latest version'
});
}
// Simple semver-like comparison
const hasUpdate = compareVersions(latestVersion, currentVersion) > 0;
return NextResponse.json({
hasUpdate,
currentVersion,
latestVersion,
releaseUrl: hasUpdate ? `https://github.com/ConardLi/easy-dataset/releases/tag/v${latestVersion}` : null
});
} catch (error) {
console.error('Failed to check for updates:', String(error));
return NextResponse.json(
{
hasUpdate: false,
error: 'Failed to check for updates'
},
{ status: 500 }
);
}
}
// Simple version comparison
function compareVersions(a, b) {
const partsA = a.split('.').map(Number);
const partsB = b.split('.').map(Number);
for (let i = 0; i < Math.max(partsA.length, partsB.length); i++) {
const numA = i < partsA.length ? partsA[i] : 0;
const numB = i < partsB.length ? partsB[i] : 0;
if (numA > numB) return 1;
if (numA < numB) return -1;
}
return 0;
}

View File

@@ -0,0 +1,75 @@
import { NextResponse } from 'next/server';
import axios from 'axios';
// Fetch model list from provider
export async function POST(request) {
try {
const { endpoint, providerId, apiKey } = await request.json();
if (!endpoint) {
return NextResponse.json({ error: 'Missing required parameter: endpoint' }, { status: 400 });
}
let url = endpoint.replace(/\/$/, ''); // Remove trailing slash
// Handle Ollama endpoint
if (providerId === 'ollama') {
// Remove possible /v1 or other version suffix
url = url.replace(/\/v\d+$/, '');
// Append /api if missing
if (!url.includes('/api')) {
url += '/api';
}
url += '/tags';
} else {
url += '/models';
}
const headers = {};
if (apiKey) {
headers.Authorization = `Bearer ${apiKey}`;
}
const response = await axios.get(url, { headers });
// Format response per provider
let formattedModels = [];
if (providerId === 'ollama') {
// Ollama /api/tags format: { models: [{ name: 'model-name', ... }] }
if (response.data.models && Array.isArray(response.data.models)) {
formattedModels = response.data.models.map(item => ({
modelId: item.name,
modelName: item.name,
providerId
}));
}
} else {
// Default handling (OpenAI-compatible)
if (response.data.data && Array.isArray(response.data.data)) {
formattedModels = response.data.data.map(item => ({
modelId: item.id,
modelName: item.id,
providerId
}));
}
}
return NextResponse.json(formattedModels);
} catch (error) {
console.error('Failed to fetch model list:', String(error));
// Handle known error shapes
if (error.response) {
if (error.response.status === 401) {
return NextResponse.json({ error: 'Invalid API key' }, { status: 401 });
}
return NextResponse.json(
{ error: `Failed to fetch model list: ${error.response.statusText}` },
{ status: error.response.status }
);
}
return NextResponse.json({ error: `Failed to fetch model list: ${error.message}` }, { status: 500 });
}
}

View File

@@ -0,0 +1,39 @@
import { NextResponse } from 'next/server';
import { getLlmModelsByProviderId } from '@/lib/db/llm-models';
// Get LLM models
export async function GET(request) {
try {
const searchParams = request.nextUrl.searchParams;
let providerId = searchParams.get('providerId');
if (!providerId) {
return NextResponse.json({ error: 'Invalid parameters' }, { status: 400 });
}
const models = await getLlmModelsByProviderId(providerId);
if (!models) {
return NextResponse.json({ error: 'LLM provider not found' }, { status: 404 });
}
return NextResponse.json(models);
} catch (error) {
console.error('Database query error:', String(error));
return NextResponse.json({ error: 'Database query failed' }, { status: 500 });
}
}
// Sync latest model list
export async function POST(request) {
try {
const { newModels, providerId } = await request.json();
const models = await getLlmModelsByProviderId(providerId);
const existingModelIds = models.map(model => model.modelId);
const diffModels = newModels.filter(item => !existingModelIds.includes(item.modelId));
if (diffModels.length > 0) {
// return NextResponse.json(await createLlmModels(diffModels));
return NextResponse.json({ message: 'No new models to insert' }, { status: 200 });
} else {
return NextResponse.json({ message: 'No new models to insert' }, { status: 200 });
}
} catch (error) {
return NextResponse.json({ error: 'Database insert failed' }, { status: 500 });
}
}

View File

@@ -0,0 +1,26 @@
import { NextResponse } from 'next/server';
const OllamaClient = require('@/lib/llm/core/providers/ollama');
// Force dynamic route to prevent static generation
export const dynamic = 'force-dynamic';
export async function GET(request) {
try {
// Read host and port from query params
const { searchParams } = new URL(request.url);
const host = searchParams.get('host') || '127.0.0.1';
const port = searchParams.get('port') || '11434';
// Create Ollama API client
const ollama = new OllamaClient({
endpoint: `http://${host}:${port}/api`
});
// Fetch model list
const models = await ollama.getModels();
return NextResponse.json(models);
} catch (error) {
// console.error('fetch Ollama models error:', error);
return NextResponse.json({ error: 'fetch Models failed' }, { status: 500 });
}
}

View File

@@ -0,0 +1,14 @@
import { NextResponse } from 'next/server';
import { getLlmProviders } from '@/lib/db/llm-providers';
import { sortProvidersByPriority } from '@/lib/util/providerLogo';
// Get LLM provider data
export async function GET() {
try {
const result = await getLlmProviders();
return NextResponse.json(sortProvidersByPriority(result, item => item.id));
} catch (error) {
console.error('Database query error:', String(error));
return NextResponse.json({ error: 'Database query failed' }, { status: 500 });
}
}

View File

@@ -0,0 +1,107 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db';
export const dynamic = 'force-dynamic';
export async function GET(request) {
try {
const { searchParams } = new URL(request.url);
const timeRange = searchParams.get('timeRange') || '7d';
const projectId = searchParams.get('projectId');
const provider = searchParams.get('provider');
const status = searchParams.get('status');
const page = parseInt(searchParams.get('page') || '1', 10);
const pageSize = parseInt(searchParams.get('pageSize') || '10', 10);
const searchTerm = searchParams.get('search') || '';
let startDate = new Date();
if (timeRange === '24h') {
startDate.setHours(startDate.getHours() - 24);
} else if (timeRange === '30d') {
startDate.setDate(startDate.getDate() - 30);
} else {
startDate.setDate(startDate.getDate() - 7);
}
const where = {
createAt: {
gte: startDate
}
};
if (projectId && projectId !== 'all') {
where.projectId = projectId;
}
if (provider && provider !== 'all') {
where.provider = provider;
}
if (status && status !== 'all') {
where.status = status;
}
if (searchTerm) {
where.OR = [{ model: { contains: searchTerm } }, { errorMessage: { contains: searchTerm } }];
}
const total = await db.llmUsageLogs.count({ where });
const logs = await db.llmUsageLogs.findMany({
where,
select: {
id: true,
projectId: true,
provider: true,
model: true,
inputTokens: true,
outputTokens: true,
totalTokens: true,
latency: true,
status: true,
errorMessage: true,
createAt: true
},
orderBy: {
createAt: 'desc'
},
skip: (page - 1) * pageSize,
take: pageSize
});
const projectIds = [...new Set(logs.map(log => log.projectId))];
const projects = await db.projects.findMany({
where: { id: { in: projectIds } },
select: { id: true, name: true }
});
const projectMap = projects.reduce((acc, p) => {
acc[p.id] = p.name;
return acc;
}, {});
const details = logs.map(log => ({
id: log.id,
projectId: log.projectId,
projectName: projectMap[log.projectId] || 'Unknown Project',
provider: log.provider,
model: log.model,
status: log.status,
failureReason: log.errorMessage,
inputTokens: log.inputTokens,
outputTokens: log.outputTokens,
totalTokens: log.totalTokens,
calls: 1, // Single record
avgLatency: log.status === 'SUCCESS' ? (log.latency / 1000).toFixed(2) + 's' : '-',
createAt: log.createAt
}));
return NextResponse.json({
details,
total,
page,
pageSize,
totalPages: Math.ceil(total / pageSize)
});
} catch (error) {
console.error('Failed to fetch monitoring logs:', error);
return NextResponse.json({ error: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,188 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db';
export const dynamic = 'force-dynamic';
export async function GET(request) {
try {
const { searchParams } = new URL(request.url);
const timeRange = searchParams.get('timeRange') || '7d'; // 24h, 7d, 30d
const projectId = searchParams.get('projectId');
const provider = searchParams.get('provider');
const status = searchParams.get('status');
let startDate = new Date();
if (timeRange === '24h') {
startDate.setHours(startDate.getHours() - 24);
} else if (timeRange === '30d') {
startDate.setDate(startDate.getDate() - 30);
} else {
startDate.setDate(startDate.getDate() - 7);
}
const where = {
createAt: {
gte: startDate
}
};
if (projectId && projectId !== 'all') {
where.projectId = projectId;
}
if (provider && provider !== 'all') {
where.provider = provider;
}
if (status && status !== 'all') {
where.status = status;
}
// 1. Fetch data for aggregation
// Note: Prisma aggregation can be slow on very large datasets. If needed, optimize with pre-aggregated tables.
const logs = await db.llmUsageLogs.findMany({
where,
select: {
id: true,
projectId: true,
provider: true,
model: true,
inputTokens: true,
outputTokens: true,
totalTokens: true,
latency: true,
status: true,
errorMessage: true,
createAt: true,
dateString: true
},
orderBy: {
createAt: 'desc'
}
});
// Build project name map
const projects = await db.projects.findMany({
select: { id: true, name: true }
});
const projectMap = projects.reduce((acc, p) => {
acc[p.id] = p.name;
return acc;
}, {});
// 2. Process and aggregate
const summary = {
totalTokens: 0,
inputTokens: 0,
outputTokens: 0,
totalCalls: logs.length,
successCalls: 0,
failedCalls: 0,
totalLatency: 0,
avgLatency: 0
};
const trendMap = {};
const modelStats = {};
const detailedStatsMap = {}; // Key: projectId-model-status-errorMessage
logs.forEach(log => {
// Summary
summary.totalTokens += log.totalTokens;
summary.inputTokens += log.inputTokens;
summary.outputTokens += log.outputTokens;
if (log.status === 'SUCCESS') {
summary.successCalls++;
summary.totalLatency += log.latency;
} else {
summary.failedCalls++;
}
// Trend (by day or hour)
let timeKey;
if (timeRange === '24h') {
const date = new Date(log.createAt);
timeKey = `${String(date.getHours()).padStart(2, '0')}:00`;
} else {
timeKey = log.dateString.slice(5); // MM-DD
}
if (!trendMap[timeKey]) {
trendMap[timeKey] = { name: timeKey, input: 0, output: 0 };
}
trendMap[timeKey].input += log.inputTokens;
trendMap[timeKey].output += log.outputTokens;
// Model Distribution
const modelKey = log.model;
if (!modelStats[modelKey]) {
modelStats[modelKey] = { name: modelKey, value: 0 };
}
modelStats[modelKey].value += log.totalTokens;
// Detailed Table Aggregation
// Key: projectId + model + status + (errorMessage || '')
const errorKey = log.errorMessage || '';
const detailKey = `${log.projectId}|${log.model}|${log.status}|${errorKey}`;
if (!detailedStatsMap[detailKey]) {
detailedStatsMap[detailKey] = {
projectId: log.projectId,
projectName: projectMap[log.projectId] || 'Unknown Project',
provider: log.provider,
model: log.model,
status: log.status,
failureReason: log.errorMessage,
inputTokens: 0,
outputTokens: 0,
totalTokens: 0,
calls: 0,
totalLatency: 0
};
}
const detailItem = detailedStatsMap[detailKey];
detailItem.inputTokens += log.inputTokens;
detailItem.outputTokens += log.outputTokens;
detailItem.totalTokens += log.totalTokens;
detailItem.calls += 1;
if (log.status === 'SUCCESS') {
detailItem.totalLatency += log.latency;
}
});
// Calculate averages
if (summary.successCalls > 0) {
summary.avgLatency = Math.round(summary.totalLatency / summary.successCalls);
}
summary.avgTokensPerCall = summary.totalCalls > 0 ? Math.round(summary.totalTokens / summary.totalCalls) : 0;
summary.failureRate = summary.totalCalls > 0 ? summary.failedCalls / summary.totalCalls : 0;
// Format chart data
const trend = Object.values(trendMap).sort((a, b) => {
// Simple sorting; for production use, consider stricter time ordering.
return a.name.localeCompare(b.name);
});
const modelDistribution = Object.values(modelStats).sort((a, b) => b.value - a.value);
// Format detailed table data
const details = Object.values(detailedStatsMap)
.map(item => ({
...item,
avgLatency:
item.status === 'SUCCESS' && item.calls > 0 ? (item.totalLatency / item.calls / 1000).toFixed(2) + 's' : '-'
}))
.sort((a, b) => b.totalTokens - a.totalTokens); // Default sorting by token usage
return NextResponse.json({
summary,
trend,
modelDistribution,
details,
projects
});
} catch (error) {
console.error('Failed to fetch monitoring stats:', error);
return NextResponse.json({ error: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,132 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db';
export const dynamic = 'force-dynamic';
export async function GET(request) {
try {
const { searchParams } = new URL(request.url);
const timeRange = searchParams.get('timeRange') || '7d';
const projectId = searchParams.get('projectId');
const provider = searchParams.get('provider');
const status = searchParams.get('status');
let startDate = new Date();
if (timeRange === '24h') {
startDate.setHours(startDate.getHours() - 24);
} else if (timeRange === '30d') {
startDate.setDate(startDate.getDate() - 30);
} else {
startDate.setDate(startDate.getDate() - 7);
}
const where = {
createAt: {
gte: startDate
}
};
if (projectId && projectId !== 'all') {
where.projectId = projectId;
}
if (provider && provider !== 'all') {
where.provider = provider;
}
if (status && status !== 'all') {
where.status = status;
}
const logs = await db.llmUsageLogs.findMany({
where,
select: {
inputTokens: true,
outputTokens: true,
totalTokens: true,
latency: true,
status: true,
createAt: true,
dateString: true,
model: true
}
});
const summary = {
totalTokens: 0,
inputTokens: 0,
outputTokens: 0,
totalCalls: logs.length,
successCalls: 0,
failedCalls: 0,
totalLatency: 0,
avgLatency: 0
};
const trendMap = {};
const modelStats = {};
logs.forEach(log => {
summary.totalTokens += log.totalTokens;
summary.inputTokens += log.inputTokens;
summary.outputTokens += log.outputTokens;
if (log.status === 'SUCCESS') {
summary.successCalls++;
summary.totalLatency += log.latency;
} else {
summary.failedCalls++;
}
let timeKey;
if (timeRange === '24h') {
const date = new Date(log.createAt);
timeKey = `${String(date.getHours()).padStart(2, '0')}:00`;
} else {
timeKey = log.dateString.slice(5);
}
if (!trendMap[timeKey]) {
trendMap[timeKey] = { name: timeKey, input: 0, output: 0 };
}
trendMap[timeKey].input += log.inputTokens;
trendMap[timeKey].output += log.outputTokens;
const modelKey = log.model;
if (!modelStats[modelKey]) {
modelStats[modelKey] = { name: modelKey, value: 0 };
}
modelStats[modelKey].value += log.totalTokens;
});
if (summary.successCalls > 0) {
summary.avgLatency = Math.round(summary.totalLatency / summary.successCalls);
}
summary.avgTokensPerCall = summary.totalCalls > 0 ? Math.round(summary.totalTokens / summary.totalCalls) : 0;
summary.failureRate = summary.totalCalls > 0 ? summary.failedCalls / summary.totalCalls : 0;
const trend = Object.values(trendMap).sort((a, b) => a.name.localeCompare(b.name));
const modelDistribution = Object.values(modelStats).sort((a, b) => b.value - a.value);
const projects = await db.projects.findMany({
select: { id: true, name: true },
orderBy: { createAt: 'desc' }
});
const allLogs = await db.llmUsageLogs.findMany({
select: { provider: true },
distinct: ['provider']
});
const providers = allLogs.map(log => log.provider).filter(Boolean);
return NextResponse.json({
summary,
trend,
modelDistribution,
projects,
providers
});
} catch (error) {
console.error('Failed to fetch monitoring summary:', error);
return NextResponse.json({ error: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,176 @@
import { NextResponse } from 'next/server';
import { getUploadFileInfoById } from '@/lib/db/upload-files';
import { createGaPairs, getGaPairsByFileId } from '@/lib/db/ga-pairs';
/**
* 批量手动添加 GA 对到多个文件
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const body = await request.json();
if (!projectId) {
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
}
const { fileIds, gaPair, appendMode = false } = body;
if (!fileIds || !Array.isArray(fileIds) || fileIds.length === 0) {
return NextResponse.json({ error: 'File IDs array is required' }, { status: 400 });
}
if (!gaPair || !gaPair.genreTitle || !gaPair.audienceTitle) {
return NextResponse.json({ error: 'GA pair with genreTitle and audienceTitle is required' }, { status: 400 });
}
console.log('开始处理批量手动添加GA对请求');
console.log('项目ID:', projectId);
console.log('请求的文件IDs:', fileIds);
console.log('GA对:', gaPair);
// 使用 getUploadFileInfoById 逐个验证文件
const validFiles = [];
const invalidFileIds = [];
for (const fileId of fileIds) {
try {
console.log(`正在验证文件: ${fileId}`);
const fileInfo = await getUploadFileInfoById(fileId);
if (fileInfo && fileInfo.projectId === projectId) {
console.log(`文件验证成功: ${fileInfo.fileName}`);
validFiles.push(fileInfo);
} else if (fileInfo) {
console.log(`文件属于其他项目: ${fileInfo.projectId} != ${projectId}`);
invalidFileIds.push(fileId);
} else {
console.log(`文件不存在: ${fileId}`);
invalidFileIds.push(fileId);
}
} catch (error) {
console.error(`验证文件 ${fileId} 时出错:`, String(error));
invalidFileIds.push(fileId);
}
}
console.log(`文件验证完成: 有效${validFiles.length}个, 无效${invalidFileIds.length}`);
if (validFiles.length === 0) {
return NextResponse.json(
{
error: 'No valid files found',
debug: {
projectId,
requestedIds: fileIds,
invalidIds: invalidFileIds,
message: 'None of the requested files belong to this project or exist in the database'
}
},
{ status: 404 }
);
}
// 批量手动添加 GA 对
console.log('开始批量手动添加GA对...');
console.log('追加模式:', appendMode);
const results = [];
for (const file of validFiles) {
try {
console.log(`处理文件: ${file.fileName}`);
// 检查是否已存在 GA 对
const existingPairs = await getGaPairsByFileId(file.id);
let pairNumber = 1;
if (appendMode && existingPairs && existingPairs.length > 0) {
// 追加模式:在现有 GA 对后面添加
pairNumber = existingPairs.length + 1;
} else if (!appendMode && existingPairs && existingPairs.length > 0) {
// 非追加模式:如果已存在 GA 对则跳过
console.log(`文件 ${file.fileName} 已存在GA对跳过`);
results.push({
fileId: file.id,
fileName: file.fileName,
success: true,
skipped: true,
message: 'GA pairs already exist'
});
continue;
}
// 创建 GA 对数据
const gaPairData = [
{
projectId,
fileId: file.id,
pairNumber,
genreTitle: gaPair.genreTitle.trim(),
genreDesc: gaPair.genreDesc?.trim() || '',
audienceTitle: gaPair.audienceTitle.trim(),
audienceDesc: gaPair.audienceDesc?.trim() || '',
isActive: true
}
];
// 保存 GA 对
if (appendMode) {
// 追加模式:只创建新的 GA 对
await createGaPairs(gaPairData);
} else {
// 非追加模式:使用 saveGaPairs 替换现有的
const { saveGaPairs } = await import('@/lib/db/ga-pairs');
await saveGaPairs(projectId, file.id, [
{
genre: { title: gaPair.genreTitle.trim(), description: gaPair.genreDesc?.trim() || '' },
audience: { title: gaPair.audienceTitle.trim(), description: gaPair.audienceDesc?.trim() || '' }
}
]);
}
results.push({
fileId: file.id,
fileName: file.fileName,
success: true,
skipped: false,
message: 'GA pair added successfully'
});
console.log(`成功为文件 ${file.fileName} 添加GA对`);
} catch (error) {
console.error(`为文件 ${file.fileName} 添加GA对失败:`, error);
results.push({
fileId: file.id,
fileName: file.fileName,
success: false,
skipped: false,
error: error.message,
message: `Failed: ${error.message}`
});
}
}
// 统计结果
const successCount = results.filter(r => r.success).length;
const failureCount = results.filter(r => !r.success).length;
console.log(`批量手动添加完成: 成功${successCount}个, 失败${failureCount}`);
return NextResponse.json({
success: true,
data: results,
summary: {
total: results.length,
success: successCount,
failure: failureCount,
processed: validFiles.length,
skipped: invalidFileIds.length
},
message: `Added GA pairs to ${successCount} files, ${failureCount} failed, ${invalidFileIds.length} files not found`
});
} catch (error) {
console.error('Error batch adding manual GA pairs:', String(error));
return NextResponse.json({ error: String(error) || 'Failed to batch add manual GA pairs' }, { status: 500 });
}
}

View File

@@ -0,0 +1,196 @@
import { NextResponse } from 'next/server';
import { getUploadFileInfoById, delUploadFileInfoById } from '@/lib/db/upload-files';
import { getProject } from '@/lib/db/projects';
import { getProjectChunks, getProjectTocByName } from '@/lib/file/text-splitter';
import { batchSaveTags } from '@/lib/db/tags';
import { handleDomainTree } from '@/lib/util/domain-tree';
import path from 'path';
import { getProjectRoot } from '@/lib/db/base';
import { promises as fs } from 'fs';
/**
* 批量删除文件
* 复用单个文件删除的完整逻辑,包括领域树修订
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const body = await request.json();
if (!projectId) {
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
}
const { fileIds, domainTreeAction = 'keep', model, language = '中文' } = body;
if (!fileIds || !Array.isArray(fileIds) || fileIds.length === 0) {
return NextResponse.json({ error: 'File IDs array is required' }, { status: 400 });
}
console.log('开始处理批量删除文件请求');
console.log('项目ID:', projectId);
console.log('请求的文件IDs:', fileIds);
console.log('领域树操作:', domainTreeAction);
// 获取项目信息
const project = await getProject(projectId);
if (!project) {
return NextResponse.json({ error: 'The project does not exist' }, { status: 404 });
}
// 验证文件并删除
const results = [];
const deletedTocs = [];
let deletedCount = 0;
let failedCount = 0;
let totalStats = {
deletedChunks: 0,
deletedQuestions: 0,
deletedDatasets: 0
};
for (const fileId of fileIds) {
try {
console.log(`正在验证文件: ${fileId}`);
const fileInfo = await getUploadFileInfoById(fileId);
if (!fileInfo) {
console.log(`文件不存在: ${fileId}`);
results.push({
fileId,
success: false,
error: 'File not found'
});
failedCount++;
continue;
}
if (fileInfo.projectId !== projectId) {
console.log(`文件属于其他项目: ${fileInfo.projectId} != ${projectId}`);
results.push({
fileId,
success: false,
error: 'File belongs to another project'
});
failedCount++;
continue;
}
// 删除文件及其相关的文本块、问题和数据集
console.log(`删除文件: ${fileInfo.fileName}`);
const { stats, fileName } = await delUploadFileInfoById(fileId);
// 累计统计信息
totalStats.deletedChunks += stats.deletedChunks || 0;
totalStats.deletedQuestions += stats.deletedQuestions || 0;
totalStats.deletedDatasets += stats.deletedDatasets || 0;
// 获取并保存删除的 TOC 信息
const deleteToc = await getProjectTocByName(projectId, fileName);
if (deleteToc) {
deletedTocs.push(deleteToc);
}
// 删除 TOC 文件
try {
const projectRoot = await getProjectRoot();
const projectPath = path.join(projectRoot, projectId);
const tocDir = path.join(projectPath, 'toc');
const baseName = path.basename(fileInfo.fileName, path.extname(fileInfo.fileName));
const tocPath = path.join(tocDir, `${baseName}-toc.json`);
await fs.unlink(tocPath);
console.log(`成功删除 TOC 文件: ${tocPath}`);
} catch (error) {
console.error(`删除 TOC 文件失败:`, String(error));
}
results.push({
fileId,
fileName: fileInfo.fileName,
success: true,
stats
});
deletedCount++;
console.log(`成功删除文件: ${fileInfo.fileName}`);
} catch (error) {
console.error(`删除文件 ${fileId} 时出错:`, error);
results.push({
fileId,
success: false,
error: error.message
});
failedCount++;
}
}
console.log(`批量删除完成: 成功${deletedCount}个, 失败${failedCount}`);
// 如果选择了保持领域树不变,直接返回删除结果
if (domainTreeAction === 'keep') {
return NextResponse.json({
success: true,
deletedCount,
failedCount,
total: fileIds.length,
results,
stats: totalStats,
domainTreeAction: 'keep',
message: `Successfully deleted ${deletedCount} files, ${failedCount} failed`
});
}
// 处理领域树更新
try {
// 获取项目的所有文件
const { chunks, toc } = await getProjectChunks(projectId);
// 如果不存在文本块,说明项目已经没有文件了
if (!chunks || chunks.length === 0) {
// 清空领域树
await batchSaveTags(projectId, []);
return NextResponse.json({
success: true,
deletedCount,
failedCount,
total: fileIds.length,
results,
stats: totalStats,
domainTreeAction,
message: `Successfully deleted ${deletedCount} files, domain tree cleared`,
domainTreeCleared: true
});
}
// 调用领域树处理模块
await handleDomainTree({
projectId,
action: domainTreeAction,
allToc: toc,
model: model,
language,
deleteToc: deletedTocs.length > 0 ? deletedTocs : undefined,
project
});
console.log('领域树更新成功');
} catch (error) {
console.error('Error updating domain tree after batch deletion:', String(error));
// 即使领域树更新失败,也不影响文件删除的结果
}
return NextResponse.json({
success: true,
deletedCount,
failedCount,
total: fileIds.length,
results,
stats: totalStats,
domainTreeAction,
message: `Successfully deleted ${deletedCount} files, ${failedCount} failed`
});
} catch (error) {
console.error('Error batch deleting files:', String(error));
return NextResponse.json({ error: String(error) || 'Failed to batch delete files' }, { status: 500 });
}
}

View File

@@ -0,0 +1,106 @@
import { NextResponse } from 'next/server';
import { batchGenerateGaPairs } from '@/lib/services/ga/ga-pairs';
import { getUploadFileInfoById } from '@/lib/db/upload-files'; // 导入单个文件查询函数
/**
* 批量生成多个文件的 GA 对
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const body = await request.json();
if (!projectId) {
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
}
const { fileIds, modelConfigId, language = '中文', appendMode = false } = body;
if (!fileIds || !Array.isArray(fileIds) || fileIds.length === 0) {
return NextResponse.json({ error: 'File IDs array is required' }, { status: 400 });
}
if (!modelConfigId) {
return NextResponse.json({ error: 'Model configuration ID is required' }, { status: 400 });
}
console.log('开始处理批量生成GA对请求');
console.log('项目ID:', projectId);
console.log('请求的文件IDs:', fileIds);
// 使用 getUploadFileInfoById 逐个验证文件
const validFiles = [];
const invalidFileIds = [];
for (const fileId of fileIds) {
try {
console.log(`正在验证文件: ${fileId}`);
const fileInfo = await getUploadFileInfoById(fileId);
if (fileInfo && fileInfo.projectId === projectId) {
console.log(`文件验证成功: ${fileInfo.fileName}`);
validFiles.push(fileInfo);
} else if (fileInfo) {
console.log(`文件属于其他项目: ${fileInfo.projectId} != ${projectId}`);
invalidFileIds.push(fileId);
} else {
console.log(`文件不存在: ${fileId}`);
invalidFileIds.push(fileId);
}
} catch (error) {
console.error(`验证文件 ${fileId} 时出错:`, String(error));
invalidFileIds.push(fileId);
}
}
console.log(`文件验证完成: 有效${validFiles.length}个, 无效${invalidFileIds.length}`);
if (validFiles.length === 0) {
return NextResponse.json(
{
error: 'No valid files found',
debug: {
projectId,
requestedIds: fileIds,
invalidIds: invalidFileIds,
message: 'None of the requested files belong to this project or exist in the database'
}
},
{ status: 404 }
);
}
// 批量生成 GA 对
console.log('开始批量生成GA对...');
console.log('追加模式:', appendMode);
const results = await batchGenerateGaPairs(
projectId,
validFiles,
modelConfigId,
language,
appendMode // 传递追加模式参数
);
// 统计结果
const successCount = results.filter(r => r.success).length;
const failureCount = results.filter(r => !r.success).length;
console.log(`批量生成完成: 成功${successCount}个, 失败${failureCount}`);
return NextResponse.json({
success: true,
data: results,
summary: {
total: results.length,
success: successCount,
failure: failureCount,
processed: validFiles.length,
skipped: invalidFileIds.length
},
message: `Generated GA pairs for ${successCount} files, ${failureCount} failed, ${invalidFileIds.length} files not found`
});
} catch (error) {
console.error('Error batch generating GA pairs:', String(error));
return NextResponse.json({ error: String(error) || 'Failed to batch generate GA pairs' }, { status: 500 });
}
}

View File

@@ -0,0 +1,161 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
import LLMClient from '@/lib/llm/core/index';
import { getModelConfigById } from '@/lib/db/model-config';
/**
* Get current question and generate answers from two models
*/
export async function GET(request, { params }) {
try {
const { projectId, taskId } = params;
const task = await db.task.findFirst({
where: {
id: taskId,
projectId,
taskType: 'blind-test'
}
});
if (!task) {
return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
}
if (task.status !== 0) {
return NextResponse.json({ code: 400, error: 'Task has ended' }, { status: 400 });
}
// Parse task detail
let detail = {};
let modelInfo = {};
try {
detail = task.detail ? JSON.parse(task.detail) : {};
modelInfo = task.modelInfo ? JSON.parse(task.modelInfo) : {};
} catch (e) {
console.error('Failed to parse task detail:', e);
}
const questionIds = detail.questionIds || detail.evalDatasetIds || [];
const currentIndex = detail.currentIndex || 0;
// Check if all questions are completed
if (questionIds.length === 0 || currentIndex >= questionIds.length) {
return NextResponse.json({
code: 0,
data: {
completed: true,
message: 'All questions completed'
}
});
}
// Fetch current question
const currentQuestionId = questionIds[currentIndex];
const currentQuestion = await db.evalDatasets.findUnique({
where: { id: currentQuestionId },
select: {
id: true,
question: true,
questionType: true,
correctAnswer: true,
tags: true
}
});
if (!currentQuestion) {
return NextResponse.json({ code: 404, error: 'Question not found' }, { status: 404 });
}
// Fetch both model configs
const [modelConfigA, modelConfigB] = await Promise.all([
getModelConfigById(modelInfo.modelA.providerId),
getModelConfigById(modelInfo.modelB.providerId)
]);
if (!modelConfigA || !modelConfigB) {
return NextResponse.json({ code: 400, error: 'Model configuration not found' }, { status: 400 });
}
// Build prompts
const systemPrompt = "You are a helpful assistant. Provide detailed and accurate answers to the user's question.";
const userPrompt = currentQuestion.question;
// Call both models in parallel
const startTimeA = Date.now();
const startTimeB = Date.now();
let answerA = '';
let answerB = '';
let errorA = null;
let errorB = null;
let durationA = 0;
let durationB = 0;
try {
// Call model A
const clientA = new LLMClient(modelConfigA);
const resultA = await clientA.chat([
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userPrompt }
]);
answerA = resultA.text || '';
durationA = Date.now() - startTimeA;
} catch (err) {
console.error('Model A call failed:', err);
errorA = err.message;
durationA = Date.now() - startTimeA;
}
try {
// Call model B
const clientB = new LLMClient(modelConfigB);
const resultB = await clientB.chat([
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userPrompt }
]);
answerB = resultB.text || '';
durationB = Date.now() - startTimeB;
} catch (err) {
console.error('Model B call failed:', err);
errorB = err.message;
durationB = Date.now() - startTimeB;
}
// Randomly swap positions (core blind-test behavior)
const isSwapped = Math.random() > 0.5;
return NextResponse.json({
code: 0,
data: {
completed: false,
currentIndex,
totalCount: evalDatasetIds.length,
question: currentQuestion,
// Blind test: do not reveal which model is which
leftAnswer: {
content: isSwapped ? answerB : answerA,
error: isSwapped ? errorB : errorA,
duration: isSwapped ? durationB : durationA
},
rightAnswer: {
content: isSwapped ? answerA : answerB,
error: isSwapped ? errorA : errorB,
duration: isSwapped ? durationA : durationB
},
// Server stores the actual mapping for scoring
_swap: isSwapped
}
});
} catch (error) {
console.error('Failed to fetch current question:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to fetch current question', message: error.message },
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,64 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
/**
* Get current question info (including random swap info)
*/
export async function GET(request, { params }) {
const { projectId, taskId } = params;
try {
if (!projectId || !taskId) {
return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
}
// Fetch task
const task = await db.task.findUnique({
where: { id: taskId }
});
if (!task || task.taskType !== 'blind-test') {
return NextResponse.json({ error: 'Task not found' }, { status: 404 });
}
// Parse task detail
const detail = JSON.parse(task.detail || '{}');
// Support both evalDatasetIds and questionIds
const questionIds = detail.questionIds || detail.evalDatasetIds || [];
const currentIndex = detail.currentIndex || 0;
// Check if task is completed
if (questionIds.length === 0 || currentIndex >= questionIds.length) {
return NextResponse.json({
completed: true,
currentIndex,
totalQuestions: questionIds.length
});
}
// Fetch current question
const currentQuestionId = questionIds[currentIndex];
const currentQuestion = await db.evalDatasets.findUnique({
where: { id: currentQuestionId }
});
if (!currentQuestion) {
return NextResponse.json({ error: 'Question not found' }, { status: 404 });
}
// Randomly decide whether to swap (core blind-test behavior)
const isSwapped = Math.random() > 0.5;
return NextResponse.json({
questionId: currentQuestion.id,
question: currentQuestion.question,
answer: currentQuestion.correctAnswer || '',
questionIndex: currentIndex + 1,
totalQuestions: questionIds.length,
isSwapped
});
} catch (error) {
console.error('Failed to fetch question info:', error);
return NextResponse.json({ error: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,190 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
/**
* Get blind-test task details
* Results are fetched from EvalResults table
*/
export async function GET(request, { params }) {
try {
const { projectId, taskId } = params;
const task = await db.task.findFirst({
where: {
id: taskId,
projectId,
taskType: 'blind-test'
}
});
if (!task) {
return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
}
let detail = {};
let modelInfo = {};
try {
detail = task.detail ? JSON.parse(task.detail) : {};
modelInfo = task.modelInfo ? JSON.parse(task.modelInfo) : {};
} catch (e) {
console.error('Failed to parse task detail:', e);
}
// Fetch all related evaluation questions
const evalDatasetIds = detail.evalDatasetIds || [];
const evalDatasets = await db.evalDatasets.findMany({
where: {
id: { in: evalDatasetIds }
},
select: {
id: true,
question: true,
questionType: true,
correctAnswer: true,
tags: true
}
});
// Sort by evalDatasetIds order
const orderedDatasets = evalDatasetIds.map(id => evalDatasets.find(d => d.id === id)).filter(Boolean);
// Fetch results from EvalResults table
const evalResults = await db.evalResults.findMany({
where: { taskId },
orderBy: { createAt: 'asc' }
});
// Parse results into the format expected by frontend
const results = evalResults.map(r => {
let modelAnswer = {};
let judgeData = {};
try {
modelAnswer = JSON.parse(r.modelAnswer || '{}');
judgeData = JSON.parse(r.judgeResponse || '{}');
} catch (e) {
// Ignore parse errors
}
return {
questionId: r.evalDatasetId,
vote: judgeData.vote,
isSwapped: judgeData.isSwapped,
modelAScore: judgeData.modelAScore || 0,
modelBScore: judgeData.modelBScore || 0,
leftAnswer: modelAnswer.leftAnswer || '',
rightAnswer: modelAnswer.rightAnswer || '',
timestamp: r.createAt
};
});
return NextResponse.json({
code: 0,
data: {
...task,
detail: {
...detail,
results // Include results from EvalResults table
},
modelInfo,
evalDatasets: orderedDatasets
}
});
} catch (error) {
console.error('Failed to fetch blind-test task details:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to fetch blind-test task details', message: error.message },
{ status: 500 }
);
}
}
/**
* Update blind-test task (interrupt/stop)
*/
export async function PUT(request, { params }) {
try {
const { projectId, taskId } = params;
const { action } = await request.json();
const task = await db.task.findFirst({
where: {
id: taskId,
projectId,
taskType: 'blind-test'
}
});
if (!task) {
return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
}
if (action === 'interrupt') {
if (task.status !== 0) {
return NextResponse.json({ code: 400, error: 'Only running tasks can be interrupted' }, { status: 400 });
}
const updatedTask = await db.task.update({
where: { id: taskId },
data: {
status: 3, // Interrupted
endTime: new Date()
}
});
return NextResponse.json({
code: 0,
data: updatedTask,
message: 'Task interrupted'
});
}
return NextResponse.json({ code: 400, error: 'Unknown action' }, { status: 400 });
} catch (error) {
console.error('Failed to update blind-test task:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to update blind-test task', message: error.message },
{ status: 500 }
);
}
}
/**
* Delete blind-test task and its results
*/
export async function DELETE(request, { params }) {
try {
const { projectId, taskId } = params;
const task = await db.task.findFirst({
where: {
id: taskId,
projectId,
taskType: 'blind-test'
}
});
if (!task) {
return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
}
// Delete related EvalResults first
await db.evalResults.deleteMany({
where: { taskId }
});
// Then delete the task
await db.task.delete({
where: { id: taskId }
});
return NextResponse.json({
code: 0,
message: 'Task deleted'
});
} catch (error) {
console.error('Failed to delete blind-test task:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to delete blind-test task', message: error.message },
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,92 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
import LLMClient from '@/lib/llm/core/index';
import { getModelConfigById } from '@/lib/db/model-config';
/**
* Stream answer for a specified model
* Query param: model=A or model=B
*/
export async function GET(request, { params }) {
const { projectId, taskId } = params;
const { searchParams } = new URL(request.url);
const modelType = searchParams.get('model'); // 'A' or 'B'
try {
if (!projectId || !taskId) {
return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
}
if (!modelType || !['A', 'B'].includes(modelType)) {
return NextResponse.json({ error: 'Model type must be specified (A or B)' }, { status: 400 });
}
// Fetch task
const task = await db.task.findUnique({
where: { id: taskId }
});
if (!task || task.taskType !== 'blind-test') {
return NextResponse.json({ error: 'Task not found' }, { status: 404 });
}
// Parse task detail
const detail = JSON.parse(task.detail || '{}');
const modelInfo = JSON.parse(task.modelInfo || '{}');
// Support both evalDatasetIds and questionIds
const questionIds = detail.questionIds || detail.evalDatasetIds || [];
const currentIndex = detail.currentIndex || 0;
// Check if task is completed
if (questionIds.length === 0 || currentIndex >= questionIds.length) {
return NextResponse.json({ completed: true });
}
// Fetch current question
const currentQuestionId = questionIds[currentIndex];
const currentQuestion = await db.evalDatasets.findUnique({
where: { id: currentQuestionId }
});
if (!currentQuestion) {
return NextResponse.json({ error: 'Question not found' }, { status: 404 });
}
// Resolve model config based on modelType
const modelConfigKey = modelType === 'A' ? 'modelA' : 'modelB';
const modelConfig = await getModelConfigById(modelInfo[modelConfigKey].id);
if (!modelConfig) {
return NextResponse.json({ error: 'Model configuration not found' }, { status: 400 });
}
// Prepare messages
const messages = [
{
role: 'system',
content: "You are a helpful assistant. Provide detailed and accurate answers to the user's question."
},
{ role: 'user', content: currentQuestion.question }
];
// Create LLM client
const client = new LLMClient({
projectId,
...modelConfig
});
// Call streaming API and return response directly
const response = await client.chatStreamAPI(messages);
return new Response(response.body, {
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': 'no-cache',
Connection: 'keep-alive'
}
});
} catch (error) {
console.error(`Model ${modelType} streaming call failed:`, error);
return NextResponse.json({ error: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,213 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
import LLMClient from '@/lib/llm/core/index';
import { getModelConfigById } from '@/lib/db/model-config';
/**
* Stream answers from two models for the current question
*/
export async function GET(request, { params }) {
const { projectId, taskId } = params;
try {
if (!projectId || !taskId) {
return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
}
// Fetch task
const task = await db.task.findUnique({
where: { id: taskId }
});
if (!task || task.taskType !== 'blind-test') {
return NextResponse.json({ error: 'Task not found' }, { status: 404 });
}
// Parse task detail
const detail = JSON.parse(task.detail || '{}');
const modelInfo = JSON.parse(task.modelInfo || '{}');
const { questionIds = [], currentIndex = 0 } = detail;
// Check if task is completed
if (currentIndex >= questionIds.length) {
return NextResponse.json({ completed: true });
}
// Fetch current question
const currentQuestionId = questionIds[currentIndex];
const currentQuestion = await db.evalDatasets.findUnique({
where: { id: currentQuestionId }
});
if (!currentQuestion) {
return NextResponse.json({ error: 'Question not found' }, { status: 404 });
}
// Fetch model configs
const [modelConfigA, modelConfigB] = await Promise.all([
getModelConfigById(modelInfo.modelA.providerId),
getModelConfigById(modelInfo.modelB.providerId)
]);
if (!modelConfigA || !modelConfigB) {
return NextResponse.json({ error: 'Model configuration not found' }, { status: 400 });
}
// Randomly swap positions (core blind-test behavior)
const isSwapped = Math.random() > 0.5;
// Create streaming response
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
try {
// Send init message
controller.enqueue(
encoder.encode(
JSON.stringify({
type: 'init',
question: currentQuestion.question,
questionId: currentQuestion.id,
questionIndex: currentIndex + 1,
totalQuestions: questionIds.length,
isSwapped
}) + '\n'
)
);
// Prepare messages
const messages = [
{
role: 'system',
content: "You are a helpful assistant. Provide detailed and accurate answers to the user's question."
},
{ role: 'user', content: currentQuestion.question }
];
// Create LLM clients
const clientA = new LLMClient({
projectId,
...modelConfigA
});
const clientB = new LLMClient({
projectId,
...modelConfigB
});
let answerA = '';
let answerB = '';
const startTime = Date.now();
// Call both models in parallel (streaming)
await Promise.all([
(async () => {
try {
const response = await clientA.chatStreamAPI(messages);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
answerA += chunk;
// Send chunk update
controller.enqueue(
encoder.encode(
JSON.stringify({
type: 'chunk',
model: isSwapped ? 'B' : 'A',
content: chunk
}) + '\n'
)
);
}
} catch (err) {
console.error('Model A call failed:', err);
controller.enqueue(
encoder.encode(
JSON.stringify({
type: 'error',
model: isSwapped ? 'B' : 'A',
error: err.message
}) + '\n'
)
);
}
})(),
(async () => {
try {
const response = await clientB.chatStreamAPI(messages);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
answerB += chunk;
// Send chunk update
controller.enqueue(
encoder.encode(
JSON.stringify({
type: 'chunk',
model: isSwapped ? 'A' : 'B',
content: chunk
}) + '\n'
)
);
}
} catch (err) {
console.error('Model B call failed:', err);
controller.enqueue(
encoder.encode(
JSON.stringify({
type: 'error',
model: isSwapped ? 'A' : 'B',
error: err.message
}) + '\n'
)
);
}
})()
]);
const duration = Date.now() - startTime;
// Send done message
controller.enqueue(
encoder.encode(
JSON.stringify({
type: 'done',
duration,
answerA: isSwapped ? answerB : answerA,
answerB: isSwapped ? answerA : answerB
}) + '\n'
)
);
controller.close();
} catch (error) {
console.error('Streaming handler failed:', error);
controller.error(error);
}
}
});
return new Response(stream, {
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': 'no-cache',
Connection: 'keep-alive'
}
});
} catch (error) {
console.error('API error:', error);
return NextResponse.json({ error: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,154 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
/**
* Submit vote result
* vote: 'left' | 'right' | 'both_good' | 'both_bad'
* Results are stored in EvalResults table
*/
export async function POST(request, { params }) {
try {
const { projectId, taskId } = params;
const { vote, questionId, isSwapped, leftAnswer, rightAnswer } = await request.json();
// Validate vote option
const validVotes = ['left', 'right', 'both_good', 'both_bad'];
if (!validVotes.includes(vote)) {
return NextResponse.json({ code: 400, error: 'Invalid vote option' }, { status: 400 });
}
if (!questionId) {
return NextResponse.json({ code: 400, error: 'Question ID is required' }, { status: 400 });
}
const task = await db.task.findFirst({
where: {
id: taskId,
projectId,
taskType: 'blind-test'
}
});
if (!task) {
return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
}
if (task.status !== 0) {
return NextResponse.json({ code: 400, error: 'Task has ended' }, { status: 400 });
}
// Parse task details
let detail = {};
try {
detail = task.detail ? JSON.parse(task.detail) : {};
} catch (e) {
console.error('Failed to parse task detail:', e);
}
// Calculate scores
// isSwapped: true means left is model B and right is model A
// isSwapped: false means left is model A and right is model B
let modelAScore = 0;
let modelBScore = 0;
if (vote === 'left') {
if (isSwapped) {
modelBScore = 1; // Left is B
} else {
modelAScore = 1; // Left is A
}
} else if (vote === 'right') {
if (isSwapped) {
modelAScore = 1; // Right is A
} else {
modelBScore = 1; // Right is B
}
} else if (vote === 'both_good') {
modelAScore = 0.5;
modelBScore = 0.5;
}
// both_bad: both scores remain 0
// Store result in EvalResults table
const evalResult = await db.evalResults.create({
data: {
projectId,
taskId,
evalDatasetId: questionId,
modelAnswer: JSON.stringify({
leftAnswer: leftAnswer || '',
rightAnswer: rightAnswer || ''
}),
score: modelAScore, // Store modelA score for sorting/aggregation
isCorrect: false, // Not applicable for blind-test
judgeResponse: JSON.stringify({
vote,
isSwapped,
modelAScore,
modelBScore
}),
duration: 0,
status: 0
}
});
// Update task progress
const evalDatasetIds = detail.evalDatasetIds || [];
const newCurrentIndex = (detail.currentIndex || 0) + 1;
const isCompleted = newCurrentIndex >= evalDatasetIds.length;
const updatedDetail = {
...detail,
currentIndex: newCurrentIndex
};
await db.task.update({
where: { id: taskId },
data: {
detail: JSON.stringify(updatedDetail),
completedCount: newCurrentIndex,
status: isCompleted ? 1 : 0, // 1-completed, 0-running
endTime: isCompleted ? new Date() : null
}
});
// Calculate current total scores from EvalResults
const allResults = await db.evalResults.findMany({
where: { taskId },
select: { judgeResponse: true }
});
let totalModelAScore = 0;
let totalModelBScore = 0;
for (const r of allResults) {
try {
const judge = JSON.parse(r.judgeResponse || '{}');
totalModelAScore += judge.modelAScore || 0;
totalModelBScore += judge.modelBScore || 0;
} catch (e) {
// Ignore parse errors
}
}
return NextResponse.json({
code: 0,
data: {
success: true,
isCompleted,
currentIndex: newCurrentIndex,
totalCount: evalDatasetIds.length,
scores: {
modelA: totalModelAScore,
modelB: totalModelBScore
}
},
message: isCompleted ? 'Blind-test task completed' : 'Vote recorded'
});
} catch (error) {
console.error('Failed to submit vote result:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to submit vote result', message: error.message },
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,226 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
/**
* Get all blind-test tasks for a project
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
const page = parseInt(searchParams.get('page') || '1');
const pageSize = parseInt(searchParams.get('pageSize') || '20');
if (!projectId) {
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
}
const skip = (page - 1) * pageSize;
// Fetch task list and total count
const [tasks, total] = await Promise.all([
db.task.findMany({
where: {
projectId,
taskType: 'blind-test'
},
orderBy: { createAt: 'desc' },
skip,
take: pageSize
}),
db.task.count({
where: {
projectId,
taskType: 'blind-test'
}
})
]);
// Fetch evaluation results for all tasks to calculate scores
const taskIds = tasks.map(t => t.id);
const allEvalResults = await db.evalResults.findMany({
where: { taskId: { in: taskIds } },
select: {
taskId: true,
judgeResponse: true
}
});
// Group results by taskId and calculate scores
const taskScores = {};
for (const result of allEvalResults) {
if (!taskScores[result.taskId]) {
taskScores[result.taskId] = { modelAScore: 0, modelBScore: 0 };
}
try {
const judge = JSON.parse(result.judgeResponse || '{}');
taskScores[result.taskId].modelAScore += judge.modelAScore || 0;
taskScores[result.taskId].modelBScore += judge.modelBScore || 0;
} catch (e) {
// Ignore parse errors
}
}
// Parse task detail fields and attach scores
const tasksWithDetails = tasks.map(task => {
let detail = {};
let modelInfo = {};
try {
detail = task.detail ? JSON.parse(task.detail) : {};
modelInfo = task.modelInfo ? JSON.parse(task.modelInfo) : {};
} catch (e) {
console.error('Failed to parse task detail:', e);
}
// Attach calculated scores as results array
const scores = taskScores[task.id] || { modelAScore: 0, modelBScore: 0 };
const results = [
{
modelAScore: scores.modelAScore,
modelBScore: scores.modelBScore
}
];
return {
...task,
detail: {
...detail,
results // Attach results for display in task card
},
modelInfo
};
});
return NextResponse.json({
code: 0,
data: {
items: tasksWithDetails,
total,
page,
pageSize,
totalPages: Math.ceil(total / pageSize)
}
});
} catch (error) {
console.error('Failed to fetch blind-test task list:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to fetch blind-test task list', message: error.message },
{ status: 500 }
);
}
}
/**
* Create a blind-test task
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const data = await request.json();
const { modelA, modelB, evalDatasetIds, language = 'zh-CN' } = data;
if (!modelA || !modelA.modelId || !modelA.providerId) {
return NextResponse.json({ code: 400, error: 'Please select model A' }, { status: 400 });
}
if (!modelB || !modelB.modelId || !modelB.providerId) {
return NextResponse.json({ code: 400, error: 'Please select model B' }, { status: 400 });
}
if (modelA.modelId === modelB.modelId && modelA.providerId === modelB.providerId) {
return NextResponse.json({ code: 400, error: 'The two models must be different' }, { status: 400 });
}
if (!evalDatasetIds || evalDatasetIds.length === 0) {
return NextResponse.json({ code: 400, error: 'Please select questions to evaluate' }, { status: 400 });
}
const evalDatasets = await db.evalDatasets.findMany({
where: {
id: { in: evalDatasetIds },
projectId
},
select: { id: true, questionType: true }
});
const invalidQuestions = evalDatasets.filter(
q => q.questionType !== 'short_answer' && q.questionType !== 'open_ended'
);
if (invalidQuestions.length > 0) {
return NextResponse.json(
{
code: 400,
error: 'Blind-test tasks only support short-answer and open-ended questions'
},
{ status: 400 }
);
}
// Fetch model config info
const [modelConfigA, modelConfigB] = await Promise.all([
db.modelConfig.findFirst({
where: { projectId, providerId: modelA.providerId, modelId: modelA.modelId }
}),
db.modelConfig.findFirst({
where: { projectId, providerId: modelB.providerId, modelId: modelB.modelId }
})
]);
// Build model info (two models)
const modelInfo = {
modelA: {
id: modelConfigA?.id,
modelId: modelA.modelId,
modelName: modelConfigA?.modelName || modelA.modelId,
providerId: modelA.providerId,
providerName: modelConfigA?.providerName || modelA.providerId
},
modelB: {
id: modelConfigB?.id,
modelId: modelB.modelId,
modelName: modelConfigB?.modelName || modelB.modelId,
providerId: modelB.providerId,
providerName: modelConfigB?.providerName || modelB.providerId
}
};
// Build task detail (only store evalDatasetIds and currentIndex)
const taskDetail = {
evalDatasetIds,
currentIndex: 0 // Current question index
};
// Create task
const newTask = await db.task.create({
data: {
projectId,
taskType: 'blind-test',
status: 0, // Running
modelInfo: JSON.stringify(modelInfo),
language,
detail: JSON.stringify(taskDetail),
totalCount: evalDatasetIds.length,
completedCount: 0,
note: ''
}
});
return NextResponse.json({
code: 0,
data: {
...newTask,
detail: taskDetail,
modelInfo
},
message: 'Blind-test task created'
});
} catch (error) {
console.error('Failed to create blind-test task:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to create blind-test task', message: error.message },
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,40 @@
import { NextResponse } from 'next/server';
import logger from '@/lib/util/logger';
import cleanService from '@/lib/services/clean';
// 为指定文本块进行数据清洗
export async function POST(request, { params }) {
try {
const { projectId, chunkId } = params;
// 验证项目ID和文本块ID
if (!projectId || !chunkId) {
return NextResponse.json({ error: 'Project ID or text block ID cannot be empty' }, { status: 400 });
}
// 获取请求体
const { model, language = '中文' } = await request.json();
if (!model) {
return NextResponse.json({ error: 'Model cannot be empty' }, { status: 400 });
}
// 使用数据清洗服务
const result = await cleanService.cleanDataForChunk(projectId, chunkId, {
model,
language
});
// 返回清洗结果
return NextResponse.json({
chunkId,
originalLength: result.originalLength,
cleanedLength: result.cleanedLength,
success: result.success,
message: '数据清洗完成'
});
} catch (error) {
logger.error('Error cleaning data:', error);
return NextResponse.json({ error: error.message || 'Error cleaning data' }, { status: 500 });
}
}

View File

@@ -0,0 +1,35 @@
import { NextResponse } from 'next/server';
import { generateEvalQuestionsForChunk } from '@/lib/services/eval';
import logger from '@/lib/util/logger';
/**
* 为指定文本块生成测评题目
*/
export async function POST(request, { params }) {
try {
const { projectId, chunkId } = params;
// 验证参数
if (!projectId || !chunkId) {
return NextResponse.json({ error: 'Project ID and Chunk ID are required' }, { status: 400 });
}
// 获取请求体
const { model, language = 'zh-CN' } = await request.json();
if (!model) {
return NextResponse.json({ error: 'Model configuration is required' }, { status: 400 });
}
// 调用服务层生成测评题目
const result = await generateEvalQuestionsForChunk(projectId, chunkId, {
model,
language
});
return NextResponse.json(result);
} catch (error) {
logger.error('Error generating eval questions:', error);
return NextResponse.json({ error: error.message || 'Failed to generate eval questions' }, { status: 500 });
}
}

View File

@@ -0,0 +1,73 @@
import { NextResponse } from 'next/server';
import { getQuestionsForChunk } from '@/lib/db/questions';
import logger from '@/lib/util/logger';
import questionService from '@/lib/services/questions';
// 为指定文本块生成问题
export async function POST(request, { params }) {
try {
const { projectId, chunkId } = params;
// 验证项目ID和文本块ID
if (!projectId || !chunkId) {
return NextResponse.json({ error: 'Project ID or text block ID cannot be empty' }, { status: 400 });
} // 获取请求体
const { model, language = '中文', number, enableGaExpansion = false } = await request.json();
if (!model) {
return NextResponse.json({ error: 'Model cannot be empty' }, { status: 400 });
}
// 后续会根据是否有GA对来选择是否启用GA扩展选择服务函数
const serviceFunc = questionService.generateQuestionsForChunkWithGA;
// 使用问题生成服务
const result = await serviceFunc(projectId, chunkId, {
model,
language,
number,
enableGaExpansion
});
// 统一返回格式确保包含GA扩展信息
const response = {
chunkId,
questions: result.questions || result.labelQuestions || [],
total: result.total || (result.questions || result.labelQuestions || []).length,
gaExpansionUsed: result.gaExpansionUsed || false,
gaPairsCount: result.gaPairsCount || 0,
expectedTotal: result.expectedTotal || result.total
};
// 返回生成的问题
return NextResponse.json(response);
} catch (error) {
logger.error('Error generating questions:', error);
return NextResponse.json({ error: error.message || 'Error generating questions' }, { status: 500 });
}
}
// 获取指定文本块的问题
export async function GET(request, { params }) {
try {
const { projectId, chunkId } = params;
// 验证项目ID和文本块ID
if (!projectId || !chunkId) {
return NextResponse.json({ error: 'The item ID or text block ID cannot be empty' }, { status: 400 });
}
// 获取文本块的问题
const questions = await getQuestionsForChunk(projectId, chunkId);
// 返回问题列表
return NextResponse.json({
chunkId,
questions,
total: questions.length
});
} catch (error) {
console.error('Error getting questions:', String(error));
return NextResponse.json({ error: error.message || 'Error getting questions' }, { status: 500 });
}
}

View File

@@ -0,0 +1,73 @@
import { NextResponse } from 'next/server';
import { deleteChunkById, getChunkById, updateChunkById } from '@/lib/db/chunks';
// 获取文本块内容
export async function GET(request, { params }) {
try {
const { projectId, chunkId } = params;
// 验证参数
if (!projectId) {
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
}
if (!chunkId) {
return NextResponse.json({ error: 'Text block ID cannot be empty' }, { status: 400 });
}
// 获取文本块内容
const chunk = await getChunkById(chunkId);
return NextResponse.json(chunk);
} catch (error) {
console.error('Failed to get text block content:', String(error));
return NextResponse.json({ error: error.message || 'Failed to get text block content' }, { status: 500 });
}
}
// 删除文本块
export async function DELETE(request, { params }) {
try {
const { projectId, chunkId } = params;
// 验证参数
if (!projectId) {
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
}
if (!chunkId) {
return NextResponse.json({ error: 'Text block ID cannot be empty' }, { status: 400 });
}
await deleteChunkById(chunkId);
return NextResponse.json({ message: 'Text block deleted successfully' });
} catch (error) {
console.error('Failed to delete text block:', String(error));
return NextResponse.json({ error: error.message || 'Failed to delete text block' }, { status: 500 });
}
}
// 编辑文本块内容
export async function PATCH(request, { params }) {
try {
const { projectId, chunkId } = params;
// 验证参数
if (!projectId) {
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
}
if (!chunkId) {
return NextResponse.json({ error: '文本块ID不能为空' }, { status: 400 });
}
// 解析请求体获取新内容
const requestData = await request.json();
const { content } = requestData;
if (!content) {
return NextResponse.json({ error: '内容不能为空' }, { status: 400 });
}
let res = await updateChunkById(chunkId, { content });
return NextResponse.json(res);
} catch (error) {
console.error('编辑文本块失败:', String(error));
return NextResponse.json({ error: error.message || '编辑文本块失败' }, { status: 500 });
}
}

View File

@@ -0,0 +1,20 @@
import { getChunkContentsByNames } from '@/lib/db/chunks';
import { NextResponse } from 'next/server';
export async function POST(request, { params }) {
try {
const { projectId } = params;
const { chunkNames } = await request.json();
if (!chunkNames || !Array.isArray(chunkNames)) {
return NextResponse.json({ error: 'chunkNames 参数必须是数组' }, { status: 400 });
}
const chunkContentMap = await getChunkContentsByNames(projectId, chunkNames);
return NextResponse.json(chunkContentMap);
} catch (error) {
console.error('批量获取文本块内容失败:', error);
return NextResponse.json({ error: '批量获取文本块内容失败' }, { status: 500 });
}
}

View File

@@ -0,0 +1,102 @@
import { NextRequest, NextResponse } from 'next/server';
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();
/**
* 批量编辑文本块内容
* POST /api/projects/[projectId]/chunks/batch-edit
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const body = await request.json();
const { position, content, chunkIds } = body;
// 验证参数
if (!position || !content || !chunkIds || !Array.isArray(chunkIds) || chunkIds.length === 0) {
return NextResponse.json({ error: 'Missing required parameters: position, content, chunkIds' }, { status: 400 });
}
if (!['start', 'end'].includes(position)) {
return NextResponse.json({ error: 'Position must be "start" or "end"' }, { status: 400 });
}
// 验证项目权限(获取要编辑的文本块)
const chunksToUpdate = await prisma.chunks.findMany({
where: {
id: { in: chunkIds },
projectId: projectId
},
select: {
id: true,
content: true,
name: true
}
});
if (chunksToUpdate.length === 0) {
return NextResponse.json({ error: 'Not found' }, { status: 404 });
}
if (chunksToUpdate.length !== chunkIds.length) {
return NextResponse.json({ error: 'Some chunks not found' }, { status: 400 });
}
// 准备更新数据
const updates = chunksToUpdate.map(chunk => {
let newContent;
if (position === 'start') {
// 在开头添加内容
newContent = content + '\n\n' + chunk.content;
} else {
// 在结尾添加内容
newContent = chunk.content + '\n\n' + content;
}
return {
where: { id: chunk.id },
data: {
content: newContent,
size: newContent.length,
updateAt: new Date()
}
};
});
async function processBatches(items, batchSize, processFn) {
const results = [];
for (let i = 0; i < items.length; i += batchSize) {
const batch = items.slice(i, i + batchSize);
const batchResults = await Promise.all(batch.map(processFn));
results.push(...batchResults);
}
return results;
}
const BATCH_SIZE = 50; // 每批处理 50 个
await processBatches(updates, BATCH_SIZE, update => prisma.chunks.update(update));
// 记录操作日志(可选)
console.log(`Successfully updated ${chunksToUpdate.length} chunks`);
return NextResponse.json({
success: true,
updatedCount: chunksToUpdate.length,
message: `Successfully updated ${chunksToUpdate.length} chunks`
});
} catch (error) {
console.error('批量编辑文本块失败:', error);
return NextResponse.json(
{
error: 'Batch edit chunks failed',
details: error.message
},
{ status: 500 }
);
} finally {
await prisma.$disconnect();
}
}

View File

@@ -0,0 +1,35 @@
import { NextResponse } from 'next/server';
import { getChunkByName } from '@/lib/db/chunks';
/**
* 根据文本块名称获取文本块
* @param {Request} request 请求对象
* @param {object} context 上下文,包含路径参数
* @returns {Promise<NextResponse>} 响应对象
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
// 从查询参数中获取 chunkName
const { searchParams } = new URL(request.url);
const chunkName = searchParams.get('chunkName');
if (!chunkName) {
return NextResponse.json({ error: '文本块名称不能为空' }, { status: 400 });
}
// 根据名称和项目ID查询文本块
const chunk = await getChunkByName(projectId, chunkName);
if (!chunk) {
return NextResponse.json({ error: '未找到指定的文本块' }, { status: 404 });
}
// 返回文本块信息
return NextResponse.json(chunk);
} catch (error) {
console.error('根据名称获取文本块失败:', String(error));
return NextResponse.json({ error: '获取文本块失败: ' + error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,21 @@
import { NextResponse } from 'next/server';
import { deleteChunkById, getChunkByFileIds, getChunkById, getChunksByFileIds, updateChunkById } from '@/lib/db/chunks';
// 获取文本块内容
export async function POST(request, { params }) {
try {
const { projectId } = params;
// 验证参数
if (!projectId) {
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
}
const { array } = await request.json();
// 获取文本块内容
const chunk = await getChunksByFileIds(array);
return NextResponse.json(chunk);
} catch (error) {
console.error('Failed to get text block content:', String(error));
return NextResponse.json({ error: String(error) || 'Failed to get text block content' }, { status: 500 });
}
}

View File

@@ -0,0 +1,36 @@
import { NextResponse } from 'next/server';
import { getProject, updateProject, getTaskConfig } from '@/lib/db/projects';
// 获取项目配置
export async function GET(request, { params }) {
try {
const projectId = params.projectId;
const config = await getProject(projectId);
const taskConfig = await getTaskConfig(projectId);
return NextResponse.json({ ...config, ...taskConfig });
} catch (error) {
console.error('获取项目配置失败:', String(error));
return NextResponse.json({ error: error.message }, { status: 500 });
}
}
// 更新项目配置
export async function PUT(request, { params }) {
try {
const projectId = params.projectId;
const newConfig = await request.json();
const currentConfig = await getProject(projectId);
// 只更新 prompts 部分
const updatedConfig = {
...currentConfig,
...newConfig.prompts
};
const config = await updateProject(projectId, updatedConfig);
return NextResponse.json(config);
} catch (error) {
console.error('更新项目配置失败:', String(error));
return NextResponse.json({ error: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,105 @@
import { NextResponse } from 'next/server';
import {
getCustomPrompts,
getCustomPrompt,
saveCustomPrompt,
deleteCustomPrompt,
batchSaveCustomPrompts,
toggleCustomPrompt,
getPromptTemplates
} from '@/lib/db/custom-prompts';
// 获取项目的自定义提示词
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
const promptType = searchParams.get('promptType');
const language = searchParams.get('language');
if (!projectId) {
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
}
const customPrompts = await getCustomPrompts(projectId, promptType, language);
const templates = await getPromptTemplates();
return NextResponse.json({
success: true,
customPrompts,
templates
});
} catch (error) {
console.error('获取自定义提示词失败:', error);
return NextResponse.json({ error: error.message }, { status: 500 });
}
}
// 保存自定义提示词
export async function POST(request, { params }) {
try {
const { projectId } = params;
const body = await request.json();
if (!projectId) {
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
}
// 批量保存
if (body.prompts && Array.isArray(body.prompts)) {
const results = await batchSaveCustomPrompts(projectId, body.prompts);
return NextResponse.json({
success: true,
results
});
}
// 单个保存
const { promptType, promptKey, language, content } = body;
if (!promptType || !promptKey || !language || content === undefined) {
return NextResponse.json(
{
error: 'promptType, promptKey, language and content are required'
},
{ status: 400 }
);
}
const result = await saveCustomPrompt(projectId, promptType, promptKey, language, content);
return NextResponse.json({
success: true,
result
});
} catch (error) {
console.error('保存自定义提示词失败:', error);
return NextResponse.json({ error: error.message }, { status: 500 });
}
}
// 删除自定义提示词
export async function DELETE(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
const promptType = searchParams.get('promptType');
const promptKey = searchParams.get('promptKey');
const language = searchParams.get('language');
if (!projectId || !promptType || !promptKey || !language) {
return NextResponse.json(
{
error: 'projectId, promptType, promptKey and language are required'
},
{ status: 400 }
);
}
const success = await deleteCustomPrompt(projectId, promptType, promptKey, language);
return NextResponse.json({
success
});
} catch (error) {
console.error('删除自定义提示词失败:', error);
return NextResponse.json({ error: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,116 @@
import { NextResponse } from 'next/server';
import { saveChunks, deleteChunksByFileId } from '@/lib/db/chunks';
import path from 'path';
import fs from 'fs/promises';
import { getProjectRoot } from '@/lib/db/base';
/**
* 处理自定义分块请求
* @param {Request} request - 请求对象
* @param {Object} params - 路由参数
* @returns {Promise<Response>} - 响应对象
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const { fileId, fileName, content, splitPoints } = await request.json();
// 参数验证
if (!projectId || !fileId || !fileName || !content || !splitPoints) {
return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
}
// 获取项目根目录
const projectRoot = await getProjectRoot();
const projectPath = path.join(projectRoot, projectId);
// 检查项目是否存在
try {
await fs.access(projectPath);
} catch (error) {
return NextResponse.json({ error: 'Project does not exist' }, { status: 404 });
}
// 先删除该文件已有的文本块
await deleteChunksByFileId(projectId, fileId);
// 根据分块点将文件内容分割成多个块
const customChunks = generateCustomChunks(projectId, fileId, fileName, content, splitPoints);
// 保存新的文本块
await saveChunks(customChunks);
return NextResponse.json({
success: true,
message: 'Custom chunks saved successfully',
totalChunks: customChunks.length
});
} catch (error) {
console.error('自定义分块处理出错:', String(error));
return NextResponse.json({ error: error.message || 'Failed to process custom split request' }, { status: 500 });
}
}
/**
* 根据分块点生成自定义文本块
* @param {string} projectId - 项目ID
* @param {string} fileId - 文件ID
* @param {string} fileName - 文件名
* @param {string} content - 文件内容
* @param {Array} splitPoints - 分块点数组
* @returns {Array} - 生成的文本块数组
*/
function generateCustomChunks(projectId, fileId, fileName, content, splitPoints) {
// 按位置排序分块点
const sortedPoints = [...splitPoints].sort((a, b) => a.position - b.position);
// 创建分块
const chunks = [];
let startPos = 0;
// 处理每个分块点
for (let i = 0; i < sortedPoints.length; i++) {
const endPos = sortedPoints[i].position;
// 提取当前分块内容
const chunkContent = content.substring(startPos, endPos);
// 跳过空白分块
if (chunkContent.trim().length === 0) {
startPos = endPos;
continue;
}
// 创建分块对象
const chunk = {
projectId,
name: `${path.basename(fileName, path.extname(fileName))}-part-${i + 1}`,
fileId,
fileName,
content: chunkContent,
summary: `${fileName} 自定义分块 ${i + 1}/${sortedPoints.length + 1}`,
size: chunkContent.length
};
chunks.push(chunk);
startPos = endPos;
}
// 添加最后一个分块(如果有内容)
const lastChunkContent = content.substring(startPos);
if (lastChunkContent.trim().length > 0) {
const lastChunk = {
projectId,
name: `${path.basename(fileName, path.extname(fileName))}-part-${sortedPoints.length + 1}`,
fileId,
fileName,
content: lastChunkContent,
summary: `${fileName} 自定义分块 ${sortedPoints.length + 1}/${sortedPoints.length + 1}`,
size: lastChunkContent.length
};
chunks.push(lastChunk);
}
return chunks;
}

View File

@@ -0,0 +1,183 @@
/**
* 单个多轮对话数据集操作API
*/
import { NextResponse } from 'next/server';
import {
getDatasetConversationById,
updateDatasetConversation,
deleteDatasetConversation,
getConversationNavigationItems
} from '@/lib/db/dataset-conversations';
/**
* 获取单个多轮对话数据集详情
*/
export async function GET(request, { params }) {
try {
const { projectId, conversationId } = params;
const { searchParams } = new URL(request.url);
const operateType = searchParams.get('operateType');
// 如果是导航操作,返回导航项
if (operateType !== null) {
const data = await getConversationNavigationItems(projectId, conversationId, operateType);
return NextResponse.json(data);
}
const conversation = await getDatasetConversationById(conversationId);
if (!conversation) {
return NextResponse.json(
{
success: false,
message: '对话数据集不存在'
},
{ status: 404 }
);
}
if (conversation.projectId !== projectId) {
return NextResponse.json(
{
success: false,
message: '对话数据集不属于指定项目'
},
{ status: 403 }
);
}
return NextResponse.json(conversation);
} catch (error) {
console.error('获取多轮对话数据集详情失败:', error);
return NextResponse.json(
{
success: false,
message: error.message
},
{ status: 500 }
);
}
}
/**
* 更新多轮对话数据集
*/
export async function PUT(request, { params }) {
try {
const { projectId, conversationId } = params;
const body = await request.json();
// 验证对话数据集是否存在且属于项目
const conversation = await getDatasetConversationById(conversationId);
if (!conversation) {
return NextResponse.json(
{
success: false,
message: '对话数据集不存在'
},
{ status: 404 }
);
}
if (conversation.projectId !== projectId) {
return NextResponse.json(
{
success: false,
message: '对话数据集不属于指定项目'
},
{ status: 403 }
);
}
// 只允许更新特定字段
const allowedFields = ['score', 'tags', 'note', 'confirmed', 'aiEvaluation', 'messages'];
const updateData = {};
allowedFields.forEach(field => {
if (body.hasOwnProperty(field)) {
if (field === 'messages') {
// 将messages数组转换为rawMessages字符串存储
updateData['rawMessages'] = JSON.stringify(body[field]);
} else {
updateData[field] = body[field];
}
}
});
if (Object.keys(updateData).length === 0) {
return NextResponse.json(
{
success: false,
message: '没有有效的更新字段'
},
{ status: 400 }
);
}
const updatedConversation = await updateDatasetConversation(conversationId, updateData);
return NextResponse.json({
success: true,
data: updatedConversation
});
} catch (error) {
console.error('更新多轮对话数据集失败:', error);
return NextResponse.json(
{
success: false,
message: error.message
},
{ status: 500 }
);
}
}
/**
* 删除多轮对话数据集
*/
export async function DELETE(request, { params }) {
try {
const { projectId, conversationId } = params;
// 验证对话数据集是否存在且属于项目
const conversation = await getDatasetConversationById(conversationId);
if (!conversation) {
return NextResponse.json(
{
success: false,
message: '对话数据集不存在'
},
{ status: 404 }
);
}
if (conversation.projectId !== projectId) {
return NextResponse.json(
{
success: false,
message: '对话数据集不属于指定项目'
},
{ status: 403 }
);
}
await deleteDatasetConversation(conversationId);
return NextResponse.json({
success: true,
message: '删除成功'
});
} catch (error) {
console.error('删除多轮对话数据集失败:', error);
return NextResponse.json(
{
success: false,
message: error.message
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,68 @@
/**
* 多轮对话数据集导出API
* 直接导出原始的 ShareGPT 格式数据集
*/
import { NextResponse } from 'next/server';
import { getAllDatasetConversations } from '@/lib/db/dataset-conversations';
/**
* 导出多轮对话数据集
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
// 筛选条件
const filters = {
confirmed: searchParams.get('confirmed')
};
// 清除空值
Object.keys(filters).forEach(key => {
if (!filters[key]) delete filters[key];
});
// 获取所有对话数据集
const conversations = await getAllDatasetConversations(projectId, filters);
if (conversations.length === 0) {
return NextResponse.json([]);
}
// 转换为 ShareGPT 格式数组
const shareGptData = [];
for (const conversation of conversations) {
try {
// 解析 rawMessages
const messages = JSON.parse(conversation.rawMessages || '[]');
if (messages.length > 0) {
// 构建 ShareGPT 格式对象
const shareGptItem = {
messages: messages
};
shareGptData.push(shareGptItem);
}
} catch (error) {
console.error(`解析对话消息失败 ${conversation.id}:`, error);
// 跳过解析失败的对话,继续处理其他对话
continue;
}
}
return NextResponse.json(shareGptData);
} catch (error) {
console.error('导出多轮对话数据集失败:', error);
return NextResponse.json(
{
success: false,
message: error.message
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,135 @@
/**
* 多轮对话数据集管理API
*/
import { NextResponse } from 'next/server';
import {
getDatasetConversationsByPagination,
getAllDatasetConversationIds,
createDatasetConversation
} from '@/lib/db/dataset-conversations';
import { generateMultiTurnConversation } from '@/lib/services/multi-turn/index';
/**
* 获取多轮对话数据集列表(支持分页和筛选)
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
const getAllIds = searchParams.get('getAllIds') === 'true'; // 新增获取所有对话ID的标志
// 筛选条件
const filters = {
keyword: searchParams.get('keyword'),
roleA: searchParams.get('roleA'),
roleB: searchParams.get('roleB'),
scenario: searchParams.get('scenario'),
scoreMin: searchParams.get('scoreMin'),
scoreMax: searchParams.get('scoreMax'),
confirmed: searchParams.get('confirmed')
};
// 清除空值
Object.keys(filters).forEach(key => {
if (!filters[key]) delete filters[key];
});
// 如果请求获取所有ID
if (getAllIds) {
const allConversationIds = await getAllDatasetConversationIds(projectId, filters);
return NextResponse.json({ allConversationIds });
}
// 正常分页查询
const page = parseInt(searchParams.get('page') || '1');
const pageSize = parseInt(searchParams.get('pageSize') || '20');
const result = await getDatasetConversationsByPagination(projectId, page, pageSize, filters);
return NextResponse.json({
success: true,
...result
});
} catch (error) {
console.error('获取多轮对话数据集失败:', error);
return NextResponse.json(
{
success: false,
message: error.message
},
{ status: 500 }
);
}
}
/**
* 创建多轮对话数据集
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const body = await request.json();
const { questionId, systemPrompt, scenario, rounds, roleA, roleB, model, language = '中文' } = body;
if (!questionId) {
return NextResponse.json(
{
success: false,
message: '问题ID不能为空'
},
{ status: 400 }
);
}
if (!model || !model.modelId) {
return NextResponse.json(
{
success: false,
message: '模型配置不能为空'
},
{ status: 400 }
);
}
// 构建配置
const config = {
systemPrompt: systemPrompt || '',
scenario: scenario || '',
rounds: rounds || 3,
roleA: roleA || '用户',
roleB: roleB || '助手',
model,
language
};
// 生成多轮对话
const result = await generateMultiTurnConversation(projectId, questionId, config);
if (!result.success) {
return NextResponse.json(
{
success: false,
message: result.error
},
{ status: 500 }
);
}
return NextResponse.json({
success: true,
data: result.data
});
} catch (error) {
console.error('创建多轮对话数据集失败:', error);
return NextResponse.json(
{
success: false,
message: error.message
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,42 @@
import { NextResponse } from 'next/server';
import { getAllDatasetConversations } from '@/lib/db/dataset-conversations';
/**
* 获取项目中多轮对话数据集的所有标签
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
if (!projectId) {
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
}
// 获取项目所有对话数据集
const conversations = await getAllDatasetConversations(projectId);
// 提取所有标签
const allTags = new Set();
conversations.forEach(conversation => {
if (conversation.tags && typeof conversation.tags === 'string') {
const tags = conversation.tags.split(/\s+/).filter(tag => tag.trim().length > 0);
tags.forEach(tag => allTags.add(tag.trim()));
}
});
return NextResponse.json({
success: true,
tags: Array.from(allTags).sort()
});
} catch (error) {
console.error('获取对话标签失败:', error);
return NextResponse.json(
{
success: false,
message: error.message
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,77 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db';
export async function POST(req, { params }) {
try {
const { projectId, datasetId } = params;
// 1. 获取数据集详情
const dataset = await db.datasets.findUnique({
where: { id: datasetId, projectId }
});
if (!dataset) {
return NextResponse.json({ error: 'Dataset not found' }, { status: 404 });
}
// 2. 尝试通过 questionId 查找关联的 chunkId
let chunkId = null;
if (dataset.questionId) {
const question = await db.questions.findUnique({
where: { id: dataset.questionId }
});
if (question) {
chunkId = question.chunkId;
}
}
// 3. 创建评估数据集记录
// 默认使用 open_ended 类型,因为通常数据集是问答对,适合作为评估
let evalTags = [];
try {
evalTags = JSON.parse(dataset.tags || '[]');
if (!Array.isArray(evalTags)) evalTags = [];
} catch (e) {
evalTags = [];
}
// 排除 'Eval' 标签,并将数组转为逗号分隔的字符串
const evalTagsString = evalTags.filter(tag => tag !== 'Eval').join(',');
const evalDataset = await db.evalDatasets.create({
data: {
projectId,
question: dataset.question,
questionType: 'open_ended',
correctAnswer: dataset.answer,
tags: evalTagsString,
note: dataset.note,
chunkId: chunkId,
options: '' // 开放题不需要选项
}
});
// 4. 更新原数据集,添加 'Eval' 标签
let currentTags = [];
try {
currentTags = JSON.parse(dataset.tags || '[]');
} catch (e) {
// ignore error
}
if (!currentTags.includes('Eval')) {
currentTags.push('Eval');
await db.datasets.update({
where: { id: datasetId },
data: {
tags: JSON.stringify(currentTags)
}
});
}
return NextResponse.json({ success: true, evalDataset });
} catch (error) {
console.error('Failed to copy dataset to eval:', error);
return NextResponse.json({ error: 'Internal Server Error' }, { status: 500 });
}
}

View File

@@ -0,0 +1,36 @@
import { NextResponse } from 'next/server';
import { evaluateDataset } from '@/lib/services/datasets/evaluation';
/**
* 评估单个数据集的质量
*/
export async function POST(request, { params }) {
try {
const { projectId, datasetId } = params;
const { model, language = 'zh-CN' } = await request.json();
if (!projectId || !datasetId) {
return NextResponse.json({ success: false, message: '项目ID和数据集ID不能为空' }, { status: 400 });
}
if (!model) {
return NextResponse.json({ success: false, message: '模型配置不能为空' }, { status: 400 });
}
// 使用评估服务进行数据集评估
const result = await evaluateDataset(projectId, datasetId, model, language);
if (!result.success) {
return NextResponse.json({ success: false, message: result.error }, { status: 500 });
}
return NextResponse.json({
success: true,
message: '数据集评估完成',
data: result.data
});
} catch (error) {
console.error('数据集评估失败:', error);
return NextResponse.json({ success: false, message: `评估失败: ${error.message}` }, { status: 500 });
}
}

View File

@@ -0,0 +1,82 @@
import { NextResponse } from 'next/server';
import { getDatasetsById, getDatasetsCounts, getNavigationItems, updateDatasetMetadata } from '@/lib/db/datasets';
/**
* 获取项目的所有数据集
*/
export async function GET(request, { params }) {
try {
const { projectId, datasetId } = params;
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
}
if (!datasetId) {
return NextResponse.json({ error: '数据集ID不能为空' }, { status: 400 });
}
const { searchParams } = new URL(request.url);
const operateType = searchParams.get('operateType');
if (operateType !== null) {
const data = await getNavigationItems(projectId, datasetId, operateType);
return NextResponse.json(data);
}
const datasets = await getDatasetsById(datasetId);
let counts = await getDatasetsCounts(projectId);
return NextResponse.json({ datasets, ...counts });
} catch (error) {
console.error('获取数据集详情失败:', String(error));
return NextResponse.json(
{
error: error.message || '获取数据集详情失败'
},
{ status: 500 }
);
}
}
/**
* 更新数据集元数据(评分、标签、备注)
*/
export async function PATCH(request, { params }) {
try {
const { projectId, datasetId } = params;
// 验证参数
if (!projectId) {
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
}
if (!datasetId) {
return NextResponse.json({ error: '数据集ID不能为空' }, { status: 400 });
}
const body = await request.json();
const { score, tags, note } = body;
// 验证评分范围
if (score !== undefined && (score < 0 || score > 5)) {
return NextResponse.json({ error: '评分必须在0-5之间' }, { status: 400 });
}
// 验证标签格式
if (tags !== undefined && !Array.isArray(tags)) {
return NextResponse.json({ error: '标签必须是数组格式' }, { status: 400 });
}
// 更新数据集元数据
const updatedDataset = await updateDatasetMetadata(datasetId, { score, tags, note });
return NextResponse.json({
success: true,
dataset: updatedDataset
});
} catch (error) {
console.error('更新数据集元数据失败:', String(error));
return NextResponse.json(
{
error: error.message || '更新数据集元数据失败'
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,52 @@
import { NextResponse } from 'next/server';
import { getDatasetsById } from '@/lib/db/datasets';
import { getEncoding } from '@langchain/core/utils/tiktoken';
/**
* 异步计算数据集文本的Token数量
*/
export async function GET(request, { params }) {
try {
const { projectId, datasetId } = params;
if (!datasetId) {
return NextResponse.json({ error: '数据集ID不能为空' }, { status: 400 });
}
const datasets = await getDatasetsById(datasetId);
const tokenCounts = {
answerTokens: 0,
cotTokens: 0
};
try {
if (datasets.answer || datasets.cot) {
// 使用 cl100k_base 编码,适用于 gpt-3.5-turbo 和 gpt-4
const encoding = await getEncoding('cl100k_base');
if (datasets.answer) {
const tokens = encoding.encode(datasets.answer);
tokenCounts.answerTokens = tokens.length;
}
if (datasets.cot) {
const tokens = encoding.encode(datasets.cot);
tokenCounts.cotTokens = tokens.length;
}
}
} catch (error) {
console.error('计算Token数量失败:', String(error));
return NextResponse.json({ error: '计算Token数量失败' }, { status: 500 });
}
return NextResponse.json(tokenCounts);
} catch (error) {
console.error('获取Token计数失败:', String(error));
return NextResponse.json(
{
error: error.message || '获取Token计数失败'
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,55 @@
/**
* 批量数据集评估任务API
* 创建批量评估数据集质量的异步任务
*/
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
import { processTask } from '@/lib/services/tasks/index';
/**
* 创建批量数据集评估任务
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const { model, language = 'zh-CN' } = await request.json();
if (!projectId) {
return NextResponse.json({ success: false, message: '项目ID不能为空' }, { status: 400 });
}
if (!model || !model.modelId) {
return NextResponse.json({ success: false, message: '模型配置不能为空' }, { status: 400 });
}
// 创建批量评估任务
const newTask = await db.task.create({
data: {
projectId,
taskType: 'dataset-evaluation',
status: 0, // 初始状态: 处理中
modelInfo: JSON.stringify(model),
language: language || 'zh-CN',
detail: '',
totalCount: 0,
note: '准备开始批量评估数据集质量...',
completedCount: 0
}
});
// 异步处理任务
processTask(newTask.id).catch(err => {
console.error(`批量评估任务启动失败: ${newTask.id}`, String(err));
});
return NextResponse.json({
success: true,
message: '批量评估任务已创建',
data: { taskId: newTask.id }
});
} catch (error) {
console.error('创建批量评估任务失败:', error);
return NextResponse.json({ success: false, message: `创建任务失败: ${error.message}` }, { status: 500 });
}
}

View File

@@ -0,0 +1,128 @@
import { NextResponse } from 'next/server';
import {
getDatasets,
getBalancedDatasetsByTags,
getTagsWithDatasetCounts,
getDatasetsBatch,
getBalancedDatasetsByTagsBatch,
getDatasetsByIds,
getDatasetsByIdsBatch
} from '@/lib/db/datasets';
/**
* 获取导出数据集
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
}
const confirmedParam = searchParams.get('confirmed');
const confirmed = confirmedParam === null ? undefined : confirmedParam === 'true';
// 获取标签统计信息
const tagStats = await getTagsWithDatasetCounts(projectId, confirmed);
return NextResponse.json(tagStats);
} catch (error) {
console.error('Failed to get tag statistics:', String(error));
return NextResponse.json(
{
error: error.message || 'Failed to get tag statistics'
},
{ status: 500 }
);
}
}
/**
* 获取标签统计信息
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const body = await request.json();
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
}
let status = body.status;
let confirmed = undefined;
if (status === 'confirmed') confirmed = true;
if (status === 'unconfirmed') confirmed = false;
// 检查是否是分批导出模式
const batchMode = body.batchMode ? 'true' : 'false';
const offset = body.offset ?? 0;
const batchSize = body.batchSize ?? 1000;
// 检查是否是平衡导出
const balanceMode = body.balanceMode ? 'true' : 'false';
const balanceConfig = body.balanceConfig;
// 检查是否有选中的数据集 ID
const selectedIds = Array.isArray(body.selectedIds) ? body.selectedIds : null;
if (batchMode === 'true') {
// 分批导出模式
if (selectedIds && selectedIds.length > 0) {
// 按选中 ID 分批导出
const datasets = await getDatasetsByIdsBatch(projectId, selectedIds, offset, batchSize);
const hasMore = datasets.length === batchSize;
return NextResponse.json({
data: datasets,
hasMore,
offset: offset + datasets.length
});
} else if (balanceMode === 'true' && balanceConfig) {
// 平衡分批导出
const parsedConfig = typeof balanceConfig === 'string' ? JSON.parse(balanceConfig) : balanceConfig;
const result = await getBalancedDatasetsByTagsBatch(projectId, parsedConfig, confirmed, offset, batchSize);
return NextResponse.json({
data: result.data,
hasMore: result.hasMore,
offset: offset + result.data.length
});
} else {
// 常规分批导出
const datasets = await getDatasetsBatch(projectId, confirmed, offset, batchSize);
const hasMore = datasets.length === batchSize;
return NextResponse.json({
data: datasets,
hasMore,
offset: offset + datasets.length
});
}
} else {
// 传统一次性导出模式(保持向后兼容)
if (selectedIds && selectedIds.length > 0) {
// 按选中 ID 导出
const datasets = await getDatasetsByIds(projectId, selectedIds);
return NextResponse.json(datasets);
} else if (balanceMode === 'true' && balanceConfig) {
// 平衡导出模式
const parsedConfig = typeof balanceConfig === 'string' ? JSON.parse(balanceConfig) : balanceConfig;
const datasets = await getBalancedDatasetsByTags(projectId, parsedConfig, confirmed);
return NextResponse.json(datasets);
} else {
// 常规导出模式
const datasets = await getDatasets(projectId, confirmed);
return NextResponse.json(datasets);
}
}
} catch (error) {
console.error('Failed to get datasets:', String(error));
return NextResponse.json(
{
error: error.message || 'Failed to get datasets'
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,44 @@
import { NextResponse } from 'next/server';
import { getDatasetsById } from '@/lib/db/datasets';
import LLMClient from '@/lib/llm/core/index';
import { getEvalQuestionPrompt } from '@/lib/llm/prompts/evalQuestion';
import { extractJsonFromLLMOutput } from '@/lib/llm/common/util';
export async function POST(request, { params }) {
try {
const { projectId } = params;
const { datasetId, model, language, questionType = 'open_ended', count = 1 } = await request.json();
if (!datasetId || !model) {
return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
}
// 1. 获取原数据集
const dataset = await getDatasetsById(datasetId);
if (!dataset) {
return NextResponse.json({ error: 'Dataset not found' }, { status: 404 });
}
// 2. 构建提示词
// 将原问题和答案合并作为上下文文本
const text = `Question: ${dataset.question}\nAnswer: ${dataset.answer}`;
const prompt = await getEvalQuestionPrompt(language || 'zh-CN', questionType, { text, number: count }, projectId);
// 3. 调用 LLM
const client = new LLMClient(model);
const response = await client.getResponse(prompt);
const result = extractJsonFromLLMOutput(response);
// 结果应该是一个数组
if (!result || !Array.isArray(result)) {
throw new Error('Failed to parse LLM output or output is not an array');
}
return NextResponse.json({ success: true, data: result });
} catch (error) {
console.error('Generate eval variant failed:', error);
return NextResponse.json({ error: error.message || 'Internal Server Error' }, { status: 500 });
}
}

View File

@@ -0,0 +1,109 @@
import { NextResponse } from 'next/server';
import { createDataset } from '@/lib/db/datasets';
import { nanoid } from 'nanoid';
export async function POST(request, { params }) {
try {
const { projectId } = params;
const { datasets, sourceInfo } = await request.json();
if (!datasets || !Array.isArray(datasets)) {
return NextResponse.json({ error: 'Invalid datasets data' }, { status: 400 });
}
const results = [];
const errors = [];
let successCount = 0;
let skippedCount = 0;
for (let i = 0; i < datasets.length; i++) {
try {
const dataset = datasets[i];
// 安全获取与清洗字段
const q = typeof dataset?.question === 'string' ? dataset.question.trim() : '';
const a = typeof dataset?.answer === 'string' ? dataset.answer.trim() : '';
// 验证必填字段:缺失则跳过
if (!q || !a) {
errors.push(`${i + 1} 条记录缺少必填字段(question/answer),已跳过`);
skippedCount++;
continue;
}
// 规范化可选字段
const chunkName = dataset?.chunkName || 'Imported Data';
const chunkContent = dataset?.chunkContent || 'Imported from external source';
const model = dataset?.model || 'imported';
const questionLabel = dataset?.questionLabel || '';
const cot = typeof dataset?.cot === 'string' ? dataset.cot : '';
const confirmed = typeof dataset?.confirmed === 'boolean' ? dataset.confirmed : false;
const score = typeof dataset?.score === 'number' ? dataset.score : 0;
// tags: 支持数组/字符串/对象
let tags = '[]';
if (Array.isArray(dataset?.tags)) {
try {
tags = JSON.stringify(dataset.tags);
} catch {
tags = '[]';
}
} else if (typeof dataset?.tags === 'string') {
tags = dataset.tags;
} else if (dataset?.tags && typeof dataset.tags === 'object') {
try {
tags = JSON.stringify(dataset.tags);
} catch {
tags = '[]';
}
}
// other: 对象或字符串
let other = '{}';
if (typeof dataset?.other === 'string') {
other = dataset.other;
} else if (dataset?.other && typeof dataset.other === 'object') {
try {
other = JSON.stringify(dataset.other);
} catch {
other = '{}';
}
}
const note = typeof dataset?.note === 'string' ? dataset.note : '';
// 创建数据集记录
const newDataset = await createDataset({
projectId,
questionId: nanoid(), // 生成唯一的问题ID
question: q,
answer: a,
chunkName,
chunkContent,
model,
questionLabel,
cot,
confirmed,
score,
tags,
note,
other
});
results.push(newDataset);
successCount++;
} catch (error) {
errors.push(`${i + 1} 条记录: ${error.message}`);
}
}
return NextResponse.json({
success: successCount,
total: datasets.length,
failed: errors.length,
skipped: skippedCount,
errors,
sourceInfo
});
} catch (error) {
console.error('Import datasets error:', error);
return NextResponse.json({ error: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,89 @@
import { NextResponse } from 'next/server';
import { getDatasetsById, updateDataset } from '@/lib/db/datasets';
import { getQuestionById } from '@/lib/db/questions';
import { getChunkById } from '@/lib/db/chunks';
import LLMClient from '@/lib/llm/core/index';
import { getNewAnswerPrompt } from '@/lib/llm/prompts/newAnswer';
import { extractJsonFromLLMOutput } from '@/lib/llm/common/util';
// 优化数据集答案
export async function POST(request, { params }) {
try {
const { projectId } = params;
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
}
// 获取请求体
const { datasetId, model, advice, language } = await request.json();
if (!datasetId) {
return NextResponse.json({ error: 'Dataset ID cannot be empty' }, { status: 400 });
}
if (!model) {
return NextResponse.json({ error: 'Model cannot be empty' }, { status: 400 });
}
if (!advice) {
return NextResponse.json({ error: 'Please provide optimization suggestions' }, { status: 400 });
}
// 获取数据集内容
const dataset = await getDatasetsById(datasetId);
if (!dataset) {
return NextResponse.json({ error: 'Dataset does not exist' }, { status: 404 });
}
// 创建LLM客户端
const llmClient = new LLMClient(model);
const { question, answer, cot, chunkContent: storedChunkContent, questionId } = dataset;
let chunkContent = storedChunkContent || '';
if (!chunkContent && questionId) {
try {
const questionRecord = await getQuestionById(questionId);
if (questionRecord?.chunkId) {
const chunkRecord = await getChunkById(questionRecord.chunkId);
chunkContent = chunkRecord?.content || '';
}
} catch (error) {
console.error('Failed to load chunk content by questionId:', error);
}
}
// 生成优化后的答案和思维链
const prompt = await getNewAnswerPrompt(language, { question, answer, cot, advice, chunkContent }, projectId);
const response = await llmClient.getResponse(prompt);
// 从LLM输出中提取JSON格式的优化结果
const optimizedResult = extractJsonFromLLMOutput(response);
if (!optimizedResult || !optimizedResult.answer) {
return NextResponse.json({ error: 'Failed to optimize answer, please try again' }, { status: 500 });
}
// 更新数据集
const updatedDataset = {
...dataset,
answer: optimizedResult.answer,
cot: cot ? optimizedResult.cot || cot : '' // 如果没有提供思考过程,则不更新
};
await updateDataset(updatedDataset);
// 返回优化后的数据集
return NextResponse.json({
success: true,
dataset: updatedDataset
});
} catch (error) {
console.error('Failed to optimize answer:', String(error));
return NextResponse.json({ error: error.message || 'Failed to optimize answer' }, { status: 500 });
}
}

View File

@@ -0,0 +1,193 @@
import { NextResponse } from 'next/server';
import {
deleteDataset,
getDatasetsByPagination,
getDatasetsIds,
getDatasetsById,
updateDataset
} from '@/lib/db/datasets';
import datasetService from '@/lib/services/datasets';
// 优化思维链函数已移至服务层
/**
* 生成数据集(为单个问题生成答案)
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const { questionId, model, language } = await request.json();
// 使用数据集生成服务
const result = await datasetService.generateDatasetForQuestion(projectId, questionId, {
model,
language
});
return NextResponse.json(result);
} catch (error) {
console.error('Failed to generate dataset:', String(error));
return NextResponse.json(
{
error: error.message || 'Failed to generate dataset'
},
{ status: 500 }
);
}
}
/**
* 获取项目的所有数据集
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
}
const page = parseInt(searchParams.get('page')) || 1;
const size = parseInt(searchParams.get('size')) || 10;
const input = searchParams.get('input');
const field = searchParams.get('field') || 'question';
const status = searchParams.get('status');
const hasCot = searchParams.get('hasCot');
const isDistill = searchParams.get('isDistill');
const scoreRange = searchParams.get('scoreRange');
const customTag = searchParams.get('customTag');
const noteKeyword = searchParams.get('noteKeyword');
const chunkName = searchParams.get('chunkName');
let confirmed = undefined;
if (status === 'confirmed') confirmed = true;
if (status === 'unconfirmed') confirmed = false;
let selectedAll = searchParams.get('selectedAll');
if (selectedAll) {
let data = await getDatasetsIds(
projectId,
confirmed,
input,
field,
hasCot,
isDistill,
scoreRange,
customTag,
noteKeyword,
chunkName
);
return NextResponse.json(data);
}
// 获取数据集
const datasets = await getDatasetsByPagination(
projectId,
page,
size,
confirmed,
input,
field, // 传递搜索字段参数
hasCot, // 传递思维链筛选参数
isDistill, // 传递蒸馏数据集筛选参数
scoreRange, // 传递评分范围筛选参数
customTag, // 传递自定义标签筛选参数
noteKeyword, // 传递备注关键字筛选参数
chunkName // 传递文本块名称筛选参数
);
return NextResponse.json(datasets);
} catch (error) {
console.error('获取数据集失败:', String(error));
return NextResponse.json(
{
error: error.message || '获取数据集失败'
},
{ status: 500 }
);
}
}
/**
* 删除数据集
*/
export async function DELETE(request) {
try {
const { searchParams } = new URL(request.url);
const datasetId = searchParams.get('id');
if (!datasetId) {
return NextResponse.json(
{
error: 'Dataset ID cannot be empty'
},
{ status: 400 }
);
}
await deleteDataset(datasetId);
return NextResponse.json({
success: true,
message: 'Dataset deleted successfully'
});
} catch (error) {
console.error('Failed to delete dataset:', error);
return NextResponse.json(
{
error: error.message || 'Failed to delete dataset'
},
{ status: 500 }
);
}
}
/**
* 编辑数据集
*/
export async function PATCH(request) {
try {
const { searchParams } = new URL(request.url);
const datasetId = searchParams.get('id');
const { answer, cot, question, confirmed } = await request.json();
if (!datasetId) {
return NextResponse.json(
{
error: 'Dataset ID cannot be empty'
},
{ status: 400 }
);
}
// 获取所有数据集
let dataset = await getDatasetsById(datasetId);
if (!dataset) {
return NextResponse.json(
{
error: 'Dataset does not exist'
},
{ status: 404 }
);
}
let data = { id: datasetId };
if (confirmed !== undefined) data.confirmed = confirmed;
if (answer) data.answer = answer;
if (cot) data.cot = cot;
if (question) data.question = question;
// 保存更新后的数据集列表
await updateDataset(data);
return NextResponse.json({
success: true,
message: 'Dataset updated successfully',
dataset: dataset
});
} catch (error) {
console.error('Failed to update dataset:', String(error));
return NextResponse.json(
{
error: error.message || 'Failed to update dataset'
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,28 @@
import { NextResponse } from 'next/server';
import { getUsedCustomTags } from '@/lib/db/datasets';
/**
* 获取项目中使用过的自定义标签
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
}
const tags = await getUsedCustomTags(projectId);
return NextResponse.json({ tags });
} catch (error) {
console.error('获取自定义标签失败:', String(error));
return NextResponse.json(
{
error: error.message || '获取自定义标签失败'
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,38 @@
import { NextResponse } from 'next/server';
// 获取默认提示词内容
export async function GET(request, { params }) {
try {
const { searchParams } = new URL(request.url);
const promptType = searchParams.get('promptType');
const promptKey = searchParams.get('promptKey');
if (!promptType || !promptKey) {
return NextResponse.json({ error: 'promptType and promptKey are required' }, { status: 400 });
}
// 动态导入对应的提示词模块
let promptModule;
try {
promptModule = await import(`@/lib/llm/prompts/${promptType}`);
} catch (error) {
return NextResponse.json({ error: `Prompt module ${promptType} not found` }, { status: 404 });
}
// 获取指定的提示词常量
const promptContent = promptModule[promptKey];
if (!promptContent) {
return NextResponse.json({ error: `Prompt key ${promptKey} not found in module ${promptType}` }, { status: 404 });
}
return NextResponse.json({
success: true,
content: promptContent,
promptType,
promptKey
});
} catch (error) {
console.error('获取默认提示词失败:', error);
return NextResponse.json({ error: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,67 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db';
/**
* 根据标签ID获取问题列表
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
const tagId = searchParams.get('tagId');
// 验证参数
if (!projectId) {
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
}
if (!tagId) {
return NextResponse.json({ error: '标签ID不能为空' }, { status: 400 });
}
// 获取标签信息
const tag = await db.tags.findUnique({
where: { id: tagId }
});
if (!tag) {
return NextResponse.json({ error: '标签不存在' }, { status: 404 });
}
// 获取或创建蒸馏文本块
let distillChunk = await db.chunks.findFirst({
where: {
projectId,
name: 'Distilled Content'
}
});
if (!distillChunk) {
// 创建一个特殊的蒸馏文本块
distillChunk = await db.chunks.create({
data: {
name: 'Distilled Content',
projectId,
fileId: 'distilled',
fileName: 'distilled.md',
content:
'This text block is used to store questions generated through data distillation and is not related to actual literature.',
summary: 'Questions generated through data distillation',
size: 0
}
});
}
const questions = await db.questions.findMany({
where: {
projectId,
label: tag.label,
chunkId: distillChunk.id
}
});
return NextResponse.json(questions);
} catch (error) {
console.error('[distill/questions/by-tag] 获取问题失败:', String(error));
return NextResponse.json({ error: error.message || '获取问题失败' }, { status: 500 });
}
}

View File

@@ -0,0 +1,101 @@
import { NextResponse } from 'next/server';
import { distillQuestionsPrompt } from '@/lib/llm/prompts/distillQuestions';
import { db } from '@/lib/db';
const LLMClient = require('@/lib/llm/core');
/**
* 生成问题接口:根据某个标签链路构造指定数量的问题
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
}
const { tagPath, currentTag, tagId, count = 5, model, language = 'zh' } = await request.json();
if (!currentTag || !tagPath) {
const errorMsg = language === 'en' ? 'Tag information cannot be empty' : '标签信息不能为空';
return NextResponse.json({ error: errorMsg }, { status: 400 });
}
// 首先获取或创建蒸馏文本块
let distillChunk = await db.chunks.findFirst({
where: {
projectId,
name: 'Distilled Content'
}
});
if (!distillChunk) {
// 创建一个特殊的蒸馏文本块
distillChunk = await db.chunks.create({
data: {
name: 'Distilled Content',
projectId,
fileId: 'distilled',
fileName: 'distilled.md',
content:
'This text block is used to store questions generated through data distillation and is not related to actual literature.',
summary: 'Questions generated through data distillation',
size: 0
}
});
}
// 获取已有的问题,避免重复
const existingQuestions = await db.questions.findMany({
where: {
projectId,
label: currentTag,
chunkId: distillChunk.id // 使用蒸馏文本块的 ID
},
select: { question: true }
});
const existingQuestionTexts = existingQuestions.map(q => q.question);
const llmClient = new LLMClient(model);
const prompt = await distillQuestionsPrompt(
language,
{ tagPath, currentTag, count, existingQuestionTexts },
projectId
);
const { answer } = await llmClient.getResponseWithCOT(prompt);
let questions = [];
try {
questions = JSON.parse(answer);
} catch (error) {
console.error('解析问题JSON失败:', String(error));
// 尝试使用正则表达式提取问题
const matches = answer.match(/"([^"]+)"/g);
if (matches) {
questions = matches.map(match => match.replace(/"/g, ''));
}
}
// 保存问题到数据库
const savedQuestions = [];
for (const questionText of questions) {
const question = await db.questions.create({
data: {
question: questionText,
projectId,
label: currentTag,
chunkId: distillChunk.id
}
});
savedQuestions.push(question);
}
return NextResponse.json(savedQuestions);
} catch (error) {
console.error('生成问题失败:', String(error));
return NextResponse.json({ error: error.message || '生成问题失败' }, { status: 500 });
}
}

View File

@@ -0,0 +1,61 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db';
/**
* 更新标签接口
*/
export async function PUT(request, { params }) {
try {
const { projectId, tagId } = params;
// 验证参数
if (!projectId || !tagId) {
return NextResponse.json({ error: '项目ID和标签ID不能为空' }, { status: 400 });
}
const { label } = await request.json();
if (!label || !label.trim()) {
return NextResponse.json({ error: '标签名称不能为空' }, { status: 400 });
}
// 检查标签是否存在
const existingTag = await db.tags.findUnique({
where: { id: tagId }
});
if (!existingTag) {
return NextResponse.json({ error: '标签不存在' }, { status: 404 });
}
// 检查项目ID是否匹配
if (existingTag.projectId !== projectId) {
return NextResponse.json({ error: '无权限编辑此标签' }, { status: 403 });
}
// 检查新标签名称是否已存在(同级标签)
const duplicateTag = await db.tags.findFirst({
where: {
projectId,
label: label.trim(),
parentId: existingTag.parentId,
id: { not: tagId }
}
});
if (duplicateTag) {
return NextResponse.json({ error: '同级标签名称已存在' }, { status: 400 });
}
// 更新标签
const updatedTag = await db.tags.update({
where: { id: tagId },
data: { label: label.trim() }
});
return NextResponse.json(updatedTag);
} catch (error) {
console.error('[标签编辑] 更新标签失败:', String(error));
return NextResponse.json({ error: error.message || '更新标签失败' }, { status: 500 });
}
}

View File

@@ -0,0 +1,31 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db';
/**
* 获取项目的所有蒸馏标签
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
}
// 获取所有标签
const tags = await db.tags.findMany({
where: {
projectId
},
orderBy: {
label: 'asc'
}
});
return NextResponse.json(tags);
} catch (error) {
console.error('获取蒸馏标签失败:', String(error));
return NextResponse.json({ error: error.message || '获取蒸馏标签失败' }, { status: 500 });
}
}

View File

@@ -0,0 +1,88 @@
import { NextResponse } from 'next/server';
import { distillTagsPrompt } from '@/lib/llm/prompts/distillTags';
import { db } from '@/lib/db';
import { getProject } from '@/lib/db/projects';
const LLMClient = require('@/lib/llm/core');
/**
* 生成标签接口:根据顶级主题、某级标签构造指定数量的子标签
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
}
const { parentTag, parentTagId, tagPath, count = 10, model, language = 'zh' } = await request.json();
if (!parentTag) {
const errorMsg = language === 'en' ? 'Topic tag name cannot be empty' : '主题标签名称不能为空';
return NextResponse.json({ error: errorMsg }, { status: 400 });
}
// 查询现有标签
const existingTags = await db.tags.findMany({
where: {
projectId,
parentId: parentTagId || null
}
});
const existingTagNames = existingTags.map(tag => tag.label);
// 创建LLM客户端
const llmClient = new LLMClient(model);
// 生成提示词
const prompt = await distillTagsPrompt(
language,
{ tagPath, parentTag, existingTags: existingTagNames, count },
projectId
);
// 调用大模型生成标签
const { answer } = await llmClient.getResponseWithCOT(prompt);
// 解析返回的标签
let tags = [];
try {
tags = JSON.parse(answer);
} catch (error) {
console.error('解析标签JSON失败:', String(error));
// 尝试使用正则表达式提取标签
const matches = answer.match(/"([^"]+)"/g);
if (matches) {
tags = matches.map(match => match.replace(/"/g, ''));
}
}
// 保存标签到数据库
const savedTags = [];
for (let i = 0; i < tags.length; i++) {
const tagName = tags[i];
try {
const tag = await db.tags.create({
data: {
label: tagName,
projectId,
parentId: parentTagId || null
}
});
savedTags.push(tag);
} catch (error) {
console.error(`[标签生成] 保存标签 ${tagName} 失败:`, String(error));
throw error;
}
}
return NextResponse.json(savedTags);
} catch (error) {
console.error('[标签生成] 生成标签失败:', String(error));
console.error('[标签生成] 错误堆栈:', error.stack);
return NextResponse.json({ error: error.message || '生成标签失败' }, { status: 500 });
}
}

View File

@@ -0,0 +1,108 @@
import { NextResponse } from 'next/server';
import { getEvalQuestionById, updateEvalQuestion, deleteEvalQuestion } from '@/lib/db/evalDatasets';
import { db } from '@/lib/db/index';
/**
* Get evaluation dataset details by ID
* Supports operateType=prev|next to navigate neighbors
*/
export async function GET(request, { params }) {
try {
const { projectId, evalId } = params;
const { searchParams } = new URL(request.url);
const operateType = searchParams.get('operateType');
// Navigation request (prev/next)
if (operateType) {
const current = await db.evalDatasets.findUnique({
where: { id: evalId },
select: { createAt: true }
});
if (!current) {
return NextResponse.json(null);
}
let neighbor = null;
if (operateType === 'prev') {
// Get previous item (newer createAt when list is sorted desc)
neighbor = await db.evalDatasets.findFirst({
where: {
projectId,
createAt: { gt: current.createAt }
},
orderBy: { createAt: 'asc' },
select: { id: true }
});
} else if (operateType === 'next') {
// Get next item (older createAt)
neighbor = await db.evalDatasets.findFirst({
where: {
projectId,
createAt: { lt: current.createAt }
},
orderBy: { createAt: 'desc' },
select: { id: true }
});
}
return NextResponse.json(neighbor || null);
}
// Regular detail request
const evalQuestion = await getEvalQuestionById(evalId);
if (!evalQuestion) {
return NextResponse.json({ error: 'Eval question not found' }, { status: 404 });
}
return NextResponse.json(evalQuestion);
} catch (error) {
console.error('Failed to get eval question:', error);
return NextResponse.json({ error: error.message || 'Failed to get eval question' }, { status: 500 });
}
}
/**
* Update evaluation dataset
*/
export async function PUT(request, { params }) {
try {
const { evalId } = params;
const data = await request.json();
// Only allow specific fields
const allowedFields = ['question', 'options', 'correctAnswer', 'tags', 'note'];
const updateData = {};
for (const field of allowedFields) {
if (data[field] !== undefined) {
updateData[field] = data[field];
}
}
const updated = await updateEvalQuestion(evalId, updateData);
return NextResponse.json(updated);
} catch (error) {
console.error('Failed to update eval question:', error);
return NextResponse.json({ error: error.message || 'Failed to update eval question' }, { status: 500 });
}
}
/**
* Delete evaluation dataset
*/
export async function DELETE(request, { params }) {
try {
const { evalId } = params;
await deleteEvalQuestion(evalId);
return NextResponse.json({ success: true });
} catch (error) {
console.error('Failed to delete eval question:', error);
return NextResponse.json({ error: error.message || 'Failed to delete eval question' }, { status: 500 });
}
}

View File

@@ -0,0 +1,63 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db';
import { buildEvalQuestionWhere } from '@/lib/db/evalDatasets';
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
const questionType = searchParams.get('questionType') || '';
const keyword = searchParams.get('keyword') || '';
const chunkId = searchParams.get('chunkId') || '';
const questionTypes = searchParams.getAll('questionTypes') || [];
const tags =
searchParams.getAll('tags').length > 0
? searchParams.getAll('tags')
: searchParams.get('tag')
? searchParams.get('tag').split(',')
: [];
const where = buildEvalQuestionWhere(projectId, {
questionType: questionType || undefined,
questionTypes: questionTypes.length > 0 ? questionTypes : undefined,
keyword: keyword || undefined,
chunkId: chunkId || undefined,
tags: tags.length > 0 ? tags : undefined
});
const [total, byTypeRaw] = await Promise.all([
db.evalDatasets.count({ where }),
db.evalDatasets.groupBy({
by: ['questionType'],
where,
_count: { id: true }
})
]);
const byType = {};
byTypeRaw.forEach(item => {
byType[item.questionType] = item._count.id;
});
const hasShortAnswer = (byType.short_answer || 0) > 0;
const hasOpenEnded = (byType.open_ended || 0) > 0;
const hasSubjective = hasShortAnswer || hasOpenEnded;
return NextResponse.json(
{
code: 0,
data: { total, byType, hasSubjective, hasShortAnswer, hasOpenEnded }
},
{ status: 200 }
);
} catch (error) {
console.error('Failed to count eval datasets:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to count eval datasets', message: error.message },
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,231 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
import { buildEvalQuestionWhere } from '@/lib/db/evalDatasets';
const BATCH_SIZE = 500;
/**
* Convert an evaluation item to a CSV row
*/
function convertToCSVRow(item, isHeader = false) {
if (isHeader) {
return ['questionType', 'question', 'options', 'correctAnswer', 'tags'].join(',');
}
const escapeCSV = str => {
if (str === null || str === undefined) return '';
const strValue = String(str);
if (strValue.includes(',') || strValue.includes('"') || strValue.includes('\n')) {
return `"${strValue.replace(/"/g, '""')}"`;
}
return strValue;
};
return [
escapeCSV(item.questionType),
escapeCSV(item.question),
escapeCSV(item.options),
escapeCSV(item.correctAnswer),
escapeCSV(item.tags)
].join(',');
}
/**
* Convert an evaluation item to export format
*/
function formatExportItem(item) {
return {
questionType: item.questionType,
question: item.question,
options: item.options,
correctAnswer: item.correctAnswer,
tags: item.tags
};
}
/**
* Export evaluation datasets
* Supports JSON, JSONL, and CSV
* Uses batched streaming for large datasets
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const body = await request.json();
const {
format = 'json', // json | jsonl | csv
questionTypes = [],
tags = [],
keyword = ''
} = body;
// Validate format
if (!['json', 'jsonl', 'csv'].includes(format)) {
return NextResponse.json({ code: 400, error: 'Unsupported export format' }, { status: 400 });
}
// Build query conditions
const where = buildEvalQuestionWhere(projectId, {
questionTypes: questionTypes.length > 0 ? questionTypes : undefined,
tags: tags.length > 0 ? tags : undefined,
keyword: keyword || undefined
});
// Fetch total count
const total = await db.evalDatasets.count({ where });
if (total === 0) {
return NextResponse.json({ code: 400, error: 'No data matches the criteria' }, { status: 400 });
}
// Return directly for small datasets
if (total <= 1000) {
const items = await db.evalDatasets.findMany({
where,
orderBy: { createAt: 'desc' }
});
const formattedItems = items.map(formatExportItem);
if (format === 'json') {
return new Response(JSON.stringify(formattedItems, null, 2), {
headers: {
'Content-Type': 'application/json',
'Content-Disposition': `attachment; filename="eval-datasets-${Date.now()}.json"`
}
});
}
if (format === 'jsonl') {
const jsonlContent = formattedItems.map(item => JSON.stringify(item)).join('\n');
return new Response(jsonlContent, {
headers: {
'Content-Type': 'application/x-ndjson',
'Content-Disposition': `attachment; filename="eval-datasets-${Date.now()}.jsonl"`
}
});
}
if (format === 'csv') {
const csvContent = [convertToCSVRow(null, true), ...items.map(item => convertToCSVRow(item))].join('\n');
return new Response('\uFEFF' + csvContent, {
headers: {
'Content-Type': 'text/csv; charset=utf-8',
'Content-Disposition': `attachment; filename="eval-datasets-${Date.now()}.csv"`
}
});
}
}
// Stream export for large datasets
const stream = new ReadableStream({
async start(controller) {
const encoder = new TextEncoder();
let isFirst = true;
// CSV outputs header row first
if (format === 'csv') {
controller.enqueue(encoder.encode('\uFEFF' + convertToCSVRow(null, true) + '\n'));
}
// JSON outputs opening bracket
if (format === 'json') {
controller.enqueue(encoder.encode('[\n'));
}
// Fetch data in batches
const totalBatches = Math.ceil(total / BATCH_SIZE);
for (let batch = 0; batch < totalBatches; batch++) {
const items = await db.evalDatasets.findMany({
where,
orderBy: { createAt: 'desc' },
skip: batch * BATCH_SIZE,
take: BATCH_SIZE
});
for (const item of items) {
const formattedItem = formatExportItem(item);
if (format === 'json') {
const prefix = isFirst ? '' : ',\n';
controller.enqueue(encoder.encode(prefix + JSON.stringify(formattedItem)));
isFirst = false;
} else if (format === 'jsonl') {
controller.enqueue(encoder.encode(JSON.stringify(formattedItem) + '\n'));
} else if (format === 'csv') {
controller.enqueue(encoder.encode(convertToCSVRow(item) + '\n'));
}
}
}
// JSON outputs closing bracket
if (format === 'json') {
controller.enqueue(encoder.encode('\n]'));
}
controller.close();
}
});
const contentTypes = {
json: 'application/json',
jsonl: 'application/x-ndjson',
csv: 'text/csv; charset=utf-8'
};
const extensions = {
json: 'json',
jsonl: 'jsonl',
csv: 'csv'
};
return new Response(stream, {
headers: {
'Content-Type': contentTypes[format],
'Content-Disposition': `attachment; filename="eval-datasets-${Date.now()}.${extensions[format]}"`,
'Transfer-Encoding': 'chunked'
}
});
} catch (error) {
console.error('Failed to export eval datasets:', error);
return NextResponse.json({ code: 500, error: error.message || 'Export failed' }, { status: 500 });
}
}
/**
* Get export preview (count only)
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
// Parse query params
const questionTypes = searchParams.getAll('questionTypes');
const tags = searchParams.getAll('tags');
const keyword = searchParams.get('keyword') || '';
// Build query conditions
const where = buildEvalQuestionWhere(projectId, {
questionTypes: questionTypes.length > 0 ? questionTypes : undefined,
tags: tags.length > 0 ? tags : undefined,
keyword: keyword || undefined
});
// Count rows
const total = await db.evalDatasets.count({ where });
return NextResponse.json({
code: 0,
data: {
total,
isLargeDataset: total > 1000
}
});
} catch (error) {
console.error('Failed to get export preview:', error);
return NextResponse.json({ code: 500, error: error.message || 'Failed to get export preview' }, { status: 500 });
}
}

View File

@@ -0,0 +1,380 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
import { nanoid } from 'nanoid';
import * as XLSX from 'xlsx';
/**
* Validate true/false item schema
*/
function validateTrueFalse(item, index) {
const errors = [];
if (!item.question || typeof item.question !== 'string') {
errors.push(`Item ${index + 1}: missing or invalid "question"`);
}
if (!item.correctAnswer || (item.correctAnswer !== '✅' && item.correctAnswer !== '❌')) {
errors.push(`Item ${index + 1}: "correctAnswer" must be "✅" or "❌"`);
}
return errors;
}
/**
* Validate single-choice item schema
*/
function validateSingleChoice(item, index) {
const errors = [];
if (!item.question || typeof item.question !== 'string') {
errors.push(`Item ${index + 1}: missing or invalid "question"`);
}
// Normalize options
let options = item.options;
if (typeof options === 'string') {
try {
options = JSON.parse(options);
} catch (e) {
errors.push(`Item ${index + 1}: invalid "options" format; unable to parse`);
return errors;
}
}
if (!options || !Array.isArray(options) || options.length < 2) {
errors.push(`Item ${index + 1}: "options" must be an array with at least 2 items`);
}
if (!item.correctAnswer || !/^[A-Z]$/.test(item.correctAnswer)) {
errors.push(`Item ${index + 1}: "correctAnswer" must be a single uppercase letter (A-Z)`);
}
return errors;
}
/**
* Validate multiple-choice item schema
*/
function validateMultipleChoice(item, index) {
const errors = [];
if (!item.question || typeof item.question !== 'string') {
errors.push(`Item ${index + 1}: missing or invalid "question"`);
}
// Normalize options
let options = item.options;
if (typeof options === 'string') {
try {
options = JSON.parse(options);
} catch (e) {
errors.push(`Item ${index + 1}: invalid "options" format; unable to parse`);
return errors;
}
}
if (!options || !Array.isArray(options) || options.length < 2) {
errors.push(`Item ${index + 1}: "options" must be an array with at least 2 items`);
}
// Normalize correctAnswer
let correctAnswer = item.correctAnswer;
if (typeof correctAnswer === 'string') {
try {
correctAnswer = JSON.parse(correctAnswer);
} catch (e) {
errors.push(`Item ${index + 1}: invalid "correctAnswer" format; unable to parse`);
return errors;
}
}
if (!correctAnswer || !Array.isArray(correctAnswer) || correctAnswer.length < 1) {
errors.push(`Item ${index + 1}: "correctAnswer" must be an array with at least 1 item`);
}
// Validate each answer token
if (Array.isArray(correctAnswer)) {
for (const ans of correctAnswer) {
if (!/^[A-Z]$/.test(ans)) {
errors.push(`Item ${index + 1}: "${ans}" is not a valid option letter in "correctAnswer"`);
}
}
}
return errors;
}
/**
* Validate QA item schema (short_answer and open_ended)
*/
function validateQA(item, index) {
const errors = [];
if (!item.question || typeof item.question !== 'string') {
errors.push(`Item ${index + 1}: missing or invalid "question"`);
}
if (!item.correctAnswer || typeof item.correctAnswer !== 'string') {
errors.push(`Item ${index + 1}: missing or invalid "correctAnswer"`);
}
return errors;
}
/**
* Validate data by question type
*/
function validateData(data, questionType) {
const allErrors = [];
for (let i = 0; i < data.length; i++) {
let errors = [];
switch (questionType) {
case 'true_false':
errors = validateTrueFalse(data[i], i);
break;
case 'single_choice':
errors = validateSingleChoice(data[i], i);
break;
case 'multiple_choice':
errors = validateMultipleChoice(data[i], i);
break;
case 'short_answer':
case 'open_ended':
errors = validateQA(data[i], i);
break;
default:
errors = [`Unsupported question type: ${questionType}`];
}
allErrors.push(...errors);
}
return allErrors;
}
/**
* Parse an Excel file
*/
function parseExcel(buffer, questionType) {
const excelHeaders = {
question: '\u9898\u76ee',
correctAnswer: '\u6b63\u786e\u7b54\u6848',
answer: '\u7b54\u6848',
options: '\u9009\u9879'
};
const workbook = XLSX.read(buffer, { type: 'buffer' });
const sheetName = workbook.SheetNames[0];
const sheet = workbook.Sheets[sheetName];
const rawData = XLSX.utils.sheet_to_json(sheet, { defval: '' });
// Convert to normalized schema
const data = rawData.map(row => {
const item = {
question: row.question || row[excelHeaders.question] || '',
correctAnswer: row.correctAnswer || row[excelHeaders.correctAnswer] || row[excelHeaders.answer] || ''
};
// Handle options (choice questions)
if (questionType === 'single_choice' || questionType === 'multiple_choice') {
// Try to parse from options column
if (row.options || row[excelHeaders.options]) {
let optionsStr = (row.options || row[excelHeaders.options]).trim();
// Replace single quotes so it becomes valid JSON
if (optionsStr.startsWith('[') && optionsStr.includes("'")) {
optionsStr = optionsStr.replace(/'/g, '"');
}
try {
// Try JSON parsing
item.options = JSON.parse(optionsStr);
} catch {
// Fallback: split by separators
item.options = optionsStr
.split(/[,;|]/)
.map(o => o.trim())
.filter(Boolean);
}
}
}
// Handle multiple-choice correctAnswer
if (questionType === 'multiple_choice') {
if (typeof item.correctAnswer === 'string') {
let answerStr = item.correctAnswer.trim();
// Replace single quotes so it becomes valid JSON
if (answerStr.startsWith('[') && answerStr.includes("'")) {
answerStr = answerStr.replace(/'/g, '"');
}
// Try JSON parsing
try {
item.correctAnswer = JSON.parse(answerStr);
} catch {
// Split string such as "A,B,C" or "ABC"
if (answerStr.includes(',') || answerStr.includes('')) {
item.correctAnswer = answerStr.split(/[,]/).map(a => a.trim().toUpperCase());
} else {
// Split characters such as "ABC" -> ["A", "B", "C"]
item.correctAnswer = answerStr
.toUpperCase()
.split('')
.filter(c => /[A-Z]/.test(c));
}
}
}
}
return item;
});
return data;
}
/**
* Parse a JSON file
*/
function parseJSON(content) {
return JSON.parse(content);
}
/**
* POST - Import evaluation datasets
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const formData = await request.formData();
const file = formData.get('file');
const questionType = formData.get('questionType');
const tags = formData.get('tags') || '';
console.log(`[Import] Start processing. Project: ${projectId}, questionType: ${questionType}, tags: ${tags}`);
if (!file) {
return NextResponse.json({ code: 400, error: 'Please upload a file' }, { status: 400 });
}
if (!questionType) {
return NextResponse.json({ code: 400, error: 'Please select a question type' }, { status: 400 });
}
// Validate question type
const validTypes = ['true_false', 'single_choice', 'multiple_choice', 'short_answer', 'open_ended'];
if (!validTypes.includes(questionType)) {
return NextResponse.json({ code: 400, error: `Unsupported question type: ${questionType}` }, { status: 400 });
}
// Get file extension
const fileName = file.name;
const fileExt = fileName.split('.').pop().toLowerCase();
console.log(`[Import] File name: ${fileName}, extension: ${fileExt}`);
// Validate file type
if (!['json', 'xls', 'xlsx'].includes(fileExt)) {
return NextResponse.json(
{ code: 400, error: 'Unsupported file format. Please upload a json, xls, or xlsx file' },
{ status: 400 }
);
}
// Read file content
const buffer = await file.arrayBuffer();
let data = [];
// Parse file
console.log('[Import] Parsing file...');
if (fileExt === 'json') {
const content = new TextDecoder().decode(buffer);
data = parseJSON(content);
} else {
data = parseExcel(Buffer.from(buffer), questionType);
}
console.log(`[Import] Parsing completed. Total items: ${data.length}`);
if (!Array.isArray(data) || data.length === 0) {
return NextResponse.json({ code: 400, error: 'File is empty or has an invalid format' }, { status: 400 });
}
// Validate data
console.log('[Import] Validating data...');
const errors = validateData(data, questionType);
if (errors.length > 0) {
console.log(`[Import] Validation failed. Error count: ${errors.length}`);
return NextResponse.json(
{
code: 400,
error: 'Data validation failed',
details: errors.slice(0, 10),
totalErrors: errors.length
},
{ status: 400 }
);
}
console.log('[Import] Validation passed. Writing to database...');
// Prepare data
const now = new Date();
const evalDatasets = data.map(item => {
// Normalize options
let options = item.options;
if (typeof options === 'string') {
try {
options = JSON.parse(options);
} catch (e) {
// Keep original on parse failure
}
}
// Normalize correctAnswer
let correctAnswer = item.correctAnswer;
if (typeof correctAnswer === 'string' && questionType === 'multiple_choice') {
try {
correctAnswer = JSON.parse(correctAnswer);
} catch (e) {
// Keep original on parse failure
}
}
return {
id: nanoid(),
projectId,
question: item.question,
questionType,
options: options ? JSON.stringify(options) : '',
// For multiple_choice, store correctAnswer as JSON array string
correctAnswer: Array.isArray(correctAnswer) ? JSON.stringify(correctAnswer) : correctAnswer,
tags: tags || '',
note: '',
createAt: now,
updateAt: now
};
});
// Batch insert
const batchSize = 100;
let insertedCount = 0;
for (let i = 0; i < evalDatasets.length; i += batchSize) {
const batch = evalDatasets.slice(i, i + batchSize);
await db.evalDatasets.createMany({ data: batch });
insertedCount += batch.length;
console.log(`[Import] Inserted ${insertedCount}/${evalDatasets.length} items`);
}
console.log(`[Import] Import completed. Total inserted: ${insertedCount}`);
return NextResponse.json({
code: 0,
data: {
total: insertedCount,
questionType,
tags
},
message: `Successfully imported ${insertedCount} evaluation items`
});
} catch (error) {
console.error('[Import] Import failed:', error);
return NextResponse.json(
{
code: 500,
error: 'Import failed',
message: error.message
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,164 @@
import { NextResponse } from 'next/server';
import { getEvalQuestionsWithPagination, getEvalQuestionsStats, deleteEvalQuestion } from '@/lib/db/evalDatasets';
/**
* Get project's evaluation dataset list (paginated)
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
// Parse query params
const page = parseInt(searchParams.get('page') || '1', 10);
const pageSize = parseInt(searchParams.get('pageSize') || '20', 10);
const questionType = searchParams.get('questionType') || '';
const questionTypes = searchParams.getAll('questionTypes');
const keyword = searchParams.get('keyword') || '';
const chunkId = searchParams.get('chunkId') || '';
// Support multiple tags params or comma-separated tag
const tags =
searchParams.getAll('tags').length > 0
? searchParams.getAll('tags')
: searchParams.get('tag')
? searchParams.get('tag').split(',')
: [];
const includeStats = searchParams.get('includeStats') === 'true';
const queryOptions = {
page,
pageSize,
questionType: questionType || undefined,
questionTypes: questionTypes.length > 0 ? questionTypes : undefined,
keyword: keyword || undefined,
chunkId: chunkId || undefined,
tags: tags.length > 0 ? tags : undefined
};
if (includeStats) {
const [result, stats] = await Promise.all([
getEvalQuestionsWithPagination(projectId, queryOptions),
getEvalQuestionsStats(projectId)
]);
result.stats = stats;
return NextResponse.json(result);
}
const result = await getEvalQuestionsWithPagination(projectId, queryOptions);
return NextResponse.json(result);
} catch (error) {
console.error('Failed to get eval datasets:', error);
return NextResponse.json({ error: error.message || 'Failed to get eval datasets' }, { status: 500 });
}
}
/**
* Batch delete evaluation datasets
*/
export async function DELETE(request, { params }) {
try {
const { ids } = await request.json();
if (!ids || !Array.isArray(ids) || ids.length === 0) {
return NextResponse.json({ error: 'Invalid request: ids array is required' }, { status: 400 });
}
const results = await Promise.all(ids.map(id => deleteEvalQuestion(id).catch(err => ({ error: err.message, id }))));
const deleted = results.filter(r => !r.error).length;
const failed = results.filter(r => r.error).length;
return NextResponse.json({
success: true,
deleted,
failed,
message: `Successfully deleted ${deleted} items${failed > 0 ? `, ${failed} failed` : ''}`
});
} catch (error) {
console.error('Failed to delete eval datasets:', error);
return NextResponse.json({ error: error.message || 'Failed to delete eval datasets' }, { status: 500 });
}
}
/**
* Create a new evaluation dataset (or batch create)
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const body = await request.json();
const { createEvalQuestion, createManyEvalQuestions } = require('@/lib/db/evalDatasets');
// Handle batch creation
if (Array.isArray(body) || (body.items && Array.isArray(body.items))) {
const items = Array.isArray(body) ? body : body.items;
if (items.length === 0) {
return NextResponse.json({ success: true, count: 0 });
}
// Validate items
const validItems = items
.map(item => {
// 确保标签格式正确: 数组转为逗号分隔字符串
let tagsStr = item.tags || '';
if (Array.isArray(tagsStr)) {
tagsStr = tagsStr.join(',');
}
return {
projectId,
question: item.question,
questionType: item.questionType || 'open_ended',
correctAnswer:
typeof item.correctAnswer === 'object' ? JSON.stringify(item.correctAnswer) : item.correctAnswer,
tags: tagsStr,
note: item.note || '',
chunkId: item.chunkId || null,
options: item.options
? typeof item.options === 'object'
? JSON.stringify(item.options)
: item.options
: ''
};
})
.filter(item => item.question && item.correctAnswer);
if (validItems.length === 0) {
return NextResponse.json({ error: 'No valid items to create' }, { status: 400 });
}
const result = await createManyEvalQuestions(validItems);
return NextResponse.json({ success: true, count: result.count });
}
// Handle single creation
const { question, correctAnswer, questionType = 'open_ended', tags, note, chunkId, options } = body;
if (!question || !correctAnswer) {
return NextResponse.json({ error: 'Question and Correct Answer are required' }, { status: 400 });
}
// 确保标签格式正确: 数组转为逗号分隔字符串
let tagsStr = tags || '';
if (Array.isArray(tagsStr)) {
tagsStr = tagsStr.join(',');
}
const evalDataset = await createEvalQuestion({
projectId,
question,
questionType,
correctAnswer: typeof correctAnswer === 'object' ? JSON.stringify(correctAnswer) : correctAnswer,
tags: tagsStr,
note: note || '',
chunkId: chunkId || null,
options: options ? (typeof options === 'object' ? JSON.stringify(options) : options) : ''
});
return NextResponse.json({ success: true, evalDataset });
} catch (error) {
console.error('Failed to create eval dataset:', error);
return NextResponse.json({ error: error.message || 'Failed to create eval dataset' }, { status: 500 });
}
}

View File

@@ -0,0 +1,124 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db';
import { buildEvalQuestionWhere } from '@/lib/db/evalDatasets';
const SMALL_TOTAL_THRESHOLD = 5000;
const HARD_LIMIT = 50000;
function shuffleArray(arr) {
const result = [...arr];
for (let i = result.length - 1; i > 0; i -= 1) {
const j = Math.floor(Math.random() * (i + 1));
[result[i], result[j]] = [result[j], result[i]];
}
return result;
}
export async function POST(request, { params }) {
try {
const { projectId } = params;
const body = await request.json();
const {
questionType = '',
questionTypes = [],
keyword = '',
chunkId = '',
tags = [],
limit = 0,
strategy = 'random'
} = body || {};
const where = buildEvalQuestionWhere(projectId, {
questionType: questionType || undefined,
questionTypes: Array.isArray(questionTypes) && questionTypes.length > 0 ? questionTypes : undefined,
keyword: keyword || undefined,
chunkId: chunkId || undefined,
tags: Array.isArray(tags) && tags.length > 0 ? tags : undefined
});
const total = await db.evalDatasets.count({ where });
if (total === 0) {
return NextResponse.json(
{
code: 0,
data: {
total: 0,
selectedCount: 0,
ids: [],
strategyUsed: strategy
}
},
{ status: 200 }
);
}
let normalizedLimit = typeof limit === 'number' && limit > 0 ? Math.min(limit, HARD_LIMIT) : HARD_LIMIT;
if (normalizedLimit >= total) {
const items = await db.evalDatasets.findMany({
where,
select: { id: true },
orderBy: { createAt: 'desc' }
});
const ids = items.map(item => item.id);
return NextResponse.json(
{
code: 0,
data: {
total,
selectedCount: ids.length,
ids,
strategyUsed: total > HARD_LIMIT ? 'top' : strategy
}
},
{ status: 200 }
);
}
let ids = [];
let strategyUsed = strategy;
if (total <= SMALL_TOTAL_THRESHOLD) {
const items = await db.evalDatasets.findMany({
where,
select: { id: true },
orderBy: { createAt: 'desc' }
});
const shuffled = shuffleArray(items);
ids = shuffled.slice(0, normalizedLimit).map(item => item.id);
strategyUsed = 'random-small';
} else {
const items = await db.evalDatasets.findMany({
where,
select: { id: true },
orderBy: { createAt: 'desc' },
take: normalizedLimit
});
ids = items.map(item => item.id);
strategyUsed = 'top-latest';
}
return NextResponse.json(
{
code: 0,
data: {
total,
selectedCount: ids.length,
ids,
strategyUsed
}
},
{ status: 200 }
);
} catch (error) {
console.error('Failed to sample eval datasets:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to sample eval datasets', message: error.message },
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,35 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
/**
* Get all evaluation dataset tags in the project
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
// Fetch tags for all datasets in the project
const datasets = await db.evalDatasets.findMany({
where: { projectId },
select: { tags: true }
});
// Extract and de-duplicate tags
const tagsSet = new Set();
datasets.forEach(dataset => {
if (dataset.tags) {
// Support both English and Chinese commas
const tags = dataset.tags
.split(/[,]/)
.map(t => t.trim())
.filter(Boolean);
tags.forEach(tag => tagsSet.add(tag));
}
});
return NextResponse.json({ tags: Array.from(tagsSet).sort() });
} catch (error) {
console.error('Failed to get tags:', error);
return NextResponse.json({ error: error.message || 'Failed to get tags' }, { status: 500 });
}
}

View File

@@ -0,0 +1,176 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
import { getEvalResultsByTaskId, getEvalResultsStats } from '@/lib/db/evalResults';
/**
* Get evaluation task details and results
*/
export async function GET(request, { params }) {
try {
const { projectId, taskId } = params;
if (!projectId || !taskId) {
return NextResponse.json({ error: 'Project ID and Task ID are required' }, { status: 400 });
}
// Fetch task details
const task = await db.task.findUnique({
where: { id: taskId }
});
if (!task) {
return NextResponse.json({ error: 'Task not found' }, { status: 404 });
}
if (task.projectId !== projectId) {
return NextResponse.json({ error: 'Task does not belong to this project' }, { status: 403 });
}
// Parse task detail fields
let detail = {};
let modelInfo = {};
try {
detail = task.detail ? JSON.parse(task.detail) : {};
modelInfo = task.modelInfo ? JSON.parse(task.modelInfo) : {};
} catch (e) {
console.error('Failed to parse task detail:', e);
}
// Parse query params
const { searchParams } = new URL(request.url);
const page = parseInt(searchParams.get('page') || '1');
const pageSize = parseInt(searchParams.get('pageSize') || '10');
const type = searchParams.get('type') || null;
const isCorrectStr = searchParams.get('isCorrect');
const isCorrect = isCorrectStr === 'true' ? true : isCorrectStr === 'false' ? false : null;
// Fetch results (supports pagination and filters)
const { items: results, total } = await getEvalResultsByTaskId(taskId, {
page,
pageSize,
type,
isCorrect
});
// Fetch stats
const stats = await getEvalResultsStats(taskId);
return NextResponse.json({
code: 0,
data: {
task: {
...task,
detail,
modelInfo
},
results,
total,
page,
pageSize,
stats
}
});
} catch (error) {
console.error('Failed to fetch evaluation task details:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to fetch evaluation task details', message: error.message },
{ status: 500 }
);
}
}
/**
* Delete evaluation task
*/
export async function DELETE(request, { params }) {
try {
const { projectId, taskId } = params;
if (!projectId || !taskId) {
return NextResponse.json({ error: 'Project ID and Task ID are required' }, { status: 400 });
}
// Validate task exists and belongs to this project
const task = await db.task.findUnique({
where: { id: taskId }
});
if (!task) {
return NextResponse.json({ error: 'Task not found' }, { status: 404 });
}
if (task.projectId !== projectId) {
return NextResponse.json({ error: 'Task does not belong to this project' }, { status: 403 });
}
// Delete evaluation results
await db.evalResults.deleteMany({
where: { taskId }
});
// Delete task
await db.task.delete({
where: { id: taskId }
});
return NextResponse.json({
code: 0,
message: 'Deleted'
});
} catch (error) {
console.error('Failed to delete evaluation task:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to delete evaluation task', message: error.message },
{ status: 500 }
);
}
}
/**
* Interrupt evaluation task
*/
export async function PUT(request, { params }) {
try {
const { projectId, taskId } = params;
const data = await request.json();
const { action } = data;
if (!projectId || !taskId) {
return NextResponse.json({ error: 'Project ID and Task ID are required' }, { status: 400 });
}
// Validate task exists and belongs to this project
const task = await db.task.findUnique({
where: { id: taskId }
});
if (!task) {
return NextResponse.json({ error: 'Task not found' }, { status: 404 });
}
if (task.projectId !== projectId) {
return NextResponse.json({ error: 'Task does not belong to this project' }, { status: 403 });
}
if (action === 'interrupt') {
// Interrupt task
await db.task.update({
where: { id: taskId },
data: {
status: 3, // Interrupted
endTime: new Date()
}
});
return NextResponse.json({
code: 0,
message: 'Task interrupted'
});
}
return NextResponse.json({ error: 'Unknown action' }, { status: 400 });
} catch (error) {
console.error('Failed to operate evaluation task:', error);
return NextResponse.json({ code: 500, error: 'Operation failed', message: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,207 @@
import { NextResponse } from 'next/server';
import { db } from '@/lib/db/index';
import { processTask } from '@/lib/services/tasks';
/**
* Get all evaluation tasks for a project
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
const page = parseInt(searchParams.get('page') || '1');
const pageSize = parseInt(searchParams.get('pageSize') || '20');
if (!projectId) {
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
}
const skip = (page - 1) * pageSize;
// Fetch task list and total count
const [tasks, total] = await Promise.all([
db.task.findMany({
where: {
projectId,
taskType: 'model-evaluation'
},
orderBy: { createAt: 'desc' },
skip,
take: pageSize
}),
db.task.count({
where: {
projectId,
taskType: 'model-evaluation'
}
})
]);
// Parse task detail fields
const tasksWithDetails = tasks.map(task => {
let detail = {};
let modelInfo = {};
try {
detail = task.detail ? JSON.parse(task.detail) : {};
modelInfo = task.modelInfo ? JSON.parse(task.modelInfo) : {};
} catch (e) {
console.error('Failed to parse task detail:', e);
}
return {
...task,
detail,
modelInfo
};
});
return NextResponse.json({
code: 0,
data: {
items: tasksWithDetails,
total,
page,
pageSize,
totalPages: Math.ceil(total / pageSize)
}
});
} catch (error) {
console.error('Failed to fetch evaluation task list:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to fetch evaluation task list', message: error.message },
{ status: 500 }
);
}
}
/**
* Create evaluation tasks
* Supports selecting multiple models and creating one task per model
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const data = await request.json();
const {
models, // Models to evaluate: [{ modelId, providerId }]
evalDatasetIds, // Evaluation question IDs
judgeModelId, // Judge model ID (for subjective grading)
judgeProviderId, // Judge provider ID
language = 'zh-CN',
filterOptions = {}, // Filter options (for display)
customScoreAnchors = null // Custom score anchors for subjective grading
} = data;
// Validate required fields
if (!models || models.length === 0) {
return NextResponse.json({ code: 400, error: 'Please select at least one model to evaluate' }, { status: 400 });
}
if (!evalDatasetIds || evalDatasetIds.length === 0) {
return NextResponse.json({ code: 400, error: 'Please select questions to evaluate' }, { status: 400 });
}
// Check for subjective questions
const evalDatasets = await db.evalDatasets.findMany({
where: {
id: { in: evalDatasetIds },
projectId
},
select: { questionType: true }
});
const hasSubjectiveQuestions = evalDatasets.some(
q => q.questionType === 'short_answer' || q.questionType === 'open_ended'
);
// If there are subjective questions, a judge model is required
if (hasSubjectiveQuestions && (!judgeModelId || !judgeProviderId)) {
return NextResponse.json(
{ code: 400, error: 'Short-answer or open-ended questions found. Please select a judge model for grading' },
{ status: 400 }
);
}
// Judge model must not be the same as any test model
if (judgeModelId && judgeProviderId) {
const judgeModel = { modelId: judgeModelId, providerId: judgeProviderId };
const isJudgeInTestModels = models.some(
m => m.modelId === judgeModel.modelId && m.providerId === judgeModel.providerId
);
if (isJudgeInTestModels) {
return NextResponse.json(
{ code: 400, error: 'Judge model cannot be the same as a test model' },
{ status: 400 }
);
}
}
// Create one task per model
const createdTasks = [];
for (const model of models) {
const { modelId, providerId } = model;
// Fetch full model config
const modelConfig = await db.modelConfig.findFirst({
where: {
projectId,
providerId,
modelId
}
});
// Keep providerId for lookup, add providerName for display
const modelInfo = {
modelId,
modelName: modelConfig?.modelName || modelId,
providerId: providerId, // Provider ID (DB ID)
providerName: modelConfig?.providerName || providerId // Provider display name
};
// Build task detail
const taskDetail = {
evalDatasetIds,
judgeModelId: judgeModelId || null,
judgeProviderId: judgeProviderId || null,
filterOptions,
hasSubjectiveQuestions,
customScoreAnchors: customScoreAnchors || null // Store custom score anchors
};
// Create task
const newTask = await db.task.create({
data: {
projectId,
taskType: 'model-evaluation',
status: 0, // Processing
modelInfo: JSON.stringify(modelInfo),
language,
detail: JSON.stringify(taskDetail),
totalCount: evalDatasetIds.length,
completedCount: 0,
note: ''
}
});
createdTasks.push(newTask);
// Start task processing asynchronously
processTask(newTask.id).catch(err => {
console.error(`Failed to start evaluation task: ${newTask.id}`, err);
});
}
return NextResponse.json({
code: 0,
data: createdTasks,
message: `Successfully created ${createdTasks.length} evaluation tasks`
});
} catch (error) {
console.error('Failed to create evaluation task:', error);
return NextResponse.json(
{ code: 500, error: 'Failed to create evaluation task', message: error.message },
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,313 @@
import { NextResponse } from 'next/server';
import { getGaPairsByFileId, toggleGaPairActive, saveGaPairs, createGaPairs } from '@/lib/db/ga-pairs';
import { getUploadFileInfoById } from '@/lib/db/upload-files';
import { generateGaPairs } from '@/lib/services/ga/ga-generation';
import logger from '@/lib/util/logger';
import { db } from '@/lib/db/index';
/**
* 生成文件的 GA 对
*/
export async function POST(request, { params }) {
try {
const { projectId, fileId } = params;
const { regenerate = false, appendMode = false, language = '中文' } = await request.json();
// 验证参数
if (!projectId || !fileId) {
return NextResponse.json({ error: 'Project ID and File ID are required' }, { status: 400 });
}
logger.info(`Starting GA pairs generation for project: ${projectId}, file: ${fileId}, appendMode: ${appendMode}`);
// 检查文件是否存在
const file = await getUploadFileInfoById(fileId);
if (!file || file.projectId !== projectId) {
return NextResponse.json({ error: 'File not found or does not belong to the project' }, { status: 404 });
}
// 获取现有的GA对
const existingGaPairs = await getGaPairsByFileId(fileId);
// 如果是追加模式且已有GA对或者不是重新生成且已存在GA对
if (!regenerate && !appendMode && existingGaPairs.length > 0) {
return NextResponse.json({
success: true,
message: 'GA pairs already exist for this file',
data: existingGaPairs
});
}
// 读取文件内容
const fileContent = await getFileContent(projectId, file.fileName);
if (!fileContent) {
return NextResponse.json({ error: 'Failed to read file content' }, { status: 500 });
}
logger.info(`File content loaded successfully, length: ${fileContent.length}`);
// 检查模型配置
try {
const { getActiveModel } = await import('@/lib/services/models');
const activeModel = await getActiveModel(projectId);
if (!activeModel) {
logger.error('No active model configuration found');
return NextResponse.json(
{ error: 'No active AI model configured. Please configure a model in settings first.' },
{ status: 400 }
);
}
logger.info(`Using active model: ${activeModel.provider} - ${activeModel.model}`);
} catch (modelError) {
logger.error('Error checking model configuration:', modelError);
return NextResponse.json(
{ error: 'Failed to load model configuration. Please check your AI model settings.' },
{ status: 500 }
);
}
// 调用 LLM 生成 GA 对
logger.info(`Generating GA pairs for file: ${file.fileName}`);
let generatedGaPairs;
try {
generatedGaPairs = await generateGaPairs(fileContent, projectId, language);
if (!generatedGaPairs || generatedGaPairs.length === 0) {
logger.warn('No GA pairs generated from LLM');
return NextResponse.json(
{
error:
'No GA pairs could be generated from the file content. The content might be too short or not suitable for GA pair generation.'
},
{ status: 400 }
);
}
logger.info(`Successfully generated ${generatedGaPairs.length} GA pairs from LLM`);
} catch (generationError) {
logger.error('GA pairs generation failed:', generationError);
// 现有的错误处理逻辑...
let errorMessage = 'Failed to generate GA pairs';
if (generationError.message.includes('No active model')) {
errorMessage = 'No active AI model available. Please configure and activate a model in settings.';
} else if (generationError.message.includes('API key')) {
errorMessage = 'Invalid API key or model configuration. Please check your AI model settings.';
} else if (generationError.message.includes('rate limit')) {
errorMessage = 'API rate limit exceeded. Please try again later.';
} else {
errorMessage = `AI model error: ${generationError.message}`;
}
return NextResponse.json({ error: errorMessage }, { status: 500 });
}
// 保存到数据库
try {
if (appendMode && existingGaPairs.length > 0) {
// 追加模式只保存新生成的GA对不删除现有的
logger.info(`Appending ${generatedGaPairs.length} new GA pairs to existing ${existingGaPairs.length} pairs`);
// 为新GA对设置正确的pairNumber
const startPairNumber = existingGaPairs.length + 1;
const newGaPairData = generatedGaPairs.map((pair, index) => ({
projectId,
fileId,
pairNumber: startPairNumber + index,
genreTitle: pair.genre?.title || pair.genreTitle || '',
genreDesc: pair.genre?.description || pair.genreDesc || '',
audienceTitle: pair.audience?.title || pair.audienceTitle || '',
audienceDesc: pair.audience?.description || pair.audienceDesc || '',
isActive: true
}));
// 只创建新的GA对不删除现有的
await createGaPairs(newGaPairData);
logger.info('New GA pairs appended to database successfully');
} else {
// 覆盖模式:删除现有的,保存新的
await saveGaPairs(projectId, fileId, generatedGaPairs);
logger.info('GA pairs saved to database successfully');
}
} catch (saveError) {
logger.error('Failed to save GA pairs to database:', saveError);
return NextResponse.json(
{ error: 'Generated GA pairs successfully but failed to save to database' },
{ status: 500 }
);
}
// 获取保存后的所有GA对
const allGaPairs = await getGaPairsByFileId(fileId);
if (appendMode && existingGaPairs.length > 0) {
// 追加模式只返回新生成的GA对
const newGaPairs = allGaPairs.slice(existingGaPairs.length);
logger.info(`Successfully appended ${newGaPairs.length} GA pairs. Total pairs: ${allGaPairs.length}`);
return NextResponse.json({
success: true,
message: `${newGaPairs.length} new GA pairs appended successfully`,
data: newGaPairs,
total: allGaPairs.length
});
} else {
// 覆盖模式返回所有GA对
logger.info(`Successfully generated and saved ${allGaPairs.length} GA pairs for file: ${file.fileName}`);
return NextResponse.json({
success: true,
message: 'GA pairs generated successfully',
data: allGaPairs
});
}
} catch (error) {
logger.error('Unexpected error in GA pairs generation:', error);
return NextResponse.json(
{ error: error.message || 'Unexpected error occurred during GA pairs generation' },
{ status: 500 }
);
}
}
/**
* 获取文件的 GA 对
*/
export async function GET(request, { params }) {
try {
const { projectId, fileId } = params;
if (!projectId || !fileId) {
return NextResponse.json({ error: 'Project ID and File ID are required' }, { status: 400 });
}
const gaPairs = await getGaPairsByFileId(fileId);
return NextResponse.json({
success: true,
data: gaPairs
});
} catch (error) {
console.error('Error getting GA pairs:', String(error));
return NextResponse.json({ error: 'Failed to get GA pairs' }, { status: 500 });
}
}
/**
* 更新/替换文件的所有 GA 对
*/
export async function PUT(request, { params }) {
try {
const { projectId, fileId } = params;
const body = await request.json();
if (!projectId || !fileId) {
return NextResponse.json({ error: 'Project ID and File ID are required' }, { status: 400 });
}
const { updates } = body;
if (!updates || !Array.isArray(updates)) {
return NextResponse.json({ error: 'Updates array is required' }, { status: 400 });
}
logger.info(`Replacing all GA pairs for file ${fileId} with ${updates.length} pairs`);
// 使用数据库事务确保原子性操作
const results = await db.$transaction(async tx => {
// 1. 先删除所有现有的GA对
await tx.gaPairs.deleteMany({
where: { fileId }
});
// 2. 然后创建新的GA对
if (updates.length > 0) {
const gaPairData = updates.map((pair, index) => ({
projectId,
fileId,
pairNumber: index + 1,
genreTitle: pair.genreTitle || pair.genre?.title || pair.genre || '',
genreDesc: pair.genreDesc || pair.genre?.description || '',
audienceTitle: pair.audienceTitle || pair.audience?.title || pair.audience || '',
audienceDesc: pair.audienceDesc || pair.audience?.description || '',
isActive: pair.isActive !== undefined ? pair.isActive : true
}));
// 验证数据
for (const data of gaPairData) {
if (!data.genreTitle || !data.audienceTitle) {
throw new Error(`Invalid GA pair data: missing genre or audience title`);
}
}
await tx.gaPairs.createMany({ data: gaPairData });
}
// 3. 返回新创建的GA对
return await tx.gaPairs.findMany({
where: { fileId },
orderBy: { pairNumber: 'asc' }
});
});
logger.info(`Successfully replaced GA pairs, new count: ${results.length}`);
return NextResponse.json({
success: true,
data: results
});
} catch (error) {
logger.error('Error updating GA pairs:', error);
return NextResponse.json({ error: error.message || 'Failed to update GA pairs' }, { status: 500 });
}
}
/**
* 切换 GA 对激活状态
*/
export async function PATCH(request, { params }) {
try {
const { projectId, fileId } = params;
const body = await request.json();
if (!projectId || !fileId) {
return NextResponse.json({ error: 'Project ID and File ID are required' }, { status: 400 });
}
const { gaPairId, isActive } = body;
if (!gaPairId || typeof isActive !== 'boolean') {
return NextResponse.json({ error: 'GA pair ID and active status are required' }, { status: 400 });
}
const updatedPair = await toggleGaPairActive(gaPairId, isActive);
return NextResponse.json({
success: true,
data: updatedPair
});
} catch (error) {
console.error('Error toggling GA pair active status:', String(error));
return NextResponse.json({ error: 'Failed to toggle GA pair active status' }, { status: 500 });
}
}
// Helper function to read file content
async function getFileContent(projectId, fileName) {
try {
const { getProjectRoot } = await import('@/lib/db/base');
const path = await import('path');
const fs = await import('fs');
const projectRoot = await getProjectRoot();
const filePath = path.join(projectRoot, projectId, 'files', fileName.replace('.pdf', '.md'));
return await fs.promises.readFile(filePath, 'utf8');
} catch (error) {
logger.error('Failed to read file content:', error);
return null;
}
}

View File

@@ -0,0 +1,243 @@
import { NextResponse } from 'next/server';
import { getProject } from '@/lib/db/projects';
import path from 'path';
import { getProjectRoot, ensureDir } from '@/lib/db/base';
import { promises as fs } from 'fs';
import {
checkUploadFileInfoByMD5,
createUploadFileInfo,
delUploadFileInfoById,
getUploadFilesPagination
} from '@/lib/db/upload-files';
import { getFileMD5 } from '@/lib/util/file';
import { batchSaveTags } from '@/lib/db/tags';
import { getProjectChunks, getProjectTocByName } from '@/lib/file/text-splitter';
import { handleDomainTree } from '@/lib/util/domain-tree';
// Replace the deprecated config export with the new export syntax
export const dynamic = 'force-dynamic';
// This tells Next.js not to parse the request body automatically
export const bodyParser = false;
// 获取项目文件列表
export async function GET(request, { params }) {
try {
const { projectId } = params;
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: 'The project ID cannot be empty' }, { status: 400 });
}
const { searchParams } = new URL(request.url);
const page = parseInt(searchParams.get('page')) || 1;
const pageSize = parseInt(searchParams.get('pageSize')) || 10; // 每页10个文件支持分页
const fileName = searchParams.get('fileName') || '';
const getAllIds = searchParams.get('getAllIds') === 'true'; // 新增获取所有文件ID的标志
// 如果请求所有文件ID直接返回ID列表
if (getAllIds) {
const allFiles = await getUploadFilesPagination(projectId, 1, 9999, fileName); // 获取所有文件
const allFileIds = allFiles.data?.map(file => String(file.id)) || [];
return NextResponse.json({ allFileIds });
}
// 获取文件列表
const files = await getUploadFilesPagination(projectId, page, pageSize, fileName);
return NextResponse.json(files);
} catch (error) {
console.error('Error obtaining file list:', String(error));
return NextResponse.json({ error: error.message || 'Error obtaining file list' }, { status: 500 });
}
}
// 删除文件
export async function DELETE(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
const fileId = searchParams.get('fileId');
const domainTreeAction = searchParams.get('domainTreeAction') || 'keep';
// 从请求体中获取模型信息和语言环境
const requestData = await request.json();
const model = requestData.model;
const language = requestData.language || 'en';
// 验证项目ID和文件名
if (!projectId) {
return NextResponse.json({ error: 'The project ID cannot be empty' }, { status: 400 });
}
if (!fileId) {
return NextResponse.json({ error: 'The file name cannot be empty' }, { status: 400 });
}
// 获取项目信息
const project = await getProject(projectId);
if (!project) {
return NextResponse.json({ error: 'The project does not exist' }, { status: 404 });
}
// 删除文件及其相关的文本块、问题和数据集
const { stats, fileName, fileInfo } = await delUploadFileInfoById(fileId);
const deleteToc = await getProjectTocByName(projectId, fileName);
try {
const projectRoot = await getProjectRoot();
const projectPath = path.join(projectRoot, projectId);
const tocDir = path.join(projectPath, 'toc');
const baseName = path.basename(fileInfo.fileName, path.extname(fileInfo.fileName));
const tocPath = path.join(tocDir, `${baseName}-toc.json`);
// 检查文件是否存在再删除
await fs.unlink(tocPath);
console.log(`成功删除 TOC 文件: ${tocPath}`);
} catch (error) {
console.error(`删除 TOC 文件失败:`, String(error));
// 即使 TOC 文件删除失败,不影响整体结果
}
// 如果选择了保持领域树不变,直接返回删除结果
if (domainTreeAction === 'keep') {
return NextResponse.json({
message: '文件删除成功',
stats: stats,
domainTreeAction: 'keep',
cascadeDelete: true
});
}
// 处理领域树更新
try {
// 获取项目的所有文件
const { chunks, toc } = await getProjectChunks(projectId);
// 如果不存在文本块,说明项目已经没有文件了
if (!chunks || chunks.length === 0) {
// 清空领域树
await batchSaveTags(projectId, []);
return NextResponse.json({
message: '文件删除成功,领域树已清空',
stats: stats,
domainTreeAction,
cascadeDelete: true
});
}
// 调用领域树处理模块
await handleDomainTree({
projectId,
action: domainTreeAction,
allToc: toc,
model,
language,
deleteToc,
project
});
} catch (error) {
console.error('Error updating domain tree after file deletion:', String(error));
// 即使领域树更新失败,也不影响文件删除的结果
}
return NextResponse.json({
message: '文件删除成功',
stats: stats,
domainTreeAction,
cascadeDelete: true
});
} catch (error) {
console.error('Error deleting file:', String(error));
return NextResponse.json({ error: error.message || 'Error deleting file' }, { status: 500 });
}
}
// 上传文件
export async function POST(request, { params }) {
console.log('File upload request processing, parameters:', params);
const { projectId } = params;
// 验证项目ID
if (!projectId) {
console.log('The project ID cannot be empty, returning 400 error');
return NextResponse.json({ error: 'The project ID cannot be empty' }, { status: 400 });
}
// 获取项目信息
const project = await getProject(projectId);
if (!project) {
console.log('The project does not exist, returning 404 error');
return NextResponse.json({ error: 'The project does not exist' }, { status: 404 });
}
console.log('Project information retrieved successfully:', project.name || project.id);
try {
console.log('Try using alternate methods for file upload...');
// 检查请求头中是否包含文件名
const encodedFileName = request.headers.get('x-file-name');
const fileName = encodedFileName ? decodeURIComponent(encodedFileName) : null;
console.log('Get file name from request header:', fileName);
if (!fileName) {
console.log('The request header does not contain a file name');
return NextResponse.json(
{ error: 'The request header does not contain a file name (x-file-name)' },
{ status: 400 }
);
}
// 检查文件类型
if (!fileName.endsWith('.md') && !fileName.endsWith('.pdf')) {
return NextResponse.json({ error: 'Only Markdown files are supported' }, { status: 400 });
}
// 直接从请求体中读取二进制数据
const fileBuffer = Buffer.from(await request.arrayBuffer());
// 保存文件
const projectRoot = await getProjectRoot();
const projectPath = path.join(projectRoot, projectId);
const filesDir = path.join(projectPath, 'files');
await ensureDir(filesDir);
const filePath = path.join(filesDir, fileName);
await fs.writeFile(filePath, fileBuffer);
//获取文件大小
const stats = await fs.stat(filePath);
//获取文件md5
const md5 = await getFileMD5(filePath);
//获取文件扩展名
const ext = path.extname(filePath);
// let res = await checkUploadFileInfoByMD5(projectId, md5);
// if (res) {
// return NextResponse.json({ error: `【${fileName}】该文件已在此项目中存在` }, { status: 400 });
// }
let fileInfo = await createUploadFileInfo({
projectId,
fileName,
size: stats.size,
md5,
fileExt: ext,
path: filesDir
});
console.log('The file upload process is complete, and a successful response is returned');
return NextResponse.json({
message: 'File uploaded successfully',
fileName,
filePath,
fileId: fileInfo.id
});
} catch (error) {
console.error('Error processing file upload:', String(error));
console.error('Error stack:', error.stack);
return NextResponse.json(
{
error: 'File upload failed: ' + (error.message || 'Unknown error')
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,126 @@
import { NextResponse } from 'next/server';
import { getProjectChunks } from '@/lib/file/text-splitter';
import { getTaskConfig } from '@/lib/db/projects';
import { getChunkById } from '@/lib/db/chunks';
import { generateQuestionsForChunk, generateQuestionsForChunkWithGA } from '@/lib/services/questions';
// 批量生成问题
export async function POST(request, { params }) {
try {
const { projectId } = params;
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: 'The project ID cannot be empty' }, { status: 400 });
}
// 获取请求体
const { model, chunkIds, language = '中文', enableGaExpansion = false } = await request.json();
if (!model) {
return NextResponse.json({ error: 'The model cannot be empty' }, { status: 400 });
}
// 如果没有指定文本块ID则获取所有文本块
let chunks = [];
if (!chunkIds || chunkIds.length === 0) {
const result = await getProjectChunks(projectId);
chunks = result.chunks || [];
} else {
// 获取指定的文本块
chunks = await Promise.all(
chunkIds.map(async chunkId => {
const chunk = await getChunkById(chunkId);
if (chunk) {
return {
id: chunk.id,
content: chunk.content,
length: chunk.content.length
};
}
return null;
})
);
chunks = chunks.filter(Boolean); // 过滤掉不存在的文本块
}
if (chunks.length === 0) {
return NextResponse.json({ error: 'No valid text blocks found' }, { status: 404 });
}
const results = [];
const errors = [];
// 获取项目 task-config 信息
const taskConfig = await getTaskConfig(projectId);
const { questionGenerationLength } = taskConfig;
for (const chunk of chunks) {
try {
// 根据文本长度自动计算问题数量
const questionNumber = Math.floor(chunk.length / questionGenerationLength);
let result;
if (enableGaExpansion) {
// 使用GA增强的问题生成
result = await generateQuestionsForChunkWithGA(projectId, chunk.id, {
model,
language,
number: questionNumber
});
} else {
// 使用标准问题生成
result = await generateQuestionsForChunk(projectId, chunk.id, {
model,
language,
number: questionNumber
});
}
// 统一处理返回结果格式
if (result && result.questions && Array.isArray(result.questions)) {
// GA增强模式的结果格式
results.push({
chunkId: chunk.id,
success: true,
questions: result.questions,
total: result.total,
gaExpansionUsed: result.gaExpansionUsed,
gaPairsCount: result.gaPairsCount
});
} else if (result && result.labelQuestions && Array.isArray(result.labelQuestions)) {
// 标准模式的结果格式
results.push({
chunkId: chunk.id,
success: true,
questions: result.labelQuestions,
total: result.total,
gaExpansionUsed: false,
gaPairsCount: 0
});
} else {
errors.push({
chunkId: chunk.id,
error: 'Failed to parse questions'
});
}
} catch (error) {
console.error(`Failed to generate questions for text block ${chunk.id}:`, String(error));
errors.push({
chunkId: chunk.id,
error: error.message || 'Failed to generate questions'
});
}
}
// 返回生成结果
return NextResponse.json({
results,
errors,
totalSuccess: results.length,
totalErrors: errors.length,
totalChunks: chunks.length
});
} catch (error) {
console.error('Failed to generate questions:', String(error));
return NextResponse.json({ error: error.message || 'Failed to generate questions' }, { status: 500 });
}
}

View File

@@ -0,0 +1,310 @@
import { NextResponse } from 'next/server';
import { getProject } from '@/lib/db/projects';
import { getDatasets } from '@/lib/db/datasets';
import fs from 'fs';
import path from 'path';
import os from 'os';
import { uploadFiles, createRepo, checkRepoAccess } from '@huggingface/hub';
// 上传数据集到 HuggingFace
export async function POST(request, { params }) {
try {
const projectId = params.projectId;
const {
token,
datasetName,
isPrivate,
formatType,
systemPrompt,
confirmedOnly,
includeCOT,
fileFormat,
customFields,
reasoningLanguage
} = await request.json();
// 获取项目信息
const project = await getProject(projectId);
if (!project) {
return NextResponse.json({ error: '项目不存在' }, { status: 404 });
}
// 获取数据集问题
const questions = await getDatasets(projectId, confirmedOnly);
if (!questions || questions.length === 0) {
return NextResponse.json({ error: '没有可用的数据集问题' }, { status: 400 });
}
// 格式化数据集
const formattedData = formatDataset(questions, formatType, systemPrompt, includeCOT, customFields);
// 创建临时目录
const tempDir = path.join(os.tmpdir(), `hf-upload-${projectId}-${Date.now()}`);
fs.mkdirSync(tempDir, { recursive: true });
// 创建数据集文件
const datasetFilePath = path.join(tempDir, `dataset.${fileFormat}`);
if (fileFormat === 'json') {
fs.writeFileSync(datasetFilePath, JSON.stringify(formattedData, null, 2));
} else if (fileFormat === 'jsonl') {
const jsonlContent = formattedData.map(item => JSON.stringify(item)).join('\n');
fs.writeFileSync(datasetFilePath, jsonlContent);
} else if (fileFormat === 'csv') {
const csvContent = convertToCSV(formattedData);
fs.writeFileSync(datasetFilePath, csvContent);
}
// 创建 README.md 文件
const readmePath = path.join(tempDir, 'README.md');
const readmeContent = generateReadme(project.name, project.description, formatType);
fs.writeFileSync(readmePath, readmeContent);
// 使用 Hugging Face REST API 上传数据集
const visibility = isPrivate ? 'private' : 'public';
try {
// 准备仓库配置
const repo = { type: 'dataset', name: datasetName };
// 检查仓库是否存在
let repoExists = true;
try {
await checkRepoAccess({ repo, accessToken: token });
console.log(`Repository ${datasetName} exists, continuing to upload files`);
} catch (error) {
// If error code is 404, the repository does not exist
if (error.statusCode === 404) {
repoExists = false;
console.log(`Repository ${datasetName} does not exist, preparing to create`);
} else {
// Other errors (e.g., permission errors)
throw new Error(`Failed to check repository access: ${error.message}`);
}
}
// If the repository does not exist, create a new one
if (!repoExists) {
try {
await createRepo({
repo,
accessToken: token,
private: isPrivate,
license: 'mit',
description: project.description || 'Dataset created with Easy Dataset'
});
console.log(`Successfully created dataset repository: ${datasetName}`);
} catch (error) {
throw new Error(`Failed to create dataset repository: ${error.message}`);
}
}
// 2. 上传数据集文件
await uploadFile(token, datasetName, datasetFilePath, `dataset.${fileFormat}`);
// 3. 上传 README.md
await uploadFile(token, datasetName, readmePath, 'README.md');
} catch (error) {
console.error('Upload to HuggingFace Failed:', String(error));
return NextResponse.json({ error: `Upload Error: ${error.message}` }, { status: 500 });
}
// 清理临时目录
fs.rmSync(tempDir, { recursive: true, force: true });
// 返回成功信息
const datasetUrl = `https://huggingface.co/datasets/${datasetName}`;
return NextResponse.json({
success: true,
message: 'Upload successfully HuggingFace',
url: datasetUrl
});
} catch (error) {
console.error('Upload Faile:', String(error));
return NextResponse.json({ error: error.message }, { status: 500 });
}
}
// 格式化数据集
function formatDataset(questions, formatType, systemPrompt, includeCOT, customFields) {
if (formatType === 'alpaca') {
return questions.map(q => {
const item = {
instruction: q.question,
input: '',
output: includeCOT && q.cot ? `${q.cot}\n\n${q.answer}` : q.answer
};
if (systemPrompt) {
item.system = systemPrompt;
}
return item;
});
} else if (formatType === 'sharegpt') {
return questions.map(q => {
const messages = [];
if (systemPrompt) {
messages.push({
role: 'system',
content: systemPrompt
});
}
messages.push({
role: 'user',
content: q.question
});
messages.push({
role: 'assistant',
content: includeCOT && q.cot ? `${q.cot}\n\n${q.answer}` : q.answer
});
return { messages };
});
} else if (formatType === 'multilingualthinking') {
return questions.map(q => {
const messages = [];
// Main message block
const mainMsg = {
reasoning_language: reasoningLanguage ? reasoningLanguage : 'English',
user: q.question,
analysis: includeCOT && q.cot ? `${q.cot}` : null,
final: q.answer
};
if (systemPrompt) {
mainMsg.developer = systemPrompt;
}
messages.push(mainMsg);
// Optional system prompt
if (systemPrompt) {
messages.push({
role: 'system',
content: systemPrompt,
thinking: null
});
}
// User message
messages.push({
role: 'user',
content: q.question,
thinking: null
});
// Assistant message
messages.push({
role: 'assistant',
content: q.answer,
thinking: includeCOT && q.cot ? `${q.cot}` : null
});
return { messages };
});
} else if (formatType === 'custom' && customFields) {
return questions.map(q => {
const item = {
[customFields.questionField]: q.question,
[customFields.answerField]: q.answer
};
if (includeCOT && q.cot) {
item[customFields.cotField] = q.cot;
}
if (customFields.includeLabels && q.labels) {
item.labels = q.labels;
}
if (customFields.includeChunk && q.chunkId) {
item.chunkId = q.chunkId;
}
return item;
});
}
// 默认返回 alpaca 格式
return questions.map(q => ({
instruction: q.question,
output: includeCOT && q.cot ? `${q.cot}\n\n${q.answer}` : q.answer
}));
}
// 将数据转换为 CSV 格式
function convertToCSV(data) {
if (!data || data.length === 0) return '';
const headers = Object.keys(data[0]);
const headerRow = headers.join(',');
const rows = data.map(item => {
return headers
.map(header => {
const value = item[header];
if (typeof value === 'string') {
// 处理字符串中的逗号和引号
return `"${value.replace(/"/g, '""')}"`;
} else if (Array.isArray(value)) {
return `"${JSON.stringify(value).replace(/"/g, '""')}"`;
} else if (typeof value === 'object' && value !== null) {
return `"${JSON.stringify(value).replace(/"/g, '""')}"`;
}
return value;
})
.join(',');
});
return [headerRow, ...rows].join('\n');
}
// 使用 @huggingface/hub 包上传文件到 HuggingFace
async function uploadFile(token, datasetName, filePath, destFileName) {
try {
// 准备仓库配置
const repo = { type: 'dataset', name: datasetName };
// 创建文件 URL
const fileUrl = new URL(`file://${filePath}`);
// 使用 @huggingface/hub 包上传文件
await uploadFiles({
repo,
accessToken: token,
files: [
{
path: destFileName,
content: fileUrl
}
],
commitTitle: `Upload ${destFileName}`,
commitDescription: `Files uploaded using Easy Dataset`
});
return { success: true };
} catch (error) {
console.error(`File ${destFileName} Upload Error:`, String(error));
throw error;
}
}
// Generate README.md file
function generateReadme(projectName, projectDescription, formatType) {
return `# ${projectName}
## Description
${projectDescription || 'This dataset was created using the Easy Dataset tool.'}
## Format
This dataset is in ${formatType} format.
## Creation Method
This dataset was created using the [Easy Dataset](https://github.com/ConardLi/easy-dataset) tool.
> Easy Dataset is a specialized application designed to streamline the creation of fine-tuning datasets for Large Language Models (LLMs). It offers an intuitive interface for uploading domain-specific files, intelligently splitting content, generating questions, and producing high-quality training data for model fine-tuning.
`;
}

View File

@@ -0,0 +1,109 @@
import { NextResponse } from 'next/server';
import { getImageDatasetById, updateImageDataset, deleteImageDataset } from '@/lib/db/imageDatasets';
import { getProjectPath } from '@/lib/db/base';
import fs from 'fs/promises';
import path from 'path';
// 获取单个数据集详情
export async function GET(request, { params }) {
try {
const { projectId, datasetId } = params;
const dataset = await getImageDatasetById(datasetId);
if (!dataset || dataset.projectId !== projectId) {
return NextResponse.json({ error: 'Dataset not found' }, { status: 404 });
}
// 获取项目路径
const projectPath = await getProjectPath(projectId);
// 读取图片 base64
let base64 = null;
try {
const imagePath = path.join(projectPath, 'images', dataset.imageName);
const imageBuffer = await fs.readFile(imagePath);
const base64Data = imageBuffer.toString('base64');
const ext = path.extname(dataset.imageName).toLowerCase();
const mimeType = ext === '.png' ? 'image/png' : ext === '.gif' ? 'image/gif' : 'image/jpeg';
base64 = `data:${mimeType};base64,${base64Data}`;
} catch (error) {
console.error(`Failed to read image ${dataset.imageName}:`, error);
}
// 添加图片 base64
const datasetWithImage = {
...dataset,
base64
};
return NextResponse.json(datasetWithImage);
} catch (error) {
console.error('Failed to get dataset detail:', error);
return NextResponse.json({ error: error.message || 'Failed to get dataset detail' }, { status: 500 });
}
}
// 更新数据集
export async function PUT(request, { params }) {
try {
const { projectId, datasetId } = params;
const updates = await request.json();
// 验证数据集存在且属于该项目
const dataset = await getImageDatasetById(datasetId);
if (!dataset || dataset.projectId !== projectId) {
return NextResponse.json({ error: 'Dataset not found' }, { status: 404 });
}
// 更新数据集
const updated = await updateImageDataset(datasetId, updates);
// 获取项目路径
const projectPath = await getProjectPath(projectId);
// 读取图片 base64
let base64 = null;
try {
const imagePath = path.join(projectPath, 'images', updated.imageName);
const imageBuffer = await fs.readFile(imagePath);
const base64Data = imageBuffer.toString('base64');
const ext = path.extname(updated.imageName).toLowerCase();
const mimeType = ext === '.png' ? 'image/png' : ext === '.gif' ? 'image/gif' : 'image/jpeg';
base64 = `data:${mimeType};base64,${base64Data}`;
} catch (error) {
console.error(`Failed to read image ${updated.imageName}:`, error);
}
// 添加图片 base64
const updatedWithImage = {
...updated,
base64
};
return NextResponse.json(updatedWithImage);
} catch (error) {
console.error('Failed to update dataset:', error);
return NextResponse.json({ error: error.message || 'Failed to update dataset' }, { status: 500 });
}
}
// 删除数据集
export async function DELETE(request, { params }) {
try {
const { projectId, datasetId } = params;
// 验证数据集存在且属于该项目
const dataset = await getImageDatasetById(datasetId);
if (!dataset || dataset.projectId !== projectId) {
return NextResponse.json({ error: 'Dataset not found' }, { status: 404 });
}
await deleteImageDataset(datasetId);
return NextResponse.json({ success: true });
} catch (error) {
console.error('Failed to delete dataset:', error);
return NextResponse.json({ error: error.message || 'Failed to delete dataset' }, { status: 500 });
}
}

View File

@@ -0,0 +1,85 @@
import { NextResponse } from 'next/server';
import { getImageDatasetsForExport } from '@/lib/db/imageDatasets';
import archiver from 'archiver';
import { getProjectPath } from '@/lib/db/base';
import path from 'path';
import fs from 'fs';
/**
* 导出图片文件压缩包
*/
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
const confirmedOnly = searchParams.get('confirmedOnly') === 'true';
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
}
// 获取数据集(用于确定需要哪些图片)
const datasets = await getImageDatasetsForExport(projectId, confirmedOnly);
if (!datasets || datasets.length === 0) {
return NextResponse.json({ error: 'No data to export' }, { status: 404 });
}
// 获取所有需要的图片名称
const imageNames = new Set(datasets.map(d => d.imageName).filter(Boolean));
if (imageNames.size === 0) {
return NextResponse.json({ error: 'No images to export' }, { status: 404 });
}
// 创建压缩包
const archive = archiver('zip', {
zlib: { level: 9 }
});
// 设置响应头
const dateStr = new Date().toISOString().slice(0, 10);
const filename = `images-${projectId}-${dateStr}.zip`;
// 添加图片文件到压缩包
const projectPath = await getProjectPath(projectId);
const imageDir = path.join(projectPath, 'images');
if (!fs.existsSync(imageDir)) {
return NextResponse.json({ error: 'Image directory not found' }, { status: 404 });
}
let addedCount = 0;
for (const imageName of imageNames) {
const imagePath = path.join(imageDir, imageName);
if (fs.existsSync(imagePath)) {
archive.file(imagePath, { name: imageName });
addedCount++;
}
}
if (addedCount === 0) {
return NextResponse.json({ error: 'No image files found' }, { status: 404 });
}
// 完成压缩
archive.finalize();
// 返回流式响应
return new NextResponse(archive, {
headers: {
'Content-Type': 'application/zip',
'Content-Disposition': `attachment; filename="${filename}"`
}
});
} catch (error) {
console.error('Failed to export images:', String(error));
return NextResponse.json(
{
error: error.message || 'Failed to export images'
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,32 @@
import { NextResponse } from 'next/server';
import { getImageDatasetsForExport } from '@/lib/db/imageDatasets';
/**
* 导出图像数据集
*/
export async function POST(request, { params }) {
try {
const { projectId } = params;
const body = await request.json();
// 验证项目ID
if (!projectId) {
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
}
const confirmedOnly = body.confirmedOnly || false;
// 获取数据集
const datasets = await getImageDatasetsForExport(projectId, confirmedOnly);
return NextResponse.json(datasets);
} catch (error) {
console.error('Failed to export image datasets:', String(error));
return NextResponse.json(
{
error: error.message || 'Failed to export image datasets'
},
{ status: 500 }
);
}
}

View File

@@ -0,0 +1,72 @@
import { NextResponse } from 'next/server';
import { getImageDatasetsByProject } from '@/lib/db/imageDatasets';
import { getProjectPath } from '@/lib/db/base';
import fs from 'fs/promises';
import path from 'path';
// 获取图片数据集列表
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
const page = parseInt(searchParams.get('page')) || 1;
const pageSize = parseInt(searchParams.get('pageSize')) || 20;
const search = searchParams.get('search') || '';
const confirmed = searchParams.get('confirmed');
const minScore = searchParams.get('minScore');
const maxScore = searchParams.get('maxScore');
// 构建筛选条件
const filters = {};
if (search) {
filters.search = search;
}
if (confirmed !== null && confirmed !== undefined) {
filters.confirmed = confirmed === 'true';
}
if (minScore) {
filters.minScore = parseInt(minScore);
}
if (maxScore) {
filters.maxScore = parseInt(maxScore);
}
const result = await getImageDatasetsByProject(projectId, page, pageSize, filters);
// 获取项目路径
const projectPath = await getProjectPath(projectId);
// 为每个数据集添加图片 base64
const datasetsWithImages = await Promise.all(
result.data.map(async dataset => {
try {
const imagePath = path.join(projectPath, 'images', dataset.imageName);
const imageBuffer = await fs.readFile(imagePath);
const base64 = imageBuffer.toString('base64');
const ext = path.extname(dataset.imageName).toLowerCase();
const mimeType = ext === '.png' ? 'image/png' : ext === '.gif' ? 'image/gif' : 'image/jpeg';
return {
...dataset,
base64: `data:${mimeType};base64,${base64}`
};
} catch (error) {
console.error(`Failed to read image ${dataset.imageName}:`, error);
return {
...dataset,
base64: null
};
}
})
);
return NextResponse.json({
data: datasetsWithImages,
total: result.total
});
} catch (error) {
console.error('Failed to get image datasets:', error);
return NextResponse.json({ error: error.message || 'Failed to get image datasets' }, { status: 500 });
}
}

View File

@@ -0,0 +1,37 @@
import { NextResponse } from 'next/server';
import { getImageDatasetsTagsByProject } from '@/lib/db/imageDatasets';
// 获取项目中所有已使用的标签
export async function GET(request, { params }) {
try {
const { projectId } = params;
// 获取项目的所有数据集
const datasets = await getImageDatasetsTagsByProject(projectId);
console.log('datasets', datasets);
// 提取所有标签
const tagsSet = new Set();
datasets.forEach(dataset => {
if (dataset.tags) {
try {
const tags = JSON.parse(dataset.tags);
if (Array.isArray(tags)) {
tags.forEach(tag => tagsSet.add(tag));
}
} catch (e) {
// 忽略解析错误
}
}
});
// 转换为数组并排序
const tags = Array.from(tagsSet).sort();
return NextResponse.json({ tags });
} catch (error) {
console.error('Failed to get tags:', error);
return NextResponse.json({ error: error.message || 'Failed to get tags' }, { status: 500 });
}
}

View File

@@ -0,0 +1,31 @@
import { NextResponse } from 'next/server';
import { getImageDetailWithQuestions } from '@/lib/services/images';
// 根据图片ID获取图片详情包含问题列表和已标注数据
export async function GET(request, { params }) {
try {
const { projectId, imageId } = params;
// 调用服务层获取图片详情
const imageData = await getImageDetailWithQuestions(projectId, imageId);
return NextResponse.json({
success: true,
data: imageData
});
} catch (error) {
console.error('Failed to get image details:', error);
// 根据错误类型返回不同的状态码
let statusCode = 500;
if (error.message === '缺少图片ID') {
statusCode = 400;
} else if (error.message === '图片不存在') {
statusCode = 404;
} else if (error.message === '图片不属于指定项目') {
statusCode = 403;
}
return NextResponse.json({ error: error.message || 'Failed to get image details' }, { status: statusCode });
}
}

View File

@@ -0,0 +1,89 @@
import { NextResponse } from 'next/server';
import { PrismaClient } from '@prisma/client';
import { getImageById, getImageChunk } from '@/lib/db/images';
import { createImageDataset } from '@/lib/db/imageDatasets';
const prisma = new PrismaClient();
// 创建标注
export async function POST(request, { params }) {
try {
const { projectId } = params;
const { imageId, questionId, question, answerType, answer, note } = await request.json();
// 验证必填字段
if (!imageId || !question || !answerType || answer === undefined || answer === null) {
return NextResponse.json({ error: '缺少必要参数imageId, question, answerType, answer' }, { status: 400 });
}
// 验证图片存在
const image = await getImageById(imageId);
if (!image || image.projectId !== projectId) {
return NextResponse.json({ error: '图片不存在' }, { status: 404 });
}
// 验证答案类型
if (!['text', 'label', 'custom_format'].includes(answerType)) {
return NextResponse.json({ error: '无效的答案类型' }, { status: 400 });
}
// 验证答案内容
if (answerType === 'text' && typeof answer !== 'string') {
return NextResponse.json({ error: '文本类型答案必须是字符串' }, { status: 400 });
}
if (answerType === 'label' && !Array.isArray(answer)) {
return NextResponse.json({ error: '标签类型答案必须是数组' }, { status: 400 });
}
// 序列化答案
let answerString = answer;
if (answerType !== 'text' && typeof answerString !== 'string') {
answerString = JSON.stringify(answer, null, 2);
}
// 1. 获取问题记录(前端传递的 questionId 指向已有的问题)
if (!questionId) {
return NextResponse.json({ error: '缺少必要参数questionId' }, { status: 400 });
}
const questionRecord = await prisma.questions.findUnique({
where: { id: questionId }
});
if (!questionRecord) {
return NextResponse.json({ error: '问题不存在' }, { status: 404 });
}
// 验证问题属于该图片
if (questionRecord.imageId !== imageId) {
return NextResponse.json({ error: '问题不属于该图片' }, { status: 400 });
}
// 2. 更新问题为已回答
await prisma.questions.update({
where: { id: questionRecord.id },
data: { answered: true }
});
// 3. 创建 ImageDataset 记录
const dataset = await createImageDataset(projectId, {
imageId: image.id,
imageName: image.imageName,
questionId: questionRecord.id,
question,
answer: answerString,
answerType,
model: 'manual',
note: note || ''
});
return NextResponse.json({
success: true,
dataset,
questionId: questionRecord.id
});
} catch (error) {
console.error('Failed to create annotation:', error);
return NextResponse.json({ error: error.message || 'Failed to create annotation' }, { status: 500 });
}
}

View File

@@ -0,0 +1,41 @@
import { NextResponse } from 'next/server';
import { getImageByName } from '@/lib/db/images';
import imageService from '@/lib/services/images';
// 生成图像数据集
export async function POST(request, { params }) {
try {
const { projectId } = params;
const { imageName, question, model, language = 'zh', previewOnly = false } = await request.json();
if (!imageName || !question) {
return NextResponse.json({ error: '缺少必要参数' }, { status: 400 });
}
if (!model) {
return NextResponse.json({ error: '请选择一个视觉模型' }, { status: 400 });
}
// 获取图片信息
const image = await getImageByName(projectId, imageName);
if (!image) {
return NextResponse.json({ error: '图片不存在' }, { status: 404 });
}
// 调用图片数据集生成服务
const result = await imageService.generateDatasetForImage(projectId, image.id, question, {
model,
language,
previewOnly
});
return NextResponse.json({
success: true,
answer: result.answer,
dataset: result.dataset
});
} catch (error) {
console.error('Failed to generate image dataset:', error);
return NextResponse.json({ error: error.message || 'Failed to generate dataset' }, { status: 500 });
}
}

View File

@@ -0,0 +1,41 @@
import { NextResponse } from 'next/server';
import { PrismaClient } from '@prisma/client';
import { getImageDetailWithQuestions } from '@/lib/services/images';
const prisma = new PrismaClient();
// 获取下一个有未标注问题的图片
export async function GET(request, { params }) {
try {
const { projectId } = params;
// 查找第一个有未标注问题的图片
const unansweredQuestion = await prisma.questions.findFirst({
where: {
projectId,
imageId: {
not: null
},
answered: false
}
});
if (!unansweredQuestion) {
return NextResponse.json({
success: true,
data: null
});
}
// 调用服务层获取图片详情
const imageData = await getImageDetailWithQuestions(projectId, unansweredQuestion.imageId);
return NextResponse.json({
success: true,
data: imageData
});
} catch (error) {
console.error('Failed to get next unanswered image:', error);
return NextResponse.json({ error: error.message || 'Failed to get next unanswered image' }, { status: 500 });
}
}

View File

@@ -0,0 +1,98 @@
import { NextResponse } from 'next/server';
import { getProjectPath } from '@/lib/db/base';
import { importImagesFromDirectories } from '@/lib/services/images';
import fs from 'fs/promises';
import path from 'path';
import { savePdfAsImages } from '@/lib/util/file';
// PDF 转图片并导入
export async function POST(request, { params }) {
let tempPdfPath = null;
let tempImagesDir = null;
try {
const { projectId } = params;
const formData = await request.formData();
const pdfFile = formData.get('file');
if (!pdfFile) {
return NextResponse.json({ error: '请选择 PDF 文件' }, { status: 400 });
}
if (!pdfFile.name.toLowerCase().endsWith('.pdf')) {
return NextResponse.json({ error: '只支持 PDF 文件' }, { status: 400 });
}
const projectPath = await getProjectPath(projectId);
const tempDir = path.join(projectPath, 'temp');
await fs.mkdir(tempDir, { recursive: true });
// 1. 保存 PDF 到临时目录
tempPdfPath = path.join(tempDir, `temp_${Date.now()}_${pdfFile.name}`);
const pdfBuffer = Buffer.from(await pdfFile.arrayBuffer());
await fs.writeFile(tempPdfPath, pdfBuffer);
// 2. 创建临时图片目录
tempImagesDir = path.join(tempDir, `pdf_images_${Date.now()}`);
await fs.mkdir(tempImagesDir, { recursive: true });
// 3. 调用 pdf2md-js 转换 PDF 为图片
console.log('开始转换 PDF 为图片...');
const imagePaths = await savePdfAsImages(tempPdfPath, tempImagesDir, 3);
console.log('PDF 转换完成,生成图片数量:', imagePaths.length);
if (!imagePaths || imagePaths.length === 0) {
throw new Error('PDF 转换失败,未生成图片');
}
// 4. 直接调用服务层导入图片
const importResult = await importImagesFromDirectories(projectId, [tempImagesDir]);
// 5. 清理临时文件
try {
if (tempPdfPath) {
await fs.unlink(tempPdfPath);
}
if (tempImagesDir) {
const tempImages = await fs.readdir(tempImagesDir);
for (const img of tempImages) {
await fs.unlink(path.join(tempImagesDir, img));
}
await fs.rmdir(tempImagesDir);
}
const tempDirContents = await fs.readdir(tempDir);
if (tempDirContents.length === 0) {
await fs.rmdir(tempDir);
}
} catch (cleanupErr) {
console.warn('清理临时文件失败:', cleanupErr);
}
return NextResponse.json({
success: true,
count: importResult.count,
images: importResult.images,
pdfName: pdfFile.name
});
} catch (error) {
console.error('Failed to convert PDF:', error);
// 清理临时文件
try {
if (tempPdfPath) {
await fs.unlink(tempPdfPath).catch(() => {});
}
if (tempImagesDir) {
const tempImages = await fs.readdir(tempImagesDir).catch(() => []);
for (const img of tempImages) {
await fs.unlink(path.join(tempImagesDir, img)).catch(() => {});
}
await fs.rmdir(tempImagesDir).catch(() => {});
}
} catch (cleanupErr) {
console.warn('清理临时文件失败:', cleanupErr);
}
return NextResponse.json({ error: error.message || 'Failed to convert PDF' }, { status: 500 });
}
}

View File

@@ -0,0 +1,40 @@
import { NextResponse } from 'next/server';
import { getImageByName } from '@/lib/db/images';
import imageService from '@/lib/services/images';
// 生成图片问题
export async function POST(request, { params }) {
try {
const { projectId } = params;
const { imageName, count = 3, model, language = 'zh' } = await request.json();
if (!imageName) {
return NextResponse.json({ error: '缺少图片名称' }, { status: 400 });
}
if (!model) {
return NextResponse.json({ error: '请选择一个视觉模型' }, { status: 400 });
}
// 获取图片信息
const image = await getImageByName(projectId, imageName);
if (!image) {
return NextResponse.json({ error: '图片不存在' }, { status: 404 });
}
// 调用图片问题生成服务
const result = await imageService.generateQuestionsForImage(projectId, image.id, {
model,
language,
count
});
return NextResponse.json({
success: true,
questions: result.questions
});
} catch (error) {
console.error('Failed to generate image questions:', error);
return NextResponse.json({ error: error.message || 'Failed to generate questions' }, { status: 500 });
}
}

View File

@@ -0,0 +1,92 @@
import { NextResponse } from 'next/server';
import { getImages, deleteImage, getImageDetail } from '@/lib/db/images';
import { getProjectPath } from '@/lib/db/base';
import { db } from '@/lib/db/index';
import { importImagesFromDirectories } from '@/lib/services/images';
import fs from 'fs/promises';
import path from 'path';
// 获取图片列表
export async function GET(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
const page = parseInt(searchParams.get('page')) || 1;
const pageSize = parseInt(searchParams.get('pageSize')) || 20;
const imageName = searchParams.get('imageName') || '';
const hasQuestions = searchParams.get('hasQuestions');
const hasDatasets = searchParams.get('hasDatasets');
const simple = searchParams.get('simple');
const result = await getImages(projectId, page, pageSize, imageName, hasQuestions, hasDatasets, simple);
return NextResponse.json(result);
} catch (error) {
console.error('Failed to get images:', error);
return NextResponse.json({ error: error.message || 'Failed to get images' }, { status: 500 });
}
}
// 导入图片
export async function POST(request, { params }) {
try {
const { projectId } = params;
const { directories } = await request.json();
// 调用服务层处理图片导入
const result = await importImagesFromDirectories(projectId, directories);
return NextResponse.json(result);
} catch (error) {
console.error('Failed to import images:', error);
return NextResponse.json({ error: error.message || 'Failed to import images' }, { status: 500 });
}
}
// 删除图片
export async function DELETE(request, { params }) {
try {
const { projectId } = params;
const { searchParams } = new URL(request.url);
const imageId = searchParams.get('imageId');
if (!imageId) {
return NextResponse.json({ error: '缺少图片ID' }, { status: 400 });
}
// 获取图片信息
const image = await getImageDetail(imageId);
if (!image) {
return NextResponse.json({ error: '图片不存在' }, { status: 404 });
}
// 删除关联的数据集
await db.imageDatasets.deleteMany({
where: { imageId }
});
// 删除关联的问题
await db.questions.deleteMany({
where: { imageId }
});
// 删除文件
const projectPath = await getProjectPath(projectId);
const filePath = path.join(projectPath, 'images', image.imageName);
try {
await fs.unlink(filePath);
} catch (err) {
console.warn('删除文件失败:', err);
}
// 删除数据库记录
await deleteImage(imageId);
return NextResponse.json({ success: true });
} catch (error) {
console.error('Failed to delete image:', error);
return NextResponse.json({ error: error.message || 'Failed to delete image' }, { status: 500 });
}
}

View File

@@ -0,0 +1,127 @@
import { NextResponse } from 'next/server';
import { getProjectPath } from '@/lib/db/base';
import { importImagesFromDirectories } from '@/lib/services/images';
import fs from 'fs/promises';
import path from 'path';
import AdmZip from 'adm-zip';
// 压缩包解压并导入图片
export async function POST(request, { params }) {
let tempZipPath = null;
let tempExtractDir = null;
try {
const { projectId } = params;
const formData = await request.formData();
const zipFile = formData.get('file');
if (!zipFile) {
return NextResponse.json({ error: '请选择压缩包文件' }, { status: 400 });
}
if (!zipFile.name.toLowerCase().endsWith('.zip')) {
return NextResponse.json({ error: '只支持 ZIP 格式的压缩包' }, { status: 400 });
}
const projectPath = await getProjectPath(projectId);
const tempDir = path.join(projectPath, 'temp');
await fs.mkdir(tempDir, { recursive: true });
// 1. 保存压缩包到临时目录
tempZipPath = path.join(tempDir, `temp_${Date.now()}_${zipFile.name}`);
const zipBuffer = Buffer.from(await zipFile.arrayBuffer());
await fs.writeFile(tempZipPath, zipBuffer);
// 2. 创建临时解压目录
tempExtractDir = path.join(tempDir, `zip_extract_${Date.now()}`);
await fs.mkdir(tempExtractDir, { recursive: true });
// 3. 使用 adm-zip 解压文件
console.log('开始解压压缩包...');
const zip = new AdmZip(tempZipPath);
const zipEntries = zip.getEntries();
// 支持的图片扩展名
const imageExtensions = ['.jpg', '.jpeg', '.png', '.gif', '.bmp', '.webp', '.svg'];
let extractedCount = 0;
// 遍历压缩包中的所有文件
for (const entry of zipEntries) {
// 跳过目录和隐藏文件
if (
entry.isDirectory ||
entry.entryName.startsWith('__MACOSX') ||
path.basename(entry.entryName).startsWith('.')
) {
continue;
}
const ext = path.extname(entry.entryName).toLowerCase();
if (imageExtensions.includes(ext)) {
// 提取文件名(不包含路径)
const fileName = path.basename(entry.entryName);
const targetPath = path.join(tempExtractDir, fileName);
// 解压文件
zip.extractEntryTo(entry, tempExtractDir, false, true, false, fileName);
extractedCount++;
}
}
console.log(`压缩包解压完成,提取图片数量: ${extractedCount}`);
if (extractedCount === 0) {
throw new Error('压缩包中没有找到支持的图片文件');
}
// 4. 调用服务层导入图片
const importResult = await importImagesFromDirectories(projectId, [tempExtractDir]);
// 5. 清理临时文件
try {
if (tempZipPath) {
await fs.unlink(tempZipPath);
}
if (tempExtractDir) {
const tempImages = await fs.readdir(tempExtractDir);
for (const img of tempImages) {
await fs.unlink(path.join(tempExtractDir, img));
}
await fs.rmdir(tempExtractDir);
}
const tempDirContents = await fs.readdir(tempDir);
if (tempDirContents.length === 0) {
await fs.rmdir(tempDir);
}
} catch (cleanupErr) {
console.warn('清理临时文件失败:', cleanupErr);
}
return NextResponse.json({
success: true,
count: importResult.count,
images: importResult.images,
zipName: zipFile.name
});
} catch (error) {
console.error('Failed to import ZIP:', error);
// 清理临时文件
try {
if (tempZipPath) {
await fs.unlink(tempZipPath).catch(() => {});
}
if (tempExtractDir) {
const tempImages = await fs.readdir(tempExtractDir).catch(() => []);
for (const img of tempImages) {
await fs.unlink(path.join(tempExtractDir, img)).catch(() => {});
}
await fs.rmdir(tempExtractDir).catch(() => {});
}
} catch (cleanupErr) {
console.warn('清理临时文件失败:', cleanupErr);
}
return NextResponse.json({ error: error.message || 'Failed to import ZIP' }, { status: 500 });
}
}

View File

@@ -0,0 +1,27 @@
import { NextResponse } from 'next/server';
import path from 'path';
import fs from 'fs';
import { getProjectRoot } from '@/lib/db/base';
export async function GET(request, { params }) {
try {
const { projectId } = params;
if (!projectId) {
return NextResponse.json({ error: 'The project ID cannot be empty' }, { status: 400 });
}
const projectRoot = await getProjectRoot();
const projectPath = path.join(projectRoot, projectId);
const configPath = path.join(projectPath, 'dataset_info.json');
const exists = fs.existsSync(configPath);
return NextResponse.json({
exists,
configPath: exists ? configPath : null
});
} catch (error) {
console.error('Error checking Llama Factory config:', String(error));
return NextResponse.json({ error: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,141 @@
import { NextResponse } from 'next/server';
import path from 'path';
import fs from 'fs';
import { getProjectRoot } from '@/lib/db/base';
import { getDatasets } from '@/lib/db/datasets';
export async function POST(request, { params }) {
try {
const { projectId } = params;
const { formatType, systemPrompt, confirmedOnly, includeCOT, reasoningLanguage } = await request.json();
if (!projectId) {
return NextResponse.json({ error: 'The project ID cannot be empty' }, { status: 400 });
}
// 获取项目根目录
const projectRoot = await getProjectRoot();
const projectPath = path.join(projectRoot, projectId);
const configPath = path.join(projectPath, 'dataset_info.json');
const alpacaPath = path.join(projectPath, 'alpaca.json');
const sharegptPath = path.join(projectPath, 'sharegpt.json');
const multilingualThinkingPath = path.join(projectPath, 'multilingual-thinking.json');
// 获取数据集
let datasets = await getDatasets(projectId, !!confirmedOnly);
// 创建 dataset_info.json 配置
const config = {
[`[Easy Dataset] [${projectId}] Alpaca`]: {
file_name: 'alpaca.json',
columns: {
prompt: 'instruction',
query: 'input',
response: 'output',
system: 'system'
}
},
[`[Easy Dataset] [${projectId}] ShareGPT`]: {
file_name: 'sharegpt.json',
formatting: 'sharegpt',
columns: {
messages: 'messages'
},
tags: {
role_tag: 'role',
content_tag: 'content',
user_tag: 'user',
assistant_tag: 'assistant',
system_tag: 'system'
}
},
[`[Easy Dataset] [${projectId}] multilingual-thinking`]: {
file_name: 'multilingual-thinking.json',
formatting: 'multilingual-thinking',
columns: {
messages: 'messages'
},
tags: {
role_tag: 'role',
content_tag: 'content',
user_tag: 'user',
assistant_tag: 'assistant',
system_tag: 'system'
}
}
};
// 生成数据文件
const alpacaData = datasets.map(({ question, answer, cot }) => ({
instruction: question,
input: '',
output: cot && includeCOT ? `<think>${cot}</think>\n${answer}` : answer,
system: systemPrompt || ''
}));
const sharegptData = datasets.map(({ question, answer, cot }) => {
const messages = [];
if (systemPrompt) {
messages.push({
role: 'system',
content: systemPrompt
});
}
messages.push({
role: 'user',
content: question
});
messages.push({
role: 'assistant',
content: cot && includeCOT ? `<think>${cot}</think>\n${answer}` : answer
});
return { messages };
});
const multilingualThinkingData = datasets.map(({ question, answer, cot }) => ({
reasoning_language: reasoningLanguage ? reasoningLanguage : 'English',
developer: systemPrompt ? systemPrompt : '', // system prompt (may be empty)
user: question,
analysis: includeCOT && cot ? cot : null, // null if no COT
final: answer,
messages: [
{
content: systemPrompt ? systemPrompt : '',
role: 'system',
thinking: null
},
{
content: question,
role: 'user',
thinking: null
},
{
content: answer,
role: 'assistant',
thinking: includeCOT && cot ? cot : null
}
]
}));
const multilingualThinkingLines = multilingualThinkingData.map(item => JSON.stringify(item, null, 2)).join('\n');
await fs.promises.writeFile(multilingualThinkingPath, multilingualThinkingLines, 'utf8');
// 写入文件
await fs.promises.writeFile(configPath, JSON.stringify(config, null, 2));
await fs.promises.writeFile(alpacaPath, JSON.stringify(alpacaData, null, 2));
await fs.promises.writeFile(sharegptPath, JSON.stringify(sharegptData, null, 2));
return NextResponse.json({
success: true,
configPath,
files: [
{ path: alpacaPath, format: 'alpaca' },
{ path: sharegptPath, format: 'sharegpt' },
{ path: multilingualThinkingPath, format: 'multilingual-thinking' }
]
});
} catch (error) {
console.error('Error generating Llama Factory config:', String(error));
return NextResponse.json({ error: error.message }, { status: 500 });
}
}

View File

@@ -0,0 +1,18 @@
import { NextResponse } from 'next/server';
import { deleteModelConfigById } from '@/lib/db/model-config';
// 删除模型配置
export async function DELETE(request, { params }) {
try {
const { projectId, modelConfigId } = params;
// 验证项目 ID
if (!projectId) {
return NextResponse.json({ error: 'The project ID cannot be empty' }, { status: 400 });
}
await deleteModelConfigById(modelConfigId);
return NextResponse.json(true);
} catch (error) {
console.error('Error obtaining model configuration:', String(error));
return NextResponse.json({ error: 'Failed to obtain model configuration' }, { status: 500 });
}
}

View File

@@ -0,0 +1,103 @@
import { NextResponse } from 'next/server';
import { createInitModelConfig, getModelConfigByProjectId, saveModelConfig } from '@/lib/db/model-config';
import { DEFAULT_MODEL_SETTINGS, MODEL_PROVIDERS } from '@/constant/model';
import { getProject } from '@/lib/db/projects';
import { sortProvidersByPriority } from '@/lib/util/providerLogo';
function normalizeModelEndpoint(endpoint = '') {
let normalizedEndpoint = String(endpoint).trim();
if (!normalizedEndpoint) {
return '';
}
if (normalizedEndpoint.includes('/chat/completions')) {
normalizedEndpoint = normalizedEndpoint.replace('/chat/completions', '');
}
return normalizedEndpoint;
}
// 获取模型配置列表
export async function GET(request, { params }) {
try {
const { projectId } = params;
// 验证项目 ID
if (!projectId) {
return NextResponse.json({ error: 'The project ID cannot be empty' }, { status: 400 });
}
let modelConfigList = await getModelConfigByProjectId(projectId);
if (!modelConfigList || modelConfigList.length === 0) {
let insertModelConfigList = [];
const sortedProviders = sortProvidersByPriority(MODEL_PROVIDERS, item => item.id);
sortedProviders.forEach(item => {
let data = {
projectId: projectId,
providerId: item.id,
providerName: item.name,
endpoint: item.defaultEndpoint,
apiKey: '',
modelId: '',
modelName: '',
type: 'text',
temperature: DEFAULT_MODEL_SETTINGS.temperature,
maxTokens: DEFAULT_MODEL_SETTINGS.maxTokens,
topK: 0,
topP: DEFAULT_MODEL_SETTINGS.topP,
status: 1
};
insertModelConfigList.push(data);
});
modelConfigList = await createInitModelConfig(insertModelConfigList);
}
modelConfigList = sortProvidersByPriority(modelConfigList, item => item.providerId);
let project = await getProject(projectId);
return NextResponse.json({ data: modelConfigList, defaultModelConfigId: project.defaultModelConfigId });
} catch (error) {
console.error('Error obtaining model configuration:', String(error));
return NextResponse.json({ error: 'Failed to obtain model configuration' }, { status: 500 });
}
}
// 保存模型配置
export async function POST(request, { params }) {
try {
const { projectId } = params;
// 验证项目 ID
if (!projectId) {
return NextResponse.json({ error: 'The project ID cannot be empty' }, { status: 400 });
}
// 获取请求体
const modelConfig = await request.json();
// 验证请求体
if (!modelConfig) {
return NextResponse.json({ error: 'The model configuration cannot be empty ' }, { status: 400 });
}
modelConfig.projectId = projectId;
modelConfig.endpoint = normalizeModelEndpoint(modelConfig.endpoint);
// 如果没有 modelId使用 modelName 补齐(兼容旧逻辑)
if (!modelConfig.modelId && modelConfig.modelName) {
modelConfig.modelId = modelConfig.modelName;
}
// 如果没有 modelName使用 modelId 补齐
if (!modelConfig.modelName && modelConfig.modelId) {
modelConfig.modelName = modelConfig.modelId;
}
if (!modelConfig.topK) {
modelConfig.topK = 0;
}
if (!modelConfig.status) {
modelConfig.status = 1;
}
const parsedMaxTokens = Number(modelConfig.maxTokens ?? DEFAULT_MODEL_SETTINGS.maxTokens);
if (!Number.isInteger(parsedMaxTokens) || parsedMaxTokens < 1) {
return NextResponse.json({ error: 'maxTokens must be a positive integer' }, { status: 400 });
}
modelConfig.maxTokens = parsedMaxTokens;
const res = await saveModelConfig(modelConfig);
return NextResponse.json(res);
} catch (error) {
console.error('Error updating model configuration:', String(error));
return NextResponse.json({ error: 'Failed to update model configuration' }, { status: 500 });
}
}

Some files were not shown because too many files have changed in this diff Show More