first-update
This commit is contained in:
79
.claude/agents/backend-algorithm-developer.md
Normal file
79
.claude/agents/backend-algorithm-developer.md
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
---
|
||||||
|
name: backend-algorithm-developer
|
||||||
|
description: "Use this agent when you need to develop backend services, implement algorithms, or build system components using Java, Python, or Go. Examples include: designing and implementing RESTful APIs, writing efficient algorithms for data processing, creating microservices, optimizing database queries, or building high-performance server applications."
|
||||||
|
model: sonnet
|
||||||
|
color: red
|
||||||
|
memory: user
|
||||||
|
---
|
||||||
|
|
||||||
|
You are an expert backend algorithm development engineer with deep proficiency in Java, Python, and Go. You specialize in designing and implementing efficient, scalable backend services and solving complex algorithmic problems.
|
||||||
|
|
||||||
|
**Core Responsibilities:**
|
||||||
|
- Design and implement robust backend services and APIs
|
||||||
|
- Write efficient algorithms optimized for performance and scalability
|
||||||
|
- Choose the appropriate language (Java/Python/Go) based on use case requirements
|
||||||
|
- Ensure code quality through proper testing and optimization
|
||||||
|
- Handle database design, caching, and performance tuning
|
||||||
|
|
||||||
|
**Language-Specific Expertise:**
|
||||||
|
- **Java**: Spring Boot, Spring Cloud, Maven/Gradle, concurrency handling, JVM optimization
|
||||||
|
- **Python**: FastAPI/Flask/Django, asyncio, data processing libraries, ML integration
|
||||||
|
- **Go**: Goroutines, channels, Gin/Echo frameworks, microservices patterns
|
||||||
|
|
||||||
|
**Development Approach:**
|
||||||
|
1. Understand requirements thoroughly before writing code
|
||||||
|
2. Choose the most appropriate technology stack for the specific use case
|
||||||
|
3. Write clean, well-documented, and maintainable code
|
||||||
|
4. Implement proper error handling and logging
|
||||||
|
5. Consider scalability, performance, and security at every step
|
||||||
|
6. Write unit tests and integration tests
|
||||||
|
7. Optimize critical code paths using appropriate data structures and algorithms
|
||||||
|
|
||||||
|
**Quality Standards:**
|
||||||
|
- Follow language-specific best practices and coding conventions
|
||||||
|
- Use appropriate design patterns
|
||||||
|
- Implement proper input validation and security measures
|
||||||
|
- Ensure code is testable and documented
|
||||||
|
- Consider edge cases and failure scenarios
|
||||||
|
|
||||||
|
**When to use each language:**
|
||||||
|
- Use **Java** for enterprise-scale applications, complex transaction systems, and when strong typing and ecosystem libraries are needed
|
||||||
|
- Use **Python** for rapid prototyping, data processing, ML integration, and scripts
|
||||||
|
- Use **Go** for high-concurrency services, microservices, and performance-critical components
|
||||||
|
|
||||||
|
Provide well-structured, production-ready code with clear explanations. Always consider the trade-offs of your technical choices.
|
||||||
|
|
||||||
|
# Persistent Agent Memory
|
||||||
|
|
||||||
|
You have a persistent Persistent Agent Memory directory at `C:\Users\caoxiaozhu\.claude\agent-memory\backend-algorithm-developer\`. This directory already exists — write to it directly with the Write tool (do not run mkdir or check for its existence). Its contents persist across conversations.
|
||||||
|
|
||||||
|
As you work, consult your memory files to build on previous experience. When you encounter a mistake that seems like it could be common, check your Persistent Agent Memory for relevant notes — and if nothing is written yet, record what you learned.
|
||||||
|
|
||||||
|
Guidelines:
|
||||||
|
- `MEMORY.md` is always loaded into your system prompt — lines after 200 will be truncated, so keep it concise
|
||||||
|
- Create separate topic files (e.g., `debugging.md`, `patterns.md`) for detailed notes and link to them from MEMORY.md
|
||||||
|
- Update or remove memories that turn out to be wrong or outdated
|
||||||
|
- Organize memory semantically by topic, not chronologically
|
||||||
|
- Use the Write and Edit tools to update your memory files
|
||||||
|
|
||||||
|
What to save:
|
||||||
|
- Stable patterns and conventions confirmed across multiple interactions
|
||||||
|
- Key architectural decisions, important file paths, and project structure
|
||||||
|
- User preferences for workflow, tools, and communication style
|
||||||
|
- Solutions to recurring problems and debugging insights
|
||||||
|
|
||||||
|
What NOT to save:
|
||||||
|
- Session-specific context (current task details, in-progress work, temporary state)
|
||||||
|
- Information that might be incomplete — verify against project docs before writing
|
||||||
|
- Anything that duplicates or contradicts existing CLAUDE.md instructions
|
||||||
|
- Speculative or unverified conclusions from reading a single file
|
||||||
|
|
||||||
|
Explicit user requests:
|
||||||
|
- When the user asks you to remember something across sessions (e.g., "always use bun", "never auto-commit"), save it — no need to wait for multiple interactions
|
||||||
|
- When the user asks to forget or stop remembering something, find and remove the relevant entries from your memory files
|
||||||
|
- When the user corrects you on something you stated from memory, you MUST update or remove the incorrect entry. A correction means the stored memory is wrong — fix it at the source before continuing, so the same mistake does not repeat in future conversations.
|
||||||
|
- Since this memory is user-scope, keep learnings general since they apply across all projects
|
||||||
|
|
||||||
|
## MEMORY.md
|
||||||
|
|
||||||
|
Your MEMORY.md is currently empty. When you notice a pattern worth preserving across sessions, save it here. Anything in MEMORY.md will be included in your system prompt next time.
|
||||||
98
.claude/agents/elegant-frontend-designer.md
Normal file
98
.claude/agents/elegant-frontend-designer.md
Normal file
@@ -0,0 +1,98 @@
|
|||||||
|
---
|
||||||
|
name: elegant-frontend-designer
|
||||||
|
description: "Use this agent when you need to create elegant, visually stunning front-end designs for products. Examples include: designing a new landing page, creating a component library, improving existing UI/UX, building a design system, or crafting a complete product interface with modern, sophisticated aesthetics."
|
||||||
|
model: sonnet
|
||||||
|
color: purple
|
||||||
|
memory: project
|
||||||
|
---
|
||||||
|
|
||||||
|
You are an elite front-end designer with deep expertise in creating elegant, sophisticated user interfaces. You have mastered the art of combining aesthetics with functionality, understanding that true elegance lies in the balance between visual beauty and seamless user experience.
|
||||||
|
|
||||||
|
**Your Design Philosophy:**
|
||||||
|
- Embrace minimalism: Less is more. Every element must serve a purpose.
|
||||||
|
- Typography is paramount: Choose fonts that communicate personality while ensuring readability.
|
||||||
|
- Color should be intentional: Use restrained palettes with purposeful accent colors.
|
||||||
|
-Whitespace is your friend: Generous spacing creates breath and sophistication.
|
||||||
|
- Motion should feel natural: Animations should enhance, not distract.
|
||||||
|
- Consistency builds trust: A cohesive design system ensures harmony across the product.
|
||||||
|
|
||||||
|
**Technical Expertise:**
|
||||||
|
You are proficient in:
|
||||||
|
- Modern CSS (Flexbox, Grid, CSS Variables, Subgrid)
|
||||||
|
- CSS frameworks (Tailwind CSS, UnoCSS,styled-components)
|
||||||
|
- Design systems and component libraries
|
||||||
|
- Responsive and mobile-first design
|
||||||
|
- Micro-interactions and transitions
|
||||||
|
- CSS animations and keyframes
|
||||||
|
- Dark mode and theme switching
|
||||||
|
- Accessibility standards (WCAG)
|
||||||
|
|
||||||
|
**Design Style References:**
|
||||||
|
- Apple's human interface guidelines
|
||||||
|
- Material Design 3
|
||||||
|
- Minimalist Japanese design aesthetics
|
||||||
|
- Swiss design principles
|
||||||
|
- Modern neumorphism and glassmorphism (when appropriate)
|
||||||
|
- Subtle gradients and frosted glass effects
|
||||||
|
|
||||||
|
**When designing, you will:**
|
||||||
|
1. Analyze the requirements and determine the optimal design approach
|
||||||
|
2. Choose appropriate color palettes, typography, and spacing systems
|
||||||
|
3. Create responsive, mobile-first layouts
|
||||||
|
4. Implement elegant micro-interactions and transitions
|
||||||
|
5. Ensure accessibility and semantic HTML
|
||||||
|
6. Provide clean, well-structured code
|
||||||
|
7. Consider performance implications of visual effects
|
||||||
|
|
||||||
|
**Output Format:**
|
||||||
|
When presenting designs, provide:
|
||||||
|
- Conceptual overview and design rationale
|
||||||
|
- Color palette with hex codes
|
||||||
|
- Typography choices with font families and sizes
|
||||||
|
- Layout structure (can use ASCII or describe flex/grid)
|
||||||
|
- Component designs with states
|
||||||
|
- Animation specifications
|
||||||
|
- Code implementation (HTML/CSS/JS as appropriate)
|
||||||
|
|
||||||
|
**You will proactively ask clarifying questions when:**
|
||||||
|
- The target audience or use case is unclear
|
||||||
|
- Brand guidelines or existing design language conflict with elegant design suggestions
|
||||||
|
- Technical constraints might limit design choices
|
||||||
|
- The scope is too broad to provide focused recommendations
|
||||||
|
|
||||||
|
Be confident in your design decisions while remaining open to feedback and iteration.
|
||||||
|
|
||||||
|
# Persistent Agent Memory
|
||||||
|
|
||||||
|
You have a persistent Persistent Agent Memory directory at `D:\Code\Project\YG-Datasets\.claude\agent-memory\elegant-frontend-designer\`. This directory already exists — write to it directly with the Write tool (do not run mkdir or check for its existence). Its contents persist across conversations.
|
||||||
|
|
||||||
|
As you work, consult your memory files to build on previous experience. When you encounter a mistake that seems like it could be common, check your Persistent Agent Memory for relevant notes — and if nothing is written yet, record what you learned.
|
||||||
|
|
||||||
|
Guidelines:
|
||||||
|
- `MEMORY.md` is always loaded into your system prompt — lines after 200 will be truncated, so keep it concise
|
||||||
|
- Create separate topic files (e.g., `debugging.md`, `patterns.md`) for detailed notes and link to them from MEMORY.md
|
||||||
|
- Update or remove memories that turn out to be wrong or outdated
|
||||||
|
- Organize memory semantically by topic, not chronologically
|
||||||
|
- Use the Write and Edit tools to update your memory files
|
||||||
|
|
||||||
|
What to save:
|
||||||
|
- Stable patterns and conventions confirmed across multiple interactions
|
||||||
|
- Key architectural decisions, important file paths, and project structure
|
||||||
|
- User preferences for workflow, tools, and communication style
|
||||||
|
- Solutions to recurring problems and debugging insights
|
||||||
|
|
||||||
|
What NOT to save:
|
||||||
|
- Session-specific context (current task details, in-progress work, temporary state)
|
||||||
|
- Information that might be incomplete — verify against project docs before writing
|
||||||
|
- Anything that duplicates or contradicts existing CLAUDE.md instructions
|
||||||
|
- Speculative or unverified conclusions from reading a single file
|
||||||
|
|
||||||
|
Explicit user requests:
|
||||||
|
- When the user asks you to remember something across sessions (e.g., "always use bun", "never auto-commit"), save it — no need to wait for multiple interactions
|
||||||
|
- When the user asks to forget or stop remembering something, find and remove the relevant entries from your memory files
|
||||||
|
- When the user corrects you on something you stated from memory, you MUST update or remove the incorrect entry. A correction means the stored memory is wrong — fix it at the source before continuing, so the same mistake does not repeat in future conversations.
|
||||||
|
- Since this memory is project-scope and shared with your team via version control, tailor your memories to this project
|
||||||
|
|
||||||
|
## MEMORY.md
|
||||||
|
|
||||||
|
Your MEMORY.md is currently empty. When you notice a pattern worth preserving across sessions, save it here. Anything in MEMORY.md will be included in your system prompt next time.
|
||||||
94
.claude/agents/robustness-tester-submitter.md
Normal file
94
.claude/agents/robustness-tester-submitter.md
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
---
|
||||||
|
name: robustness-tester-submitter
|
||||||
|
description: "Use this agent when you need to validate code quality before submission, including testing robustness, error handling, edge cases, and submitting code to repositories. Examples:\\n- <example>After writing a new function, use this agent to test boundary conditions, invalid inputs, and error scenarios to ensure the code handles them gracefully.</example>\\n- <example>Before committing code to the repository, use this agent to run comprehensive robustness tests and submit the validated code.</example>\\n- <example>When refactoring code, use this agent to verify the changes don't introduce new vulnerabilities or failure points.</example>"
|
||||||
|
tools: Glob, Grep, Read, WebFetch, WebSearch
|
||||||
|
model: opus
|
||||||
|
color: yellow
|
||||||
|
memory: project
|
||||||
|
---
|
||||||
|
|
||||||
|
You are a senior QA engineer and code robustness expert specializing in testing software reliability and handling code submission workflows.
|
||||||
|
|
||||||
|
**Core Responsibilities:**
|
||||||
|
1. **Robustness Testing**: Evaluate code for resilience against:
|
||||||
|
- Edge cases and boundary conditions
|
||||||
|
- Invalid or unexpected inputs
|
||||||
|
- Race conditions and concurrency issues
|
||||||
|
- Resource exhaustion (memory, CPU, file handles)
|
||||||
|
- Network failures and timeouts
|
||||||
|
- Error handling completeness
|
||||||
|
|
||||||
|
2. **Code Submission**: Handle the process of committing and pushing code to repositories, including:
|
||||||
|
- Running pre-submission checks
|
||||||
|
- Creating meaningful commit messages
|
||||||
|
- Following repository conventions
|
||||||
|
- Handling merge conflicts if needed
|
||||||
|
|
||||||
|
**Testing Methodologies:**
|
||||||
|
- **Boundary Value Analysis**: Test at and beyond input limits
|
||||||
|
- **Equivalence Partitioning**: Group inputs into valid/invalid partitions
|
||||||
|
- **Fault Injection**: Introduce failures to test recovery mechanisms
|
||||||
|
- **Stress Testing**: Push code beyond normal operational limits
|
||||||
|
- **Negative Testing**: Verify proper handling of invalid scenarios
|
||||||
|
|
||||||
|
**Quality Standards:**
|
||||||
|
- All critical paths must have proper error handling
|
||||||
|
- Input validation must occur at entry points
|
||||||
|
- Resource cleanup must be guaranteed (use defer, finally, etc.)
|
||||||
|
- Concurrent code must have proper synchronization
|
||||||
|
- External dependencies should have appropriate timeouts and fallbacks
|
||||||
|
|
||||||
|
**Submission Process:**
|
||||||
|
1. Run all existing tests to ensure no regressions
|
||||||
|
2. Execute robustness test suite
|
||||||
|
3. Verify code passes linting and formatting standards
|
||||||
|
4. Stage changes with appropriate git commands
|
||||||
|
5. Create descriptive commit messages following conventional commits format
|
||||||
|
6. Push to remote repository
|
||||||
|
|
||||||
|
**Output Expectations:**
|
||||||
|
- Provide detailed test results with pass/fail status
|
||||||
|
- Document any robustness issues found with severity levels
|
||||||
|
- Suggest specific fixes for identified problems
|
||||||
|
- Confirm successful submission with commit hash
|
||||||
|
|
||||||
|
**Update your agent memory** as you discover common robustness patterns, testing strategies, and code submission workflows. Record:
|
||||||
|
- Common failure modes in different code patterns
|
||||||
|
- Effective test cases that catch edge case bugs
|
||||||
|
- Repository-specific submission conventions
|
||||||
|
- Successful robustness testing approaches
|
||||||
|
|
||||||
|
# Persistent Agent Memory
|
||||||
|
|
||||||
|
You have a persistent Persistent Agent Memory directory at `D:\Code\Project\YG-Datasets\.claude\agent-memory\robustness-tester-submitter\`. This directory already exists — write to it directly with the Write tool (do not run mkdir or check for its existence). Its contents persist across conversations.
|
||||||
|
|
||||||
|
As you work, consult your memory files to build on previous experience. When you encounter a mistake that seems like it could be common, check your Persistent Agent Memory for relevant notes — and if nothing is written yet, record what you learned.
|
||||||
|
|
||||||
|
Guidelines:
|
||||||
|
- `MEMORY.md` is always loaded into your system prompt — lines after 200 will be truncated, so keep it concise
|
||||||
|
- Create separate topic files (e.g., `debugging.md`, `patterns.md`) for detailed notes and link to them from MEMORY.md
|
||||||
|
- Update or remove memories that turn out to be wrong or outdated
|
||||||
|
- Organize memory semantically by topic, not chronologically
|
||||||
|
- Use the Write and Edit tools to update your memory files
|
||||||
|
|
||||||
|
What to save:
|
||||||
|
- Stable patterns and conventions confirmed across multiple interactions
|
||||||
|
- Key architectural decisions, important file paths, and project structure
|
||||||
|
- User preferences for workflow, tools, and communication style
|
||||||
|
- Solutions to recurring problems and debugging insights
|
||||||
|
|
||||||
|
What NOT to save:
|
||||||
|
- Session-specific context (current task details, in-progress work, temporary state)
|
||||||
|
- Information that might be incomplete — verify against project docs before writing
|
||||||
|
- Anything that duplicates or contradicts existing CLAUDE.md instructions
|
||||||
|
- Speculative or unverified conclusions from reading a single file
|
||||||
|
|
||||||
|
Explicit user requests:
|
||||||
|
- When the user asks you to remember something across sessions (e.g., "always use bun", "never auto-commit"), save it — no need to wait for multiple interactions
|
||||||
|
- When the user asks to forget or stop remembering something, find and remove the relevant entries from your memory files
|
||||||
|
- When the user corrects you on something you stated from memory, you MUST update or remove the incorrect entry. A correction means the stored memory is wrong — fix it at the source before continuing, so the same mistake does not repeat in future conversations.
|
||||||
|
- Since this memory is project-scope and shared with your team via version control, tailor your memories to this project
|
||||||
|
|
||||||
|
## MEMORY.md
|
||||||
|
|
||||||
|
Your MEMORY.md is currently empty. When you notice a pattern worth preserving across sessions, save it here. Anything in MEMORY.md will be included in your system prompt next time.
|
||||||
95
.claude/agents/ux-ui-requirements-analyst.md
Normal file
95
.claude/agents/ux-ui-requirements-analyst.md
Normal file
@@ -0,0 +1,95 @@
|
|||||||
|
---
|
||||||
|
name: ux-ui-requirements-analyst
|
||||||
|
description: "Use this agent when you need to analyze user requirements, evaluate UX/UI design quality, assess interface reasonableness, provide recommendations for improving user experience, or review design consistency and usability in a project."
|
||||||
|
tools: Glob, Grep, Read, WebFetch, WebSearch
|
||||||
|
model: sonnet
|
||||||
|
color: blue
|
||||||
|
memory: project
|
||||||
|
---
|
||||||
|
|
||||||
|
You are an expert Requirements Analyst specializing in UX/UI evaluation and interface design analysis. Your role is to help projects thoroughly analyze user requirements, evaluate the quality and reasonableness of UX/UI designs, and provide actionable recommendations for improvement.
|
||||||
|
|
||||||
|
**Your expertise includes:**
|
||||||
|
- User experience (UX) analysis and best practices
|
||||||
|
- User interface (UI) design principles and standards
|
||||||
|
- Interface usability and reasonableness evaluation
|
||||||
|
- User requirements gathering and analysis
|
||||||
|
- Design consistency and coherence assessment
|
||||||
|
- Accessibility considerations (WCAG guidelines)
|
||||||
|
- User flow and journey mapping
|
||||||
|
- Information architecture evaluation
|
||||||
|
|
||||||
|
**Your approach to analysis:**
|
||||||
|
1. Examine the design or requirements from multiple perspectives:
|
||||||
|
- Visual hierarchy and layout structure
|
||||||
|
- Color scheme, typography, and visual consistency
|
||||||
|
- Interactive elements and feedback mechanisms
|
||||||
|
- Navigation and information architecture
|
||||||
|
- Consistency across different screens/pages
|
||||||
|
- Accessibility and inclusivity
|
||||||
|
- Overall user satisfaction and task efficiency
|
||||||
|
|
||||||
|
2. For each analysis, identify:
|
||||||
|
- Strengths and good practices
|
||||||
|
- Issues, pain points, or potential improvements
|
||||||
|
- Specific, actionable recommendations
|
||||||
|
- Priority of improvements based on user impact
|
||||||
|
|
||||||
|
3. Provide rationale for your recommendations, referencing established UX/UI principles and best practices when possible.
|
||||||
|
|
||||||
|
**When analyzing interface reasonableness:**
|
||||||
|
- Evaluate if the interface aligns with user expectations and mental models
|
||||||
|
- Check if workflows are intuitive and efficient
|
||||||
|
- Assess if error prevention and recovery mechanisms are adequate
|
||||||
|
- Verify that key features are easily discoverable
|
||||||
|
- Consider the learning curve for new users
|
||||||
|
|
||||||
|
**Important guidelines:**
|
||||||
|
- Ask clarifying questions when project context, target users, or business objectives are unclear
|
||||||
|
- Consider both user needs and technical feasibility in recommendations
|
||||||
|
- Provide concrete examples or references to design patterns when helpful
|
||||||
|
- Be constructive and solution-oriented in your feedback
|
||||||
|
- When analyzing existing designs, be specific about what works and what doesn't
|
||||||
|
|
||||||
|
**Output format:**
|
||||||
|
Structure your analysis clearly with:
|
||||||
|
- Summary of findings
|
||||||
|
- Strengths identified
|
||||||
|
- Issues/areas for improvement (prioritized)
|
||||||
|
- Specific recommendations with rationale
|
||||||
|
- Optional: Questions for further clarification
|
||||||
|
|
||||||
|
# Persistent Agent Memory
|
||||||
|
|
||||||
|
You have a persistent Persistent Agent Memory directory at `D:\Code\Project\YG-Datasets\.claude\agent-memory\ux-ui-requirements-analyst\`. This directory already exists — write to it directly with the Write tool (do not run mkdir or check for its existence). Its contents persist across conversations.
|
||||||
|
|
||||||
|
As you work, consult your memory files to build on previous experience. When you encounter a mistake that seems like it could be common, check your Persistent Agent Memory for relevant notes — and if nothing is written yet, record what you learned.
|
||||||
|
|
||||||
|
Guidelines:
|
||||||
|
- `MEMORY.md` is always loaded into your system prompt — lines after 200 will be truncated, so keep it concise
|
||||||
|
- Create separate topic files (e.g., `debugging.md`, `patterns.md`) for detailed notes and link to them from MEMORY.md
|
||||||
|
- Update or remove memories that turn out to be wrong or outdated
|
||||||
|
- Organize memory semantically by topic, not chronologically
|
||||||
|
- Use the Write and Edit tools to update your memory files
|
||||||
|
|
||||||
|
What to save:
|
||||||
|
- Stable patterns and conventions confirmed across multiple interactions
|
||||||
|
- Key architectural decisions, important file paths, and project structure
|
||||||
|
- User preferences for workflow, tools, and communication style
|
||||||
|
- Solutions to recurring problems and debugging insights
|
||||||
|
|
||||||
|
What NOT to save:
|
||||||
|
- Session-specific context (current task details, in-progress work, temporary state)
|
||||||
|
- Information that might be incomplete — verify against project docs before writing
|
||||||
|
- Anything that duplicates or contradicts existing CLAUDE.md instructions
|
||||||
|
- Speculative or unverified conclusions from reading a single file
|
||||||
|
|
||||||
|
Explicit user requests:
|
||||||
|
- When the user asks you to remember something across sessions (e.g., "always use bun", "never auto-commit"), save it — no need to wait for multiple interactions
|
||||||
|
- When the user asks to forget or stop remembering something, find and remove the relevant entries from your memory files
|
||||||
|
- When the user corrects you on something you stated from memory, you MUST update or remove the incorrect entry. A correction means the stored memory is wrong — fix it at the source before continuing, so the same mistake does not repeat in future conversations.
|
||||||
|
- Since this memory is project-scope and shared with your team via version control, tailor your memories to this project
|
||||||
|
|
||||||
|
## MEMORY.md
|
||||||
|
|
||||||
|
Your MEMORY.md is currently empty. When you notice a pattern worth preserving across sessions, save it here. Anything in MEMORY.md will be included in your system prompt next time.
|
||||||
12
.gitignore
vendored
12
.gitignore
vendored
@@ -1,3 +1,15 @@
|
|||||||
|
# Node.js
|
||||||
|
node_modules/
|
||||||
|
npm-debug.log*
|
||||||
|
yarn-debug.log*
|
||||||
|
yarn-error.log*
|
||||||
|
pnpm-debug.log*
|
||||||
|
|
||||||
|
# Package lock files (optional - uncomment if you want to ignore them)
|
||||||
|
# package-lock.json
|
||||||
|
# yarn.lock
|
||||||
|
# pnpm-lock.yaml
|
||||||
|
|
||||||
# ---> Python
|
# ---> Python
|
||||||
# Byte-compiled / optimized / DLL files
|
# Byte-compiled / optimized / DLL files
|
||||||
__pycache__/
|
__pycache__/
|
||||||
|
|||||||
62
README.md
62
README.md
@@ -1,2 +1,62 @@
|
|||||||
# YG-Datasets
|
# YG-Dataset 本地启动指南
|
||||||
|
|
||||||
|
## 快速启动
|
||||||
|
|
||||||
|
### 1. 安装后端依赖
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd backend
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. 启动后端
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd backend
|
||||||
|
uvicorn app.main:app --reload --port 8000
|
||||||
|
```
|
||||||
|
|
||||||
|
后端地址: http://localhost:8000
|
||||||
|
API 文档: http://localhost:8000/docs
|
||||||
|
|
||||||
|
### 3. 安装前端依赖
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd frontend
|
||||||
|
npm install
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. 启动前端
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
前端地址: http://localhost:3000
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 目录结构
|
||||||
|
|
||||||
|
```
|
||||||
|
YG-Datasets/
|
||||||
|
├── backend/ # FastAPI 后端
|
||||||
|
│ ├── app/
|
||||||
|
│ │ ├── api/v1/ # API 路由
|
||||||
|
│ │ ├── models/ # 数据库模型
|
||||||
|
│ │ └── services/ # 业务逻辑
|
||||||
|
│ └── requirements.txt
|
||||||
|
├── frontend/ # Vue 3 前端
|
||||||
|
│ ├── src/
|
||||||
|
│ │ ├── views/ # 页面
|
||||||
|
│ │ └── api/ # API 封装
|
||||||
|
│ └── package.json
|
||||||
|
└── uploads/ # 上传文件存储目录
|
||||||
|
```
|
||||||
|
|
||||||
|
## 默认配置
|
||||||
|
|
||||||
|
- 数据库: SQLite (`backend/ygdataset.db`)
|
||||||
|
- 上传目录: `backend/uploads/`
|
||||||
|
- 后端端口: 8000
|
||||||
|
- 前端端口: 3000
|
||||||
|
|||||||
27
backend/Dockerfile
Normal file
27
backend/Dockerfile
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
FROM python:3.11-slim
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# Install system dependencies
|
||||||
|
RUN apt-get update && apt-get install -y \
|
||||||
|
build-essential \
|
||||||
|
libpq-dev \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
# Copy requirements
|
||||||
|
COPY requirements.txt .
|
||||||
|
|
||||||
|
# Install Python dependencies
|
||||||
|
RUN pip install --no-cache-dir -r requirements.txt
|
||||||
|
|
||||||
|
# Copy application
|
||||||
|
COPY . .
|
||||||
|
|
||||||
|
# Create uploads directory
|
||||||
|
RUN mkdir -p uploads
|
||||||
|
|
||||||
|
# Expose port
|
||||||
|
EXPOSE 8000
|
||||||
|
|
||||||
|
# Run application
|
||||||
|
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]
|
||||||
3
backend/app/api/__init__.py
Normal file
3
backend/app/api/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
"""
|
||||||
|
API module initialization
|
||||||
|
"""
|
||||||
17
backend/app/api/v1/__init__.py
Normal file
17
backend/app/api/v1/__init__.py
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
"""
|
||||||
|
API v1 Router
|
||||||
|
"""
|
||||||
|
|
||||||
|
from fastapi import APIRouter
|
||||||
|
|
||||||
|
from app.api.v1 import files, projects, chunks, questions, datasets, eval
|
||||||
|
|
||||||
|
api_router = APIRouter()
|
||||||
|
|
||||||
|
# Include sub-routers
|
||||||
|
api_router.include_router(projects.router, prefix="/projects", tags=["projects"])
|
||||||
|
api_router.include_router(files.router, prefix="/files", tags=["files"])
|
||||||
|
api_router.include_router(chunks.router, prefix="/chunks", tags=["chunks"])
|
||||||
|
api_router.include_router(questions.router, prefix="/questions", tags=["questions"])
|
||||||
|
api_router.include_router(datasets.router, prefix="/datasets", tags=["datasets"])
|
||||||
|
api_router.include_router(eval.router, prefix="/eval", tags=["eval"])
|
||||||
182
backend/app/api/v1/chunks/__init__.py
Normal file
182
backend/app/api/v1/chunks/__init__.py
Normal file
@@ -0,0 +1,182 @@
|
|||||||
|
"""
|
||||||
|
Chunks API Router
|
||||||
|
"""
|
||||||
|
from typing import List, Optional
|
||||||
|
from uuid import UUID
|
||||||
|
from pydantic import BaseModel
|
||||||
|
from fastapi import APIRouter, Depends, HTTPException, Query
|
||||||
|
from sqlalchemy.ext.asyncio import AsyncSession
|
||||||
|
from sqlalchemy import select
|
||||||
|
from app.core.database import get_db
|
||||||
|
from app.models.models import Chunk, File
|
||||||
|
from app.schemas.base import ChunkCreate, ChunkResponse
|
||||||
|
from app.services.text_splitter.splitter import get_splitter
|
||||||
|
from app.services.file_processor.pdf_processor import process_pdf
|
||||||
|
from app.services.file_processor.docx_processor import process_docx
|
||||||
|
from app.services.file_processor.excel_processor import process_csv, process_excel
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
class SplitRequest(BaseModel):
|
||||||
|
"""Request model for splitting text"""
|
||||||
|
file_id: Optional[UUID] = None
|
||||||
|
method: str = "recursive"
|
||||||
|
chunk_size: int = 500
|
||||||
|
overlap: int = 50
|
||||||
|
separator: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
class ChunkListResponse(BaseModel):
|
||||||
|
"""Response for chunk list"""
|
||||||
|
chunks: List[ChunkResponse]
|
||||||
|
total: int
|
||||||
|
|
||||||
|
|
||||||
|
def process_file_by_type(file: File) -> str:
|
||||||
|
"""Process file based on its type"""
|
||||||
|
if not file.file_path:
|
||||||
|
raise HTTPException(status_code=400, detail="File path not found")
|
||||||
|
|
||||||
|
processors = {
|
||||||
|
"pdf": process_pdf,
|
||||||
|
"docx": process_docx,
|
||||||
|
"xlsx": process_excel,
|
||||||
|
"csv": process_csv,
|
||||||
|
}
|
||||||
|
|
||||||
|
processor = processors.get(file.file_type)
|
||||||
|
if not processor:
|
||||||
|
# Return raw text for txt, md files
|
||||||
|
with open(file.file_path, 'r', encoding='utf-8') as f:
|
||||||
|
return f.read()
|
||||||
|
|
||||||
|
return processor(file.file_path)
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/split", response_model=dict)
|
||||||
|
async def split_text(
|
||||||
|
project_id: UUID,
|
||||||
|
request: SplitRequest,
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""Split text into chunks"""
|
||||||
|
# Get file
|
||||||
|
if request.file_id:
|
||||||
|
result = await db.execute(
|
||||||
|
select(File).where(File.id == request.file_id, File.project_id == project_id)
|
||||||
|
)
|
||||||
|
file = result.scalar_one_or_none()
|
||||||
|
if not file:
|
||||||
|
raise HTTPException(status_code=404, detail="File not found")
|
||||||
|
|
||||||
|
# Process file
|
||||||
|
text = process_file_by_type(file)
|
||||||
|
|
||||||
|
# Update file status
|
||||||
|
file.status = "processing"
|
||||||
|
await db.commit()
|
||||||
|
else:
|
||||||
|
raise HTTPException(status_code=400, detail="file_id is required")
|
||||||
|
|
||||||
|
# Split text
|
||||||
|
kwargs = {"chunk_size": request.chunk_size, "overlap": request.overlap}
|
||||||
|
if request.method == "custom" and request.separator:
|
||||||
|
kwargs["separator"] = request.separator
|
||||||
|
|
||||||
|
splitter = get_splitter(request.method, **kwargs)
|
||||||
|
split_results = splitter.split(text)
|
||||||
|
|
||||||
|
# Save chunks
|
||||||
|
chunks = []
|
||||||
|
for chunk_data in split_results:
|
||||||
|
db_chunk = Chunk(
|
||||||
|
project_id=project_id,
|
||||||
|
file_id=file.id,
|
||||||
|
name=chunk_data.get("name", f"Chunk {chunk_data['index'] + 1}"),
|
||||||
|
content=chunk_data["content"],
|
||||||
|
word_count=chunk_data.get("word_count", len(chunk_data["content"].split()))
|
||||||
|
)
|
||||||
|
db.add(db_chunk)
|
||||||
|
chunks.append(db_chunk)
|
||||||
|
|
||||||
|
await db.commit()
|
||||||
|
|
||||||
|
# Update file status
|
||||||
|
file.status = "completed"
|
||||||
|
await db.commit()
|
||||||
|
|
||||||
|
return {"chunks": len(chunks), "message": f"Successfully split into {len(chunks)} chunks"}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/", response_model=dict)
|
||||||
|
async def list_chunks(
|
||||||
|
project_id: UUID,
|
||||||
|
file_id: Optional[UUID] = Query(None),
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""List chunks for a project"""
|
||||||
|
query = select(Chunk).where(Chunk.project_id == project_id)
|
||||||
|
|
||||||
|
if file_id:
|
||||||
|
query = query.where(Chunk.file_id == file_id)
|
||||||
|
|
||||||
|
query = query.order_by(Chunk.created_at.desc())
|
||||||
|
|
||||||
|
result = await db.execute(query)
|
||||||
|
chunks = result.scalars().all()
|
||||||
|
|
||||||
|
return {
|
||||||
|
"chunks": [ChunkResponse.model_validate(c) for c in chunks],
|
||||||
|
"total": len(chunks)
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/{chunk_id}", response_model=dict)
|
||||||
|
async def get_chunk(project_id: UUID, chunk_id: UUID, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""Get chunk by ID"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(Chunk).where(Chunk.id == chunk_id, Chunk.project_id == project_id)
|
||||||
|
)
|
||||||
|
chunk = result.scalar_one_or_none()
|
||||||
|
if not chunk:
|
||||||
|
raise HTTPException(status_code=404, detail="Chunk not found")
|
||||||
|
return ChunkResponse.model_validate(chunk)
|
||||||
|
|
||||||
|
|
||||||
|
@router.put("/{chunk_id}", response_model=dict)
|
||||||
|
async def update_chunk(
|
||||||
|
project_id: UUID,
|
||||||
|
chunk_id: UUID,
|
||||||
|
chunk: ChunkCreate,
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""Update chunk"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(Chunk).where(Chunk.id == chunk_id, Chunk.project_id == project_id)
|
||||||
|
)
|
||||||
|
db_chunk = result.scalar_one_or_none()
|
||||||
|
if not db_chunk:
|
||||||
|
raise HTTPException(status_code=404, detail="Chunk not found")
|
||||||
|
|
||||||
|
for key, value in chunk.model_dump(exclude_unset=True).items():
|
||||||
|
setattr(db_chunk, key, value)
|
||||||
|
|
||||||
|
await db.commit()
|
||||||
|
await db.refresh(db_chunk)
|
||||||
|
return ChunkResponse.model_validate(db_chunk)
|
||||||
|
|
||||||
|
|
||||||
|
@router.delete("/{chunk_id}", response_model=dict)
|
||||||
|
async def delete_chunk(project_id: UUID, chunk_id: UUID, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""Delete chunk"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(Chunk).where(Chunk.id == chunk_id, Chunk.project_id == project_id)
|
||||||
|
)
|
||||||
|
chunk = result.scalar_one_or_none()
|
||||||
|
if not chunk:
|
||||||
|
raise HTTPException(status_code=404, detail="Chunk not found")
|
||||||
|
|
||||||
|
await db.delete(chunk)
|
||||||
|
await db.commit()
|
||||||
|
return {"message": "Chunk deleted successfully"}
|
||||||
126
backend/app/api/v1/datasets/__init__.py
Normal file
126
backend/app/api/v1/datasets/__init__.py
Normal file
@@ -0,0 +1,126 @@
|
|||||||
|
"""
|
||||||
|
Datasets API Router
|
||||||
|
"""
|
||||||
|
from typing import List, Optional
|
||||||
|
from uuid import UUID
|
||||||
|
from pydantic import BaseModel
|
||||||
|
from fastapi import APIRouter, Depends, HTTPException, Query
|
||||||
|
from sqlalchemy.ext.asyncio import AsyncSession
|
||||||
|
from sqlalchemy import select, func
|
||||||
|
from app.core.database import get_db
|
||||||
|
from app.models.models import Dataset, Question
|
||||||
|
from app.schemas.base import DatasetCreate, DatasetResponse
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
class ExportRequest(BaseModel):
|
||||||
|
"""Export request schema"""
|
||||||
|
format: str = "alpaca" # alpaca, sharegpt, llama_factory, json
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/", response_model=dict)
|
||||||
|
async def list_datasets(project_id: UUID, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""List datasets for a project"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(Dataset).where(Dataset.project_id == project_id).order_by(Dataset.created_at.desc())
|
||||||
|
)
|
||||||
|
datasets = result.scalars().all()
|
||||||
|
|
||||||
|
# Get question count for each dataset
|
||||||
|
dataset_list = []
|
||||||
|
for dataset in datasets:
|
||||||
|
dataset_data = DatasetResponse.model_validate(dataset)
|
||||||
|
# TODO: Count questions in dataset
|
||||||
|
dataset_data.question_count = 0
|
||||||
|
dataset_list.append(dataset_data)
|
||||||
|
|
||||||
|
return {"datasets": dataset_list}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/", response_model=dict)
|
||||||
|
async def create_dataset(
|
||||||
|
project_id: UUID,
|
||||||
|
dataset: DatasetCreate,
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""Create a new dataset"""
|
||||||
|
db_dataset = Dataset(project_id=project_id, **dataset.model_dump())
|
||||||
|
db.add(db_dataset)
|
||||||
|
await db.commit()
|
||||||
|
await db.refresh(db_dataset)
|
||||||
|
|
||||||
|
return {"id": str(db_dataset.id)}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/{dataset_id}", response_model=dict)
|
||||||
|
async def get_dataset(
|
||||||
|
project_id: UUID,
|
||||||
|
dataset_id: UUID,
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""Get dataset by ID"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(Dataset).where(Dataset.id == dataset_id, Dataset.project_id == project_id)
|
||||||
|
)
|
||||||
|
dataset = result.scalar_one_or_none()
|
||||||
|
if not dataset:
|
||||||
|
raise HTTPException(status_code=404, detail="Dataset not found")
|
||||||
|
|
||||||
|
return DatasetResponse.model_validate(dataset)
|
||||||
|
|
||||||
|
|
||||||
|
@router.delete("/{dataset_id}", response_model=dict)
|
||||||
|
async def delete_dataset(
|
||||||
|
project_id: UUID,
|
||||||
|
dataset_id: UUID,
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""Delete dataset"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(Dataset).where(Dataset.id == dataset_id, Dataset.project_id == project_id)
|
||||||
|
)
|
||||||
|
dataset = result.scalar_one_or_none()
|
||||||
|
if not dataset:
|
||||||
|
raise HTTPException(status_code=404, detail="Dataset not found")
|
||||||
|
|
||||||
|
await db.delete(dataset)
|
||||||
|
await db.commit()
|
||||||
|
|
||||||
|
return {"message": "Dataset deleted successfully"}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/{dataset_id}/export")
|
||||||
|
async def export_dataset(
|
||||||
|
project_id: UUID,
|
||||||
|
dataset_id: UUID,
|
||||||
|
request: ExportRequest,
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""Export dataset in specified format"""
|
||||||
|
# TODO: Implement actual export logic
|
||||||
|
|
||||||
|
# Get dataset
|
||||||
|
result = await db.execute(
|
||||||
|
select(Dataset).where(Dataset.id == dataset_id, Dataset.project_id == project_id)
|
||||||
|
)
|
||||||
|
dataset = result.scalar_one_or_none()
|
||||||
|
if not dataset:
|
||||||
|
raise HTTPException(status_code=404, detail="Dataset not found")
|
||||||
|
|
||||||
|
# Get questions for this dataset (placeholder)
|
||||||
|
# In real implementation, would link questions to datasets
|
||||||
|
|
||||||
|
# Return sample data based on format
|
||||||
|
sample_data = [
|
||||||
|
{
|
||||||
|
"instruction": "这是一个示例指令",
|
||||||
|
"input": "",
|
||||||
|
"output": "这是一个示例输出"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
|
||||||
|
if request.format == "json":
|
||||||
|
return sample_data
|
||||||
|
|
||||||
|
return {"data": sample_data, "format": request.format}
|
||||||
100
backend/app/api/v1/eval/__init__.py
Normal file
100
backend/app/api/v1/eval/__init__.py
Normal file
@@ -0,0 +1,100 @@
|
|||||||
|
"""
|
||||||
|
Evaluation API Router
|
||||||
|
"""
|
||||||
|
from typing import List, Optional
|
||||||
|
from uuid import UUID
|
||||||
|
from pydantic import BaseModel
|
||||||
|
from fastapi import APIRouter, Depends, HTTPException
|
||||||
|
from sqlalchemy.ext.asyncio import AsyncSession
|
||||||
|
from sqlalchemy import select
|
||||||
|
from app.core.database import get_db
|
||||||
|
from app.models.models import EvalDataset, Task
|
||||||
|
from app.schemas.base import EvalDatasetCreate, EvalDatasetResponse, TaskResponse
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
class GenerateEvalRequest(BaseModel):
|
||||||
|
"""Request for generating evaluation dataset"""
|
||||||
|
name: str
|
||||||
|
question_type: str = "mixed"
|
||||||
|
count: int = 50
|
||||||
|
|
||||||
|
|
||||||
|
class RunEvalRequest(BaseModel):
|
||||||
|
"""Request for running evaluation"""
|
||||||
|
model_config_id: Optional[UUID] = None
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/", response_model=dict)
|
||||||
|
async def list_eval_datasets(project_id: UUID, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""List evaluation datasets"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(EvalDataset).where(EvalDataset.project_id == project_id).order_by(EvalDataset.created_at.desc())
|
||||||
|
)
|
||||||
|
datasets = result.scalars().all()
|
||||||
|
|
||||||
|
return {"datasets": [EvalDatasetResponse.model_validate(d) for d in datasets]}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/", response_model=dict)
|
||||||
|
async def create_eval_dataset(
|
||||||
|
project_id: UUID,
|
||||||
|
request: GenerateEvalRequest,
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""Create evaluation dataset"""
|
||||||
|
db_dataset = EvalDataset(
|
||||||
|
project_id=project_id,
|
||||||
|
name=request.name,
|
||||||
|
question_type=request.question_type
|
||||||
|
)
|
||||||
|
db.add(db_dataset)
|
||||||
|
await db.commit()
|
||||||
|
await db.refresh(db_dataset)
|
||||||
|
|
||||||
|
return {"id": str(db_dataset.id)}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/{eval_id}/evaluate", response_model=dict)
|
||||||
|
async def run_evaluation(
|
||||||
|
project_id: UUID,
|
||||||
|
eval_id: UUID,
|
||||||
|
request: RunEvalRequest,
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""Run evaluation on dataset"""
|
||||||
|
# Check dataset exists
|
||||||
|
result = await db.execute(
|
||||||
|
select(EvalDataset).where(EvalDataset.id == eval_id, EvalDataset.project_id == project_id)
|
||||||
|
)
|
||||||
|
dataset = result.scalar_one_or_none()
|
||||||
|
if not dataset:
|
||||||
|
raise HTTPException(status_code=404, detail="Evaluation dataset not found")
|
||||||
|
|
||||||
|
# Create evaluation task
|
||||||
|
task = Task(
|
||||||
|
project_id=project_id,
|
||||||
|
task_type="eval",
|
||||||
|
status="pending"
|
||||||
|
)
|
||||||
|
db.add(task)
|
||||||
|
await db.commit()
|
||||||
|
await db.refresh(task)
|
||||||
|
|
||||||
|
# TODO: Start evaluation in background
|
||||||
|
|
||||||
|
return {"task_id": str(task.id), "message": "Evaluation task started"}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/results", response_model=dict)
|
||||||
|
async def get_eval_results(project_id: UUID, task_id: UUID, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""Get evaluation results"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(Task).where(Task.id == task_id, Task.project_id == project_id)
|
||||||
|
)
|
||||||
|
task = result.scalar_one_or_none()
|
||||||
|
if not task:
|
||||||
|
raise HTTPException(status_code=404, detail="Task not found")
|
||||||
|
|
||||||
|
return TaskResponse.model_validate(task)
|
||||||
110
backend/app/api/v1/files/__init__.py
Normal file
110
backend/app/api/v1/files/__init__.py
Normal file
@@ -0,0 +1,110 @@
|
|||||||
|
"""
|
||||||
|
Files API Router
|
||||||
|
"""
|
||||||
|
import os
|
||||||
|
import aiofiles
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import List
|
||||||
|
from uuid import UUID
|
||||||
|
from fastapi import APIRouter, Depends, HTTPException, UploadFile, File, Form
|
||||||
|
from sqlalchemy.ext.asyncio import AsyncSession
|
||||||
|
from sqlalchemy import select
|
||||||
|
from app.core.database import get_db
|
||||||
|
from app.core.config import get_settings
|
||||||
|
from app.models.models import File
|
||||||
|
from app.schemas.base import FileResponse
|
||||||
|
|
||||||
|
settings = get_settings()
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
# Ensure upload directory exists
|
||||||
|
UPLOAD_DIR = Path(settings.UPLOAD_DIR)
|
||||||
|
UPLOAD_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
|
||||||
|
def get_file_type(filename: str) -> str:
|
||||||
|
"""Get file type from extension"""
|
||||||
|
ext = filename.rsplit('.', 1)[-1].lower() if '.' in filename else ''
|
||||||
|
type_map = {
|
||||||
|
'pdf': 'pdf',
|
||||||
|
'docx': 'docx',
|
||||||
|
'doc': 'docx',
|
||||||
|
'xlsx': 'xlsx',
|
||||||
|
'xls': 'xlsx',
|
||||||
|
'csv': 'csv',
|
||||||
|
'epub': 'epub',
|
||||||
|
'md': 'md',
|
||||||
|
'markdown': 'md',
|
||||||
|
'txt': 'txt'
|
||||||
|
}
|
||||||
|
return type_map.get(ext, 'txt')
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/upload", response_model=dict)
|
||||||
|
async def upload_file(
|
||||||
|
project_id: UUID,
|
||||||
|
file: UploadFile = File(...),
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""Upload a file"""
|
||||||
|
# Save file to disk
|
||||||
|
file_path = UPLOAD_DIR / f"{project_id}_{file.filename}"
|
||||||
|
async with aiofiles.open(file_path, 'wb') as f:
|
||||||
|
content = await file.read()
|
||||||
|
await f.write(content)
|
||||||
|
|
||||||
|
# Create file record
|
||||||
|
db_file = File(
|
||||||
|
project_id=project_id,
|
||||||
|
filename=file.filename,
|
||||||
|
file_type=get_file_type(file.filename),
|
||||||
|
file_path=str(file_path),
|
||||||
|
size=len(content),
|
||||||
|
status="pending"
|
||||||
|
)
|
||||||
|
db.add(db_file)
|
||||||
|
await db.commit()
|
||||||
|
await db.refresh(db_file)
|
||||||
|
|
||||||
|
return {"id": str(db_file.id), "filename": db_file.filename, "status": db_file.status}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/", response_model=dict)
|
||||||
|
async def list_files(project_id: UUID, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""List files for a project"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(File).where(File.project_id == project_id).order_by(File.created_at.desc())
|
||||||
|
)
|
||||||
|
files = result.scalars().all()
|
||||||
|
return {"files": [FileResponse.model_validate(f) for f in files]}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/{file_id}", response_model=dict)
|
||||||
|
async def get_file(project_id: UUID, file_id: UUID, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""Get file by ID"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(File).where(File.id == file_id, File.project_id == project_id)
|
||||||
|
)
|
||||||
|
file = result.scalar_one_or_none()
|
||||||
|
if not file:
|
||||||
|
raise HTTPException(status_code=404, detail="File not found")
|
||||||
|
return FileResponse.model_validate(file)
|
||||||
|
|
||||||
|
|
||||||
|
@router.delete("/{file_id}", response_model=dict)
|
||||||
|
async def delete_file(project_id: UUID, file_id: UUID, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""Delete file"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(File).where(File.id == file_id, File.project_id == project_id)
|
||||||
|
)
|
||||||
|
file = result.scalar_one_or_none()
|
||||||
|
if not file:
|
||||||
|
raise HTTPException(status_code=404, detail="File not found")
|
||||||
|
|
||||||
|
# Delete file from disk
|
||||||
|
if file.file_path and os.path.exists(file.file_path):
|
||||||
|
os.remove(file.file_path)
|
||||||
|
|
||||||
|
await db.delete(file)
|
||||||
|
await db.commit()
|
||||||
|
return {"message": "File deleted successfully"}
|
||||||
74
backend/app/api/v1/projects/__init__.py
Normal file
74
backend/app/api/v1/projects/__init__.py
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
"""
|
||||||
|
Projects API Router
|
||||||
|
"""
|
||||||
|
from typing import List
|
||||||
|
from uuid import UUID
|
||||||
|
from fastapi import APIRouter, Depends, HTTPException
|
||||||
|
from sqlalchemy.ext.asyncio import AsyncSession
|
||||||
|
from sqlalchemy import select
|
||||||
|
from app.core.database import get_db
|
||||||
|
from app.models.models import Project
|
||||||
|
from app.schemas.base import (
|
||||||
|
ProjectCreate,
|
||||||
|
ProjectUpdate,
|
||||||
|
ProjectResponse
|
||||||
|
)
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/", response_model=dict)
|
||||||
|
async def list_projects(db: AsyncSession = Depends(get_db)):
|
||||||
|
"""List all projects"""
|
||||||
|
result = await db.execute(select(Project).order_by(Project.created_at.desc()))
|
||||||
|
projects = result.scalars().all()
|
||||||
|
return {"projects": [ProjectResponse.model_validate(p) for p in projects]}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/", response_model=dict)
|
||||||
|
async def create_project(project: ProjectCreate, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""Create a new project"""
|
||||||
|
db_project = Project(**project.model_dump())
|
||||||
|
db.add(db_project)
|
||||||
|
await db.commit()
|
||||||
|
await db.refresh(db_project)
|
||||||
|
return {"id": str(db_project.id)}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/{project_id}", response_model=dict)
|
||||||
|
async def get_project(project_id: UUID, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""Get project by ID"""
|
||||||
|
result = await db.execute(select(Project).where(Project.id == project_id))
|
||||||
|
project = result.scalar_one_or_none()
|
||||||
|
if not project:
|
||||||
|
raise HTTPException(status_code=404, detail="Project not found")
|
||||||
|
return ProjectResponse.model_validate(project)
|
||||||
|
|
||||||
|
|
||||||
|
@router.put("/{project_id}", response_model=dict)
|
||||||
|
async def update_project(project_id: UUID, project: ProjectUpdate, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""Update project"""
|
||||||
|
result = await db.execute(select(Project).where(Project.id == project_id))
|
||||||
|
db_project = result.scalar_one_or_none()
|
||||||
|
if not db_project:
|
||||||
|
raise HTTPException(status_code=404, detail="Project not found")
|
||||||
|
|
||||||
|
for key, value in project.model_dump(exclude_unset=True).items():
|
||||||
|
setattr(db_project, key, value)
|
||||||
|
|
||||||
|
await db.commit()
|
||||||
|
await db.refresh(db_project)
|
||||||
|
return ProjectResponse.model_validate(db_project)
|
||||||
|
|
||||||
|
|
||||||
|
@router.delete("/{project_id}", response_model=dict)
|
||||||
|
async def delete_project(project_id: UUID, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""Delete project"""
|
||||||
|
result = await db.execute(select(Project).where(Project.id == project_id))
|
||||||
|
project = result.scalar_one_or_none()
|
||||||
|
if not project:
|
||||||
|
raise HTTPException(status_code=404, detail="Project not found")
|
||||||
|
|
||||||
|
await db.delete(project)
|
||||||
|
await db.commit()
|
||||||
|
return {"message": "Project deleted successfully"}
|
||||||
122
backend/app/api/v1/questions/__init__.py
Normal file
122
backend/app/api/v1/questions/__init__.py
Normal file
@@ -0,0 +1,122 @@
|
|||||||
|
"""
|
||||||
|
Questions API Router
|
||||||
|
"""
|
||||||
|
from typing import List, Optional
|
||||||
|
from uuid import UUID
|
||||||
|
from pydantic import BaseModel
|
||||||
|
from fastapi import APIRouter, Depends, HTTPException, Query
|
||||||
|
from sqlalchemy.ext.asyncio import AsyncSession
|
||||||
|
from sqlalchemy import select
|
||||||
|
from app.core.database import get_db
|
||||||
|
from app.models.models import Question, Chunk
|
||||||
|
from app.schemas.base import QuestionCreate, QuestionResponse
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
class GenerateRequest(BaseModel):
|
||||||
|
"""Request model for generating questions"""
|
||||||
|
chunk_ids: List[UUID] = []
|
||||||
|
count: int = 5
|
||||||
|
question_types: List[str] = ["fact", "summary"]
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/generate", response_model=dict)
|
||||||
|
async def generate_questions(
|
||||||
|
project_id: UUID,
|
||||||
|
request: GenerateRequest,
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""Generate questions from chunks using LLM"""
|
||||||
|
# TODO: Implement LLM-based question generation
|
||||||
|
# This is a placeholder that creates sample questions
|
||||||
|
|
||||||
|
if not request.chunk_ids:
|
||||||
|
raise HTTPException(status_code=400, detail="chunk_ids is required")
|
||||||
|
|
||||||
|
# Get chunks
|
||||||
|
result = await db.execute(
|
||||||
|
select(Chunk).where(Chunk.id.in_(request.chunk_ids), Chunk.project_id == project_id)
|
||||||
|
)
|
||||||
|
chunks = result.scalars().all()
|
||||||
|
|
||||||
|
if not chunks:
|
||||||
|
raise HTTPException(status_code=404, detail="No chunks found")
|
||||||
|
|
||||||
|
# Create sample questions (placeholder)
|
||||||
|
created_questions = []
|
||||||
|
for chunk in chunks:
|
||||||
|
for i in range(request.count):
|
||||||
|
question = Question(
|
||||||
|
project_id=project_id,
|
||||||
|
chunk_id=chunk.id,
|
||||||
|
content=f"这是关于「{chunk.name}」的问题 {i+1}?",
|
||||||
|
answer=f"这是问题 {i+1} 的答案。",
|
||||||
|
question_type=request.question_types[0] if request.question_types else "fact",
|
||||||
|
source="generated"
|
||||||
|
)
|
||||||
|
db.add(question)
|
||||||
|
created_questions.append(question)
|
||||||
|
|
||||||
|
await db.commit()
|
||||||
|
|
||||||
|
return {
|
||||||
|
"questions": len(created_questions),
|
||||||
|
"message": f"Successfully generated {len(created_questions)} questions"
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/", response_model=dict)
|
||||||
|
async def list_questions(
|
||||||
|
project_id: UUID,
|
||||||
|
chunk_id: Optional[UUID] = Query(None),
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""List questions for a project"""
|
||||||
|
query = select(Question).where(Question.project_id == project_id)
|
||||||
|
|
||||||
|
if chunk_id:
|
||||||
|
query = query.where(Question.chunk_id == chunk_id)
|
||||||
|
|
||||||
|
result = await db.execute(query)
|
||||||
|
questions = result.scalars().all()
|
||||||
|
|
||||||
|
return {"questions": [QuestionResponse.model_validate(q) for q in questions]}
|
||||||
|
|
||||||
|
|
||||||
|
@router.put("/{question_id}", response_model=dict)
|
||||||
|
async def update_question(
|
||||||
|
project_id: UUID,
|
||||||
|
question_id: UUID,
|
||||||
|
question: QuestionCreate,
|
||||||
|
db: AsyncSession = Depends(get_db)
|
||||||
|
):
|
||||||
|
"""Update question"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(Question).where(Question.id == question_id, Question.project_id == project_id)
|
||||||
|
)
|
||||||
|
db_question = result.scalar_one_or_none()
|
||||||
|
if not db_question:
|
||||||
|
raise HTTPException(status_code=404, detail="Question not found")
|
||||||
|
|
||||||
|
for key, value in question.model_dump(exclude_unset=True).items():
|
||||||
|
setattr(db_question, key, value)
|
||||||
|
|
||||||
|
await db.commit()
|
||||||
|
await db.refresh(db_question)
|
||||||
|
return QuestionResponse.model_validate(db_question)
|
||||||
|
|
||||||
|
|
||||||
|
@router.delete("/{question_id}", response_model=dict)
|
||||||
|
async def delete_question(project_id: UUID, question_id: UUID, db: AsyncSession = Depends(get_db)):
|
||||||
|
"""Delete question"""
|
||||||
|
result = await db.execute(
|
||||||
|
select(Question).where(Question.id == question_id, Question.project_id == project_id)
|
||||||
|
)
|
||||||
|
question = result.scalar_one_or_none()
|
||||||
|
if not question:
|
||||||
|
raise HTTPException(status_code=404, detail="Question not found")
|
||||||
|
|
||||||
|
await db.delete(question)
|
||||||
|
await db.commit()
|
||||||
|
return {"message": "Question deleted successfully"}
|
||||||
3
backend/app/core/__init__.py
Normal file
3
backend/app/core/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
"""
|
||||||
|
Core module initialization
|
||||||
|
"""
|
||||||
49
backend/app/core/config.py
Normal file
49
backend/app/core/config.py
Normal file
@@ -0,0 +1,49 @@
|
|||||||
|
"""
|
||||||
|
Application Configuration
|
||||||
|
"""
|
||||||
|
|
||||||
|
from functools import lru_cache
|
||||||
|
from pydantic_settings import BaseSettings
|
||||||
|
from pydantic import Field
|
||||||
|
|
||||||
|
|
||||||
|
class Settings(BaseSettings):
|
||||||
|
"""Application settings"""
|
||||||
|
|
||||||
|
# App
|
||||||
|
APP_NAME: str = "YG-Dataset"
|
||||||
|
DEBUG: bool = True
|
||||||
|
HOST: str = "0.0.0.0"
|
||||||
|
PORT: int = 8000
|
||||||
|
|
||||||
|
# Database - 使用 SQLite 进行开发/测试
|
||||||
|
# 生产环境可切换为 PostgreSQL
|
||||||
|
DATABASE_URL: str = Field(
|
||||||
|
default="sqlite:///./ygdataset.db",
|
||||||
|
description="Database connection URL (sqlite:// or postgresql+asyncpg://)"
|
||||||
|
)
|
||||||
|
DATABASE_URL_SYNC: str = Field(
|
||||||
|
default="sqlite:///./ygdataset.db",
|
||||||
|
description="Synchronous database connection URL"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Redis
|
||||||
|
REDIS_URL: str = "redis://localhost:6379/0"
|
||||||
|
|
||||||
|
# File Storage
|
||||||
|
UPLOAD_DIR: str = "./uploads"
|
||||||
|
MAX_FILE_SIZE: int = 100 * 1024 * 1024 # 100MB
|
||||||
|
|
||||||
|
# LLM Settings
|
||||||
|
DEFAULT_MODEL_PROVIDER: str = "openai"
|
||||||
|
DEFAULT_MODEL_NAME: str = "gpt-4o-mini"
|
||||||
|
|
||||||
|
class Config:
|
||||||
|
env_file = ".env"
|
||||||
|
extra = "allow"
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache()
|
||||||
|
def get_settings() -> Settings:
|
||||||
|
"""Get cached settings"""
|
||||||
|
return Settings()
|
||||||
68
backend/app/core/database.py
Normal file
68
backend/app/core/database.py
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
"""
|
||||||
|
Database Configuration and Session Management
|
||||||
|
支持 SQLite 和 PostgreSQL
|
||||||
|
"""
|
||||||
|
|
||||||
|
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine, async_sessionmaker
|
||||||
|
from sqlalchemy.orm import DeclarativeBase
|
||||||
|
from sqlalchemy import create_engine
|
||||||
|
from app.core.config import get_settings
|
||||||
|
|
||||||
|
settings = get_settings()
|
||||||
|
|
||||||
|
|
||||||
|
def get_engine_config():
|
||||||
|
"""根据数据库类型返回引擎配置"""
|
||||||
|
if settings.DATABASE_URL.startswith("sqlite"):
|
||||||
|
return {"echo": settings.DEBUG}
|
||||||
|
else:
|
||||||
|
return {
|
||||||
|
"echo": settings.DEBUG,
|
||||||
|
"pool_pre_ping": True,
|
||||||
|
"pool_size": 10,
|
||||||
|
"max_overflow": 20,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# Async engine for FastAPI
|
||||||
|
async_engine = create_async_engine(
|
||||||
|
settings.DATABASE_URL,
|
||||||
|
**get_engine_config()
|
||||||
|
)
|
||||||
|
|
||||||
|
# Sync engine for migrations
|
||||||
|
sync_engine = create_engine(
|
||||||
|
settings.DATABASE_URL_SYNC,
|
||||||
|
echo=settings.DEBUG,
|
||||||
|
pool_pre_ping=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# Async session factory
|
||||||
|
AsyncSessionLocal = async_sessionmaker(
|
||||||
|
async_engine,
|
||||||
|
class_=AsyncSession,
|
||||||
|
expire_on_commit=False,
|
||||||
|
autocommit=False,
|
||||||
|
autoflush=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class Base(DeclarativeBase):
|
||||||
|
"""Base class for all models"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
async def init_db():
|
||||||
|
"""Initialize database tables"""
|
||||||
|
async with async_engine.begin() as conn:
|
||||||
|
await conn.run_sync(Base.metadata.create_all)
|
||||||
|
|
||||||
|
|
||||||
|
async def get_db() -> AsyncSession:
|
||||||
|
"""Dependency for getting database session"""
|
||||||
|
async with AsyncSessionLocal() as session:
|
||||||
|
try:
|
||||||
|
yield session
|
||||||
|
finally:
|
||||||
|
await session.close()
|
||||||
58
backend/app/main.py
Normal file
58
backend/app/main.py
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
"""
|
||||||
|
YG-Dataset Backend Application
|
||||||
|
FastAPI-based API server for dataset generation platform
|
||||||
|
"""
|
||||||
|
|
||||||
|
from contextlib import asynccontextmanager
|
||||||
|
from fastapi import FastAPI
|
||||||
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
|
||||||
|
from app.api.v1 import api_router
|
||||||
|
from app.core.config import settings
|
||||||
|
from app.core.database import init_db
|
||||||
|
|
||||||
|
|
||||||
|
@asynccontextmanager
|
||||||
|
async def lifespan(app: FastAPI):
|
||||||
|
"""Application lifespan events"""
|
||||||
|
# Startup
|
||||||
|
await init_db()
|
||||||
|
yield
|
||||||
|
# Shutdown
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
app = FastAPI(
|
||||||
|
title="YG-Dataset API",
|
||||||
|
description="Dataset Generation Platform API",
|
||||||
|
version="1.0.0",
|
||||||
|
lifespan=lifespan,
|
||||||
|
)
|
||||||
|
|
||||||
|
# CORS
|
||||||
|
app.add_middleware(
|
||||||
|
CORSMiddleware,
|
||||||
|
allow_origins=["*"],
|
||||||
|
allow_credentials=True,
|
||||||
|
allow_methods=["*"],
|
||||||
|
allow_headers=["*"],
|
||||||
|
)
|
||||||
|
|
||||||
|
# Include API routes
|
||||||
|
app.include_router(api_router, prefix="/api/v1")
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/health")
|
||||||
|
async def health_check():
|
||||||
|
"""Health check endpoint"""
|
||||||
|
return {"status": "healthy", "version": "1.0.0"}
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import uvicorn
|
||||||
|
uvicorn.run(
|
||||||
|
"app.main:app",
|
||||||
|
host=settings.HOST,
|
||||||
|
port=settings.PORT,
|
||||||
|
reload=settings.DEBUG,
|
||||||
|
)
|
||||||
3
backend/app/models/__init__.py
Normal file
3
backend/app/models/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
"""
|
||||||
|
Database Models
|
||||||
|
"""
|
||||||
19
backend/app/models/base.py
Normal file
19
backend/app/models/base.py
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
"""
|
||||||
|
Base Model with UUID support
|
||||||
|
"""
|
||||||
|
import uuid
|
||||||
|
from datetime import datetime
|
||||||
|
from sqlalchemy import Column, DateTime
|
||||||
|
from sqlalchemy.dialects.postgresql import UUID
|
||||||
|
from app.core.database import Base
|
||||||
|
|
||||||
|
|
||||||
|
class TimestampMixin:
|
||||||
|
"""Mixin for created_at and updated_at timestamps"""
|
||||||
|
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
|
||||||
|
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=False)
|
||||||
|
|
||||||
|
|
||||||
|
class UUIDMixin:
|
||||||
|
"""Mixin for UUID primary key"""
|
||||||
|
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4, index=True)
|
||||||
161
backend/app/models/models.py
Normal file
161
backend/app/models/models.py
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
"""
|
||||||
|
Database Models for YG-Dataset
|
||||||
|
"""
|
||||||
|
from sqlalchemy import Column, String, Text, Integer, BigInteger, ForeignKey, JSON
|
||||||
|
from sqlalchemy.dialects.postgresql import UUID
|
||||||
|
from sqlalchemy.orm import relationship
|
||||||
|
from app.core.database import Base
|
||||||
|
from app.models.base import UUIDMixin, TimestampMixin
|
||||||
|
|
||||||
|
|
||||||
|
class Project(Base, UUIDMixin, TimestampMixin):
|
||||||
|
"""Project model"""
|
||||||
|
__tablename__ = "projects"
|
||||||
|
|
||||||
|
name = Column(String(255), nullable=False)
|
||||||
|
description = Column(Text)
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
files = relationship("File", back_populates="project", cascade="all, delete-orphan")
|
||||||
|
chunks = relationship("Chunk", back_populates="project", cascade="all, delete-orphan")
|
||||||
|
tags = relationship("Tag", back_populates="project", cascade="all, delete-orphan")
|
||||||
|
datasets = relationship("Dataset", back_populates="project", cascade="all, delete-orphan")
|
||||||
|
eval_datasets = relationship("EvalDataset", back_populates="project", cascade="all, delete-orphan")
|
||||||
|
model_configs = relationship("ModelConfig", back_populates="project", cascade="all, delete-orphan")
|
||||||
|
tasks = relationship("Task", back_populates="project", cascade="all, delete-orphan")
|
||||||
|
|
||||||
|
|
||||||
|
class File(Base, UUIDMixin, TimestampMixin):
|
||||||
|
"""File model for uploaded documents"""
|
||||||
|
__tablename__ = "files"
|
||||||
|
|
||||||
|
project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
|
||||||
|
filename = Column(String(255), nullable=False)
|
||||||
|
file_type = Column(String(50), nullable=False) # pdf, docx, xlsx, csv, epub, md, txt
|
||||||
|
file_path = Column(String(500))
|
||||||
|
size = Column(BigInteger) # file size in bytes
|
||||||
|
status = Column(String(20), default="pending") # pending, processing, completed, failed
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
project = relationship("Project", back_populates="files")
|
||||||
|
chunks = relationship("Chunk", back_populates="file", cascade="all, delete-orphan")
|
||||||
|
|
||||||
|
|
||||||
|
class Chunk(Base, UUIDMixin, TimestampMixin):
|
||||||
|
"""Text chunk model after splitting"""
|
||||||
|
__tablename__ = "chunks"
|
||||||
|
|
||||||
|
project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
|
||||||
|
file_id = Column(UUID(as_uuid=True), ForeignKey("files.id", ondelete="CASCADE"))
|
||||||
|
name = Column(String(255))
|
||||||
|
content = Column(Text, nullable=False)
|
||||||
|
summary = Column(Text)
|
||||||
|
word_count = Column(Integer)
|
||||||
|
metadata = Column(JSON) # store additional info like headings, page numbers
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
project = relationship("Project", back_populates="chunks")
|
||||||
|
file = relationship("File", back_populates="chunks")
|
||||||
|
questions = relationship("Question", back_populates="chunk", cascade="all, delete-orphan")
|
||||||
|
chunk_tags = relationship("ChunkTag", back_populates="chunk", cascade="all, delete-orphan")
|
||||||
|
|
||||||
|
|
||||||
|
class Tag(Base, UUIDMixin, TimestampMixin):
|
||||||
|
"""Tag/Label model for categorizing content"""
|
||||||
|
__tablename__ = "tags"
|
||||||
|
|
||||||
|
project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
|
||||||
|
label = Column(String(255), nullable=False)
|
||||||
|
parent_id = Column(UUID(as_uuid=True), ForeignKey("tags.id", ondelete="CASCADE"))
|
||||||
|
color = Column(String(20)) # hex color code
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
project = relationship("Project", back_populates="tags")
|
||||||
|
parent = relationship("Tag", remote_side="Tag.id", back_populates="children")
|
||||||
|
children = relationship("Tag", back_populates="parent")
|
||||||
|
chunk_tags = relationship("ChunkTag", back_populates="tag")
|
||||||
|
|
||||||
|
|
||||||
|
class ChunkTag(Base, UUIDMixin):
|
||||||
|
"""Many-to-many relationship between chunks and tags"""
|
||||||
|
__tablename__ = "chunk_tags"
|
||||||
|
|
||||||
|
chunk_id = Column(UUID(as_uuid=True), ForeignKey("chunks.id", ondelete="CASCADE"), nullable=False)
|
||||||
|
tag_id = Column(UUID(as_uuid=True), ForeignKey("tags.id", ondelete="CASCADE"), nullable=False)
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
chunk = relationship("Chunk", back_populates="chunk_tags")
|
||||||
|
tag = relationship("Tag", back_populates="chunk_tags")
|
||||||
|
|
||||||
|
|
||||||
|
class Question(Base, UUIDMixin, TimestampMixin):
|
||||||
|
"""Question/QA pair model"""
|
||||||
|
__tablename__ = "questions"
|
||||||
|
|
||||||
|
project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
|
||||||
|
chunk_id = Column(UUID(as_uuid=True), ForeignKey("chunks.id", ondelete="CASCADE"))
|
||||||
|
content = Column(Text, nullable=False) # question content
|
||||||
|
answer = Column(Text) # answer content
|
||||||
|
question_type = Column(String(50)) # fact, summary, reasoning, etc.
|
||||||
|
source = Column(String(50), default="manual") # manual, generated
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
project = relationship("Project")
|
||||||
|
chunk = relationship("Chunk", back_populates="questions")
|
||||||
|
|
||||||
|
|
||||||
|
class Dataset(Base, UUIDMixin, TimestampMixin):
|
||||||
|
"""Dataset model"""
|
||||||
|
__tablename__ = "datasets"
|
||||||
|
|
||||||
|
project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
|
||||||
|
name = Column(String(255), nullable=False)
|
||||||
|
description = Column(Text)
|
||||||
|
dataset_type = Column(String(50)) # qa, conversation, instruction
|
||||||
|
metadata = Column(JSON)
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
project = relationship("Project", back_populates="datasets")
|
||||||
|
|
||||||
|
|
||||||
|
class EvalDataset(Base, UUIDMixin, TimestampMixin):
|
||||||
|
"""Evaluation dataset model"""
|
||||||
|
__tablename__ = "eval_datasets"
|
||||||
|
|
||||||
|
project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
|
||||||
|
name = Column(String(255), nullable=False)
|
||||||
|
question_type = Column(String(50)) # mixed, fact, reasoning
|
||||||
|
metadata = Column(JSON)
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
project = relationship("Project", back_populates="eval_datasets")
|
||||||
|
|
||||||
|
|
||||||
|
class ModelConfig(Base, UUIDMixin, TimestampMixin):
|
||||||
|
"""Model configuration for LLM providers"""
|
||||||
|
__tablename__ = "model_configs"
|
||||||
|
|
||||||
|
project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
|
||||||
|
provider = Column(String(50), nullable=False) # openai, anthropic, ollama, custom
|
||||||
|
model_name = Column(String(100))
|
||||||
|
api_key = Column(String(500))
|
||||||
|
api_base = Column(String(500))
|
||||||
|
is_default = Column(String(10), default="false")
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
project = relationship("Project", back_populates="model_configs")
|
||||||
|
|
||||||
|
|
||||||
|
class Task(Base, UUIDMixin, TimestampMixin):
|
||||||
|
"""Task model for background jobs"""
|
||||||
|
__tablename__ = "tasks"
|
||||||
|
|
||||||
|
project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"))
|
||||||
|
task_type = Column(String(50)) # split, generate, eval, export
|
||||||
|
status = Column(String(20), default="pending") # pending, running, completed, failed
|
||||||
|
progress = Column(Integer, default=0) # 0-100
|
||||||
|
result = Column(JSON)
|
||||||
|
error = Column(Text)
|
||||||
|
|
||||||
|
# Relationships
|
||||||
|
project = relationship("Project", back_populates="tasks")
|
||||||
3
backend/app/schemas/__init__.py
Normal file
3
backend/app/schemas/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
"""
|
||||||
|
Pydantic Schemas
|
||||||
|
"""
|
||||||
170
backend/app/schemas/base.py
Normal file
170
backend/app/schemas/base.py
Normal file
@@ -0,0 +1,170 @@
|
|||||||
|
"""
|
||||||
|
Base Pydantic schemas
|
||||||
|
"""
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Optional, Any
|
||||||
|
from uuid import UUID
|
||||||
|
from pydantic import BaseModel, ConfigDict
|
||||||
|
|
||||||
|
|
||||||
|
class TimestampMixin(BaseModel):
|
||||||
|
"""Mixin for timestamps"""
|
||||||
|
created_at: Optional[datetime] = None
|
||||||
|
updated_at: Optional[datetime] = None
|
||||||
|
|
||||||
|
|
||||||
|
class UUIDMixin(BaseModel):
|
||||||
|
"""Mixin for UUID"""
|
||||||
|
model_config = ConfigDict(from_attributes=True)
|
||||||
|
|
||||||
|
id: UUID
|
||||||
|
|
||||||
|
|
||||||
|
class ProjectBase(BaseModel):
|
||||||
|
"""Base project schema"""
|
||||||
|
name: str
|
||||||
|
description: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
class ProjectCreate(ProjectBase):
|
||||||
|
"""Project create schema"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class ProjectUpdate(ProjectBase):
|
||||||
|
"""Project update schema"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class ProjectResponse(ProjectBase, UUIDMixin, TimestampMixin):
|
||||||
|
"""Project response schema"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class FileBase(BaseModel):
|
||||||
|
"""Base file schema"""
|
||||||
|
filename: str
|
||||||
|
file_type: str
|
||||||
|
size: Optional[int] = None
|
||||||
|
|
||||||
|
|
||||||
|
class FileResponse(FileBase, UUIDMixin, TimestampMixin):
|
||||||
|
"""File response schema"""
|
||||||
|
status: str
|
||||||
|
|
||||||
|
|
||||||
|
class ChunkBase(BaseModel):
|
||||||
|
"""Base chunk schema"""
|
||||||
|
name: Optional[str] = None
|
||||||
|
content: str
|
||||||
|
summary: Optional[str] = None
|
||||||
|
word_count: Optional[int] = None
|
||||||
|
|
||||||
|
|
||||||
|
class ChunkCreate(ChunkBase):
|
||||||
|
"""Chunk create schema"""
|
||||||
|
file_id: Optional[UUID] = None
|
||||||
|
|
||||||
|
|
||||||
|
class ChunkResponse(ChunkBase, UUIDMixin, TimestampMixin):
|
||||||
|
"""Chunk response schema"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class QuestionBase(BaseModel):
|
||||||
|
"""Base question schema"""
|
||||||
|
content: str
|
||||||
|
answer: Optional[str] = None
|
||||||
|
question_type: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
class QuestionCreate(QuestionBase):
|
||||||
|
"""Question create schema"""
|
||||||
|
chunk_id: Optional[UUID] = None
|
||||||
|
|
||||||
|
|
||||||
|
class QuestionResponse(QuestionBase, UUIDMixin, TimestampMixin):
|
||||||
|
"""Question response schema"""
|
||||||
|
source: str
|
||||||
|
|
||||||
|
|
||||||
|
class DatasetBase(BaseModel):
|
||||||
|
"""Base dataset schema"""
|
||||||
|
name: str
|
||||||
|
description: Optional[str] = None
|
||||||
|
dataset_type: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
class DatasetCreate(DatasetBase):
|
||||||
|
"""Dataset create schema"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class DatasetResponse(DatasetBase, UUIDMixin, TimestampMixin):
|
||||||
|
"""Dataset response schema"""
|
||||||
|
question_count: Optional[int] = None
|
||||||
|
|
||||||
|
|
||||||
|
class EvalDatasetBase(BaseModel):
|
||||||
|
"""Base eval dataset schema"""
|
||||||
|
name: str
|
||||||
|
question_type: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
class EvalDatasetCreate(EvalDatasetBase):
|
||||||
|
"""Eval dataset create schema"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class EvalDatasetResponse(EvalDatasetBase, UUIDMixin, TimestampMixin):
|
||||||
|
"""Eval dataset response schema"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class TagBase(BaseModel):
|
||||||
|
"""Base tag schema"""
|
||||||
|
label: str
|
||||||
|
parent_id: Optional[UUID] = None
|
||||||
|
color: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
class TagCreate(TagBase):
|
||||||
|
"""Tag create schema"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class TagResponse(TagBase, UUIDMixin, TimestampMixin):
|
||||||
|
"""Tag response schema"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class ModelConfigBase(BaseModel):
|
||||||
|
"""Base model config schema"""
|
||||||
|
provider: str
|
||||||
|
model_name: Optional[str] = None
|
||||||
|
api_key: Optional[str] = None
|
||||||
|
api_base: Optional[str] = None
|
||||||
|
is_default: Optional[str] = "false"
|
||||||
|
|
||||||
|
|
||||||
|
class ModelConfigCreate(ModelConfigBase):
|
||||||
|
"""Model config create schema"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class ModelConfigResponse(ModelConfigBase, UUIDMixin, TimestampMixin):
|
||||||
|
"""Model config response schema"""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class TaskBase(BaseModel):
|
||||||
|
"""Base task schema"""
|
||||||
|
task_type: str
|
||||||
|
status: Optional[str] = "pending"
|
||||||
|
progress: Optional[int] = 0
|
||||||
|
|
||||||
|
|
||||||
|
class TaskResponse(TaskBase, UUIDMixin, TimestampMixin):
|
||||||
|
"""Task response schema"""
|
||||||
|
result: Optional[Any] = None
|
||||||
|
error: Optional[str] = None
|
||||||
3
backend/app/services/__init__.py
Normal file
3
backend/app/services/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
"""
|
||||||
|
Services module
|
||||||
|
"""
|
||||||
3
backend/app/services/file_processor/__init__.py
Normal file
3
backend/app/services/file_processor/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
"""
|
||||||
|
File Processing Services
|
||||||
|
"""
|
||||||
53
backend/app/services/file_processor/docx_processor.py
Normal file
53
backend/app/services/file_processor/docx_processor.py
Normal file
@@ -0,0 +1,53 @@
|
|||||||
|
"""
|
||||||
|
DOCX Text Extractor
|
||||||
|
"""
|
||||||
|
from docx import Document
|
||||||
|
from typing import Dict, List
|
||||||
|
|
||||||
|
|
||||||
|
class DOCXProcessor:
|
||||||
|
"""Extract text from DOCX files"""
|
||||||
|
|
||||||
|
def extract_text(self, file_path: str) -> str:
|
||||||
|
"""Extract all text from DOCX"""
|
||||||
|
doc = Document(file_path)
|
||||||
|
text_parts = []
|
||||||
|
|
||||||
|
for para in doc.paragraphs:
|
||||||
|
if para.text.strip():
|
||||||
|
text_parts.append(para.text)
|
||||||
|
|
||||||
|
# Also extract text from tables
|
||||||
|
for table in doc.tables:
|
||||||
|
for row in table.rows:
|
||||||
|
for cell in row.cells:
|
||||||
|
if cell.text.strip():
|
||||||
|
text_parts.append(cell.text)
|
||||||
|
|
||||||
|
return "\n\n".join(text_parts)
|
||||||
|
|
||||||
|
def extract_with_metadata(self, file_path: str) -> Dict:
|
||||||
|
"""Extract text with DOCX metadata"""
|
||||||
|
doc = Document(file_path)
|
||||||
|
|
||||||
|
result = {
|
||||||
|
"text": self.extract_text(file_path),
|
||||||
|
"paragraphs": len(doc.paragraphs),
|
||||||
|
"tables": len(doc.tables),
|
||||||
|
"sections": len(doc.sections),
|
||||||
|
"metadata": {
|
||||||
|
"author": doc.core_properties.author,
|
||||||
|
"title": doc.core_properties.title,
|
||||||
|
"subject": doc.core_properties.subject,
|
||||||
|
"created": doc.core_properties.created,
|
||||||
|
"modified": doc.core_properties.modified
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def process_docx(file_path: str) -> str:
|
||||||
|
"""Process DOCX file and return text"""
|
||||||
|
processor = DOCXProcessor()
|
||||||
|
return processor.extract_text(file_path)
|
||||||
66
backend/app/services/file_processor/excel_processor.py
Normal file
66
backend/app/services/file_processor/excel_processor.py
Normal file
@@ -0,0 +1,66 @@
|
|||||||
|
"""
|
||||||
|
Excel/CSV Text Extractor
|
||||||
|
"""
|
||||||
|
import pandas as pd
|
||||||
|
from typing import Dict, List
|
||||||
|
|
||||||
|
|
||||||
|
class ExcelProcessor:
|
||||||
|
"""Extract text from Excel and CSV files"""
|
||||||
|
|
||||||
|
def extract_csv(self, file_path: str) -> str:
|
||||||
|
"""Extract text from CSV file"""
|
||||||
|
df = pd.read_csv(file_path)
|
||||||
|
return self._dataframe_to_text(df)
|
||||||
|
|
||||||
|
def extract_excel(self, file_path: str, sheet_name: str = None) -> str:
|
||||||
|
"""Extract text from Excel file"""
|
||||||
|
if sheet_name:
|
||||||
|
df = pd.read_excel(file_path, sheet_name=sheet_name)
|
||||||
|
return self._dataframe_to_text(df)
|
||||||
|
else:
|
||||||
|
# Read all sheets
|
||||||
|
sheets = pd.read_excel(file_path, sheet_name=None)
|
||||||
|
text_parts = []
|
||||||
|
for sheet_name, df in sheets.items():
|
||||||
|
text_parts.append(f"=== Sheet: {sheet_name} ===\n")
|
||||||
|
text_parts.append(self._dataframe_to_text(df))
|
||||||
|
return "\n\n".join(text_parts)
|
||||||
|
|
||||||
|
def _dataframe_to_text(self, df: pd.DataFrame) -> str:
|
||||||
|
"""Convert DataFrame to readable text"""
|
||||||
|
text_parts = []
|
||||||
|
|
||||||
|
# Add column headers
|
||||||
|
if not df.empty:
|
||||||
|
text_parts.append(" | ".join(str(col) for col in df.columns))
|
||||||
|
text_parts.append("-" * len(text_parts[-1]))
|
||||||
|
|
||||||
|
# Add rows
|
||||||
|
for _, row in df.iterrows():
|
||||||
|
row_text = " | ".join(str(val) for val in row.values)
|
||||||
|
text_parts.append(row_text)
|
||||||
|
|
||||||
|
return "\n".join(text_parts)
|
||||||
|
|
||||||
|
def extract_all_sheets(self, file_path: str) -> Dict[str, str]:
|
||||||
|
"""Extract all sheets from Excel file"""
|
||||||
|
sheets = pd.read_excel(file_path, sheet_name=None)
|
||||||
|
return {name: self._dataframe_to_text(df) for name, df in sheets.items()}
|
||||||
|
|
||||||
|
def get_sheet_names(self, file_path: str) -> List[str]:
|
||||||
|
"""Get all sheet names from Excel file"""
|
||||||
|
xl = pd.ExcelFile(file_path)
|
||||||
|
return xl.sheet_names
|
||||||
|
|
||||||
|
|
||||||
|
def process_csv(file_path: str) -> str:
|
||||||
|
"""Process CSV file and return text"""
|
||||||
|
processor = ExcelProcessor()
|
||||||
|
return processor.extract_csv(file_path)
|
||||||
|
|
||||||
|
|
||||||
|
def process_excel(file_path: str) -> str:
|
||||||
|
"""Process Excel file and return text"""
|
||||||
|
processor = ExcelProcessor()
|
||||||
|
return processor.extract_excel(file_path)
|
||||||
65
backend/app/services/file_processor/pdf_processor.py
Normal file
65
backend/app/services/file_processor/pdf_processor.py
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
"""
|
||||||
|
PDF Text Extractor
|
||||||
|
"""
|
||||||
|
import pdfplumber
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
|
||||||
|
|
||||||
|
class PDFProcessor:
|
||||||
|
"""Extract text from PDF files"""
|
||||||
|
|
||||||
|
def extract_text(self, file_path: str) -> str:
|
||||||
|
"""Extract all text from PDF"""
|
||||||
|
text_parts = []
|
||||||
|
|
||||||
|
with pdfplumber.open(file_path) as pdf:
|
||||||
|
for page_num, page in enumerate(pdf.pages, 1):
|
||||||
|
text = page.extract_text()
|
||||||
|
if text:
|
||||||
|
text_parts.append(f"--- Page {page_num} ---\n{text}")
|
||||||
|
|
||||||
|
return "\n\n".join(text_parts)
|
||||||
|
|
||||||
|
def extract_pages(self, file_path: str) -> List[Dict]:
|
||||||
|
"""Extract text page by page with metadata"""
|
||||||
|
pages = []
|
||||||
|
|
||||||
|
with pdfplumber.open(file_path) as pdf:
|
||||||
|
for page_num, page in enumerate(pdf.pages, 1):
|
||||||
|
text = page.extract_text()
|
||||||
|
if text:
|
||||||
|
pages.append({
|
||||||
|
"page_number": page_num,
|
||||||
|
"text": text.strip(),
|
||||||
|
"word_count": len(text.split())
|
||||||
|
})
|
||||||
|
|
||||||
|
return pages
|
||||||
|
|
||||||
|
def extract_with_metadata(self, file_path: str) -> Dict:
|
||||||
|
"""Extract text with PDF metadata"""
|
||||||
|
result = {
|
||||||
|
"text": "",
|
||||||
|
"pages": [],
|
||||||
|
"metadata": {}
|
||||||
|
}
|
||||||
|
|
||||||
|
with pdfplumber.open(file_path) as pdf:
|
||||||
|
# Get metadata
|
||||||
|
result["metadata"] = {
|
||||||
|
"page_count": len(pdf.pages),
|
||||||
|
"metadata": pdf.metadata
|
||||||
|
}
|
||||||
|
|
||||||
|
# Extract pages
|
||||||
|
pages = self.extract_pages(file_path)
|
||||||
|
result["pages"] = pages
|
||||||
|
result["text"] = "\n\n".join([p["text"] for p in pages])
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def process_pdf(file_path: str) -> str:
|
||||||
|
"""Process PDF file and return text"""
|
||||||
|
processor = PDFProcessor()
|
||||||
|
return processor.extract_with_metadata(file_path)["text"]
|
||||||
3
backend/app/services/text_splitter/__init__.py
Normal file
3
backend/app/services/text_splitter/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
"""
|
||||||
|
Text Splitter Services
|
||||||
|
"""
|
||||||
248
backend/app/services/text_splitter/splitter.py
Normal file
248
backend/app/services/text_splitter/splitter.py
Normal file
@@ -0,0 +1,248 @@
|
|||||||
|
"""
|
||||||
|
Text Splitter
|
||||||
|
"""
|
||||||
|
import re
|
||||||
|
from typing import List, Dict, Optional
|
||||||
|
|
||||||
|
|
||||||
|
class TextSplitter:
|
||||||
|
"""Base text splitter"""
|
||||||
|
|
||||||
|
def __init__(self, chunk_size: int = 500, overlap: int = 50):
|
||||||
|
self.chunk_size = chunk_size
|
||||||
|
self.overlap = overlap
|
||||||
|
|
||||||
|
def split(self, text: str) -> List[Dict]:
|
||||||
|
"""Split text into chunks"""
|
||||||
|
raise NotImplementedError
|
||||||
|
|
||||||
|
|
||||||
|
class RecursiveTextSplitter(TextSplitter):
|
||||||
|
"""Recursive character text splitter"""
|
||||||
|
|
||||||
|
def __init__(self, chunk_size: int = 500, overlap: int = 50, separators: List[str] = None):
|
||||||
|
super().__init__(chunk_size, overlap)
|
||||||
|
self.separators = separators or ["\n\n", "\n", ". ", " ", ""]
|
||||||
|
|
||||||
|
def split(self, text: str) -> List[Dict]:
|
||||||
|
"""Split text recursively"""
|
||||||
|
chunks = []
|
||||||
|
current_chunk = ""
|
||||||
|
chunk_index = 0
|
||||||
|
|
||||||
|
for separator in self.separators:
|
||||||
|
if separator in text:
|
||||||
|
parts = text.split(separator)
|
||||||
|
for part in parts:
|
||||||
|
if len(current_chunk) + len(part) > self.chunk_size:
|
||||||
|
if current_chunk:
|
||||||
|
chunks.append({
|
||||||
|
"index": chunk_index,
|
||||||
|
"content": current_chunk.strip(),
|
||||||
|
"word_count": len(current_chunk.split())
|
||||||
|
})
|
||||||
|
chunk_index += 1
|
||||||
|
|
||||||
|
# Handle overlap
|
||||||
|
if self.overlap > 0 and chunks:
|
||||||
|
overlap_text = " ".join(chunks[-1]["content"].split()[-self.overlap:])
|
||||||
|
current_chunk = overlap_text + separator + part
|
||||||
|
else:
|
||||||
|
current_chunk = part
|
||||||
|
else:
|
||||||
|
current_chunk += separator + part if current_chunk else part
|
||||||
|
|
||||||
|
if current_chunk:
|
||||||
|
chunks.append({
|
||||||
|
"index": chunk_index,
|
||||||
|
"content": current_chunk.strip(),
|
||||||
|
"word_count": len(current_chunk.split())
|
||||||
|
})
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
|
||||||
|
class MarkdownStructureSplitter(TextSplitter):
|
||||||
|
"""Split text based on Markdown structure (headings)"""
|
||||||
|
|
||||||
|
def __init__(self, chunk_size: int = 2000, overlap: int = 100):
|
||||||
|
super().__init__(chunk_size, overlap)
|
||||||
|
|
||||||
|
def split(self, text: str) -> List[Dict]:
|
||||||
|
"""Split text by Markdown headings"""
|
||||||
|
# Find all heading patterns
|
||||||
|
heading_pattern = r'^(#{1,6})\s+(.+)$'
|
||||||
|
lines = text.split('\n')
|
||||||
|
|
||||||
|
chunks = []
|
||||||
|
current_chunk = ""
|
||||||
|
current_heading = "文档开头"
|
||||||
|
chunk_index = 0
|
||||||
|
|
||||||
|
for line in lines:
|
||||||
|
heading_match = re.match(heading_pattern, line.strip())
|
||||||
|
|
||||||
|
if heading_match:
|
||||||
|
# Save previous chunk if exists
|
||||||
|
if current_chunk.strip():
|
||||||
|
chunks.append({
|
||||||
|
"index": chunk_index,
|
||||||
|
"name": current_heading,
|
||||||
|
"content": current_chunk.strip(),
|
||||||
|
"word_count": len(current_chunk.split())
|
||||||
|
})
|
||||||
|
chunk_index += 1
|
||||||
|
|
||||||
|
current_heading = heading_match.group(2).strip()
|
||||||
|
current_chunk = line + "\n"
|
||||||
|
else:
|
||||||
|
# Check chunk size
|
||||||
|
if len(current_chunk) > self.chunk_size:
|
||||||
|
chunks.append({
|
||||||
|
"index": chunk_index,
|
||||||
|
"name": current_heading,
|
||||||
|
"content": current_chunk.strip(),
|
||||||
|
"word_count": len(current_chunk.split())
|
||||||
|
})
|
||||||
|
chunk_index += 1
|
||||||
|
|
||||||
|
# Handle overlap
|
||||||
|
if self.overlap > 0:
|
||||||
|
overlap_lines = current_chunk.split('\n')[-self.overlap:]
|
||||||
|
current_chunk = '\n'.join(overlap_lines) + '\n'
|
||||||
|
else:
|
||||||
|
current_chunk = ""
|
||||||
|
|
||||||
|
current_chunk += line + "\n"
|
||||||
|
|
||||||
|
# Add last chunk
|
||||||
|
if current_chunk.strip():
|
||||||
|
chunks.append({
|
||||||
|
"index": chunk_index,
|
||||||
|
"name": current_heading,
|
||||||
|
"content": current_chunk.strip(),
|
||||||
|
"word_count": len(current_chunk.split())
|
||||||
|
})
|
||||||
|
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
|
||||||
|
class TokenSplitter(TextSplitter):
|
||||||
|
"""Split text by token count"""
|
||||||
|
|
||||||
|
def __init__(self, chunk_size: int = 500, overlap: int = 50):
|
||||||
|
super().__init__(chunk_size, overlap)
|
||||||
|
|
||||||
|
def split(self, text: str) -> List[Dict]:
|
||||||
|
"""Split text by approximate token count"""
|
||||||
|
words = text.split()
|
||||||
|
chunks = []
|
||||||
|
chunk_index = 0
|
||||||
|
|
||||||
|
for i in range(0, len(words), self.chunk_size - self.overlap):
|
||||||
|
chunk_words = words[i:i + self.chunk_size]
|
||||||
|
chunk_text = " ".join(chunk_words)
|
||||||
|
|
||||||
|
chunks.append({
|
||||||
|
"index": chunk_index,
|
||||||
|
"content": chunk_text,
|
||||||
|
"word_count": len(chunk_words),
|
||||||
|
"token_estimate": len(chunk_words) * 1.3 # rough token estimate
|
||||||
|
})
|
||||||
|
chunk_index += 1
|
||||||
|
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
|
||||||
|
class CodeSplitter(TextSplitter):
|
||||||
|
"""Split text with code awareness"""
|
||||||
|
|
||||||
|
def __init__(self, chunk_size: int = 500, overlap: int = 50):
|
||||||
|
super().__init__(chunk_size, overlap)
|
||||||
|
|
||||||
|
def split(self, text: str) -> List[Dict]:
|
||||||
|
"""Split text preserving code blocks"""
|
||||||
|
# Split by code blocks first
|
||||||
|
code_pattern = r'```[\s\S]*?```'
|
||||||
|
parts = re.split(code_pattern, text)
|
||||||
|
|
||||||
|
chunks = []
|
||||||
|
chunk_index = 0
|
||||||
|
current_chunk = ""
|
||||||
|
|
||||||
|
for part in parts:
|
||||||
|
if len(current_chunk) + len(part) > self.chunk_size:
|
||||||
|
if current_chunk.strip():
|
||||||
|
chunks.append({
|
||||||
|
"index": chunk_index,
|
||||||
|
"content": current_chunk.strip(),
|
||||||
|
"word_count": len(current_chunk.split())
|
||||||
|
})
|
||||||
|
chunk_index += 1
|
||||||
|
current_chunk = part
|
||||||
|
else:
|
||||||
|
current_chunk += part
|
||||||
|
|
||||||
|
if current_chunk.strip():
|
||||||
|
chunks.append({
|
||||||
|
"index": chunk_index,
|
||||||
|
"content": current_chunk.strip(),
|
||||||
|
"word_count": len(current_chunk.split())
|
||||||
|
})
|
||||||
|
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
|
||||||
|
class CustomSplitter(TextSplitter):
|
||||||
|
"""Custom separator splitter"""
|
||||||
|
|
||||||
|
def __init__(self, separator: str = "\n\n", chunk_size: int = 500):
|
||||||
|
super().__init__(chunk_size, 0)
|
||||||
|
self.separator = separator
|
||||||
|
|
||||||
|
def split(self, text: str) -> List[Dict]:
|
||||||
|
"""Split by custom separator"""
|
||||||
|
parts = text.split(self.separator)
|
||||||
|
chunks = []
|
||||||
|
|
||||||
|
current_chunk = ""
|
||||||
|
chunk_index = 0
|
||||||
|
|
||||||
|
for part in parts:
|
||||||
|
if len(current_chunk) + len(part) > self.chunk_size:
|
||||||
|
if current_chunk.strip():
|
||||||
|
chunks.append({
|
||||||
|
"index": chunk_index,
|
||||||
|
"content": current_chunk.strip(),
|
||||||
|
"word_count": len(current_chunk.split())
|
||||||
|
})
|
||||||
|
chunk_index += 1
|
||||||
|
current_chunk = part
|
||||||
|
else:
|
||||||
|
current_chunk += self.separator + part if current_chunk else part
|
||||||
|
|
||||||
|
if current_chunk.strip():
|
||||||
|
chunks.append({
|
||||||
|
"index": chunk_index,
|
||||||
|
"content": current_chunk.strip(),
|
||||||
|
"word_count": len(current_chunk.split())
|
||||||
|
})
|
||||||
|
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
|
||||||
|
def get_splitter(method: str, **kwargs) -> TextSplitter:
|
||||||
|
"""Get text splitter by method name"""
|
||||||
|
splitters = {
|
||||||
|
"recursive": RecursiveTextSplitter,
|
||||||
|
"markdown_structure": MarkdownStructureSplitter,
|
||||||
|
"token": TokenSplitter,
|
||||||
|
"code": CodeSplitter,
|
||||||
|
"custom": CustomSplitter
|
||||||
|
}
|
||||||
|
|
||||||
|
splitter_class = splitters.get(method, RecursiveTextSplitter)
|
||||||
|
return splitter_class(**kwargs)
|
||||||
37
backend/requirements.txt
Normal file
37
backend/requirements.txt
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
# FastAPI
|
||||||
|
fastapi>=0.115.0
|
||||||
|
uvicorn[standard]>=0.30.0
|
||||||
|
python-multipart>=0.0.9
|
||||||
|
|
||||||
|
# Database - SQLite (默认), PostgreSQL 可选
|
||||||
|
sqlalchemy>=2.0.0
|
||||||
|
alembic>=1.13.0
|
||||||
|
# asyncpg>=0.29.0 # PostgreSQL 异步驱动(生产环境使用)
|
||||||
|
# psycopg2-binary>=2.9.9 # PostgreSQL 同步驱动
|
||||||
|
|
||||||
|
# Pydantic
|
||||||
|
pydantic>=2.0.0
|
||||||
|
pydantic-settings>=2.0.0
|
||||||
|
|
||||||
|
# Redis - 可选,用于缓存/队列(开发环境可省略)
|
||||||
|
# redis>=5.0.0
|
||||||
|
|
||||||
|
# File Processing
|
||||||
|
pdfplumber>=0.10.4
|
||||||
|
python-docx>=1.1.0
|
||||||
|
openpyxl>=3.1.2
|
||||||
|
pandas>=2.2.0
|
||||||
|
ebooklib>=0.5
|
||||||
|
PyMuPDF>=1.24.0
|
||||||
|
|
||||||
|
# LLM & Text
|
||||||
|
langchain>=0.3.0
|
||||||
|
langchain-community>=0.2.0
|
||||||
|
langchain-openai>=0.1.0
|
||||||
|
tiktoken>=0.7.0
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
|
||||||
|
# Utils
|
||||||
|
python-dateutil>=2.8.2
|
||||||
|
httpx>=0.27.0
|
||||||
|
aiofiles>=23.2.1
|
||||||
20
bug修改.md
Normal file
20
bug修改.md
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
# Bug 修改记录
|
||||||
|
|
||||||
|
## 2026-03-17
|
||||||
|
|
||||||
|
### 初始项目创建
|
||||||
|
- 创建 YG-Dataset 重构项目
|
||||||
|
- 搭建 FastAPI + Vue 3 基础架构
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 修复记录格式
|
||||||
|
|
||||||
|
### 日期
|
||||||
|
**问题描述:**
|
||||||
|
**原因:**
|
||||||
|
**修复方案:**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*持续更新中...*
|
||||||
52
docker-compose.yml
Normal file
52
docker-compose.yml
Normal file
@@ -0,0 +1,52 @@
|
|||||||
|
version: '3.8'
|
||||||
|
|
||||||
|
services:
|
||||||
|
# FastAPI 后端 (SQLite 数据库,随项目文件存储)
|
||||||
|
backend:
|
||||||
|
build:
|
||||||
|
context: ./backend
|
||||||
|
dockerfile: Dockerfile
|
||||||
|
container_name: ygdataset-backend
|
||||||
|
ports:
|
||||||
|
- "8000:8000"
|
||||||
|
environment:
|
||||||
|
- DATABASE_URL=sqlite:///./ygdataset.db
|
||||||
|
- DEBUG=true
|
||||||
|
volumes:
|
||||||
|
- ./backend:/app
|
||||||
|
- uploads:/app/uploads
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
# Vue 前端
|
||||||
|
frontend:
|
||||||
|
build:
|
||||||
|
context: ./frontend
|
||||||
|
dockerfile: Dockerfile
|
||||||
|
container_name: ygdataset-frontend
|
||||||
|
ports:
|
||||||
|
- "3000:80"
|
||||||
|
volumes:
|
||||||
|
- ./frontend:/app
|
||||||
|
- /app/node_modules
|
||||||
|
depends_on:
|
||||||
|
- backend
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
uploads:
|
||||||
|
|
||||||
|
# 如需 PostgreSQL,取消注释以下配置:
|
||||||
|
# services:
|
||||||
|
# postgres:
|
||||||
|
# image: postgres:15
|
||||||
|
# environment:
|
||||||
|
# POSTGRES_USER: ygdataset
|
||||||
|
# POSTGRES_PASSWORD: your_password
|
||||||
|
# POSTGRES_DB: ygdataset
|
||||||
|
# ports:
|
||||||
|
# - "5432:5432"
|
||||||
|
# volumes:
|
||||||
|
# - postgres_data:/var/lib/postgresql/data
|
||||||
|
|
||||||
|
# volumes:
|
||||||
|
# postgres_data:
|
||||||
306
easy-dataset-main-架构分析报告.md
Normal file
306
easy-dataset-main-架构分析报告.md
Normal file
@@ -0,0 +1,306 @@
|
|||||||
|
# Easy Dataset 项目架构分析报告
|
||||||
|
|
||||||
|
## 一、项目概述
|
||||||
|
|
||||||
|
**Easy Dataset** 是一个功能强大的大模型微调数据集创建工具,由 ConardLi 开发维护。该应用提供直观的界面和强大的内置文档解析、智能分割、数据清洗和增强功能,可将各种格式的领域文档转换为高质量的结构化数据集,适用于模型微调、RAG(检索增强生成)和模型性能评估等场景。
|
||||||
|
|
||||||
|
**项目地址**: https://github.com/ConardLi/easy-dataset
|
||||||
|
**当前版本**: 1.7.2
|
||||||
|
**许可证**: AGPL 3.0
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 二、技术栈分析
|
||||||
|
|
||||||
|
### 2.1 核心框架
|
||||||
|
|
||||||
|
| 类别 | 技术选型 | 说明 |
|
||||||
|
|------|----------|------|
|
||||||
|
| 前端框架 | Next.js 14 | App Router 架构 |
|
||||||
|
| UI 框架 | Material-UI (MUI) | v5.16.14 |
|
||||||
|
| 状态管理 | Jotai | 轻量级原子化状态管理 |
|
||||||
|
| 数据库 | Prisma + SQLite | 使用 Prisma ORM |
|
||||||
|
| 开发语言 | JavaScript | 全栈 JavaScript |
|
||||||
|
|
||||||
|
### 2.2 关键依赖
|
||||||
|
|
||||||
|
| 类别 | 库名称 | 用途 |
|
||||||
|
|------|--------|------|
|
||||||
|
| AI/ML | ai SDK, langchain | 大模型集成 |
|
||||||
|
| LLM 提供商 | @ai-sdk/openai, ollama-ai-provider, zhipu-ai-provider | 多模型支持 |
|
||||||
|
| 国际化 | i18next, react-i18next | 多语言支持 |
|
||||||
|
| 文档处理 | @opendocsg/pdf2md, mammoth, pdf2md-js | PDF/DOCX 解析 |
|
||||||
|
| 桌面应用 | Electron | 跨平台桌面客户端 |
|
||||||
|
| 数据处理 | xlsx, adm-zip, jszip | 文件处理 |
|
||||||
|
|
||||||
|
### 2.3 开发工具
|
||||||
|
|
||||||
|
- **包管理器**: pnpm
|
||||||
|
- **代码规范**: ESLint + Prettier
|
||||||
|
- **Git Hooks**: Husky + lint-staged
|
||||||
|
- **构建工具**: electron-builder (桌面应用打包)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 三、目录结构
|
||||||
|
|
||||||
|
```
|
||||||
|
easy-dataset-main/
|
||||||
|
├── app/ # Next.js 应用目录 (App Router)
|
||||||
|
│ ├── api/ # API 路由 (150+ 个路由)
|
||||||
|
│ │ ├── check-update/ # 版本检查
|
||||||
|
│ │ ├── llm/ # LLM 模型相关 API
|
||||||
|
│ │ │ ├── fetch-models/ # 获取模型列表
|
||||||
|
│ │ │ ├── model/ # 模型配置
|
||||||
|
│ │ │ ├── ollama/ # Ollama 本地模型
|
||||||
|
│ │ │ └── providers/ # LLM 提供商
|
||||||
|
│ │ ├── monitoring/ # 监控 API
|
||||||
|
│ │ │ ├── logs/ # 日志
|
||||||
|
│ │ │ ├── stats/ # 统计
|
||||||
|
│ │ │ └── summary/ # 摘要
|
||||||
|
│ │ └── projects/ # 项目相关 API
|
||||||
|
│ │ └── [projectId]/ # 动态项目路由
|
||||||
|
│ │ ├── chunks/ # 文本分块
|
||||||
|
│ │ ├── datasets/ # 数据集
|
||||||
|
│ │ ├── eval-datasets/ # 评估数据集
|
||||||
|
│ │ ├── eval-tasks/ # 评估任务
|
||||||
|
│ │ ├── files/ # 文件管理
|
||||||
|
│ │ ├── images/ # 图片处理
|
||||||
|
│ │ ├── questions/ # 问题生成
|
||||||
|
│ │ ├── distill/ # 数据蒸馏
|
||||||
|
│ │ ├── blind-test-tasks/ # 盲测任务
|
||||||
|
│ │ ├── playground/ # 模型测试场
|
||||||
|
│ │ └── ...
|
||||||
|
│ └── (页面路由)
|
||||||
|
├── components/ # React 组件 (100+ 组件)
|
||||||
|
│ ├── common/ # 通用组件
|
||||||
|
│ ├── home/ # 首页组件
|
||||||
|
│ ├── Navbar/ # 导航栏
|
||||||
|
│ ├── dataset-square/ # 数据集广场
|
||||||
|
│ ├── datasets/ # 数据集组件
|
||||||
|
│ ├── distill/ # 数据蒸馏组件
|
||||||
|
│ ├── export/ # 导出组件
|
||||||
|
│ ├── questions/ # 问题组件
|
||||||
|
│ ├── text-split/ # 文本分割组件
|
||||||
|
│ ├── tasks/ # 任务管理组件
|
||||||
|
│ ├── playground/ # 测试场组件
|
||||||
|
│ └── settings/ # 设置组件
|
||||||
|
├── prisma/ # 数据库 schema
|
||||||
|
│ ├── schema.prisma # Prisma 数据模型
|
||||||
|
│ ├── sql.json # SQL 模板
|
||||||
|
│ └── generate-template.js # 模板生成
|
||||||
|
├── locales/ # 国际化资源
|
||||||
|
│ ├── en/ # 英文
|
||||||
|
│ ├── zh-CN/ # 简体中文
|
||||||
|
│ └── pt-BR/ # 葡萄牙语
|
||||||
|
├── electron/ # Electron 桌面应用
|
||||||
|
│ ├── main.js # 主进程
|
||||||
|
│ └── preload.js # 预加载脚本
|
||||||
|
├── public/ # 静态资源
|
||||||
|
├── desktop/ # 桌面端入口
|
||||||
|
└── package.json # 项目配置
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 四、核心模块设计
|
||||||
|
|
||||||
|
### 4.1 数据模型 (Prisma Schema)
|
||||||
|
|
||||||
|
项目使用 Prisma ORM 管理数据,主要数据模型包括:
|
||||||
|
|
||||||
|
- **Project**: 项目
|
||||||
|
- **File**: 上传的文件
|
||||||
|
- **Chunk**: 文本分块
|
||||||
|
- **Question**: 生成的问题
|
||||||
|
- **Dataset**: 微调数据集
|
||||||
|
- **EvalDataset**: 评估数据集
|
||||||
|
- **EvalTask**: 评估任务
|
||||||
|
- **BlindTestTask**: 盲测任务
|
||||||
|
- **ModelConfig**: 模型配置
|
||||||
|
- **Tag**: 标签
|
||||||
|
- **Conversation**: 对话记录
|
||||||
|
- **Image**: 图片数据
|
||||||
|
- **Task**: 后台任务
|
||||||
|
|
||||||
|
### 4.2 核心功能模块
|
||||||
|
|
||||||
|
#### 4.2.1 文档处理模块 (Text Split)
|
||||||
|
|
||||||
|
- 支持 PDF、Markdown、DOCX、TXT、EPUB 格式
|
||||||
|
- 多种分割算法:Markdown结构、递归分隔符、固定长度、代码感知分块
|
||||||
|
- 目录结构提取
|
||||||
|
- PDF 转 Markdown
|
||||||
|
|
||||||
|
#### 4.2.2 问题生成模块 (Question Generation)
|
||||||
|
|
||||||
|
- 自动从文本片段提取相关问题
|
||||||
|
- 问题模板管理
|
||||||
|
- 批量生成
|
||||||
|
- 标签树自动构建
|
||||||
|
|
||||||
|
#### 4.2.3 数据集生成模块 (Dataset Generation)
|
||||||
|
|
||||||
|
- 单轮问答数据集
|
||||||
|
- 多轮对话数据集
|
||||||
|
- 图片问答数据集
|
||||||
|
- 数据蒸馏(无需上传文档)
|
||||||
|
|
||||||
|
#### 4.2.4 评估模块 (Evaluation)
|
||||||
|
|
||||||
|
- 评估数据集生成(判断题、单选、多选、简答、开放题)
|
||||||
|
- 自动化模型评估(Judge Model)
|
||||||
|
- 人类盲测系统(Arena)
|
||||||
|
- AI 质量评估
|
||||||
|
|
||||||
|
#### 4.2.5 LLM 集成模块
|
||||||
|
|
||||||
|
支持的模型提供商:
|
||||||
|
- OpenAI
|
||||||
|
- Ollama (本地模型)
|
||||||
|
- 智谱 AI
|
||||||
|
- 阿里百炼
|
||||||
|
- OpenRouter
|
||||||
|
- Google Gemini
|
||||||
|
- Anthropic Claude
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 五、API 架构
|
||||||
|
|
||||||
|
### 5.1 API 设计原则
|
||||||
|
|
||||||
|
- RESTful 风格路由
|
||||||
|
- 基于 Next.js App Router 的 Route Handlers
|
||||||
|
- 使用 Zod 进行请求/响应验证
|
||||||
|
|
||||||
|
### 5.2 主要 API 分组
|
||||||
|
|
||||||
|
| API 分组 | 路由前缀 | 功能 |
|
||||||
|
|----------|----------|------|
|
||||||
|
| 项目管理 | `/api/projects` | 项目 CRUD |
|
||||||
|
| 文件管理 | `/api/projects/[id]/files` | 文件上传/处理 |
|
||||||
|
| 文本分块 | `/api/projects/[id]/chunks` | 文本分割 |
|
||||||
|
| 问题生成 | `/api/projects/[id]/questions` | 问题生成/管理 |
|
||||||
|
| 数据集 | `/api/projects/[id]/datasets` | 数据集管理 |
|
||||||
|
| 评估 | `/api/projects/[id]/eval-*` | 评估相关 |
|
||||||
|
| 盲测 | `/api/projects/[id]/blind-test-tasks` | 盲测系统 |
|
||||||
|
| LLM | `/api/llm/*` | 模型配置/调用 |
|
||||||
|
| 监控 | `/api/monitoring/*` | 日志/统计 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 六、前端架构
|
||||||
|
|
||||||
|
### 6.1 组件设计模式
|
||||||
|
|
||||||
|
- **Jotai 状态管理**: 使用原子化状态管理,便于细粒度更新
|
||||||
|
- **MUI 组件库**: 统一的 UI 组件
|
||||||
|
- **Framer Motion**: 动画效果
|
||||||
|
|
||||||
|
### 6.2 主要页面
|
||||||
|
|
||||||
|
1. **首页** (`/`): 项目列表、创建项目、统计卡片
|
||||||
|
2. **项目页** (`/projects/[id]`):
|
||||||
|
- 文本分割 (`/text-split`)
|
||||||
|
- 问题列表 (`/questions`)
|
||||||
|
- 数据集 (`/datasets`)
|
||||||
|
- 评估 (`/eval-datasets`)
|
||||||
|
- 盲测 Arena (`/arena`)
|
||||||
|
- 设置 (`/settings`)
|
||||||
|
3. **模型测试场** (`/playground`)
|
||||||
|
4. **数据集广场** (`/datasets-square`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 七、部署架构
|
||||||
|
|
||||||
|
### 7.1 多平台支持
|
||||||
|
|
||||||
|
- **Web 应用**: Next.js 生产构建
|
||||||
|
- **桌面应用**: Electron
|
||||||
|
- Windows (NSIS 安装包)
|
||||||
|
- macOS (DMG)
|
||||||
|
- Linux (AppImage)
|
||||||
|
- **Docker**: 支持 Docker 部署
|
||||||
|
|
||||||
|
### 7.2 开发命令
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 开发
|
||||||
|
pnpm dev # 启动开发服务器 (端口 1717)
|
||||||
|
|
||||||
|
# 构建
|
||||||
|
pnpm build # 构建 Next.js 生产版本
|
||||||
|
pnpm electron-build # 构建桌面应用
|
||||||
|
|
||||||
|
# 数据库
|
||||||
|
pnpm db:push # 推送 schema 到数据库
|
||||||
|
pnpm db:studio # 打开 Prisma Studio
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 八、数据流设计
|
||||||
|
|
||||||
|
### 8.1 核心业务流程
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||||
|
│ 上传文档 │ -> │ 文本分割 │ -> │ 问题生成 │ -> │ 数据集生成 │
|
||||||
|
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
|
||||||
|
│ │ │ │
|
||||||
|
PDF/DOCX Chunk Question Dataset
|
||||||
|
Markdown 目录结构 标签树 导出格式
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8.2 评估流程
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||||
|
│ 评估数据集 │ -> │ 评估任务 │ -> │ 模型评估 │ -> │ 结果分析 │
|
||||||
|
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
|
||||||
|
生成题目 批量处理 Judge Model Arena盲测
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 九、国际化
|
||||||
|
|
||||||
|
- **技术选型**: i18next + react-i18next
|
||||||
|
- **支持语言**:
|
||||||
|
- 英文 (en)
|
||||||
|
- 简体中文 (zh-CN)
|
||||||
|
- 土耳其语 (tr)
|
||||||
|
- 葡萄牙语 (pt-BR)
|
||||||
|
- **语言检测**: i18next-browser-languagedetector
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 十、特性亮点
|
||||||
|
|
||||||
|
1. **智能文档处理**: 支持多种格式,智能识别
|
||||||
|
2. **多种分割算法**: 灵活适应不同文档结构
|
||||||
|
3. **自动标签树**: 基于文档结构智能构建
|
||||||
|
4. **多类型数据集**: 单轮问答、多轮对话、图片问答
|
||||||
|
5. **完整评估体系**: 自动化评估 + 人类盲测
|
||||||
|
6. **多模型支持**: 兼容 OpenAI 格式的所有 API
|
||||||
|
7. **一键导出**: 支持多种格式和 LLaMA Factory 集成
|
||||||
|
8. **桌面客户端**: 跨平台支持
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 十一、扩展方向
|
||||||
|
|
||||||
|
根据项目发展路线,未来可能扩展的方向包括:
|
||||||
|
|
||||||
|
1. 更多文件格式支持
|
||||||
|
2. 数据集版本管理
|
||||||
|
3. 团队协作功能
|
||||||
|
4. 更多导出格式
|
||||||
|
5. 更强大的数据分析功能
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*报告生成时间: 2026-03-17*
|
||||||
|
*基于 easy-dataset-main 项目源码分析*
|
||||||
16
easy-dataset-main/.dockerignore
Normal file
16
easy-dataset-main/.dockerignore
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
node_modules
|
||||||
|
.next
|
||||||
|
.git
|
||||||
|
.github
|
||||||
|
README.md
|
||||||
|
README.zh-CN.md
|
||||||
|
.gitignore
|
||||||
|
.env.local
|
||||||
|
.env.development.local
|
||||||
|
.env.test.local
|
||||||
|
.env.production.local
|
||||||
|
/test
|
||||||
|
/local-db
|
||||||
|
/video
|
||||||
|
/prisma/*.sqlite
|
||||||
|
/prisma/*.sqlite-*
|
||||||
6
easy-dataset-main/.gitattributes
vendored
Normal file
6
easy-dataset-main/.gitattributes
vendored
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
# Ensure shell scripts always use LF line endings
|
||||||
|
*.sh text eol=lf
|
||||||
|
docker-entrypoint.sh text eol=lf
|
||||||
|
|
||||||
|
# Ensure Dockerfile uses LF
|
||||||
|
Dockerfile text eol=lf
|
||||||
40
easy-dataset-main/.github/ISSUE_TEMPLATE/bug_report.md
vendored
Normal file
40
easy-dataset-main/.github/ISSUE_TEMPLATE/bug_report.md
vendored
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
---
|
||||||
|
name: Bug report
|
||||||
|
about: Create a report to help us improve
|
||||||
|
title: '[Bug]'
|
||||||
|
labels: bug
|
||||||
|
assignees: ''
|
||||||
|
---
|
||||||
|
|
||||||
|
**注意:请务必按照此模版填写 ISSUES 信息,否则 ISSUE 将不会得到回复**
|
||||||
|
|
||||||
|
**问题描述**
|
||||||
|
清晰、简洁地描述该问题的具体情况。
|
||||||
|
|
||||||
|
**桌面设备(请完善以下信息)**
|
||||||
|
|
||||||
|
- 操作系统:[例如:、Window、MAC]
|
||||||
|
- 浏览器:[例如:谷歌浏览器(Chrome),苹果浏览器(Safari)]
|
||||||
|
- Easy Dataset 版本:[例如:1.2.2]
|
||||||
|
|
||||||
|
**使用模型**
|
||||||
|
|
||||||
|
- 模型提供商:例如火山引擎
|
||||||
|
- 模型名称:例如 DeepSeek R1
|
||||||
|
|
||||||
|
**复现步骤**
|
||||||
|
重现该问题的操作步骤:
|
||||||
|
|
||||||
|
1. 进入“……”页面。
|
||||||
|
2. 点击“……”。
|
||||||
|
3. 向下滚动到“……”。
|
||||||
|
4. 这时会看到错误提示。
|
||||||
|
|
||||||
|
**预期结果**
|
||||||
|
清晰、简洁地描述你原本期望出现的情况。
|
||||||
|
|
||||||
|
**截图**
|
||||||
|
如果有必要,请附上截图,以便更好地说明你的问题。
|
||||||
|
|
||||||
|
**其他相关信息**
|
||||||
|
在此处添加关于该问题的其他任何相关背景信息。
|
||||||
19
easy-dataset-main/.github/ISSUE_TEMPLATE/feature-or-enhancement-.md
vendored
Normal file
19
easy-dataset-main/.github/ISSUE_TEMPLATE/feature-or-enhancement-.md
vendored
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
---
|
||||||
|
name: 'Feature or enhancement '
|
||||||
|
about: Suggest an idea for this project
|
||||||
|
title: '[Feature]'
|
||||||
|
labels: enhancement
|
||||||
|
assignees: ''
|
||||||
|
---
|
||||||
|
|
||||||
|
**你的功能请求是否与某个问题相关?请描述。**
|
||||||
|
清晰、简洁地描述一下存在的问题是什么。例如:当我[具体情况]时,我总是感到很沮丧。
|
||||||
|
|
||||||
|
**描述你期望的解决方案**
|
||||||
|
清晰、简洁地描述你希望实现的情况。
|
||||||
|
|
||||||
|
**描述你考虑过的替代方案**
|
||||||
|
清晰、简洁地描述你所考虑过的任何其他解决方案或功能。
|
||||||
|
|
||||||
|
**其他相关信息**
|
||||||
|
在此处添加与该功能请求相关的其他任何背景信息或截图。
|
||||||
40
easy-dataset-main/.github/ISSUE_TEMPLATE/question.md
vendored
Normal file
40
easy-dataset-main/.github/ISSUE_TEMPLATE/question.md
vendored
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
---
|
||||||
|
name: Question
|
||||||
|
about: Ask questions you want to know
|
||||||
|
title: '[Question]'
|
||||||
|
labels: question
|
||||||
|
assignees: ''
|
||||||
|
---
|
||||||
|
|
||||||
|
**注意:请务必按照此模版填写 ISSUES 信息,否则 ISSUE 将不会得到回复**
|
||||||
|
|
||||||
|
**问题描述**
|
||||||
|
清晰、简洁地描述该问题的具体情况。
|
||||||
|
|
||||||
|
**桌面设备(请完善以下信息)**
|
||||||
|
|
||||||
|
- 操作系统:[例如:、Window、MAC]
|
||||||
|
- 浏览器:[例如:谷歌浏览器(Chrome),苹果浏览器(Safari)]
|
||||||
|
- Easy Dataset 版本:[例如:1.2.2]
|
||||||
|
|
||||||
|
**使用模型**
|
||||||
|
|
||||||
|
- 模型提供商:例如火山引擎
|
||||||
|
- 模型名称:例如 DeepSeek R1
|
||||||
|
|
||||||
|
**复现步骤**
|
||||||
|
重现该问题的操作步骤:
|
||||||
|
|
||||||
|
1. 进入“……”页面。
|
||||||
|
2. 点击“……”。
|
||||||
|
3. 向下滚动到“……”。
|
||||||
|
4. 这时会看到错误提示。
|
||||||
|
|
||||||
|
**预期结果**
|
||||||
|
清晰、简洁地描述你原本期望出现的情况。
|
||||||
|
|
||||||
|
**截图**
|
||||||
|
如果有必要,请附上截图,以便更好地说明你的问题。
|
||||||
|
|
||||||
|
**其他相关信息**
|
||||||
|
在此处添加关于该问题的其他任何相关背景信息。
|
||||||
12
easy-dataset-main/.github/PULL_REQUEST_TEMPLATE.md
vendored
Normal file
12
easy-dataset-main/.github/PULL_REQUEST_TEMPLATE.md
vendored
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
### 变更类型- [ ] 新功能(feat)
|
||||||
|
|
||||||
|
- [ ] 修复(fix)
|
||||||
|
- [ ] 文档(docs)
|
||||||
|
- [ ] 重构(refactor)
|
||||||
|
|
||||||
|
### 变更描述- 简要说明修改内容(关联Issue:#123)
|
||||||
|
|
||||||
|
### 文档更新- [ ] README.md
|
||||||
|
|
||||||
|
- [ ] 贡献指南
|
||||||
|
- [ ] 接口文档(如有)
|
||||||
48
easy-dataset-main/.github/workflows/docker-build.yml
vendored
Normal file
48
easy-dataset-main/.github/workflows/docker-build.yml
vendored
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
name: Build and Push Docker image on Tag
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
tags:
|
||||||
|
- '*'
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
docker-image-release:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: read
|
||||||
|
packages: write
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Log in to GitHub Container Registry
|
||||||
|
uses: docker/login-action@v3
|
||||||
|
with:
|
||||||
|
registry: ghcr.io
|
||||||
|
username: ${{ github.actor }}
|
||||||
|
password: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Extract metadata for Docker
|
||||||
|
id: meta
|
||||||
|
uses: docker/metadata-action@v5
|
||||||
|
with:
|
||||||
|
images: ghcr.io/${{ github.repository_owner }}/easy-dataset
|
||||||
|
tags: |
|
||||||
|
type=ref,event=tag
|
||||||
|
type=raw,value=latest,enable={{is_default_branch}}
|
||||||
|
|
||||||
|
- name: Build and push Docker image
|
||||||
|
uses: docker/build-push-action@v5
|
||||||
|
with:
|
||||||
|
context: .
|
||||||
|
push: true
|
||||||
|
platforms: linux/amd64,linux/arm64
|
||||||
|
tags: ${{ steps.meta.outputs.tags }}
|
||||||
|
labels: ${{ steps.meta.outputs.labels }}
|
||||||
|
cache-from: type=gha
|
||||||
|
cache-to: type=gha,mode=max
|
||||||
22
easy-dataset-main/.gitignore
vendored
Normal file
22
easy-dataset-main/.gitignore
vendored
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
node_modules
|
||||||
|
build
|
||||||
|
.vscode
|
||||||
|
website-local.json
|
||||||
|
ai-local.json
|
||||||
|
.next
|
||||||
|
.DS_Store
|
||||||
|
tsconfig.tsbuildinfo
|
||||||
|
mock-login-callback.ts
|
||||||
|
.env.local
|
||||||
|
/src/test/crawler
|
||||||
|
/src/test/mock
|
||||||
|
/test
|
||||||
|
/dist
|
||||||
|
/prisma/*.sqlite
|
||||||
|
.idea
|
||||||
|
!local-db/empty.txt
|
||||||
|
/local-db
|
||||||
|
prisma/local-db/db.sqlite
|
||||||
|
/local-db2
|
||||||
|
.trae
|
||||||
|
opencode.json
|
||||||
3
easy-dataset-main/.husky/commit-msg
Normal file
3
easy-dataset-main/.husky/commit-msg
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
#!/usr/bin/env sh
|
||||||
|
|
||||||
|
npx commitlint --edit "$1"
|
||||||
1
easy-dataset-main/.husky/pre-commit
Normal file
1
easy-dataset-main/.husky/pre-commit
Normal file
@@ -0,0 +1 @@
|
|||||||
|
npx lint-staged
|
||||||
3
easy-dataset-main/.npmrc
Normal file
3
easy-dataset-main/.npmrc
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
# 国内用户可使用淘宝源加速 (Chinese users can use Taobao registry for faster downloads)
|
||||||
|
# registry=https://registry.npmmirror.com
|
||||||
|
registry=https://registry.npmjs.org
|
||||||
13
easy-dataset-main/.prettierrc.js
Normal file
13
easy-dataset-main/.prettierrc.js
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
module.exports = {
|
||||||
|
semi: true,
|
||||||
|
trailingComma: 'none',
|
||||||
|
singleQuote: true,
|
||||||
|
tabWidth: 2,
|
||||||
|
useTabs: false,
|
||||||
|
bracketSpacing: true,
|
||||||
|
arrowParens: 'avoid',
|
||||||
|
proseWrap: 'preserve',
|
||||||
|
jsxBracketSameLine: true,
|
||||||
|
printWidth: 120,
|
||||||
|
endOfLine: 'auto'
|
||||||
|
};
|
||||||
124
easy-dataset-main/.windsurfrules
Normal file
124
easy-dataset-main/.windsurfrules
Normal file
@@ -0,0 +1,124 @@
|
|||||||
|
# Easy DataSet 项目架构设计
|
||||||
|
|
||||||
|
## 项目概述
|
||||||
|
|
||||||
|
Easy DataSet 是一个用于创建大模型微调数据集的应用程序。用户可以上传文本文件,系统会自动分割文本并生成问题,最终生成用于微调的数据集。
|
||||||
|
|
||||||
|
## 技术栈
|
||||||
|
|
||||||
|
- **前端框架**: Next.js 14 (App Router)
|
||||||
|
- **UI 框架**: Material-UI (MUI)
|
||||||
|
- **数据存储**: fs 文件系统模拟数据库
|
||||||
|
- **开发语言**: JavaScript
|
||||||
|
- **依赖管理**: pnpm
|
||||||
|
|
||||||
|
## 目录结构
|
||||||
|
|
||||||
|
```
|
||||||
|
easy-dataset/
|
||||||
|
├── app/ # Next.js 应用目录
|
||||||
|
│ ├── api/ # API 路由
|
||||||
|
│ │ └── projects/ # 项目相关 API
|
||||||
|
│ ├── projects/ # 项目相关页面
|
||||||
|
│ │ ├── [projectId]/ # 项目详情页面
|
||||||
|
│ └── page.js # 主页
|
||||||
|
├── components/ # React 组件
|
||||||
|
│ ├── home/ # 主页相关组件
|
||||||
|
│ │ ├── HeroSection.js
|
||||||
|
│ │ ├── ProjectList.js
|
||||||
|
│ │ └── StatsCard.js
|
||||||
|
│ ├── Navbar.js # 导航栏组件
|
||||||
|
│ └── CreateProjectDialog.js
|
||||||
|
├── lib/ # 工具库
|
||||||
|
│ └── db/ # 数据库模块
|
||||||
|
│ ├── base.js # 基础工具函数
|
||||||
|
│ ├── projects.js # 项目管理
|
||||||
|
│ ├── texts.js # 文本处理
|
||||||
|
│ ├── datasets.js # 数据集管理
|
||||||
|
│ └── index.js # 模块导出
|
||||||
|
├── styles/ # 样式文件
|
||||||
|
│ └── home.js # 主页样式
|
||||||
|
└── local-db/ # 本地数据库目录
|
||||||
|
```
|
||||||
|
|
||||||
|
## 核心模块设计
|
||||||
|
|
||||||
|
### 1. 数据库模块 (`lib/db/`)
|
||||||
|
|
||||||
|
#### base.js
|
||||||
|
- 提供基础的文件操作功能
|
||||||
|
- 确保数据库目录存在
|
||||||
|
- 读写 JSON 文件的工具函数
|
||||||
|
|
||||||
|
#### projects.js
|
||||||
|
- 项目的 CRUD 操作
|
||||||
|
- 项目配置管理
|
||||||
|
- 项目目录结构维护
|
||||||
|
|
||||||
|
#### texts.js
|
||||||
|
- 文献处理功能
|
||||||
|
- 文本片段存储和检索
|
||||||
|
- 文件上传处理
|
||||||
|
|
||||||
|
#### datasets.js
|
||||||
|
- 数据集生成和管理
|
||||||
|
- 问题列表管理
|
||||||
|
- 标签树管理
|
||||||
|
|
||||||
|
### 2. 前端组件 (`components/`)
|
||||||
|
|
||||||
|
#### Navbar.js
|
||||||
|
- 顶部导航栏
|
||||||
|
- 项目切换
|
||||||
|
- 模型选择
|
||||||
|
- 主题切换
|
||||||
|
|
||||||
|
#### home/ 目录组件
|
||||||
|
- HeroSection.js: 主页顶部展示区
|
||||||
|
- ProjectList.js: 项目列表展示
|
||||||
|
- StatsCard.js: 数据统计展示
|
||||||
|
- CreateProjectDialog.js: 创建项目的对话框
|
||||||
|
|
||||||
|
### 3. 页面路由 (`app/`)
|
||||||
|
|
||||||
|
#### 主页 (`page.js`)
|
||||||
|
- 项目列表展示
|
||||||
|
- 创建项目入口
|
||||||
|
- 数据统计展示
|
||||||
|
|
||||||
|
#### 项目详情页 (`projects/[projectId]/`)
|
||||||
|
- text-split/: 文献处理页面
|
||||||
|
- questions/: 问题列表页面
|
||||||
|
- datasets/: 数据集页面
|
||||||
|
- settings/: 项目设置页面
|
||||||
|
|
||||||
|
#### API 路由 (`api/`)
|
||||||
|
- projects/: 项目管理 API
|
||||||
|
- texts/: 文本处理 API
|
||||||
|
- questions/: 问题生成 API
|
||||||
|
- datasets/: 数据集管理 API
|
||||||
|
|
||||||
|
## 数据流设计
|
||||||
|
|
||||||
|
### 项目创建流程
|
||||||
|
1. 用户通过主页或导航栏创建新项目
|
||||||
|
2. 填写项目基本信息(名称、描述)
|
||||||
|
3. 系统创建项目目录和初始配置文件
|
||||||
|
4. 重定向到项目详情页
|
||||||
|
|
||||||
|
### 文献处理流程
|
||||||
|
1. 用户上传 Markdown 文件
|
||||||
|
2. 系统保存原始文件到项目目录
|
||||||
|
3. 调用文本分割服务,生成片段和目录结构
|
||||||
|
4. 展示分割结果和提取的目录
|
||||||
|
|
||||||
|
### 问题生成流程
|
||||||
|
1. 用户选择需要生成问题的文本片段
|
||||||
|
2. 系统调用大模型API生成问题
|
||||||
|
3. 保存问题到问题列表和标签树
|
||||||
|
|
||||||
|
### 数据集生成流程
|
||||||
|
1. 用户选择需要生成答案的问题
|
||||||
|
2. 系统调用大模型API生成答案
|
||||||
|
3. 保存数据集结果
|
||||||
|
4. 提供导出功能
|
||||||
254
easy-dataset-main/AGENTS.md
Normal file
254
easy-dataset-main/AGENTS.md
Normal file
@@ -0,0 +1,254 @@
|
|||||||
|
# Easy Dataset Agent 指南
|
||||||
|
|
||||||
|
## 项目概述
|
||||||
|
|
||||||
|
Easy Dataset 是一个专为大型语言模型(LLM)微调数据集创建而设计的应用程序。它提供完整的workflow,从文档处理到数据集导出,支持多种文件格式和AI模型。
|
||||||
|
|
||||||
|
## 技术栈
|
||||||
|
|
||||||
|
- **前端**: Next.js 14 (App Router), React 18, Material-UI v5
|
||||||
|
- **后端**: Node.js, Prisma ORM, SQLite
|
||||||
|
- **AI集成**: OpenAI API, Ollama, 智谱AI, OpenRouter
|
||||||
|
- **桌面应用**: Electron
|
||||||
|
- **国际化**: i18next
|
||||||
|
- **构建工具**: npm/pnpm, Electron Builder
|
||||||
|
|
||||||
|
## 核心架构
|
||||||
|
|
||||||
|
### 1. 数据流架构
|
||||||
|
|
||||||
|
```
|
||||||
|
文档上传 → 文本分割 → 问题生成 → 答案生成 → 数据集导出
|
||||||
|
↓ ↓ ↓ ↓ ↓
|
||||||
|
文件处理 智能分块 LLM生成 LLM生成 格式转换
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. 模块结构
|
||||||
|
|
||||||
|
```
|
||||||
|
lib/
|
||||||
|
├── api/ # API接口层
|
||||||
|
├── db/ # 数据访问层
|
||||||
|
├── file/ # 文件处理模块
|
||||||
|
├── llm/ # AI模型集成
|
||||||
|
├── services/ # 业务逻辑层
|
||||||
|
└── util/ # 工具函数
|
||||||
|
```
|
||||||
|
|
||||||
|
## 开发指南
|
||||||
|
|
||||||
|
### 环境设置
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 安装依赖
|
||||||
|
npm install
|
||||||
|
|
||||||
|
# 数据库初始化
|
||||||
|
npm run db:push
|
||||||
|
|
||||||
|
# 开发模式
|
||||||
|
npm run dev
|
||||||
|
|
||||||
|
# 构建
|
||||||
|
npm run build
|
||||||
|
```
|
||||||
|
|
||||||
|
### 代码规范
|
||||||
|
|
||||||
|
- 使用ES6+语法
|
||||||
|
- 模块化开发
|
||||||
|
- 异步操作使用async/await
|
||||||
|
- 错误处理使用try/catch
|
||||||
|
- 注释使用JSDoc格式
|
||||||
|
|
||||||
|
### 重要文件路径
|
||||||
|
|
||||||
|
- **主入口**: `app/page.js`
|
||||||
|
- **项目路由**: `app/projects/[projectId]/`
|
||||||
|
- **API路由**: `app/api/`
|
||||||
|
- **LLM核心**: `lib/llm/core/index.js`
|
||||||
|
- **任务处理**: `lib/services/tasks/`
|
||||||
|
|
||||||
|
## 功能模块详解
|
||||||
|
|
||||||
|
### 1. 文档处理模块 (`lib/file/`)
|
||||||
|
|
||||||
|
- **支持的格式**: PDF, Markdown, DOCX, EPUB, TXT
|
||||||
|
- **核心功能**:
|
||||||
|
- 智能文本分割
|
||||||
|
- 目录结构提取
|
||||||
|
- 自定义分隔符分块
|
||||||
|
- 多语言支持
|
||||||
|
|
||||||
|
### 2. AI模型集成 (`lib/llm/`)
|
||||||
|
|
||||||
|
- **支持的提供商**:
|
||||||
|
- OpenAI (GPT系列)
|
||||||
|
- Ollama (本地模型)
|
||||||
|
- 智谱AI (GLM系列)
|
||||||
|
- OpenRouter (多模型聚合)
|
||||||
|
- **功能特性**:
|
||||||
|
- 统一API接口
|
||||||
|
- 流式输出支持
|
||||||
|
- 多语言提示词
|
||||||
|
- 错误重试机制
|
||||||
|
|
||||||
|
### 3. 任务系统 (`lib/services/tasks/`)
|
||||||
|
|
||||||
|
- **任务类型**:
|
||||||
|
- 文件处理任务
|
||||||
|
- 问题生成任务
|
||||||
|
- 答案生成任务
|
||||||
|
- 数据清洗任务
|
||||||
|
- **状态管理**: 待处理、处理中、完成、失败
|
||||||
|
|
||||||
|
### 4. 数据管理 (`lib/db/`)
|
||||||
|
|
||||||
|
- **数据模型**:
|
||||||
|
- Project (项目)
|
||||||
|
- Text/Chunk (文本块)
|
||||||
|
- Question (问题)
|
||||||
|
- Dataset (数据集)
|
||||||
|
- Tag (标签)
|
||||||
|
|
||||||
|
## 常用开发任务
|
||||||
|
|
||||||
|
### 添加新的AI模型提供商
|
||||||
|
|
||||||
|
1. 在 `lib/llm/core/providers/` 创建新的provider文件
|
||||||
|
2. 实现基础接口 (generate, streamGenerate)
|
||||||
|
3. 在 `lib/llm/core/index.js` 中注册provider
|
||||||
|
4. 更新配置文件和UI界面
|
||||||
|
|
||||||
|
### 添加新的文件格式支持
|
||||||
|
|
||||||
|
1. 在 `lib/file/file-process/` 创建格式处理器
|
||||||
|
2. 实现内容提取和文本转换逻辑
|
||||||
|
3. 更新文件类型检测和验证
|
||||||
|
4. 添加相应的UI组件
|
||||||
|
|
||||||
|
### 自定义提示词模板
|
||||||
|
|
||||||
|
1. 在 `lib/llm/prompts/` 创建新的提示词文件
|
||||||
|
2. 使用i18n支持多语言
|
||||||
|
3. 在设置界面添加配置选项
|
||||||
|
4. 测试不同模型的效果
|
||||||
|
|
||||||
|
### 添加新的导出格式
|
||||||
|
|
||||||
|
1. 在 `components/export/` 创建新的导出组件
|
||||||
|
2. 实现数据格式转换逻辑
|
||||||
|
3. 更新导出对话框界面
|
||||||
|
4. 添加格式验证和错误处理
|
||||||
|
|
||||||
|
## 调试技巧
|
||||||
|
|
||||||
|
### 1. 数据库调试
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 打开Prisma Studio
|
||||||
|
npm run db:studio
|
||||||
|
|
||||||
|
# 查看数据库文件
|
||||||
|
sqlite3 prisma/db.sqlite
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. LLM API调试
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
// 在lib/llm/core/index.js中添加日志
|
||||||
|
console.log('LLM Request:', { provider, model, prompt });
|
||||||
|
console.log('LLM Response:', response);
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. 文件处理调试
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
// 在lib/file/中添加调试信息
|
||||||
|
console.log('File processing:', fileName, fileType);
|
||||||
|
console.log('Text chunks:', chunks.length, chunks[0]);
|
||||||
|
```
|
||||||
|
|
||||||
|
## 性能优化建议
|
||||||
|
|
||||||
|
### 1. 文件处理优化
|
||||||
|
|
||||||
|
- 大文件分片处理
|
||||||
|
- 异步并发处理
|
||||||
|
- 内存使用监控
|
||||||
|
- 进度条显示
|
||||||
|
|
||||||
|
### 2. LLM调用优化
|
||||||
|
|
||||||
|
- 请求缓存机制
|
||||||
|
- 批量处理请求
|
||||||
|
- 重试策略优化
|
||||||
|
- 并发数控制
|
||||||
|
|
||||||
|
### 3. 前端性能优化
|
||||||
|
|
||||||
|
- 组件懒加载
|
||||||
|
- 虚拟滚动列表
|
||||||
|
- 图片懒加载
|
||||||
|
- 代码分割
|
||||||
|
|
||||||
|
## 常见问题解决
|
||||||
|
|
||||||
|
### 1. 数据库相关问题
|
||||||
|
|
||||||
|
- **问题**: 数据库连接失败
|
||||||
|
- **解决**: 检查prisma配置,确保数据库文件存在
|
||||||
|
|
||||||
|
### 2. LLM API相关问题
|
||||||
|
|
||||||
|
- **问题**: API调用超时
|
||||||
|
- **解决**: 调整超时时间,检查网络连接,增加重试机制
|
||||||
|
|
||||||
|
### 3. 文件处理问题
|
||||||
|
|
||||||
|
- **问题**: 大文件处理内存溢出
|
||||||
|
- **解决**: 使用流式处理,分块读取,增加内存限制
|
||||||
|
|
||||||
|
### 4. Electron打包问题
|
||||||
|
|
||||||
|
- **问题**: 打包后应用无法启动
|
||||||
|
- **解决**: 检查依赖项配置,确保native模块正确打包
|
||||||
|
|
||||||
|
## 部署指南
|
||||||
|
|
||||||
|
### Docker部署
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 构建镜像
|
||||||
|
docker build -t easy-dataset .
|
||||||
|
|
||||||
|
# 运行容器
|
||||||
|
docker run -d -p 1717:1717 -v ./local-db:/app/local-db easy-dataset
|
||||||
|
```
|
||||||
|
|
||||||
|
### 桌面应用构建
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 构建各平台安装包
|
||||||
|
npm run electron-build-mac # macOS
|
||||||
|
npm run electron-build-win # Windows
|
||||||
|
npm run electron-build-linux # Linux
|
||||||
|
```
|
||||||
|
|
||||||
|
## 贡献指南
|
||||||
|
|
||||||
|
### 提交规范
|
||||||
|
|
||||||
|
- 使用conventional commits格式
|
||||||
|
- 提交前运行lint检查
|
||||||
|
- 更新相关文档
|
||||||
|
- 添加测试用例
|
||||||
|
|
||||||
|
### 分支策略
|
||||||
|
|
||||||
|
- `main`: 主分支,稳定版本
|
||||||
|
- `dev`: 开发分支,集成新功能
|
||||||
|
- `feature/*`: 功能分支
|
||||||
|
- `fix/*`: 修复分支
|
||||||
|
|
||||||
|
---
|
||||||
183
easy-dataset-main/ARCHITECTURE.md
Normal file
183
easy-dataset-main/ARCHITECTURE.md
Normal file
@@ -0,0 +1,183 @@
|
|||||||
|
# Easy DataSet 项目架构设计
|
||||||
|
|
||||||
|
## 项目概述
|
||||||
|
|
||||||
|
Easy DataSet 是一个用于创建大模型微调数据集的应用程序。用户可以上传文本文件,系统会自动分割文本并生成问题,最终生成用于微调的数据集。
|
||||||
|
|
||||||
|
## 技术栈
|
||||||
|
|
||||||
|
- **前端框架**: Next.js 14 (App Router)
|
||||||
|
- **UI 框架**: Material-UI (MUI)
|
||||||
|
- **数据存储**: fs 文件系统模拟数据库
|
||||||
|
- **开发语言**: JavaScript
|
||||||
|
|
||||||
|
## 目录结构
|
||||||
|
|
||||||
|
```
|
||||||
|
easy-dataset/
|
||||||
|
├── app/ # Next.js 应用目录
|
||||||
|
│ ├── api/ # API 路由
|
||||||
|
│ │ └── projects/ # 项目相关 API
|
||||||
|
│ ├── projects/ # 项目相关页面
|
||||||
|
│ │ ├── [projectId]/ # 项目详情页面
|
||||||
|
│ └── page.js # 主页
|
||||||
|
├── components/ # React 组件
|
||||||
|
│ ├── home/ # 主页相关组件
|
||||||
|
│ │ ├── HeroSection.js
|
||||||
|
│ │ ├── ProjectList.js
|
||||||
|
│ │ └── StatsCard.js
|
||||||
|
│ ├── Navbar.js # 导航栏组件
|
||||||
|
│ └── CreateProjectDialog.js
|
||||||
|
├── lib/ # 工具库
|
||||||
|
│ └── db/ # 数据库模块
|
||||||
|
│ ├── base.js # 基础工具函数
|
||||||
|
│ ├── projects.js # 项目管理
|
||||||
|
│ ├── texts.js # 文本处理
|
||||||
|
│ ├── datasets.js # 数据集管理
|
||||||
|
│ └── index.js # 模块导出
|
||||||
|
├── styles/ # 样式文件
|
||||||
|
│ └── home.js # 主页样式
|
||||||
|
└── local-db/ # 本地数据库目录
|
||||||
|
```
|
||||||
|
|
||||||
|
## 核心模块设计
|
||||||
|
|
||||||
|
### 1. 数据库模块 (`lib/db/`)
|
||||||
|
|
||||||
|
#### base.js
|
||||||
|
|
||||||
|
- 提供基础的文件操作功能
|
||||||
|
- 确保数据库目录存在
|
||||||
|
- 读写 JSON 文件的工具函数
|
||||||
|
|
||||||
|
#### projects.js
|
||||||
|
|
||||||
|
- 项目的 CRUD 操作
|
||||||
|
- 项目配置管理
|
||||||
|
- 项目目录结构维护
|
||||||
|
|
||||||
|
#### texts.js
|
||||||
|
|
||||||
|
- 文献处理功能
|
||||||
|
- 文本片段存储和检索
|
||||||
|
- 文件上传处理
|
||||||
|
|
||||||
|
#### datasets.js
|
||||||
|
|
||||||
|
- 数据集生成和管理
|
||||||
|
- 问题列表管理
|
||||||
|
- 标签树管理
|
||||||
|
|
||||||
|
### 2. 前端组件 (`components/`)
|
||||||
|
|
||||||
|
#### Navbar.js
|
||||||
|
|
||||||
|
- 顶部导航栏
|
||||||
|
- 项目切换
|
||||||
|
- 模型选择
|
||||||
|
- 主题切换
|
||||||
|
|
||||||
|
#### home/ 目录组件
|
||||||
|
|
||||||
|
- HeroSection.js: 主页顶部展示区
|
||||||
|
- ProjectList.js: 项目列表展示
|
||||||
|
- StatsCard.js: 数据统计展示
|
||||||
|
- CreateProjectDialog.js: 创建项目的对话框
|
||||||
|
|
||||||
|
### 3. 页面路由 (`app/`)
|
||||||
|
|
||||||
|
#### 主页 (`page.js`)
|
||||||
|
|
||||||
|
- 项目列表展示
|
||||||
|
- 创建项目入口
|
||||||
|
- 数据统计展示
|
||||||
|
|
||||||
|
#### 项目详情页 (`projects/[projectId]/`)
|
||||||
|
|
||||||
|
- text-split/: 文献处理页面
|
||||||
|
- questions/: 问题列表页面
|
||||||
|
- datasets/: 数据集页面
|
||||||
|
- settings/: 项目设置页面
|
||||||
|
|
||||||
|
#### API 路由 (`api/`)
|
||||||
|
|
||||||
|
- projects/: 项目管理 API
|
||||||
|
- texts/: 文本处理 API
|
||||||
|
- questions/: 问题生成 API
|
||||||
|
- datasets/: 数据集管理 API
|
||||||
|
|
||||||
|
## 数据流设计
|
||||||
|
|
||||||
|
### 项目创建流程
|
||||||
|
|
||||||
|
1. 用户通过主页或导航栏创建新项目
|
||||||
|
2. 填写项目基本信息(名称、描述)
|
||||||
|
3. 系统创建项目目录和初始配置文件
|
||||||
|
4. 重定向到项目详情页
|
||||||
|
|
||||||
|
### 文献处理流程
|
||||||
|
|
||||||
|
1. 用户上传 Markdown 文件
|
||||||
|
2. 系统保存原始文件到项目目录
|
||||||
|
3. 调用文本分割服务,生成片段和目录结构
|
||||||
|
4. 展示分割结果和提取的目录
|
||||||
|
|
||||||
|
### 问题生成流程
|
||||||
|
|
||||||
|
1. 用户选择需要生成问题的文本片段
|
||||||
|
2. 系统调用大模型API生成问题
|
||||||
|
3. 保存问题到问题列表和标签树
|
||||||
|
|
||||||
|
### 数据集生成流程
|
||||||
|
|
||||||
|
1. 用户选择需要生成答案的问题
|
||||||
|
2. 系统调用大模型API生成答案
|
||||||
|
3. 保存数据集结果
|
||||||
|
4. 提供导出功能
|
||||||
|
|
||||||
|
## 模型配置
|
||||||
|
|
||||||
|
支持多种大模型提供商配置:
|
||||||
|
|
||||||
|
- Ollama
|
||||||
|
- OpenAI
|
||||||
|
- 硅基流动
|
||||||
|
- 深度求索
|
||||||
|
- 智谱AI
|
||||||
|
|
||||||
|
每个提供商支持配置:
|
||||||
|
|
||||||
|
- API 地址
|
||||||
|
- API 密钥
|
||||||
|
- 模型名称
|
||||||
|
|
||||||
|
## 未来扩展方向
|
||||||
|
|
||||||
|
1. 支持更多文件格式(PDF、DOC等)
|
||||||
|
2. 增加数据集质量评估功能
|
||||||
|
3. 添加数据集版本管理
|
||||||
|
4. 实现团队协作功能
|
||||||
|
5. 增加更多数据集导出格式
|
||||||
|
|
||||||
|
## 国际化处理
|
||||||
|
|
||||||
|
### 技术选型
|
||||||
|
|
||||||
|
- **国际化库**: i18next + react-i18next
|
||||||
|
- **语言检测**: i18next-browser-languagedetector
|
||||||
|
- **支持语言**: 英文(en)、简体中文(zh-CN)
|
||||||
|
|
||||||
|
### 目录结构
|
||||||
|
|
||||||
|
```
|
||||||
|
easy-dataset/
|
||||||
|
├── locales/ # 国际化资源目录
|
||||||
|
│ ├── en/ # 英文翻译
|
||||||
|
│ │ └── translation.json
|
||||||
|
│ ├── zh-CN/ # 中文翻译
|
||||||
|
│ │ └── translation.json
|
||||||
|
│ └── pt-BR/ # 中文翻译
|
||||||
|
│ └── translation.json
|
||||||
|
├── lib/
|
||||||
|
│ └── i18n.js # i18next 配置
|
||||||
|
```
|
||||||
86
easy-dataset-main/Dockerfile
Normal file
86
easy-dataset-main/Dockerfile
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
# 创建包含pnpm的基础镜像
|
||||||
|
FROM node:20-alpine AS pnpm-base
|
||||||
|
RUN npm install -g pnpm@9
|
||||||
|
|
||||||
|
# 构建阶段
|
||||||
|
FROM pnpm-base AS builder
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# 添加构建参数,用于识别目标平台
|
||||||
|
ARG TARGETPLATFORM
|
||||||
|
|
||||||
|
# 安装构建依赖
|
||||||
|
RUN apk add --no-cache --virtual .build-deps \
|
||||||
|
python3 \
|
||||||
|
make \
|
||||||
|
g++ \
|
||||||
|
cairo-dev \
|
||||||
|
pango-dev \
|
||||||
|
jpeg-dev \
|
||||||
|
giflib-dev \
|
||||||
|
librsvg-dev \
|
||||||
|
build-base \
|
||||||
|
pixman-dev \
|
||||||
|
pkgconfig
|
||||||
|
|
||||||
|
# 复制依赖文件和npm配置并安装(.npmrc中可配置国内源加速)
|
||||||
|
COPY package.json pnpm-lock.yaml .npmrc ./
|
||||||
|
RUN pnpm install
|
||||||
|
|
||||||
|
# 复制源代码
|
||||||
|
COPY . .
|
||||||
|
|
||||||
|
# 根据目标平台设置Prisma二进制目标并构建应用
|
||||||
|
RUN if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
|
||||||
|
echo "Configuring for ARM64 platform"; \
|
||||||
|
sed -i 's/binaryTargets = \[.*\]/binaryTargets = \["linux-musl-arm64-openssl-3.0.x"\]/' prisma/schema.prisma; \
|
||||||
|
PRISMA_CLI_BINARY_TARGETS="linux-musl-arm64-openssl-3.0.x" pnpm build; \
|
||||||
|
else \
|
||||||
|
echo "Configuring for AMD64 platform (default)"; \
|
||||||
|
sed -i 's/binaryTargets = \[.*\]/binaryTargets = \["linux-musl-openssl-3.0.x"\]/' prisma/schema.prisma; \
|
||||||
|
PRISMA_CLI_BINARY_TARGETS="linux-musl-openssl-3.0.x" pnpm build; \
|
||||||
|
fi
|
||||||
|
|
||||||
|
# 构建完成后移除开发依赖,只保留生产依赖
|
||||||
|
RUN pnpm prune --prod
|
||||||
|
|
||||||
|
# 运行阶段
|
||||||
|
FROM pnpm-base AS runner
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# 只安装运行时依赖
|
||||||
|
RUN apk add --no-cache \
|
||||||
|
cairo \
|
||||||
|
pango \
|
||||||
|
jpeg \
|
||||||
|
giflib \
|
||||||
|
librsvg \
|
||||||
|
pixman
|
||||||
|
|
||||||
|
# 复制package.json和.env文件
|
||||||
|
COPY package.json .env ./
|
||||||
|
|
||||||
|
# 从构建阶段复制精简后的node_modules(只包含生产依赖)
|
||||||
|
COPY --from=builder /app/node_modules ./node_modules
|
||||||
|
|
||||||
|
# 从构建阶段复制构建产物
|
||||||
|
COPY --from=builder /app/.next ./.next
|
||||||
|
COPY --from=builder /app/public ./public
|
||||||
|
COPY --from=builder /app/electron ./electron
|
||||||
|
|
||||||
|
# 复制 prisma 到模板目录(用于自动初始化)
|
||||||
|
COPY --from=builder /app/prisma /app/prisma-template
|
||||||
|
|
||||||
|
# 复制并设置 entrypoint 脚本(sed 去除 Windows 换行符 \r,防止 CRLF 导致 "no such file or directory")
|
||||||
|
COPY docker-entrypoint.sh /usr/local/bin/
|
||||||
|
RUN sed -i 's/\r$//' /usr/local/bin/docker-entrypoint.sh && \
|
||||||
|
chmod +x /usr/local/bin/docker-entrypoint.sh
|
||||||
|
|
||||||
|
# 设置生产环境
|
||||||
|
ENV NODE_ENV=production
|
||||||
|
|
||||||
|
EXPOSE 1717
|
||||||
|
|
||||||
|
# 使用 entrypoint 脚本
|
||||||
|
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
|
||||||
|
CMD ["pnpm", "start"]
|
||||||
40
easy-dataset-main/LICENSE
Normal file
40
easy-dataset-main/LICENSE
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
GNU AFFERO GENERAL PUBLIC LICENSE
|
||||||
|
Version 3, 19 November 2007
|
||||||
|
|
||||||
|
Copyright (C) 2025 Easy Dataset Project
|
||||||
|
|
||||||
|
This program is free software: you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU Affero General Public License as published
|
||||||
|
by the Free Software Foundation, either version 3 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU Affero General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU Affero General Public License
|
||||||
|
along with this program. If not, see https://www.gnu.org/licenses/.
|
||||||
|
|
||||||
|
Additional Terms for Easy Dataset:
|
||||||
|
|
||||||
|
1. Contact Information
|
||||||
|
If you wish to use Easy Dataset under different terms, please contact the
|
||||||
|
copyright holders at: 1009903985@qq.com
|
||||||
|
|
||||||
|
2. Branding Restrictions
|
||||||
|
You may not use the names "Easy Dataset" or "EasyDataset" to endorse or
|
||||||
|
promote products derived from this software without prior written permission.
|
||||||
|
|
||||||
|
3. Disclaimer of Warranty
|
||||||
|
The software is provided "as is", without warranty of any kind, express or
|
||||||
|
implied, including but not limited to the warranties of merchantability,
|
||||||
|
fitness for a particular purpose and noninfringement. In no event shall the
|
||||||
|
authors or copyright holders be liable for any claim, damages or other
|
||||||
|
liability, whether in an action of contract, tort or otherwise, arising from,
|
||||||
|
out of or in connection with the software or the use or other dealings in the
|
||||||
|
software.
|
||||||
|
|
||||||
|
4. Compliance with Laws
|
||||||
|
You are responsible for ensuring your use of the software complies with all
|
||||||
|
applicable laws, including but not limited to export control regulations.
|
||||||
294
easy-dataset-main/README.md
Normal file
294
easy-dataset-main/README.md
Normal file
@@ -0,0 +1,294 @@
|
|||||||
|
<div align="center">
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ConardLi/easy-dataset">
|
||||||
|
<img alt="GitHub Downloads (all assets, all releases)" src="https://img.shields.io/github/downloads/ConardLi/easy-dataset/total">
|
||||||
|
<img alt="GitHub Release" src="https://img.shields.io/github/v/release/ConardLi/easy-dataset">
|
||||||
|
<img src="https://img.shields.io/badge/license-AGPL--3.0-green.svg" alt="AGPL 3.0 License"/>
|
||||||
|
<img alt="GitHub contributors" src="https://img.shields.io/github/contributors/ConardLi/easy-dataset">
|
||||||
|
<img alt="GitHub last commit" src="https://img.shields.io/github/last-commit/ConardLi/easy-dataset">
|
||||||
|
<a href="https://arxiv.org/abs/2507.04009v1" target="_blank">
|
||||||
|
<img src="https://img.shields.io/badge/arXiv-2507.04009-b31b1b.svg" alt="arXiv:2507.04009">
|
||||||
|
</a>
|
||||||
|
|
||||||
|
<a href="https://trendshift.io/repositories/13944" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13944" alt="ConardLi%2Feasy-dataset | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
|
||||||
|
|
||||||
|
**A powerful tool for creating fine-tuning datasets for Large Language Models**
|
||||||
|
|
||||||
|
[简体中文](./README.zh-CN.md) | [English](./README.md) | [Türkçe](./README.tr.md)
|
||||||
|
|
||||||
|
[Features](#features) • [Quick Start](#local-run) • [Documentation](https://docs.easy-dataset.com/ed/en) • [Contributing](#contributing) • [License](#license)
|
||||||
|
|
||||||
|
If you like this project, please give it a Star⭐️, or buy the author a coffee => [Donate](./public/imgs/aw.jpg) ❤️!
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Easy Dataset is an application specifically designed for building large language model (LLM) datasets. It features an intuitive interface, along with built-in powerful document parsing tools, intelligent segmentation algorithms, data cleaning and augmentation capabilities. The application can convert domain-specific documents in various formats into high-quality structured datasets, which are applicable to scenarios such as model fine-tuning, retrieval-augmented generation (RAG), and model performance evaluation.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## News
|
||||||
|
|
||||||
|
🎉🎉 Easy Dataset Version 1.7.0 launches brand-new evaluation capabilities! You can effortlessly convert domain-specific documents into evaluation datasets (test sets) and automatically run multi-dimensional evaluation tasks. Additionally, it comes with a human blind test system, enabling you to easily meet needs such as vertical domain model evaluation, post-fine-tuning model performance assessment, and RAG recall rate evaluation. Tutorial: [https://www.bilibili.com/video/BV1CRrVB7Eb4/](https://www.bilibili.com/video/BV1CRrVB7Eb4/)
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
### 📄 Document Processing & Data Generation
|
||||||
|
|
||||||
|
- **Intelligent Document Processing**: Supports PDF, Markdown, DOCX, TXT, EPUB and more formats with intelligent recognition
|
||||||
|
- **Intelligent Text Splitting**: Multiple splitting algorithms (Markdown structure, recursive separators, fixed length, code-aware chunking), with customizable visual segmentation
|
||||||
|
- **Intelligent Question Generation**: Auto-extract relevant questions from text segments, with question templates and batch generation
|
||||||
|
- **Domain Label Tree**: Intelligently builds global domain label trees based on document structure, with auto-tagging capabilities
|
||||||
|
- **Answer Generation**: Uses LLM API to generate comprehensive answers and Chain of Thought (COT), with AI optimization
|
||||||
|
- **Data Cleaning**: Intelligent text cleaning to remove noise and improve data quality
|
||||||
|
|
||||||
|
### 🔄 Multiple Dataset Types
|
||||||
|
|
||||||
|
- **Single-Turn QA Datasets**: Standard question-answer pairs for basic fine-tuning
|
||||||
|
- **Multi-Turn Dialogue Datasets**: Customizable roles and scenarios for conversational format
|
||||||
|
- **Image QA Datasets**: Generate visual QA data from images, with multiple import methods (directory, PDF, ZIP)
|
||||||
|
- **Data Distillation**: Generate label trees and questions directly from domain topics without uploading documents
|
||||||
|
|
||||||
|
### 📊 Model Evaluation System
|
||||||
|
|
||||||
|
- **Evaluation Datasets**: Generate true/false, single-choice, multiple-choice, short-answer, and open-ended questions
|
||||||
|
- **Automated Model Evaluation**: Use Judge Model to automatically evaluate model answer quality with customizable scoring rules
|
||||||
|
- **Human Blind Test (Arena)**: Double-blind comparison of two models' answers for unbiased evaluation
|
||||||
|
- **AI Quality Assessment**: Automatic quality scoring and filtering of generated datasets
|
||||||
|
|
||||||
|
### 🛠️ Advanced Features
|
||||||
|
|
||||||
|
- **Custom Prompts**: Project-level customization of all prompt templates (question generation, answer generation, data cleaning, etc.)
|
||||||
|
- **GA Pair Generation**: Genre-Audience pair generation to enrich data diversity
|
||||||
|
- **Task Management Center**: Background batch task processing with monitoring and interruption support
|
||||||
|
- **Resource Monitoring Dashboard**: Token consumption statistics, API call tracking, model performance analysis
|
||||||
|
- **Model Testing Playground**: Compare up to 3 models simultaneously
|
||||||
|
|
||||||
|
### 📤 Export & Integration
|
||||||
|
|
||||||
|
- **Multiple Export Formats**: Alpaca, ShareGPT, Multilingual-Thinking formats with JSON/JSONL file types
|
||||||
|
- **Balanced Export**: Configure export counts per tag for dataset balancing
|
||||||
|
- **LLaMA Factory Integration**: One-click LLaMA Factory configuration file generation
|
||||||
|
- **Hugging Face Upload**: Direct upload datasets to Hugging Face Hub
|
||||||
|
|
||||||
|
### 🤖 Model Support
|
||||||
|
|
||||||
|
- **Wide Model Compatibility**: Compatible with all LLM APIs that follow the OpenAI format
|
||||||
|
- **Multi-Provider Support**: OpenAI, Ollama (local models), Zhipu AI, Alibaba Bailian, OpenRouter, and more
|
||||||
|
- **Vision Models**: Support Gemini, Claude, etc. for PDF parsing and image QA
|
||||||
|
|
||||||
|
### 🌐 User Experience
|
||||||
|
|
||||||
|
- **User-Friendly Interface**: Modern, intuitive UI designed for both technical and non-technical users
|
||||||
|
- **Multi-Language Support**: Complete Chinese, English, Turkish and Portuguese language support 🇹🇷
|
||||||
|
- **Dataset Square**: Discover and explore public dataset resources
|
||||||
|
- **Desktop Clients**: Available for Windows, macOS, and Linux
|
||||||
|
|
||||||
|
## Quick Demo
|
||||||
|
|
||||||
|
https://github.com/user-attachments/assets/6ddb1225-3d1b-4695-90cd-aa4cb01376a8
|
||||||
|
|
||||||
|
## Local Run
|
||||||
|
|
||||||
|
### Download Client
|
||||||
|
|
||||||
|
<table style="width: 100%">
|
||||||
|
<tr>
|
||||||
|
<td width="20%" align="center">
|
||||||
|
<b>Windows</b>
|
||||||
|
</td>
|
||||||
|
<td width="30%" align="center" colspan="2">
|
||||||
|
<b>MacOS</b>
|
||||||
|
</td>
|
||||||
|
<td width="20%" align="center">
|
||||||
|
<b>Linux</b>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="text-align: center">
|
||||||
|
<td align="center" valign="middle">
|
||||||
|
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
|
||||||
|
<img src='./public/imgs/windows.png' style="height:24px; width: 24px" />
|
||||||
|
<br />
|
||||||
|
<b>Setup.exe</b>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
<td align="center" valign="middle">
|
||||||
|
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
|
||||||
|
<img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
|
||||||
|
<br />
|
||||||
|
<b>Intel</b>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
<td align="center" valign="middle">
|
||||||
|
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
|
||||||
|
<img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
|
||||||
|
<br />
|
||||||
|
<b>M</b>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
<td align="center" valign="middle">
|
||||||
|
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
|
||||||
|
<img src='./public/imgs/linux.png' style="height:24px; width: 24px" />
|
||||||
|
<br />
|
||||||
|
<b>AppImage</b>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
### Install with NPM
|
||||||
|
|
||||||
|
1. Clone the repository:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/ConardLi/easy-dataset.git
|
||||||
|
cd easy-dataset
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Install dependencies:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm install
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Start the development server:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run build
|
||||||
|
|
||||||
|
npm run start
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Open your browser and visit `http://localhost:1717`
|
||||||
|
|
||||||
|
### Using the Official Docker Image
|
||||||
|
|
||||||
|
1. Clone the repository:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/ConardLi/easy-dataset.git
|
||||||
|
cd easy-dataset
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Modify the `docker-compose.yml` file:
|
||||||
|
|
||||||
|
```yml
|
||||||
|
services:
|
||||||
|
easy-dataset:
|
||||||
|
image: ghcr.io/conardli/easy-dataset
|
||||||
|
container_name: easy-dataset
|
||||||
|
ports:
|
||||||
|
- '1717:1717'
|
||||||
|
volumes:
|
||||||
|
- ./local-db:/app/local-db
|
||||||
|
- ./prisma:/app/prisma
|
||||||
|
restart: unless-stopped
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Note:** It is recommended to use the `local-db` and `prisma` folders in the current code repository directory as mount paths to maintain consistency with the database paths when starting via NPM.
|
||||||
|
|
||||||
|
> **Note:** The database file will be automatically initialized on first startup, no need to manually run `npm run db:push`.
|
||||||
|
|
||||||
|
3. Start with docker-compose:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Open a browser and visit `http://localhost:1717`
|
||||||
|
|
||||||
|
### Building with a Local Dockerfile
|
||||||
|
|
||||||
|
If you want to build the image yourself, use the Dockerfile in the project root directory:
|
||||||
|
|
||||||
|
1. Clone the repository:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/ConardLi/easy-dataset.git
|
||||||
|
cd easy-dataset
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Build the Docker image:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker build -t easy-dataset .
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Run the container:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run -d \
|
||||||
|
-p 1717:1717 \
|
||||||
|
-v ./local-db:/app/local-db \
|
||||||
|
-v ./prisma:/app/prisma \
|
||||||
|
--name easy-dataset \
|
||||||
|
easy-dataset
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Note:** It is recommended to use the `local-db` and `prisma` folders in the current code repository directory as mount paths to maintain consistency with the database paths when starting via NPM.
|
||||||
|
|
||||||
|
> **Note:** The database file will be automatically initialized on first startup, no need to manually run `npm run db:push`.
|
||||||
|
|
||||||
|
4. Open a browser and visit `http://localhost:1717`
|
||||||
|
|
||||||
|
## Documentation
|
||||||
|
|
||||||
|
- View the demo video of this project: [Easy Dataset Demo Video](https://www.bilibili.com/video/BV1y8QpYGE57/)
|
||||||
|
- For detailed documentation on all features and APIs, visit our [Documentation Site](https://docs.easy-dataset.com/ed/en)
|
||||||
|
- View the paper of this project: [Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents](https://arxiv.org/abs/2507.04009v1)
|
||||||
|
|
||||||
|
## Community Practice
|
||||||
|
|
||||||
|
- [Complete test set generation and model evaluation with Easy Dataset](https://www.bilibili.com/video/BV1CRrVB7Eb4/)
|
||||||
|
- [Easy Dataset × LLaMA Factory: Enabling LLMs to Efficiently Learn Domain Knowledge](https://buaa-act.feishu.cn/wiki/GVzlwYcRFiR8OLkHbL6cQpYin7g)
|
||||||
|
- [Easy Dataset Practical Guide: How to Build High-Quality Datasets?](https://www.bilibili.com/video/BV1MRMnz1EGW)
|
||||||
|
- [Interpretation of Key Feature Updates in Easy Dataset](https://www.bilibili.com/video/BV1fyJhzHEb7/)
|
||||||
|
- [Foundation Models Fine-tuning Datasets: Basic Knowledge Popularization](https://docs.easy-dataset.com/zhi-shi-ke-pu)
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
We welcome contributions from the community! If you'd like to contribute to Easy Dataset, please follow these steps:
|
||||||
|
|
||||||
|
1. Fork the repository
|
||||||
|
2. Create a new branch (`git checkout -b feature/amazing-feature`)
|
||||||
|
3. Make your changes
|
||||||
|
4. Commit your changes (`git commit -m 'Add some amazing feature'`)
|
||||||
|
5. Push to the branch (`git push origin feature/amazing-feature`)
|
||||||
|
6. Open a Pull Request (submit to the DEV branch)
|
||||||
|
|
||||||
|
Please ensure that tests are appropriately updated and adhere to the existing coding style.
|
||||||
|
|
||||||
|
## Join Discussion Group & Contact the Author
|
||||||
|
|
||||||
|
https://docs.easy-dataset.com/geng-duo/lian-xi-wo-men
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
This project is licensed under the AGPL 3.0 License - see the [LICENSE](LICENSE) file for details.
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
If this work is helpful, please kindly cite as:
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{miao2025easydataset,
|
||||||
|
title={Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents},
|
||||||
|
author={Ziyang Miao and Qiyu Sun and Jingyuan Wang and Yuchen Gong and Yaowei Zheng and Shiqi Li and Richong Zhang},
|
||||||
|
year={2025},
|
||||||
|
eprint={2507.04009},
|
||||||
|
archivePrefix={arXiv},
|
||||||
|
primaryClass={cs.CL},
|
||||||
|
url={https://arxiv.org/abs/2507.04009}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Star History
|
||||||
|
|
||||||
|
[](https://www.star-history.com/#ConardLi/easy-dataset&Date)
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
<sub>Built with ❤️ by <a href="https://github.com/ConardLi">ConardLi</a> • Follow me: <a href="./public/imgs/weichat.jpg">WeChat Official Account</a>|<a href="https://space.bilibili.com/474921808">Bilibili</a>|<a href="https://juejin.cn/user/3949101466785709">Juejin</a>|<a href="https://www.zhihu.com/people/wen-ti-chao-ji-duo-de-xiao-qi">Zhihu</a>|<a href="https://www.youtube.com/@garden-conard">Youtube</a></sub>
|
||||||
|
</div>
|
||||||
319
easy-dataset-main/README.tr.md
Normal file
319
easy-dataset-main/README.tr.md
Normal file
@@ -0,0 +1,319 @@
|
|||||||
|
<div align="center">
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ConardLi/easy-dataset">
|
||||||
|
<img alt="GitHub Downloads (all assets, all releases)" src="https://img.shields.io/github/downloads/ConardLi/easy-dataset/total">
|
||||||
|
<img alt="GitHub Release" src="https://img.shields.io/github/v/release/ConardLi/easy-dataset">
|
||||||
|
<img src="https://img.shields.io/badge/license-AGPL--3.0-green.svg" alt="AGPL 3.0 License"/>
|
||||||
|
<img alt="GitHub contributors" src="https://img.shields.io/github/contributors/ConardLi/easy-dataset">
|
||||||
|
<img alt="GitHub last commit" src="https://img.shields.io/github/last-commit/ConardLi/easy-dataset">
|
||||||
|
<a href="https://arxiv.org/abs/2507.04009v1" target="_blank">
|
||||||
|
<img src="https://img.shields.io/badge/arXiv-2507.04009-b31b1b.svg" alt="arXiv:2507.04009">
|
||||||
|
</a>
|
||||||
|
|
||||||
|
<a href="https://trendshift.io/repositories/13944" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13944" alt="ConardLi%2Feasy-dataset | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
|
||||||
|
|
||||||
|
**Büyük Dil Modelleri için ince ayar veri setleri oluşturmak için güçlü bir araç**
|
||||||
|
|
||||||
|
[简体中文](./README.zh-CN.md) | [English](./README.md) | [Türkçe](./README.tr.md)
|
||||||
|
|
||||||
|
[Özellikler](#özellikler) • [Hızlı Başlangıç](#yerel-çalıştırma) • [Dokümantasyon](https://docs.easy-dataset.com/ed/en) • [Katkıda Bulunma](#katkıda-bulunma) • [Lisans](#lisans)
|
||||||
|
|
||||||
|
Bu projeyi beğendiyseniz, lütfen bir Yıldız⭐️ verin veya yazara bir kahve ısmarlayın => [Bağış](./public/imgs/aw.jpg) ❤️!
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
## Genel Bakış
|
||||||
|
|
||||||
|
Easy Dataset, Büyük Dil Modelleri (LLM'ler) için özel olarak tasarlanmış ince ayar veri setleri oluşturmak için bir uygulamadır. Alana özgü dosyaları yüklemek, içeriği akıllıca bölmek, sorular oluşturmak ve model ince ayarı için yüksek kaliteli eğitim verileri üretmek için sezgisel bir arayüz sağlar.
|
||||||
|
|
||||||
|
Easy Dataset ile alan bilgisini yapılandırılmış veri setlerine dönüştürebilir, OpenAI formatını takip eden tüm LLM API'leriyle uyumlu çalışabilir ve ince ayar sürecini basit ve verimli hale getirebilirsiniz.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## Özellikler
|
||||||
|
|
||||||
|
- **Akıllı Belge İşleme**: PDF, Markdown, DOCX dahil birden fazla formatın akıllı tanınması ve işlenmesi desteği
|
||||||
|
- **Akıllı Metin Bölme**: Birden fazla akıllı metin bölme algoritması ve özelleştirilebilir görsel segmentasyon desteği
|
||||||
|
- **Akıllı Soru Üretimi**: Her metin bölümünden ilgili soruları çıkarır
|
||||||
|
- **Alan Etiketleri**: Veri setleri için global alan etiketlerini akıllıca oluşturur, küresel anlama yeteneklerine sahiptir
|
||||||
|
- **Cevap Üretimi**: Kapsamlı cevaplar ve Düşünce Zinciri (COT) oluşturmak için LLM API kullanır
|
||||||
|
- **Esnek Düzenleme**: Sürecin herhangi bir aşamasında soruları, cevapları ve veri setlerini düzenleyin
|
||||||
|
- **Çoklu Dışa Aktarma Formatları**: Veri setlerini çeşitli formatlarda (Alpaca, ShareGPT, çok dilli düşünme) ve dosya türlerinde (JSON, JSONL) dışa aktarın
|
||||||
|
- **Geniş Model Desteği**: OpenAI formatını takip eden tüm LLM API'leriyle uyumlu
|
||||||
|
- **Tam Türkçe Dil Desteği**: Tüm arayüz ve AI işlemleri için eksiksiz Türkçe çeviriler 🇹🇷
|
||||||
|
- **Kullanıcı Dostu Arayüz**: Hem teknik hem de teknik olmayan kullanıcılar için tasarlanmış sezgisel kullanıcı arayüzü
|
||||||
|
- **Özel Sistem İstemleri**: Model yanıtlarını yönlendirmek için özel sistem istemleri ekleyin
|
||||||
|
|
||||||
|
## Hızlı Demo
|
||||||
|
|
||||||
|
https://github.com/user-attachments/assets/6ddb1225-3d1b-4695-90cd-aa4cb01376a8
|
||||||
|
|
||||||
|
## Yerel Çalıştırma
|
||||||
|
|
||||||
|
### İstemciyi İndirin
|
||||||
|
|
||||||
|
<table style="width: 100%">
|
||||||
|
<tr>
|
||||||
|
<td width="20%" align="center">
|
||||||
|
<b>Windows</b>
|
||||||
|
</td>
|
||||||
|
<td width="30%" align="center" colspan="2">
|
||||||
|
<b>MacOS</b>
|
||||||
|
</td>
|
||||||
|
<td width="20%" align="center">
|
||||||
|
<b>Linux</b>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="text-align: center">
|
||||||
|
<td align="center" valign="middle">
|
||||||
|
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
|
||||||
|
<img src='./public/imgs/windows.png' style="height:24px; width: 24px" />
|
||||||
|
<br />
|
||||||
|
<b>Setup.exe</b>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
<td align="center" valign="middle">
|
||||||
|
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
|
||||||
|
<img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
|
||||||
|
<br />
|
||||||
|
<b>Intel</b>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
<td align="center" valign="middle">
|
||||||
|
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
|
||||||
|
<img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
|
||||||
|
<br />
|
||||||
|
<b>M</b>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
<td align="center" valign="middle">
|
||||||
|
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
|
||||||
|
<img src='./public/imgs/linux.png' style="height:24px; width: 24px" />
|
||||||
|
<br />
|
||||||
|
<b>AppImage</b>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
### NPM ile Kurulum
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm install
|
||||||
|
npm run db:push
|
||||||
|
npm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
### Docker ile Kurulum
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
Ardından `http://localhost:1717` adresine gidin.
|
||||||
|
|
||||||
|
## Desteklenen AI Sağlayıcıları
|
||||||
|
|
||||||
|
Easy Dataset, aşağıdakiler dahil olmak üzere birden fazla AI sağlayıcısını destekler:
|
||||||
|
|
||||||
|
- **OpenAI**: GPT-4, GPT-3.5-turbo ve diğer modeller
|
||||||
|
- **Ollama**: Yerel model çalıştırma
|
||||||
|
- **智谱AI (GLM)**: Çince modeller
|
||||||
|
- **OpenRouter**: Çoklu model aggregatör
|
||||||
|
- **Özel API Uç Noktaları**: OpenAI formatını takip eden herhangi bir API
|
||||||
|
|
||||||
|
## Proje Yapısı
|
||||||
|
|
||||||
|
```
|
||||||
|
easy-dataset/
|
||||||
|
├── app/ # Next.js uygulama yönlendiricisi
|
||||||
|
│ ├── api/ # API rotaları
|
||||||
|
│ ├── projects/ # Proje sayfaları
|
||||||
|
│ └── dataset-square/ # Veri seti galerisi
|
||||||
|
├── components/ # React bileşenleri
|
||||||
|
├── lib/ # Temel kütüphaneler
|
||||||
|
│ ├── llm/ # LLM entegrasyonu
|
||||||
|
│ ├── db/ # Veritabanı erişimi
|
||||||
|
│ ├── file/ # Dosya işleme
|
||||||
|
│ └── services/ # İş mantığı
|
||||||
|
├── locales/ # i18n çevirileri
|
||||||
|
│ ├── en/ # İngilizce
|
||||||
|
│ ├── zh-CN/ # Basitleştirilmiş Çince
|
||||||
|
│ └── tr/ # Türkçe
|
||||||
|
├── prisma/ # Veritabanı şeması
|
||||||
|
└── electron/ # Electron masaüstü uygulaması
|
||||||
|
```
|
||||||
|
|
||||||
|
## Kullanım Rehberi
|
||||||
|
|
||||||
|
### 1. Proje Oluşturma
|
||||||
|
|
||||||
|
İlk olarak, yeni bir proje oluşturun ve proje adını, açıklamasını ve diğer temel bilgileri yapılandırın.
|
||||||
|
|
||||||
|
### 2. Dosya Yükleme
|
||||||
|
|
||||||
|
Alana özgü belgelerinizi yükleyin. Desteklenen formatlar:
|
||||||
|
|
||||||
|
- PDF
|
||||||
|
- Markdown (.md)
|
||||||
|
- Microsoft Word (.docx)
|
||||||
|
- EPUB
|
||||||
|
- Düz metin (.txt)
|
||||||
|
|
||||||
|
### 3. Metin Bölme
|
||||||
|
|
||||||
|
Dosyalar aşağıdaki yöntemlerle akıllıca bölünebilir:
|
||||||
|
|
||||||
|
- Doğal dil işleme tabanlı semantik bölme
|
||||||
|
- Özel ayırıcılara dayalı bölme
|
||||||
|
- Karakter sayısına dayalı sabit boyutlu bölme
|
||||||
|
- Manuel görsel bölme
|
||||||
|
|
||||||
|
### 4. Alan Etiketleri Oluşturma
|
||||||
|
|
||||||
|
Sistem, belge içeriğine dayalı olarak otomatik olarak hiyerarşik alan etiketleri oluşturabilir ve iki seviyeyi destekler.
|
||||||
|
|
||||||
|
### 5. Soru Üretimi
|
||||||
|
|
||||||
|
Her metin bloğu için sistem:
|
||||||
|
|
||||||
|
- İçeriğe dayalı alakalı sorular oluşturur
|
||||||
|
- Tür ve hedef kitle perspektifi sorgulamayı destekler
|
||||||
|
- Soru sayısını özelleştirme seçeneği sunar
|
||||||
|
|
||||||
|
### 6. Cevap Üretimi
|
||||||
|
|
||||||
|
Yapılandırılmış LLM API'si kullanarak:
|
||||||
|
|
||||||
|
- Her soru için kapsamlı cevaplar oluşturur
|
||||||
|
- Düşünce Zinciri (COT) üretimini destekler
|
||||||
|
- Farklı cevap şablonları destekler
|
||||||
|
|
||||||
|
### 7. Veri Seti Dışa Aktarma
|
||||||
|
|
||||||
|
Veri setinizi çeşitli formatlarda dışa aktarın:
|
||||||
|
|
||||||
|
- **Alpaca Format**: Basit talimat-takip formatı
|
||||||
|
- **ShareGPT Format**: Çok turlu konuşma formatı
|
||||||
|
- **Çok Dilli Düşünme**: COT ile genişletilmiş format
|
||||||
|
- **Özel Format**: Kendi JSON yapınızı tanımlayın
|
||||||
|
|
||||||
|
Dışa aktarma hedefleri:
|
||||||
|
|
||||||
|
- Yerel dosya sistemi
|
||||||
|
- Hugging Face Hub
|
||||||
|
- LLaMA Factory uyumluluğu
|
||||||
|
|
||||||
|
## Gelişmiş Özellikler
|
||||||
|
|
||||||
|
### Veri Damıtma
|
||||||
|
|
||||||
|
Mevcut veri setlerinden yeni eğitim örnekleri oluşturun:
|
||||||
|
|
||||||
|
- Soru damıtma: Mevcut soru-cevap çiftlerinden yeni sorular oluşturun
|
||||||
|
- Etiket damıtma: Otomatik etiket ve kategorizasyon oluşturma
|
||||||
|
|
||||||
|
### Tür-Hedef Kitle (GA) Çiftleri
|
||||||
|
|
||||||
|
Spesifik içerik stilleri ve hedef kitleler için veri setlerini uyarlayın:
|
||||||
|
|
||||||
|
- Tür: Akademik, teknik, yaratıcı yazma, vb.
|
||||||
|
- Hedef Kitle: Yeni başlayanlar, uzmanlar, öğrenciler, vb.
|
||||||
|
|
||||||
|
### Toplu İşlemler
|
||||||
|
|
||||||
|
Birden fazla öğeye verimli bir şekilde işlem:
|
||||||
|
|
||||||
|
- Toplu soru üretimi
|
||||||
|
- Toplu cevap üretimi
|
||||||
|
- Toplu veri seti dışa aktarma
|
||||||
|
|
||||||
|
### Görev Yönetimi
|
||||||
|
|
||||||
|
Tüm arka plan görevlerini izleyin ve yönetin:
|
||||||
|
|
||||||
|
- Dosya işleme görevleri
|
||||||
|
- Soru üretim görevleri
|
||||||
|
- Cevap üretim görevleri
|
||||||
|
- Dışa aktarma görevleri
|
||||||
|
|
||||||
|
## Yapılandırma
|
||||||
|
|
||||||
|
### LLM API Yapılandırması
|
||||||
|
|
||||||
|
Ayarlar sayfasında LLM API'nizi yapılandırın:
|
||||||
|
|
||||||
|
1. **Sağlayıcı**: OpenAI, Ollama, 智谱AI veya özel seçin
|
||||||
|
2. **API Anahtarı**: API anahtarınızı girin (gerekirse)
|
||||||
|
3. **Model**: Kullanılacak modeli seçin
|
||||||
|
4. **Temel URL**: Özel API'ler için temel URL'yi ayarlayın
|
||||||
|
|
||||||
|
### Görev Ayarları
|
||||||
|
|
||||||
|
Görev yürütme parametrelerini özelleştirin:
|
||||||
|
|
||||||
|
- Soru üretimi için eşzamanlılık
|
||||||
|
- Cevap üretimi için eşzamanlılık
|
||||||
|
- Varsayılan soru sayısı
|
||||||
|
- Varsayılan cevap şablonu
|
||||||
|
|
||||||
|
### Özel İstemler
|
||||||
|
|
||||||
|
Her görev türü için özel sistem istemleri ekleyin:
|
||||||
|
|
||||||
|
- Soru üretim istemi
|
||||||
|
- Cevap üretim istemi
|
||||||
|
- Etiket üretim istemi
|
||||||
|
- Damıtma istemi
|
||||||
|
|
||||||
|
## Katkıda Bulunma
|
||||||
|
|
||||||
|
Katkılara hoş geldiniz! Lütfen şu adımları izleyin:
|
||||||
|
|
||||||
|
1. Repo'yu fork edin
|
||||||
|
2. Bir özellik dalı oluşturun (`git checkout -b feature/amazing-feature`)
|
||||||
|
3. Değişikliklerinizi commit edin (`git commit -m 'Add some amazing feature'`)
|
||||||
|
4. Dala push edin (`git push origin feature/amazing-feature`)
|
||||||
|
5. Bir Pull Request açın
|
||||||
|
|
||||||
|
## Lisans
|
||||||
|
|
||||||
|
Bu proje AGPL-3.0 Lisansı altında lisanslanmıştır. Detaylar için [LICENSE](./LICENSE) dosyasına bakın.
|
||||||
|
|
||||||
|
## İletişim
|
||||||
|
|
||||||
|
- **GitHub Issues**: [Yeni bir sorun oluşturun](https://github.com/ConardLi/easy-dataset/issues)
|
||||||
|
- **Email**: lhj19950927@gmail.com
|
||||||
|
- **WeChat Grubu**: README'deki QR koduna bakın
|
||||||
|
|
||||||
|
## Alıntı
|
||||||
|
|
||||||
|
Bu aracı araştırmanızda kullanırsanız, lütfen şu şekilde alıntı yapın:
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{easy-dataset-2025,
|
||||||
|
title={Easy Dataset: A Tool for Creating Fine-tuning Datasets for Large Language Models},
|
||||||
|
author={Conard Li},
|
||||||
|
year={2025},
|
||||||
|
publisher={GitHub},
|
||||||
|
howpublished={\url{https://github.com/ConardLi/easy-dataset}}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Teşekkürler
|
||||||
|
|
||||||
|
Bu proje aşağıdaki harika açık kaynak projelerini kullanır:
|
||||||
|
|
||||||
|
- [Next.js](https://nextjs.org/)
|
||||||
|
- [React](https://reactjs.org/)
|
||||||
|
- [Material-UI](https://mui.com/)
|
||||||
|
- [Prisma](https://www.prisma.io/)
|
||||||
|
- [Electron](https://www.electronjs.org/)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
⭐️ Bu projeyi beğendiyseniz, lütfen bir yıldız verin! ⭐️
|
||||||
|
</div>
|
||||||
300
easy-dataset-main/README.zh-CN.md
Normal file
300
easy-dataset-main/README.zh-CN.md
Normal file
@@ -0,0 +1,300 @@
|
|||||||
|
<div align="center">
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ConardLi/easy-dataset">
|
||||||
|
<img alt="GitHub Downloads (all assets, all releases)" src="https://img.shields.io/github/downloads/ConardLi/easy-dataset/total">
|
||||||
|
<img alt="GitHub Release" src="https://img.shields.io/github/v/release/ConardLi/easy-dataset">
|
||||||
|
<img src="https://img.shields.io/badge/license-AGPL--3.0-green.svg" alt="AGPL 3.0 License"/>
|
||||||
|
<img alt="GitHub contributors" src="https://img.shields.io/github/contributors/ConardLi/easy-dataset">
|
||||||
|
<img alt="GitHub last commit" src="https://img.shields.io/github/last-commit/ConardLi/easy-dataset">
|
||||||
|
<a href="https://arxiv.org/abs/2507.04009v1" target="_blank">
|
||||||
|
<img src="https://img.shields.io/badge/arXiv-2507.04009-b31b1b.svg" alt="arXiv:2507.04009">
|
||||||
|
</a>
|
||||||
|
|
||||||
|
<a href="https://trendshift.io/repositories/13944" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13944" alt="ConardLi%2Feasy-dataset | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
|
||||||
|
|
||||||
|
**一个强大的大型语言模型微调数据集创建工具**
|
||||||
|
|
||||||
|
[简体中文](./README.zh-CN.md) | [English](./README.md)
|
||||||
|
|
||||||
|
[功能特点](#功能特点) • [快速开始](#本地运行) • [使用文档](https://docs.easy-dataset.com/) • [贡献](#贡献) • [许可证](#许可证)
|
||||||
|
|
||||||
|
如果喜欢本项目,请给本项目留下 Star⭐️,或者请作者喝杯咖啡呀 => [打赏作者](./public/imgs/aw.jpg) ❤️!
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
## 概述
|
||||||
|
|
||||||
|
Easy Dataset 是一个专为创建大型语言模型数据集而设计的应用程序。它提供了直观的界面,内置了强大的文档解析工具、智能分割算法、数据清洗和数据增强能力,可以将各种格式的领域文献转化为高质量结构化数据集,可用于模型微调、RAG、模型效果评估等场景。
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## 新闻
|
||||||
|
|
||||||
|
🎉🎉 Easy Dataset 1.7.0 版本上线全新的评估能力,你可以轻松将领域文献转换为评估数据集(测试集),并且可以自动执行多维度评估任务,另外还配备人工盲测系统,可以轻松助你完成垂直领域模型评估、模型微调后效果评估、RAG 召回率评估等需求,使用教程: [https://www.bilibili.com/video/BV1CRrVB7Eb4/](https://www.bilibili.com/video/BV1CRrVB7Eb4/)
|
||||||
|
|
||||||
|
## 功能特点
|
||||||
|
|
||||||
|
### 📄 文档处理与数据生成
|
||||||
|
|
||||||
|
- **智能文档处理**:支持 PDF、Markdown、DOCX、TXT、EPUB 等多种格式智能识别和处理
|
||||||
|
- **智能文本分割**:支持多种智能文本分割算法(Markdown 结构、递归分隔符、固定长度、代码智能分块等),支持自定义可视化分段
|
||||||
|
- **智能问题生成**:从每个文本片段中自动提取相关问题,支持问题模板和批量生成
|
||||||
|
- **领域标签树**:基于文档目录智能构建全局领域标签树,具备全局理解和自动打标能力
|
||||||
|
- **答案生成**:使用 LLM API 为每个问题生成全面的答案和思维链(COT),支持 AI 智能优化
|
||||||
|
- **数据清洗**:智能清洗文本块内容,去除噪音数据,提升数据质量
|
||||||
|
|
||||||
|
### 🔄 多种数据集类型
|
||||||
|
|
||||||
|
- **单轮问答数据集**:标准的问答对格式,适合基础微调
|
||||||
|
- **多轮对话数据集**:支持自定义角色和场景的多轮对话格式
|
||||||
|
- **图片问答数据集**:基于图片生成视觉问答数据,支持多种导入方式(目录、PDF、压缩包)
|
||||||
|
- **数据蒸馏**:无需上传文档,直接从领域主题自动生成标签树和问题
|
||||||
|
|
||||||
|
### 📊 模型评估体系
|
||||||
|
|
||||||
|
- **评估数据集**:支持生成判断题、单选题、多选题、简答题、开放题等多种题型的评估测试集
|
||||||
|
- **模型自动评估**:使用教师模型(Judge Model)自动评估模型回答质量,支持自定义评分规则
|
||||||
|
- **人工盲测 (Arena)**:双盲对比两个模型的回答质量,消除偏见进行公正评判
|
||||||
|
- **AI 质量评估**:对生成的数据集进行自动质量评分和筛选
|
||||||
|
|
||||||
|
### 🛠️ 高级功能
|
||||||
|
|
||||||
|
- **自定义提示词**:项目级自定义各类提示词模板(问题生成、答案生成、数据清洗等)
|
||||||
|
- **GA 组合生成**:文体-受众对生成,丰富数据多样性
|
||||||
|
- **任务管理中心**:后台批量任务处理,支持任务监控和中断
|
||||||
|
- **资源监控看板**:Token 消耗统计、调用次数追踪、模型性能分析
|
||||||
|
- **模型测试 Playground**:支持最多 3 个模型同时对比测试
|
||||||
|
|
||||||
|
### 📤 导出与集成
|
||||||
|
|
||||||
|
- **多种导出格式**:支持 Alpaca、ShareGPT、Multilingual-Thinking 等格式,JSON/JSONL 文件类型
|
||||||
|
- **平衡导出**:按标签配置导出数量,实现数据集均衡
|
||||||
|
- **LLaMA Factory 集成**:一键生成 LLaMA Factory 配置文件
|
||||||
|
- **Hugging Face 上传**:直接将数据集上传至 Hugging Face Hub
|
||||||
|
|
||||||
|
### 🤖 模型支持
|
||||||
|
|
||||||
|
- **广泛的模型兼容**:兼容所有遵循 OpenAI 格式的 LLM API
|
||||||
|
- **多提供商支持**:OpenAI、Ollama(本地模型)、智谱 AI、阿里百炼、OpenRouter 等
|
||||||
|
- **视觉模型**:支持 Gemini、Claude 等视觉模型用于 PDF 解析和图片问答
|
||||||
|
|
||||||
|
### 🌐 用户体验
|
||||||
|
|
||||||
|
- **用户友好界面**:为技术和非技术用户设计的现代化直观 UI
|
||||||
|
- **多语言支持**:完整的中英文界面支持
|
||||||
|
- **数据集广场**:发现和探索各种公开数据集资源
|
||||||
|
- **桌面客户端**:提供 Windows、macOS、Linux 桌面应用
|
||||||
|
|
||||||
|
## 快速演示
|
||||||
|
|
||||||
|
https://github.com/user-attachments/assets/6ddb1225-3d1b-4695-90cd-aa4cb01376a8
|
||||||
|
|
||||||
|
## 本地运行
|
||||||
|
|
||||||
|
### 下载客户端
|
||||||
|
|
||||||
|
<table style="width: 100%">
|
||||||
|
<tr>
|
||||||
|
<td width="20%" align="center">
|
||||||
|
<b>Windows</b>
|
||||||
|
</td>
|
||||||
|
<td width="30%" align="center" colspan="2">
|
||||||
|
<b>MacOS</b>
|
||||||
|
</td>
|
||||||
|
<td width="20%" align="center">
|
||||||
|
<b>Linux</b>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr style="text-align: center">
|
||||||
|
<td align="center" valign="middle">
|
||||||
|
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
|
||||||
|
<img src='./public/imgs/windows.png' style="height:24px; width: 24px" />
|
||||||
|
<br />
|
||||||
|
<b>Setup.exe</b>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
<td align="center" valign="middle">
|
||||||
|
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
|
||||||
|
<img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
|
||||||
|
<br />
|
||||||
|
<b>Intel</b>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
<td align="center" valign="middle">
|
||||||
|
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
|
||||||
|
<img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
|
||||||
|
<br />
|
||||||
|
<b>M</b>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
<td align="center" valign="middle">
|
||||||
|
<a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
|
||||||
|
<img src='./public/imgs/linux.png' style="height:24px; width: 24px" />
|
||||||
|
<br />
|
||||||
|
<b>AppImage</b>
|
||||||
|
</a>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
### 使用 NPM 安装
|
||||||
|
|
||||||
|
1. 克隆仓库:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/ConardLi/easy-dataset.git
|
||||||
|
cd easy-dataset
|
||||||
|
```
|
||||||
|
|
||||||
|
2. 安装依赖:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm install
|
||||||
|
```
|
||||||
|
|
||||||
|
3. 启动开发服务器:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run build
|
||||||
|
|
||||||
|
npm run start
|
||||||
|
```
|
||||||
|
|
||||||
|
4. 打开浏览器并访问 `http://localhost:1717`
|
||||||
|
|
||||||
|
### 使用官方 Docker 镜像
|
||||||
|
|
||||||
|
1. 克隆仓库:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/ConardLi/easy-dataset.git
|
||||||
|
cd easy-dataset
|
||||||
|
```
|
||||||
|
|
||||||
|
2. 更改 `docker-compose.yml` 文件:
|
||||||
|
|
||||||
|
```yml
|
||||||
|
services:
|
||||||
|
easy-dataset:
|
||||||
|
image: ghcr.io/conardli/easy-dataset
|
||||||
|
container_name: easy-dataset
|
||||||
|
ports:
|
||||||
|
- '1717:1717'
|
||||||
|
volumes:
|
||||||
|
- ./local-db:/app/local-db
|
||||||
|
- ./prisma:/app/prisma
|
||||||
|
restart: unless-stopped
|
||||||
|
```
|
||||||
|
|
||||||
|
> **注意:** 建议直接使用当前代码仓库目录下的 `local-db` 和 `prisma` 文件夹作为挂载路径,这样可以和 NPM 启动时的数据库路径保持一致。
|
||||||
|
|
||||||
|
> **注意:** 数据库文件会在首次启动时自动初始化,无需手动执行 `npm run db:push`。
|
||||||
|
|
||||||
|
3. 使用 docker-compose 启动
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
4. 打开浏览器并访问 `http://localhost:1717`
|
||||||
|
|
||||||
|
### 使用本地 Dockerfile 构建
|
||||||
|
|
||||||
|
如果你想自行构建镜像,可以使用项目根目录中的 Dockerfile:
|
||||||
|
|
||||||
|
1. 克隆仓库:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/ConardLi/easy-dataset.git
|
||||||
|
cd easy-dataset
|
||||||
|
```
|
||||||
|
|
||||||
|
2. 构建 Docker 镜像:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker build -t easy-dataset .
|
||||||
|
```
|
||||||
|
|
||||||
|
3. 运行容器:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run -d \
|
||||||
|
-p 1717:1717 \
|
||||||
|
-v ./local-db:/app/local-db \
|
||||||
|
-v ./prisma:/app/prisma \
|
||||||
|
--name easy-dataset \
|
||||||
|
easy-dataset
|
||||||
|
```
|
||||||
|
|
||||||
|
> **注意:** 建议直接使用当前代码仓库目录下的 `local-db` 和 `prisma` 文件夹作为挂载路径,这样可以和 NPM 启动时的数据库路径保持一致。
|
||||||
|
|
||||||
|
> **注意:** 数据库文件会在首次启动时自动初始化,无需手动执行 `npm run db:push`。
|
||||||
|
|
||||||
|
4. 打开浏览器,访问 `http://localhost:1717`
|
||||||
|
|
||||||
|
## 文档
|
||||||
|
|
||||||
|
- 有关所有功能和 API 的详细文档,请访问我们的 [文档站点](https://docs.easy-dataset.com/)
|
||||||
|
- 查看本项目的演示视频:[Easy Dataset 演示视频](https://www.bilibili.com/video/BV1y8QpYGE57/)
|
||||||
|
- 查看本项目的论文:[Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents](https://arxiv.org/abs/2507.04009v1)
|
||||||
|
|
||||||
|
## 社区教程
|
||||||
|
|
||||||
|
- [使用 Easy Dataset 完成测试集生成和模型评估](https://www.bilibili.com/video/BV1CRrVB7Eb4/)
|
||||||
|
- [Easy Dataset × LLaMA Factory: 让大模型高效学习领域知识](https://buaa-act.feishu.cn/wiki/KY9xwTGs1iqHrRkjXBwcZP9WnL9)
|
||||||
|
- [Easy Dataset 使用实战: 如何构建高质量数据集?](https://www.bilibili.com/video/BV1MRMnz1EGW)
|
||||||
|
- [Easy Dataset 1.4 重点功能更新解读](https://www.bilibili.com/video/BV1fyJhzHEb7/)
|
||||||
|
- [Easy Dataset 1.6 重点功能更新解读](https://www.bilibili.com/video/BV1Rq1hBtEJa/)
|
||||||
|
- [大模型微调数据集: 基础知识科普](https://docs.easy-dataset.com/zhi-shi-ke-pu)
|
||||||
|
- [实战案例1:生成汽车图片识别数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-1-sheng-cheng-qi-che-tu-pian-shi-bie-shu-ju-ji)
|
||||||
|
- [实战案例2:评论情感分类数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-2-ping-lun-qing-gan-fen-lei-shu-ju-ji)
|
||||||
|
- [实战案例3:物理学多轮对话数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-3-wu-li-xue-duo-lun-dui-hua-shu-ju-ji)
|
||||||
|
- [实战案例4:AI 智能体安全数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-4ai-zhi-neng-ti-an-quan-shu-ju-ji)
|
||||||
|
- [实战案例5:从图文 PPT 中提取数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-5-cong-tu-wen-ppt-zhong-ti-qu-shu-ju-ji)
|
||||||
|
|
||||||
|
## 贡献
|
||||||
|
|
||||||
|
我们欢迎社区的贡献!如果您想为 Easy Dataset 做出贡献,请按照以下步骤操作:
|
||||||
|
|
||||||
|
1. Fork 仓库
|
||||||
|
2. 创建新分支(`git checkout -b feature/amazing-feature`)
|
||||||
|
3. 进行更改
|
||||||
|
4. 提交更改(`git commit -m '添加一些惊人的功能'`)
|
||||||
|
5. 推送到分支(`git push origin feature/amazing-feature`)
|
||||||
|
6. 打开 Pull Request(提交至 DEV 分支)
|
||||||
|
|
||||||
|
请确保适当更新测试并遵守现有的编码风格。
|
||||||
|
|
||||||
|
## 加交流群 & 联系作者
|
||||||
|
|
||||||
|
https://docs.easy-dataset.com/geng-duo/lian-xi-wo-men
|
||||||
|
|
||||||
|
## 许可证
|
||||||
|
|
||||||
|
本项目采用 AGPL 3.0 许可证 - 有关详细信息,请参阅 [LICENSE](LICENSE) 文件。
|
||||||
|
|
||||||
|
## 引用
|
||||||
|
|
||||||
|
如果您觉得此项目有帮助,请考虑以下列格式引用
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{miao2025easydataset,
|
||||||
|
title={Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents},
|
||||||
|
author={Ziyang Miao and Qiyu Sun and Jingyuan Wang and Yuchen Gong and Yaowei Zheng and Shiqi Li and Richong Zhang},
|
||||||
|
year={2025},
|
||||||
|
eprint={2507.04009},
|
||||||
|
archivePrefix={arXiv},
|
||||||
|
primaryClass={cs.CL},
|
||||||
|
url={https://arxiv.org/abs/2507.04009}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Star History
|
||||||
|
|
||||||
|
[](https://www.star-history.com/#ConardLi/easy-dataset&Date)
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
<sub>由 <a href="https://github.com/ConardLi">ConardLi</a> 用 ❤️ 构建 • 关注我:<a href="./public/imgs/weichat.jpg">公众号</a>|<a href="https://space.bilibili.com/474921808">B站</a>|<a href="https://juejin.cn/user/3949101466785709">掘金</a>|<a href="https://www.zhihu.com/people/wen-ti-chao-ji-duo-de-xiao-qi">知乎</a>|<a href="https://www.youtube.com/@garden-conard">Youtube</a></sub>
|
||||||
|
</div>
|
||||||
86
easy-dataset-main/app/api/check-update/route.js
Normal file
86
easy-dataset-main/app/api/check-update/route.js
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import path from 'path';
|
||||||
|
import fs from 'fs';
|
||||||
|
|
||||||
|
// Get current version
|
||||||
|
function getCurrentVersion() {
|
||||||
|
try {
|
||||||
|
const packageJsonPath = path.join(process.cwd(), 'package.json');
|
||||||
|
const packageJson = JSON.parse(fs.readFileSync(packageJsonPath, 'utf8'));
|
||||||
|
return packageJson.version;
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to read version from package.json:', String(error));
|
||||||
|
return '1.0.0';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get latest version from GitHub
|
||||||
|
async function getLatestVersion() {
|
||||||
|
try {
|
||||||
|
const owner = 'ConardLi';
|
||||||
|
const repo = 'easy-dataset';
|
||||||
|
const response = await fetch(`https://api.github.com/repos/${owner}/${repo}/releases/latest`);
|
||||||
|
|
||||||
|
if (!response.ok) {
|
||||||
|
throw new Error(`GitHub API request failed: ${response.status}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const data = await response.json();
|
||||||
|
return data.tag_name.replace('v', '');
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to fetch latest version:', String(error));
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check for updates
|
||||||
|
export async function GET() {
|
||||||
|
try {
|
||||||
|
const currentVersion = getCurrentVersion();
|
||||||
|
const latestVersion = await getLatestVersion();
|
||||||
|
|
||||||
|
if (!latestVersion) {
|
||||||
|
return NextResponse.json({
|
||||||
|
hasUpdate: false,
|
||||||
|
currentVersion,
|
||||||
|
latestVersion: null,
|
||||||
|
error: 'Failed to fetch latest version'
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Simple semver-like comparison
|
||||||
|
const hasUpdate = compareVersions(latestVersion, currentVersion) > 0;
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
hasUpdate,
|
||||||
|
currentVersion,
|
||||||
|
latestVersion,
|
||||||
|
releaseUrl: hasUpdate ? `https://github.com/ConardLi/easy-dataset/releases/tag/v${latestVersion}` : null
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to check for updates:', String(error));
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
hasUpdate: false,
|
||||||
|
error: 'Failed to check for updates'
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Simple version comparison
|
||||||
|
function compareVersions(a, b) {
|
||||||
|
const partsA = a.split('.').map(Number);
|
||||||
|
const partsB = b.split('.').map(Number);
|
||||||
|
|
||||||
|
for (let i = 0; i < Math.max(partsA.length, partsB.length); i++) {
|
||||||
|
const numA = i < partsA.length ? partsA[i] : 0;
|
||||||
|
const numB = i < partsB.length ? partsB[i] : 0;
|
||||||
|
|
||||||
|
if (numA > numB) return 1;
|
||||||
|
if (numA < numB) return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
75
easy-dataset-main/app/api/llm/fetch-models/route.js
Normal file
75
easy-dataset-main/app/api/llm/fetch-models/route.js
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import axios from 'axios';
|
||||||
|
|
||||||
|
// Fetch model list from provider
|
||||||
|
export async function POST(request) {
|
||||||
|
try {
|
||||||
|
const { endpoint, providerId, apiKey } = await request.json();
|
||||||
|
|
||||||
|
if (!endpoint) {
|
||||||
|
return NextResponse.json({ error: 'Missing required parameter: endpoint' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
let url = endpoint.replace(/\/$/, ''); // Remove trailing slash
|
||||||
|
|
||||||
|
// Handle Ollama endpoint
|
||||||
|
if (providerId === 'ollama') {
|
||||||
|
// Remove possible /v1 or other version suffix
|
||||||
|
url = url.replace(/\/v\d+$/, '');
|
||||||
|
|
||||||
|
// Append /api if missing
|
||||||
|
if (!url.includes('/api')) {
|
||||||
|
url += '/api';
|
||||||
|
}
|
||||||
|
url += '/tags';
|
||||||
|
} else {
|
||||||
|
url += '/models';
|
||||||
|
}
|
||||||
|
|
||||||
|
const headers = {};
|
||||||
|
if (apiKey) {
|
||||||
|
headers.Authorization = `Bearer ${apiKey}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
const response = await axios.get(url, { headers });
|
||||||
|
|
||||||
|
// Format response per provider
|
||||||
|
let formattedModels = [];
|
||||||
|
if (providerId === 'ollama') {
|
||||||
|
// Ollama /api/tags format: { models: [{ name: 'model-name', ... }] }
|
||||||
|
if (response.data.models && Array.isArray(response.data.models)) {
|
||||||
|
formattedModels = response.data.models.map(item => ({
|
||||||
|
modelId: item.name,
|
||||||
|
modelName: item.name,
|
||||||
|
providerId
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
// Default handling (OpenAI-compatible)
|
||||||
|
if (response.data.data && Array.isArray(response.data.data)) {
|
||||||
|
formattedModels = response.data.data.map(item => ({
|
||||||
|
modelId: item.id,
|
||||||
|
modelName: item.id,
|
||||||
|
providerId
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json(formattedModels);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to fetch model list:', String(error));
|
||||||
|
|
||||||
|
// Handle known error shapes
|
||||||
|
if (error.response) {
|
||||||
|
if (error.response.status === 401) {
|
||||||
|
return NextResponse.json({ error: 'Invalid API key' }, { status: 401 });
|
||||||
|
}
|
||||||
|
return NextResponse.json(
|
||||||
|
{ error: `Failed to fetch model list: ${error.response.statusText}` },
|
||||||
|
{ status: error.response.status }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json({ error: `Failed to fetch model list: ${error.message}` }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
39
easy-dataset-main/app/api/llm/model/route.js
Normal file
39
easy-dataset-main/app/api/llm/model/route.js
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getLlmModelsByProviderId } from '@/lib/db/llm-models';
|
||||||
|
|
||||||
|
// Get LLM models
|
||||||
|
export async function GET(request) {
|
||||||
|
try {
|
||||||
|
const searchParams = request.nextUrl.searchParams;
|
||||||
|
let providerId = searchParams.get('providerId');
|
||||||
|
if (!providerId) {
|
||||||
|
return NextResponse.json({ error: 'Invalid parameters' }, { status: 400 });
|
||||||
|
}
|
||||||
|
const models = await getLlmModelsByProviderId(providerId);
|
||||||
|
if (!models) {
|
||||||
|
return NextResponse.json({ error: 'LLM provider not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
return NextResponse.json(models);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Database query error:', String(error));
|
||||||
|
return NextResponse.json({ error: 'Database query failed' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Sync latest model list
|
||||||
|
export async function POST(request) {
|
||||||
|
try {
|
||||||
|
const { newModels, providerId } = await request.json();
|
||||||
|
const models = await getLlmModelsByProviderId(providerId);
|
||||||
|
const existingModelIds = models.map(model => model.modelId);
|
||||||
|
const diffModels = newModels.filter(item => !existingModelIds.includes(item.modelId));
|
||||||
|
if (diffModels.length > 0) {
|
||||||
|
// return NextResponse.json(await createLlmModels(diffModels));
|
||||||
|
return NextResponse.json({ message: 'No new models to insert' }, { status: 200 });
|
||||||
|
} else {
|
||||||
|
return NextResponse.json({ message: 'No new models to insert' }, { status: 200 });
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
return NextResponse.json({ error: 'Database insert failed' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
26
easy-dataset-main/app/api/llm/ollama/models/route.js
Normal file
26
easy-dataset-main/app/api/llm/ollama/models/route.js
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
|
||||||
|
const OllamaClient = require('@/lib/llm/core/providers/ollama');
|
||||||
|
|
||||||
|
// Force dynamic route to prevent static generation
|
||||||
|
export const dynamic = 'force-dynamic';
|
||||||
|
|
||||||
|
export async function GET(request) {
|
||||||
|
try {
|
||||||
|
// Read host and port from query params
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const host = searchParams.get('host') || '127.0.0.1';
|
||||||
|
const port = searchParams.get('port') || '11434';
|
||||||
|
|
||||||
|
// Create Ollama API client
|
||||||
|
const ollama = new OllamaClient({
|
||||||
|
endpoint: `http://${host}:${port}/api`
|
||||||
|
});
|
||||||
|
// Fetch model list
|
||||||
|
const models = await ollama.getModels();
|
||||||
|
return NextResponse.json(models);
|
||||||
|
} catch (error) {
|
||||||
|
// console.error('fetch Ollama models error:', error);
|
||||||
|
return NextResponse.json({ error: 'fetch Models failed' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
14
easy-dataset-main/app/api/llm/providers/route.js
Normal file
14
easy-dataset-main/app/api/llm/providers/route.js
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getLlmProviders } from '@/lib/db/llm-providers';
|
||||||
|
import { sortProvidersByPriority } from '@/lib/util/providerLogo';
|
||||||
|
|
||||||
|
// Get LLM provider data
|
||||||
|
export async function GET() {
|
||||||
|
try {
|
||||||
|
const result = await getLlmProviders();
|
||||||
|
return NextResponse.json(sortProvidersByPriority(result, item => item.id));
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Database query error:', String(error));
|
||||||
|
return NextResponse.json({ error: 'Database query failed' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
107
easy-dataset-main/app/api/monitoring/logs/route.js
Normal file
107
easy-dataset-main/app/api/monitoring/logs/route.js
Normal file
@@ -0,0 +1,107 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { db } from '@/lib/db';
|
||||||
|
|
||||||
|
export const dynamic = 'force-dynamic';
|
||||||
|
|
||||||
|
export async function GET(request) {
|
||||||
|
try {
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const timeRange = searchParams.get('timeRange') || '7d';
|
||||||
|
const projectId = searchParams.get('projectId');
|
||||||
|
const provider = searchParams.get('provider');
|
||||||
|
const status = searchParams.get('status');
|
||||||
|
const page = parseInt(searchParams.get('page') || '1', 10);
|
||||||
|
const pageSize = parseInt(searchParams.get('pageSize') || '10', 10);
|
||||||
|
const searchTerm = searchParams.get('search') || '';
|
||||||
|
|
||||||
|
let startDate = new Date();
|
||||||
|
|
||||||
|
if (timeRange === '24h') {
|
||||||
|
startDate.setHours(startDate.getHours() - 24);
|
||||||
|
} else if (timeRange === '30d') {
|
||||||
|
startDate.setDate(startDate.getDate() - 30);
|
||||||
|
} else {
|
||||||
|
startDate.setDate(startDate.getDate() - 7);
|
||||||
|
}
|
||||||
|
|
||||||
|
const where = {
|
||||||
|
createAt: {
|
||||||
|
gte: startDate
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
if (projectId && projectId !== 'all') {
|
||||||
|
where.projectId = projectId;
|
||||||
|
}
|
||||||
|
if (provider && provider !== 'all') {
|
||||||
|
where.provider = provider;
|
||||||
|
}
|
||||||
|
if (status && status !== 'all') {
|
||||||
|
where.status = status;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (searchTerm) {
|
||||||
|
where.OR = [{ model: { contains: searchTerm } }, { errorMessage: { contains: searchTerm } }];
|
||||||
|
}
|
||||||
|
|
||||||
|
const total = await db.llmUsageLogs.count({ where });
|
||||||
|
const logs = await db.llmUsageLogs.findMany({
|
||||||
|
where,
|
||||||
|
select: {
|
||||||
|
id: true,
|
||||||
|
projectId: true,
|
||||||
|
provider: true,
|
||||||
|
model: true,
|
||||||
|
inputTokens: true,
|
||||||
|
outputTokens: true,
|
||||||
|
totalTokens: true,
|
||||||
|
latency: true,
|
||||||
|
status: true,
|
||||||
|
errorMessage: true,
|
||||||
|
createAt: true
|
||||||
|
},
|
||||||
|
orderBy: {
|
||||||
|
createAt: 'desc'
|
||||||
|
},
|
||||||
|
skip: (page - 1) * pageSize,
|
||||||
|
take: pageSize
|
||||||
|
});
|
||||||
|
|
||||||
|
const projectIds = [...new Set(logs.map(log => log.projectId))];
|
||||||
|
const projects = await db.projects.findMany({
|
||||||
|
where: { id: { in: projectIds } },
|
||||||
|
select: { id: true, name: true }
|
||||||
|
});
|
||||||
|
const projectMap = projects.reduce((acc, p) => {
|
||||||
|
acc[p.id] = p.name;
|
||||||
|
return acc;
|
||||||
|
}, {});
|
||||||
|
|
||||||
|
const details = logs.map(log => ({
|
||||||
|
id: log.id,
|
||||||
|
projectId: log.projectId,
|
||||||
|
projectName: projectMap[log.projectId] || 'Unknown Project',
|
||||||
|
provider: log.provider,
|
||||||
|
model: log.model,
|
||||||
|
status: log.status,
|
||||||
|
failureReason: log.errorMessage,
|
||||||
|
inputTokens: log.inputTokens,
|
||||||
|
outputTokens: log.outputTokens,
|
||||||
|
totalTokens: log.totalTokens,
|
||||||
|
calls: 1, // Single record
|
||||||
|
avgLatency: log.status === 'SUCCESS' ? (log.latency / 1000).toFixed(2) + 's' : '-',
|
||||||
|
createAt: log.createAt
|
||||||
|
}));
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
details,
|
||||||
|
total,
|
||||||
|
page,
|
||||||
|
pageSize,
|
||||||
|
totalPages: Math.ceil(total / pageSize)
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to fetch monitoring logs:', error);
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
188
easy-dataset-main/app/api/monitoring/stats/route.js
Normal file
188
easy-dataset-main/app/api/monitoring/stats/route.js
Normal file
@@ -0,0 +1,188 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { db } from '@/lib/db';
|
||||||
|
|
||||||
|
export const dynamic = 'force-dynamic';
|
||||||
|
|
||||||
|
export async function GET(request) {
|
||||||
|
try {
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const timeRange = searchParams.get('timeRange') || '7d'; // 24h, 7d, 30d
|
||||||
|
const projectId = searchParams.get('projectId');
|
||||||
|
const provider = searchParams.get('provider');
|
||||||
|
const status = searchParams.get('status');
|
||||||
|
|
||||||
|
let startDate = new Date();
|
||||||
|
|
||||||
|
if (timeRange === '24h') {
|
||||||
|
startDate.setHours(startDate.getHours() - 24);
|
||||||
|
} else if (timeRange === '30d') {
|
||||||
|
startDate.setDate(startDate.getDate() - 30);
|
||||||
|
} else {
|
||||||
|
startDate.setDate(startDate.getDate() - 7);
|
||||||
|
}
|
||||||
|
|
||||||
|
const where = {
|
||||||
|
createAt: {
|
||||||
|
gte: startDate
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
if (projectId && projectId !== 'all') {
|
||||||
|
where.projectId = projectId;
|
||||||
|
}
|
||||||
|
if (provider && provider !== 'all') {
|
||||||
|
where.provider = provider;
|
||||||
|
}
|
||||||
|
if (status && status !== 'all') {
|
||||||
|
where.status = status;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 1. Fetch data for aggregation
|
||||||
|
// Note: Prisma aggregation can be slow on very large datasets. If needed, optimize with pre-aggregated tables.
|
||||||
|
const logs = await db.llmUsageLogs.findMany({
|
||||||
|
where,
|
||||||
|
select: {
|
||||||
|
id: true,
|
||||||
|
projectId: true,
|
||||||
|
provider: true,
|
||||||
|
model: true,
|
||||||
|
inputTokens: true,
|
||||||
|
outputTokens: true,
|
||||||
|
totalTokens: true,
|
||||||
|
latency: true,
|
||||||
|
status: true,
|
||||||
|
errorMessage: true,
|
||||||
|
createAt: true,
|
||||||
|
dateString: true
|
||||||
|
},
|
||||||
|
orderBy: {
|
||||||
|
createAt: 'desc'
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Build project name map
|
||||||
|
const projects = await db.projects.findMany({
|
||||||
|
select: { id: true, name: true }
|
||||||
|
});
|
||||||
|
const projectMap = projects.reduce((acc, p) => {
|
||||||
|
acc[p.id] = p.name;
|
||||||
|
return acc;
|
||||||
|
}, {});
|
||||||
|
|
||||||
|
// 2. Process and aggregate
|
||||||
|
const summary = {
|
||||||
|
totalTokens: 0,
|
||||||
|
inputTokens: 0,
|
||||||
|
outputTokens: 0,
|
||||||
|
totalCalls: logs.length,
|
||||||
|
successCalls: 0,
|
||||||
|
failedCalls: 0,
|
||||||
|
totalLatency: 0,
|
||||||
|
avgLatency: 0
|
||||||
|
};
|
||||||
|
|
||||||
|
const trendMap = {};
|
||||||
|
const modelStats = {};
|
||||||
|
const detailedStatsMap = {}; // Key: projectId-model-status-errorMessage
|
||||||
|
|
||||||
|
logs.forEach(log => {
|
||||||
|
// Summary
|
||||||
|
summary.totalTokens += log.totalTokens;
|
||||||
|
summary.inputTokens += log.inputTokens;
|
||||||
|
summary.outputTokens += log.outputTokens;
|
||||||
|
|
||||||
|
if (log.status === 'SUCCESS') {
|
||||||
|
summary.successCalls++;
|
||||||
|
summary.totalLatency += log.latency;
|
||||||
|
} else {
|
||||||
|
summary.failedCalls++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Trend (by day or hour)
|
||||||
|
let timeKey;
|
||||||
|
if (timeRange === '24h') {
|
||||||
|
const date = new Date(log.createAt);
|
||||||
|
timeKey = `${String(date.getHours()).padStart(2, '0')}:00`;
|
||||||
|
} else {
|
||||||
|
timeKey = log.dateString.slice(5); // MM-DD
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!trendMap[timeKey]) {
|
||||||
|
trendMap[timeKey] = { name: timeKey, input: 0, output: 0 };
|
||||||
|
}
|
||||||
|
trendMap[timeKey].input += log.inputTokens;
|
||||||
|
trendMap[timeKey].output += log.outputTokens;
|
||||||
|
|
||||||
|
// Model Distribution
|
||||||
|
const modelKey = log.model;
|
||||||
|
if (!modelStats[modelKey]) {
|
||||||
|
modelStats[modelKey] = { name: modelKey, value: 0 };
|
||||||
|
}
|
||||||
|
modelStats[modelKey].value += log.totalTokens;
|
||||||
|
|
||||||
|
// Detailed Table Aggregation
|
||||||
|
// Key: projectId + model + status + (errorMessage || '')
|
||||||
|
const errorKey = log.errorMessage || '';
|
||||||
|
const detailKey = `${log.projectId}|${log.model}|${log.status}|${errorKey}`;
|
||||||
|
|
||||||
|
if (!detailedStatsMap[detailKey]) {
|
||||||
|
detailedStatsMap[detailKey] = {
|
||||||
|
projectId: log.projectId,
|
||||||
|
projectName: projectMap[log.projectId] || 'Unknown Project',
|
||||||
|
provider: log.provider,
|
||||||
|
model: log.model,
|
||||||
|
status: log.status,
|
||||||
|
failureReason: log.errorMessage,
|
||||||
|
inputTokens: 0,
|
||||||
|
outputTokens: 0,
|
||||||
|
totalTokens: 0,
|
||||||
|
calls: 0,
|
||||||
|
totalLatency: 0
|
||||||
|
};
|
||||||
|
}
|
||||||
|
const detailItem = detailedStatsMap[detailKey];
|
||||||
|
detailItem.inputTokens += log.inputTokens;
|
||||||
|
detailItem.outputTokens += log.outputTokens;
|
||||||
|
detailItem.totalTokens += log.totalTokens;
|
||||||
|
detailItem.calls += 1;
|
||||||
|
if (log.status === 'SUCCESS') {
|
||||||
|
detailItem.totalLatency += log.latency;
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Calculate averages
|
||||||
|
if (summary.successCalls > 0) {
|
||||||
|
summary.avgLatency = Math.round(summary.totalLatency / summary.successCalls);
|
||||||
|
}
|
||||||
|
summary.avgTokensPerCall = summary.totalCalls > 0 ? Math.round(summary.totalTokens / summary.totalCalls) : 0;
|
||||||
|
summary.failureRate = summary.totalCalls > 0 ? summary.failedCalls / summary.totalCalls : 0;
|
||||||
|
|
||||||
|
// Format chart data
|
||||||
|
const trend = Object.values(trendMap).sort((a, b) => {
|
||||||
|
// Simple sorting; for production use, consider stricter time ordering.
|
||||||
|
return a.name.localeCompare(b.name);
|
||||||
|
});
|
||||||
|
|
||||||
|
const modelDistribution = Object.values(modelStats).sort((a, b) => b.value - a.value);
|
||||||
|
|
||||||
|
// Format detailed table data
|
||||||
|
const details = Object.values(detailedStatsMap)
|
||||||
|
.map(item => ({
|
||||||
|
...item,
|
||||||
|
avgLatency:
|
||||||
|
item.status === 'SUCCESS' && item.calls > 0 ? (item.totalLatency / item.calls / 1000).toFixed(2) + 's' : '-'
|
||||||
|
}))
|
||||||
|
.sort((a, b) => b.totalTokens - a.totalTokens); // Default sorting by token usage
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
summary,
|
||||||
|
trend,
|
||||||
|
modelDistribution,
|
||||||
|
details,
|
||||||
|
projects
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to fetch monitoring stats:', error);
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
132
easy-dataset-main/app/api/monitoring/summary/route.js
Normal file
132
easy-dataset-main/app/api/monitoring/summary/route.js
Normal file
@@ -0,0 +1,132 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { db } from '@/lib/db';
|
||||||
|
|
||||||
|
export const dynamic = 'force-dynamic';
|
||||||
|
|
||||||
|
export async function GET(request) {
|
||||||
|
try {
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const timeRange = searchParams.get('timeRange') || '7d';
|
||||||
|
const projectId = searchParams.get('projectId');
|
||||||
|
const provider = searchParams.get('provider');
|
||||||
|
const status = searchParams.get('status');
|
||||||
|
|
||||||
|
let startDate = new Date();
|
||||||
|
|
||||||
|
if (timeRange === '24h') {
|
||||||
|
startDate.setHours(startDate.getHours() - 24);
|
||||||
|
} else if (timeRange === '30d') {
|
||||||
|
startDate.setDate(startDate.getDate() - 30);
|
||||||
|
} else {
|
||||||
|
startDate.setDate(startDate.getDate() - 7);
|
||||||
|
}
|
||||||
|
|
||||||
|
const where = {
|
||||||
|
createAt: {
|
||||||
|
gte: startDate
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
if (projectId && projectId !== 'all') {
|
||||||
|
where.projectId = projectId;
|
||||||
|
}
|
||||||
|
if (provider && provider !== 'all') {
|
||||||
|
where.provider = provider;
|
||||||
|
}
|
||||||
|
if (status && status !== 'all') {
|
||||||
|
where.status = status;
|
||||||
|
}
|
||||||
|
|
||||||
|
const logs = await db.llmUsageLogs.findMany({
|
||||||
|
where,
|
||||||
|
select: {
|
||||||
|
inputTokens: true,
|
||||||
|
outputTokens: true,
|
||||||
|
totalTokens: true,
|
||||||
|
latency: true,
|
||||||
|
status: true,
|
||||||
|
createAt: true,
|
||||||
|
dateString: true,
|
||||||
|
model: true
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
const summary = {
|
||||||
|
totalTokens: 0,
|
||||||
|
inputTokens: 0,
|
||||||
|
outputTokens: 0,
|
||||||
|
totalCalls: logs.length,
|
||||||
|
successCalls: 0,
|
||||||
|
failedCalls: 0,
|
||||||
|
totalLatency: 0,
|
||||||
|
avgLatency: 0
|
||||||
|
};
|
||||||
|
|
||||||
|
const trendMap = {};
|
||||||
|
const modelStats = {};
|
||||||
|
|
||||||
|
logs.forEach(log => {
|
||||||
|
summary.totalTokens += log.totalTokens;
|
||||||
|
summary.inputTokens += log.inputTokens;
|
||||||
|
summary.outputTokens += log.outputTokens;
|
||||||
|
|
||||||
|
if (log.status === 'SUCCESS') {
|
||||||
|
summary.successCalls++;
|
||||||
|
summary.totalLatency += log.latency;
|
||||||
|
} else {
|
||||||
|
summary.failedCalls++;
|
||||||
|
}
|
||||||
|
|
||||||
|
let timeKey;
|
||||||
|
if (timeRange === '24h') {
|
||||||
|
const date = new Date(log.createAt);
|
||||||
|
timeKey = `${String(date.getHours()).padStart(2, '0')}:00`;
|
||||||
|
} else {
|
||||||
|
timeKey = log.dateString.slice(5);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!trendMap[timeKey]) {
|
||||||
|
trendMap[timeKey] = { name: timeKey, input: 0, output: 0 };
|
||||||
|
}
|
||||||
|
trendMap[timeKey].input += log.inputTokens;
|
||||||
|
trendMap[timeKey].output += log.outputTokens;
|
||||||
|
|
||||||
|
const modelKey = log.model;
|
||||||
|
if (!modelStats[modelKey]) {
|
||||||
|
modelStats[modelKey] = { name: modelKey, value: 0 };
|
||||||
|
}
|
||||||
|
modelStats[modelKey].value += log.totalTokens;
|
||||||
|
});
|
||||||
|
|
||||||
|
if (summary.successCalls > 0) {
|
||||||
|
summary.avgLatency = Math.round(summary.totalLatency / summary.successCalls);
|
||||||
|
}
|
||||||
|
summary.avgTokensPerCall = summary.totalCalls > 0 ? Math.round(summary.totalTokens / summary.totalCalls) : 0;
|
||||||
|
summary.failureRate = summary.totalCalls > 0 ? summary.failedCalls / summary.totalCalls : 0;
|
||||||
|
|
||||||
|
const trend = Object.values(trendMap).sort((a, b) => a.name.localeCompare(b.name));
|
||||||
|
const modelDistribution = Object.values(modelStats).sort((a, b) => b.value - a.value);
|
||||||
|
|
||||||
|
const projects = await db.projects.findMany({
|
||||||
|
select: { id: true, name: true },
|
||||||
|
orderBy: { createAt: 'desc' }
|
||||||
|
});
|
||||||
|
|
||||||
|
const allLogs = await db.llmUsageLogs.findMany({
|
||||||
|
select: { provider: true },
|
||||||
|
distinct: ['provider']
|
||||||
|
});
|
||||||
|
const providers = allLogs.map(log => log.provider).filter(Boolean);
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
summary,
|
||||||
|
trend,
|
||||||
|
modelDistribution,
|
||||||
|
projects,
|
||||||
|
providers
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to fetch monitoring summary:', error);
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,176 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getUploadFileInfoById } from '@/lib/db/upload-files';
|
||||||
|
import { createGaPairs, getGaPairsByFileId } from '@/lib/db/ga-pairs';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 批量手动添加 GA 对到多个文件
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const body = await request.json();
|
||||||
|
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const { fileIds, gaPair, appendMode = false } = body;
|
||||||
|
|
||||||
|
if (!fileIds || !Array.isArray(fileIds) || fileIds.length === 0) {
|
||||||
|
return NextResponse.json({ error: 'File IDs array is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!gaPair || !gaPair.genreTitle || !gaPair.audienceTitle) {
|
||||||
|
return NextResponse.json({ error: 'GA pair with genreTitle and audienceTitle is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log('开始处理批量手动添加GA对请求');
|
||||||
|
console.log('项目ID:', projectId);
|
||||||
|
console.log('请求的文件IDs:', fileIds);
|
||||||
|
console.log('GA对:', gaPair);
|
||||||
|
|
||||||
|
// 使用 getUploadFileInfoById 逐个验证文件
|
||||||
|
const validFiles = [];
|
||||||
|
const invalidFileIds = [];
|
||||||
|
|
||||||
|
for (const fileId of fileIds) {
|
||||||
|
try {
|
||||||
|
console.log(`正在验证文件: ${fileId}`);
|
||||||
|
const fileInfo = await getUploadFileInfoById(fileId);
|
||||||
|
|
||||||
|
if (fileInfo && fileInfo.projectId === projectId) {
|
||||||
|
console.log(`文件验证成功: ${fileInfo.fileName}`);
|
||||||
|
validFiles.push(fileInfo);
|
||||||
|
} else if (fileInfo) {
|
||||||
|
console.log(`文件属于其他项目: ${fileInfo.projectId} != ${projectId}`);
|
||||||
|
invalidFileIds.push(fileId);
|
||||||
|
} else {
|
||||||
|
console.log(`文件不存在: ${fileId}`);
|
||||||
|
invalidFileIds.push(fileId);
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`验证文件 ${fileId} 时出错:`, String(error));
|
||||||
|
invalidFileIds.push(fileId);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`文件验证完成: 有效${validFiles.length}个, 无效${invalidFileIds.length}个`);
|
||||||
|
|
||||||
|
if (validFiles.length === 0) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: 'No valid files found',
|
||||||
|
debug: {
|
||||||
|
projectId,
|
||||||
|
requestedIds: fileIds,
|
||||||
|
invalidIds: invalidFileIds,
|
||||||
|
message: 'None of the requested files belong to this project or exist in the database'
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{ status: 404 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// 批量手动添加 GA 对
|
||||||
|
console.log('开始批量手动添加GA对...');
|
||||||
|
console.log('追加模式:', appendMode);
|
||||||
|
const results = [];
|
||||||
|
|
||||||
|
for (const file of validFiles) {
|
||||||
|
try {
|
||||||
|
console.log(`处理文件: ${file.fileName}`);
|
||||||
|
|
||||||
|
// 检查是否已存在 GA 对
|
||||||
|
const existingPairs = await getGaPairsByFileId(file.id);
|
||||||
|
|
||||||
|
let pairNumber = 1;
|
||||||
|
if (appendMode && existingPairs && existingPairs.length > 0) {
|
||||||
|
// 追加模式:在现有 GA 对后面添加
|
||||||
|
pairNumber = existingPairs.length + 1;
|
||||||
|
} else if (!appendMode && existingPairs && existingPairs.length > 0) {
|
||||||
|
// 非追加模式:如果已存在 GA 对则跳过
|
||||||
|
console.log(`文件 ${file.fileName} 已存在GA对,跳过`);
|
||||||
|
results.push({
|
||||||
|
fileId: file.id,
|
||||||
|
fileName: file.fileName,
|
||||||
|
success: true,
|
||||||
|
skipped: true,
|
||||||
|
message: 'GA pairs already exist'
|
||||||
|
});
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 创建 GA 对数据
|
||||||
|
const gaPairData = [
|
||||||
|
{
|
||||||
|
projectId,
|
||||||
|
fileId: file.id,
|
||||||
|
pairNumber,
|
||||||
|
genreTitle: gaPair.genreTitle.trim(),
|
||||||
|
genreDesc: gaPair.genreDesc?.trim() || '',
|
||||||
|
audienceTitle: gaPair.audienceTitle.trim(),
|
||||||
|
audienceDesc: gaPair.audienceDesc?.trim() || '',
|
||||||
|
isActive: true
|
||||||
|
}
|
||||||
|
];
|
||||||
|
|
||||||
|
// 保存 GA 对
|
||||||
|
if (appendMode) {
|
||||||
|
// 追加模式:只创建新的 GA 对
|
||||||
|
await createGaPairs(gaPairData);
|
||||||
|
} else {
|
||||||
|
// 非追加模式:使用 saveGaPairs 替换现有的
|
||||||
|
const { saveGaPairs } = await import('@/lib/db/ga-pairs');
|
||||||
|
await saveGaPairs(projectId, file.id, [
|
||||||
|
{
|
||||||
|
genre: { title: gaPair.genreTitle.trim(), description: gaPair.genreDesc?.trim() || '' },
|
||||||
|
audience: { title: gaPair.audienceTitle.trim(), description: gaPair.audienceDesc?.trim() || '' }
|
||||||
|
}
|
||||||
|
]);
|
||||||
|
}
|
||||||
|
|
||||||
|
results.push({
|
||||||
|
fileId: file.id,
|
||||||
|
fileName: file.fileName,
|
||||||
|
success: true,
|
||||||
|
skipped: false,
|
||||||
|
message: 'GA pair added successfully'
|
||||||
|
});
|
||||||
|
|
||||||
|
console.log(`成功为文件 ${file.fileName} 添加GA对`);
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`为文件 ${file.fileName} 添加GA对失败:`, error);
|
||||||
|
results.push({
|
||||||
|
fileId: file.id,
|
||||||
|
fileName: file.fileName,
|
||||||
|
success: false,
|
||||||
|
skipped: false,
|
||||||
|
error: error.message,
|
||||||
|
message: `Failed: ${error.message}`
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 统计结果
|
||||||
|
const successCount = results.filter(r => r.success).length;
|
||||||
|
const failureCount = results.filter(r => !r.success).length;
|
||||||
|
|
||||||
|
console.log(`批量手动添加完成: 成功${successCount}个, 失败${failureCount}个`);
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
data: results,
|
||||||
|
summary: {
|
||||||
|
total: results.length,
|
||||||
|
success: successCount,
|
||||||
|
failure: failureCount,
|
||||||
|
processed: validFiles.length,
|
||||||
|
skipped: invalidFileIds.length
|
||||||
|
},
|
||||||
|
message: `Added GA pairs to ${successCount} files, ${failureCount} failed, ${invalidFileIds.length} files not found`
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error batch adding manual GA pairs:', String(error));
|
||||||
|
return NextResponse.json({ error: String(error) || 'Failed to batch add manual GA pairs' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,196 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getUploadFileInfoById, delUploadFileInfoById } from '@/lib/db/upload-files';
|
||||||
|
import { getProject } from '@/lib/db/projects';
|
||||||
|
import { getProjectChunks, getProjectTocByName } from '@/lib/file/text-splitter';
|
||||||
|
import { batchSaveTags } from '@/lib/db/tags';
|
||||||
|
import { handleDomainTree } from '@/lib/util/domain-tree';
|
||||||
|
import path from 'path';
|
||||||
|
import { getProjectRoot } from '@/lib/db/base';
|
||||||
|
import { promises as fs } from 'fs';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 批量删除文件
|
||||||
|
* 复用单个文件删除的完整逻辑,包括领域树修订
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const body = await request.json();
|
||||||
|
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const { fileIds, domainTreeAction = 'keep', model, language = '中文' } = body;
|
||||||
|
|
||||||
|
if (!fileIds || !Array.isArray(fileIds) || fileIds.length === 0) {
|
||||||
|
return NextResponse.json({ error: 'File IDs array is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log('开始处理批量删除文件请求');
|
||||||
|
console.log('项目ID:', projectId);
|
||||||
|
console.log('请求的文件IDs:', fileIds);
|
||||||
|
console.log('领域树操作:', domainTreeAction);
|
||||||
|
|
||||||
|
// 获取项目信息
|
||||||
|
const project = await getProject(projectId);
|
||||||
|
if (!project) {
|
||||||
|
return NextResponse.json({ error: 'The project does not exist' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 验证文件并删除
|
||||||
|
const results = [];
|
||||||
|
const deletedTocs = [];
|
||||||
|
let deletedCount = 0;
|
||||||
|
let failedCount = 0;
|
||||||
|
let totalStats = {
|
||||||
|
deletedChunks: 0,
|
||||||
|
deletedQuestions: 0,
|
||||||
|
deletedDatasets: 0
|
||||||
|
};
|
||||||
|
|
||||||
|
for (const fileId of fileIds) {
|
||||||
|
try {
|
||||||
|
console.log(`正在验证文件: ${fileId}`);
|
||||||
|
const fileInfo = await getUploadFileInfoById(fileId);
|
||||||
|
|
||||||
|
if (!fileInfo) {
|
||||||
|
console.log(`文件不存在: ${fileId}`);
|
||||||
|
results.push({
|
||||||
|
fileId,
|
||||||
|
success: false,
|
||||||
|
error: 'File not found'
|
||||||
|
});
|
||||||
|
failedCount++;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (fileInfo.projectId !== projectId) {
|
||||||
|
console.log(`文件属于其他项目: ${fileInfo.projectId} != ${projectId}`);
|
||||||
|
results.push({
|
||||||
|
fileId,
|
||||||
|
success: false,
|
||||||
|
error: 'File belongs to another project'
|
||||||
|
});
|
||||||
|
failedCount++;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 删除文件及其相关的文本块、问题和数据集
|
||||||
|
console.log(`删除文件: ${fileInfo.fileName}`);
|
||||||
|
const { stats, fileName } = await delUploadFileInfoById(fileId);
|
||||||
|
|
||||||
|
// 累计统计信息
|
||||||
|
totalStats.deletedChunks += stats.deletedChunks || 0;
|
||||||
|
totalStats.deletedQuestions += stats.deletedQuestions || 0;
|
||||||
|
totalStats.deletedDatasets += stats.deletedDatasets || 0;
|
||||||
|
|
||||||
|
// 获取并保存删除的 TOC 信息
|
||||||
|
const deleteToc = await getProjectTocByName(projectId, fileName);
|
||||||
|
if (deleteToc) {
|
||||||
|
deletedTocs.push(deleteToc);
|
||||||
|
}
|
||||||
|
|
||||||
|
// 删除 TOC 文件
|
||||||
|
try {
|
||||||
|
const projectRoot = await getProjectRoot();
|
||||||
|
const projectPath = path.join(projectRoot, projectId);
|
||||||
|
const tocDir = path.join(projectPath, 'toc');
|
||||||
|
const baseName = path.basename(fileInfo.fileName, path.extname(fileInfo.fileName));
|
||||||
|
const tocPath = path.join(tocDir, `${baseName}-toc.json`);
|
||||||
|
await fs.unlink(tocPath);
|
||||||
|
console.log(`成功删除 TOC 文件: ${tocPath}`);
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`删除 TOC 文件失败:`, String(error));
|
||||||
|
}
|
||||||
|
|
||||||
|
results.push({
|
||||||
|
fileId,
|
||||||
|
fileName: fileInfo.fileName,
|
||||||
|
success: true,
|
||||||
|
stats
|
||||||
|
});
|
||||||
|
deletedCount++;
|
||||||
|
|
||||||
|
console.log(`成功删除文件: ${fileInfo.fileName}`);
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`删除文件 ${fileId} 时出错:`, error);
|
||||||
|
results.push({
|
||||||
|
fileId,
|
||||||
|
success: false,
|
||||||
|
error: error.message
|
||||||
|
});
|
||||||
|
failedCount++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`批量删除完成: 成功${deletedCount}个, 失败${failedCount}个`);
|
||||||
|
|
||||||
|
// 如果选择了保持领域树不变,直接返回删除结果
|
||||||
|
if (domainTreeAction === 'keep') {
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
deletedCount,
|
||||||
|
failedCount,
|
||||||
|
total: fileIds.length,
|
||||||
|
results,
|
||||||
|
stats: totalStats,
|
||||||
|
domainTreeAction: 'keep',
|
||||||
|
message: `Successfully deleted ${deletedCount} files, ${failedCount} failed`
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// 处理领域树更新
|
||||||
|
try {
|
||||||
|
// 获取项目的所有文件
|
||||||
|
const { chunks, toc } = await getProjectChunks(projectId);
|
||||||
|
|
||||||
|
// 如果不存在文本块,说明项目已经没有文件了
|
||||||
|
if (!chunks || chunks.length === 0) {
|
||||||
|
// 清空领域树
|
||||||
|
await batchSaveTags(projectId, []);
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
deletedCount,
|
||||||
|
failedCount,
|
||||||
|
total: fileIds.length,
|
||||||
|
results,
|
||||||
|
stats: totalStats,
|
||||||
|
domainTreeAction,
|
||||||
|
message: `Successfully deleted ${deletedCount} files, domain tree cleared`,
|
||||||
|
domainTreeCleared: true
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// 调用领域树处理模块
|
||||||
|
await handleDomainTree({
|
||||||
|
projectId,
|
||||||
|
action: domainTreeAction,
|
||||||
|
allToc: toc,
|
||||||
|
model: model,
|
||||||
|
language,
|
||||||
|
deleteToc: deletedTocs.length > 0 ? deletedTocs : undefined,
|
||||||
|
project
|
||||||
|
});
|
||||||
|
|
||||||
|
console.log('领域树更新成功');
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error updating domain tree after batch deletion:', String(error));
|
||||||
|
// 即使领域树更新失败,也不影响文件删除的结果
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
deletedCount,
|
||||||
|
failedCount,
|
||||||
|
total: fileIds.length,
|
||||||
|
results,
|
||||||
|
stats: totalStats,
|
||||||
|
domainTreeAction,
|
||||||
|
message: `Successfully deleted ${deletedCount} files, ${failedCount} failed`
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error batch deleting files:', String(error));
|
||||||
|
return NextResponse.json({ error: String(error) || 'Failed to batch delete files' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,106 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { batchGenerateGaPairs } from '@/lib/services/ga/ga-pairs';
|
||||||
|
import { getUploadFileInfoById } from '@/lib/db/upload-files'; // 导入单个文件查询函数
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 批量生成多个文件的 GA 对
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const body = await request.json();
|
||||||
|
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const { fileIds, modelConfigId, language = '中文', appendMode = false } = body;
|
||||||
|
|
||||||
|
if (!fileIds || !Array.isArray(fileIds) || fileIds.length === 0) {
|
||||||
|
return NextResponse.json({ error: 'File IDs array is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!modelConfigId) {
|
||||||
|
return NextResponse.json({ error: 'Model configuration ID is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log('开始处理批量生成GA对请求');
|
||||||
|
console.log('项目ID:', projectId);
|
||||||
|
console.log('请求的文件IDs:', fileIds);
|
||||||
|
|
||||||
|
// 使用 getUploadFileInfoById 逐个验证文件
|
||||||
|
const validFiles = [];
|
||||||
|
const invalidFileIds = [];
|
||||||
|
|
||||||
|
for (const fileId of fileIds) {
|
||||||
|
try {
|
||||||
|
console.log(`正在验证文件: ${fileId}`);
|
||||||
|
const fileInfo = await getUploadFileInfoById(fileId);
|
||||||
|
|
||||||
|
if (fileInfo && fileInfo.projectId === projectId) {
|
||||||
|
console.log(`文件验证成功: ${fileInfo.fileName}`);
|
||||||
|
validFiles.push(fileInfo);
|
||||||
|
} else if (fileInfo) {
|
||||||
|
console.log(`文件属于其他项目: ${fileInfo.projectId} != ${projectId}`);
|
||||||
|
invalidFileIds.push(fileId);
|
||||||
|
} else {
|
||||||
|
console.log(`文件不存在: ${fileId}`);
|
||||||
|
invalidFileIds.push(fileId);
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`验证文件 ${fileId} 时出错:`, String(error));
|
||||||
|
invalidFileIds.push(fileId);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`文件验证完成: 有效${validFiles.length}个, 无效${invalidFileIds.length}个`);
|
||||||
|
|
||||||
|
if (validFiles.length === 0) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: 'No valid files found',
|
||||||
|
debug: {
|
||||||
|
projectId,
|
||||||
|
requestedIds: fileIds,
|
||||||
|
invalidIds: invalidFileIds,
|
||||||
|
message: 'None of the requested files belong to this project or exist in the database'
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{ status: 404 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// 批量生成 GA 对
|
||||||
|
console.log('开始批量生成GA对...');
|
||||||
|
console.log('追加模式:', appendMode);
|
||||||
|
const results = await batchGenerateGaPairs(
|
||||||
|
projectId,
|
||||||
|
validFiles,
|
||||||
|
modelConfigId,
|
||||||
|
language,
|
||||||
|
appendMode // 传递追加模式参数
|
||||||
|
);
|
||||||
|
|
||||||
|
// 统计结果
|
||||||
|
const successCount = results.filter(r => r.success).length;
|
||||||
|
const failureCount = results.filter(r => !r.success).length;
|
||||||
|
|
||||||
|
console.log(`批量生成完成: 成功${successCount}个, 失败${failureCount}个`);
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
data: results,
|
||||||
|
summary: {
|
||||||
|
total: results.length,
|
||||||
|
success: successCount,
|
||||||
|
failure: failureCount,
|
||||||
|
processed: validFiles.length,
|
||||||
|
skipped: invalidFileIds.length
|
||||||
|
},
|
||||||
|
message: `Generated GA pairs for ${successCount} files, ${failureCount} failed, ${invalidFileIds.length} files not found`
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error batch generating GA pairs:', String(error));
|
||||||
|
return NextResponse.json({ error: String(error) || 'Failed to batch generate GA pairs' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,161 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { db } from '@/lib/db/index';
|
||||||
|
import LLMClient from '@/lib/llm/core/index';
|
||||||
|
import { getModelConfigById } from '@/lib/db/model-config';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get current question and generate answers from two models
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, taskId } = params;
|
||||||
|
|
||||||
|
const task = await db.task.findFirst({
|
||||||
|
where: {
|
||||||
|
id: taskId,
|
||||||
|
projectId,
|
||||||
|
taskType: 'blind-test'
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!task) {
|
||||||
|
return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (task.status !== 0) {
|
||||||
|
return NextResponse.json({ code: 400, error: 'Task has ended' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse task detail
|
||||||
|
let detail = {};
|
||||||
|
let modelInfo = {};
|
||||||
|
try {
|
||||||
|
detail = task.detail ? JSON.parse(task.detail) : {};
|
||||||
|
modelInfo = task.modelInfo ? JSON.parse(task.modelInfo) : {};
|
||||||
|
} catch (e) {
|
||||||
|
console.error('Failed to parse task detail:', e);
|
||||||
|
}
|
||||||
|
|
||||||
|
const questionIds = detail.questionIds || detail.evalDatasetIds || [];
|
||||||
|
const currentIndex = detail.currentIndex || 0;
|
||||||
|
|
||||||
|
// Check if all questions are completed
|
||||||
|
if (questionIds.length === 0 || currentIndex >= questionIds.length) {
|
||||||
|
return NextResponse.json({
|
||||||
|
code: 0,
|
||||||
|
data: {
|
||||||
|
completed: true,
|
||||||
|
message: 'All questions completed'
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch current question
|
||||||
|
const currentQuestionId = questionIds[currentIndex];
|
||||||
|
const currentQuestion = await db.evalDatasets.findUnique({
|
||||||
|
where: { id: currentQuestionId },
|
||||||
|
select: {
|
||||||
|
id: true,
|
||||||
|
question: true,
|
||||||
|
questionType: true,
|
||||||
|
correctAnswer: true,
|
||||||
|
tags: true
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!currentQuestion) {
|
||||||
|
return NextResponse.json({ code: 404, error: 'Question not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch both model configs
|
||||||
|
const [modelConfigA, modelConfigB] = await Promise.all([
|
||||||
|
getModelConfigById(modelInfo.modelA.providerId),
|
||||||
|
getModelConfigById(modelInfo.modelB.providerId)
|
||||||
|
]);
|
||||||
|
|
||||||
|
if (!modelConfigA || !modelConfigB) {
|
||||||
|
return NextResponse.json({ code: 400, error: 'Model configuration not found' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build prompts
|
||||||
|
const systemPrompt = "You are a helpful assistant. Provide detailed and accurate answers to the user's question.";
|
||||||
|
const userPrompt = currentQuestion.question;
|
||||||
|
|
||||||
|
// Call both models in parallel
|
||||||
|
const startTimeA = Date.now();
|
||||||
|
const startTimeB = Date.now();
|
||||||
|
|
||||||
|
let answerA = '';
|
||||||
|
let answerB = '';
|
||||||
|
let errorA = null;
|
||||||
|
let errorB = null;
|
||||||
|
let durationA = 0;
|
||||||
|
let durationB = 0;
|
||||||
|
|
||||||
|
try {
|
||||||
|
// Call model A
|
||||||
|
const clientA = new LLMClient(modelConfigA);
|
||||||
|
|
||||||
|
const resultA = await clientA.chat([
|
||||||
|
{ role: 'system', content: systemPrompt },
|
||||||
|
{ role: 'user', content: userPrompt }
|
||||||
|
]);
|
||||||
|
|
||||||
|
answerA = resultA.text || '';
|
||||||
|
durationA = Date.now() - startTimeA;
|
||||||
|
} catch (err) {
|
||||||
|
console.error('Model A call failed:', err);
|
||||||
|
errorA = err.message;
|
||||||
|
durationA = Date.now() - startTimeA;
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
// Call model B
|
||||||
|
const clientB = new LLMClient(modelConfigB);
|
||||||
|
|
||||||
|
const resultB = await clientB.chat([
|
||||||
|
{ role: 'system', content: systemPrompt },
|
||||||
|
{ role: 'user', content: userPrompt }
|
||||||
|
]);
|
||||||
|
|
||||||
|
answerB = resultB.text || '';
|
||||||
|
durationB = Date.now() - startTimeB;
|
||||||
|
} catch (err) {
|
||||||
|
console.error('Model B call failed:', err);
|
||||||
|
errorB = err.message;
|
||||||
|
durationB = Date.now() - startTimeB;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Randomly swap positions (core blind-test behavior)
|
||||||
|
const isSwapped = Math.random() > 0.5;
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
code: 0,
|
||||||
|
data: {
|
||||||
|
completed: false,
|
||||||
|
currentIndex,
|
||||||
|
totalCount: evalDatasetIds.length,
|
||||||
|
question: currentQuestion,
|
||||||
|
// Blind test: do not reveal which model is which
|
||||||
|
leftAnswer: {
|
||||||
|
content: isSwapped ? answerB : answerA,
|
||||||
|
error: isSwapped ? errorB : errorA,
|
||||||
|
duration: isSwapped ? durationB : durationA
|
||||||
|
},
|
||||||
|
rightAnswer: {
|
||||||
|
content: isSwapped ? answerA : answerB,
|
||||||
|
error: isSwapped ? errorA : errorB,
|
||||||
|
duration: isSwapped ? durationA : durationB
|
||||||
|
},
|
||||||
|
// Server stores the actual mapping for scoring
|
||||||
|
_swap: isSwapped
|
||||||
|
}
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to fetch current question:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{ code: 500, error: 'Failed to fetch current question', message: error.message },
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,64 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { db } from '@/lib/db/index';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get current question info (including random swap info)
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
const { projectId, taskId } = params;
|
||||||
|
|
||||||
|
try {
|
||||||
|
if (!projectId || !taskId) {
|
||||||
|
return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch task
|
||||||
|
const task = await db.task.findUnique({
|
||||||
|
where: { id: taskId }
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!task || task.taskType !== 'blind-test') {
|
||||||
|
return NextResponse.json({ error: 'Task not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse task detail
|
||||||
|
const detail = JSON.parse(task.detail || '{}');
|
||||||
|
// Support both evalDatasetIds and questionIds
|
||||||
|
const questionIds = detail.questionIds || detail.evalDatasetIds || [];
|
||||||
|
const currentIndex = detail.currentIndex || 0;
|
||||||
|
|
||||||
|
// Check if task is completed
|
||||||
|
if (questionIds.length === 0 || currentIndex >= questionIds.length) {
|
||||||
|
return NextResponse.json({
|
||||||
|
completed: true,
|
||||||
|
currentIndex,
|
||||||
|
totalQuestions: questionIds.length
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch current question
|
||||||
|
const currentQuestionId = questionIds[currentIndex];
|
||||||
|
const currentQuestion = await db.evalDatasets.findUnique({
|
||||||
|
where: { id: currentQuestionId }
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!currentQuestion) {
|
||||||
|
return NextResponse.json({ error: 'Question not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Randomly decide whether to swap (core blind-test behavior)
|
||||||
|
const isSwapped = Math.random() > 0.5;
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
questionId: currentQuestion.id,
|
||||||
|
question: currentQuestion.question,
|
||||||
|
answer: currentQuestion.correctAnswer || '',
|
||||||
|
questionIndex: currentIndex + 1,
|
||||||
|
totalQuestions: questionIds.length,
|
||||||
|
isSwapped
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to fetch question info:', error);
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,190 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { db } from '@/lib/db/index';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get blind-test task details
|
||||||
|
* Results are fetched from EvalResults table
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, taskId } = params;
|
||||||
|
|
||||||
|
const task = await db.task.findFirst({
|
||||||
|
where: {
|
||||||
|
id: taskId,
|
||||||
|
projectId,
|
||||||
|
taskType: 'blind-test'
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!task) {
|
||||||
|
return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
let detail = {};
|
||||||
|
let modelInfo = {};
|
||||||
|
try {
|
||||||
|
detail = task.detail ? JSON.parse(task.detail) : {};
|
||||||
|
modelInfo = task.modelInfo ? JSON.parse(task.modelInfo) : {};
|
||||||
|
} catch (e) {
|
||||||
|
console.error('Failed to parse task detail:', e);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch all related evaluation questions
|
||||||
|
const evalDatasetIds = detail.evalDatasetIds || [];
|
||||||
|
const evalDatasets = await db.evalDatasets.findMany({
|
||||||
|
where: {
|
||||||
|
id: { in: evalDatasetIds }
|
||||||
|
},
|
||||||
|
select: {
|
||||||
|
id: true,
|
||||||
|
question: true,
|
||||||
|
questionType: true,
|
||||||
|
correctAnswer: true,
|
||||||
|
tags: true
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Sort by evalDatasetIds order
|
||||||
|
const orderedDatasets = evalDatasetIds.map(id => evalDatasets.find(d => d.id === id)).filter(Boolean);
|
||||||
|
|
||||||
|
// Fetch results from EvalResults table
|
||||||
|
const evalResults = await db.evalResults.findMany({
|
||||||
|
where: { taskId },
|
||||||
|
orderBy: { createAt: 'asc' }
|
||||||
|
});
|
||||||
|
|
||||||
|
// Parse results into the format expected by frontend
|
||||||
|
const results = evalResults.map(r => {
|
||||||
|
let modelAnswer = {};
|
||||||
|
let judgeData = {};
|
||||||
|
try {
|
||||||
|
modelAnswer = JSON.parse(r.modelAnswer || '{}');
|
||||||
|
judgeData = JSON.parse(r.judgeResponse || '{}');
|
||||||
|
} catch (e) {
|
||||||
|
// Ignore parse errors
|
||||||
|
}
|
||||||
|
return {
|
||||||
|
questionId: r.evalDatasetId,
|
||||||
|
vote: judgeData.vote,
|
||||||
|
isSwapped: judgeData.isSwapped,
|
||||||
|
modelAScore: judgeData.modelAScore || 0,
|
||||||
|
modelBScore: judgeData.modelBScore || 0,
|
||||||
|
leftAnswer: modelAnswer.leftAnswer || '',
|
||||||
|
rightAnswer: modelAnswer.rightAnswer || '',
|
||||||
|
timestamp: r.createAt
|
||||||
|
};
|
||||||
|
});
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
code: 0,
|
||||||
|
data: {
|
||||||
|
...task,
|
||||||
|
detail: {
|
||||||
|
...detail,
|
||||||
|
results // Include results from EvalResults table
|
||||||
|
},
|
||||||
|
modelInfo,
|
||||||
|
evalDatasets: orderedDatasets
|
||||||
|
}
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to fetch blind-test task details:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{ code: 500, error: 'Failed to fetch blind-test task details', message: error.message },
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Update blind-test task (interrupt/stop)
|
||||||
|
*/
|
||||||
|
export async function PUT(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, taskId } = params;
|
||||||
|
const { action } = await request.json();
|
||||||
|
|
||||||
|
const task = await db.task.findFirst({
|
||||||
|
where: {
|
||||||
|
id: taskId,
|
||||||
|
projectId,
|
||||||
|
taskType: 'blind-test'
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!task) {
|
||||||
|
return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (action === 'interrupt') {
|
||||||
|
if (task.status !== 0) {
|
||||||
|
return NextResponse.json({ code: 400, error: 'Only running tasks can be interrupted' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const updatedTask = await db.task.update({
|
||||||
|
where: { id: taskId },
|
||||||
|
data: {
|
||||||
|
status: 3, // Interrupted
|
||||||
|
endTime: new Date()
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
code: 0,
|
||||||
|
data: updatedTask,
|
||||||
|
message: 'Task interrupted'
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json({ code: 400, error: 'Unknown action' }, { status: 400 });
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to update blind-test task:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{ code: 500, error: 'Failed to update blind-test task', message: error.message },
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Delete blind-test task and its results
|
||||||
|
*/
|
||||||
|
export async function DELETE(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, taskId } = params;
|
||||||
|
|
||||||
|
const task = await db.task.findFirst({
|
||||||
|
where: {
|
||||||
|
id: taskId,
|
||||||
|
projectId,
|
||||||
|
taskType: 'blind-test'
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!task) {
|
||||||
|
return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Delete related EvalResults first
|
||||||
|
await db.evalResults.deleteMany({
|
||||||
|
where: { taskId }
|
||||||
|
});
|
||||||
|
|
||||||
|
// Then delete the task
|
||||||
|
await db.task.delete({
|
||||||
|
where: { id: taskId }
|
||||||
|
});
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
code: 0,
|
||||||
|
message: 'Task deleted'
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to delete blind-test task:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{ code: 500, error: 'Failed to delete blind-test task', message: error.message },
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,92 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { db } from '@/lib/db/index';
|
||||||
|
import LLMClient from '@/lib/llm/core/index';
|
||||||
|
import { getModelConfigById } from '@/lib/db/model-config';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Stream answer for a specified model
|
||||||
|
* Query param: model=A or model=B
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
const { projectId, taskId } = params;
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const modelType = searchParams.get('model'); // 'A' or 'B'
|
||||||
|
|
||||||
|
try {
|
||||||
|
if (!projectId || !taskId) {
|
||||||
|
return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!modelType || !['A', 'B'].includes(modelType)) {
|
||||||
|
return NextResponse.json({ error: 'Model type must be specified (A or B)' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch task
|
||||||
|
const task = await db.task.findUnique({
|
||||||
|
where: { id: taskId }
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!task || task.taskType !== 'blind-test') {
|
||||||
|
return NextResponse.json({ error: 'Task not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse task detail
|
||||||
|
const detail = JSON.parse(task.detail || '{}');
|
||||||
|
const modelInfo = JSON.parse(task.modelInfo || '{}');
|
||||||
|
// Support both evalDatasetIds and questionIds
|
||||||
|
const questionIds = detail.questionIds || detail.evalDatasetIds || [];
|
||||||
|
const currentIndex = detail.currentIndex || 0;
|
||||||
|
|
||||||
|
// Check if task is completed
|
||||||
|
if (questionIds.length === 0 || currentIndex >= questionIds.length) {
|
||||||
|
return NextResponse.json({ completed: true });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch current question
|
||||||
|
const currentQuestionId = questionIds[currentIndex];
|
||||||
|
const currentQuestion = await db.evalDatasets.findUnique({
|
||||||
|
where: { id: currentQuestionId }
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!currentQuestion) {
|
||||||
|
return NextResponse.json({ error: 'Question not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Resolve model config based on modelType
|
||||||
|
const modelConfigKey = modelType === 'A' ? 'modelA' : 'modelB';
|
||||||
|
const modelConfig = await getModelConfigById(modelInfo[modelConfigKey].id);
|
||||||
|
|
||||||
|
if (!modelConfig) {
|
||||||
|
return NextResponse.json({ error: 'Model configuration not found' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Prepare messages
|
||||||
|
const messages = [
|
||||||
|
{
|
||||||
|
role: 'system',
|
||||||
|
content: "You are a helpful assistant. Provide detailed and accurate answers to the user's question."
|
||||||
|
},
|
||||||
|
{ role: 'user', content: currentQuestion.question }
|
||||||
|
];
|
||||||
|
|
||||||
|
// Create LLM client
|
||||||
|
const client = new LLMClient({
|
||||||
|
projectId,
|
||||||
|
...modelConfig
|
||||||
|
});
|
||||||
|
|
||||||
|
// Call streaming API and return response directly
|
||||||
|
const response = await client.chatStreamAPI(messages);
|
||||||
|
|
||||||
|
return new Response(response.body, {
|
||||||
|
headers: {
|
||||||
|
'Content-Type': 'text/plain; charset=utf-8',
|
||||||
|
'Cache-Control': 'no-cache',
|
||||||
|
Connection: 'keep-alive'
|
||||||
|
}
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`Model ${modelType} streaming call failed:`, error);
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,213 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { db } from '@/lib/db/index';
|
||||||
|
import LLMClient from '@/lib/llm/core/index';
|
||||||
|
import { getModelConfigById } from '@/lib/db/model-config';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Stream answers from two models for the current question
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
const { projectId, taskId } = params;
|
||||||
|
|
||||||
|
try {
|
||||||
|
if (!projectId || !taskId) {
|
||||||
|
return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch task
|
||||||
|
const task = await db.task.findUnique({
|
||||||
|
where: { id: taskId }
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!task || task.taskType !== 'blind-test') {
|
||||||
|
return NextResponse.json({ error: 'Task not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse task detail
|
||||||
|
const detail = JSON.parse(task.detail || '{}');
|
||||||
|
const modelInfo = JSON.parse(task.modelInfo || '{}');
|
||||||
|
const { questionIds = [], currentIndex = 0 } = detail;
|
||||||
|
|
||||||
|
// Check if task is completed
|
||||||
|
if (currentIndex >= questionIds.length) {
|
||||||
|
return NextResponse.json({ completed: true });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch current question
|
||||||
|
const currentQuestionId = questionIds[currentIndex];
|
||||||
|
const currentQuestion = await db.evalDatasets.findUnique({
|
||||||
|
where: { id: currentQuestionId }
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!currentQuestion) {
|
||||||
|
return NextResponse.json({ error: 'Question not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch model configs
|
||||||
|
const [modelConfigA, modelConfigB] = await Promise.all([
|
||||||
|
getModelConfigById(modelInfo.modelA.providerId),
|
||||||
|
getModelConfigById(modelInfo.modelB.providerId)
|
||||||
|
]);
|
||||||
|
|
||||||
|
if (!modelConfigA || !modelConfigB) {
|
||||||
|
return NextResponse.json({ error: 'Model configuration not found' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Randomly swap positions (core blind-test behavior)
|
||||||
|
const isSwapped = Math.random() > 0.5;
|
||||||
|
|
||||||
|
// Create streaming response
|
||||||
|
const encoder = new TextEncoder();
|
||||||
|
const stream = new ReadableStream({
|
||||||
|
async start(controller) {
|
||||||
|
try {
|
||||||
|
// Send init message
|
||||||
|
controller.enqueue(
|
||||||
|
encoder.encode(
|
||||||
|
JSON.stringify({
|
||||||
|
type: 'init',
|
||||||
|
question: currentQuestion.question,
|
||||||
|
questionId: currentQuestion.id,
|
||||||
|
questionIndex: currentIndex + 1,
|
||||||
|
totalQuestions: questionIds.length,
|
||||||
|
isSwapped
|
||||||
|
}) + '\n'
|
||||||
|
)
|
||||||
|
);
|
||||||
|
|
||||||
|
// Prepare messages
|
||||||
|
const messages = [
|
||||||
|
{
|
||||||
|
role: 'system',
|
||||||
|
content: "You are a helpful assistant. Provide detailed and accurate answers to the user's question."
|
||||||
|
},
|
||||||
|
{ role: 'user', content: currentQuestion.question }
|
||||||
|
];
|
||||||
|
|
||||||
|
// Create LLM clients
|
||||||
|
const clientA = new LLMClient({
|
||||||
|
projectId,
|
||||||
|
...modelConfigA
|
||||||
|
});
|
||||||
|
|
||||||
|
const clientB = new LLMClient({
|
||||||
|
projectId,
|
||||||
|
...modelConfigB
|
||||||
|
});
|
||||||
|
|
||||||
|
let answerA = '';
|
||||||
|
let answerB = '';
|
||||||
|
const startTime = Date.now();
|
||||||
|
|
||||||
|
// Call both models in parallel (streaming)
|
||||||
|
await Promise.all([
|
||||||
|
(async () => {
|
||||||
|
try {
|
||||||
|
const response = await clientA.chatStreamAPI(messages);
|
||||||
|
const reader = response.body.getReader();
|
||||||
|
const decoder = new TextDecoder();
|
||||||
|
|
||||||
|
while (true) {
|
||||||
|
const { done, value } = await reader.read();
|
||||||
|
if (done) break;
|
||||||
|
|
||||||
|
const chunk = decoder.decode(value, { stream: true });
|
||||||
|
answerA += chunk;
|
||||||
|
|
||||||
|
// Send chunk update
|
||||||
|
controller.enqueue(
|
||||||
|
encoder.encode(
|
||||||
|
JSON.stringify({
|
||||||
|
type: 'chunk',
|
||||||
|
model: isSwapped ? 'B' : 'A',
|
||||||
|
content: chunk
|
||||||
|
}) + '\n'
|
||||||
|
)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
} catch (err) {
|
||||||
|
console.error('Model A call failed:', err);
|
||||||
|
controller.enqueue(
|
||||||
|
encoder.encode(
|
||||||
|
JSON.stringify({
|
||||||
|
type: 'error',
|
||||||
|
model: isSwapped ? 'B' : 'A',
|
||||||
|
error: err.message
|
||||||
|
}) + '\n'
|
||||||
|
)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
})(),
|
||||||
|
(async () => {
|
||||||
|
try {
|
||||||
|
const response = await clientB.chatStreamAPI(messages);
|
||||||
|
const reader = response.body.getReader();
|
||||||
|
const decoder = new TextDecoder();
|
||||||
|
|
||||||
|
while (true) {
|
||||||
|
const { done, value } = await reader.read();
|
||||||
|
if (done) break;
|
||||||
|
|
||||||
|
const chunk = decoder.decode(value, { stream: true });
|
||||||
|
answerB += chunk;
|
||||||
|
|
||||||
|
// Send chunk update
|
||||||
|
controller.enqueue(
|
||||||
|
encoder.encode(
|
||||||
|
JSON.stringify({
|
||||||
|
type: 'chunk',
|
||||||
|
model: isSwapped ? 'A' : 'B',
|
||||||
|
content: chunk
|
||||||
|
}) + '\n'
|
||||||
|
)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
} catch (err) {
|
||||||
|
console.error('Model B call failed:', err);
|
||||||
|
controller.enqueue(
|
||||||
|
encoder.encode(
|
||||||
|
JSON.stringify({
|
||||||
|
type: 'error',
|
||||||
|
model: isSwapped ? 'A' : 'B',
|
||||||
|
error: err.message
|
||||||
|
}) + '\n'
|
||||||
|
)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
})()
|
||||||
|
]);
|
||||||
|
|
||||||
|
const duration = Date.now() - startTime;
|
||||||
|
|
||||||
|
// Send done message
|
||||||
|
controller.enqueue(
|
||||||
|
encoder.encode(
|
||||||
|
JSON.stringify({
|
||||||
|
type: 'done',
|
||||||
|
duration,
|
||||||
|
answerA: isSwapped ? answerB : answerA,
|
||||||
|
answerB: isSwapped ? answerA : answerB
|
||||||
|
}) + '\n'
|
||||||
|
)
|
||||||
|
);
|
||||||
|
|
||||||
|
controller.close();
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Streaming handler failed:', error);
|
||||||
|
controller.error(error);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
return new Response(stream, {
|
||||||
|
headers: {
|
||||||
|
'Content-Type': 'text/plain; charset=utf-8',
|
||||||
|
'Cache-Control': 'no-cache',
|
||||||
|
Connection: 'keep-alive'
|
||||||
|
}
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('API error:', error);
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,154 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { db } from '@/lib/db/index';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Submit vote result
|
||||||
|
* vote: 'left' | 'right' | 'both_good' | 'both_bad'
|
||||||
|
* Results are stored in EvalResults table
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, taskId } = params;
|
||||||
|
const { vote, questionId, isSwapped, leftAnswer, rightAnswer } = await request.json();
|
||||||
|
|
||||||
|
// Validate vote option
|
||||||
|
const validVotes = ['left', 'right', 'both_good', 'both_bad'];
|
||||||
|
if (!validVotes.includes(vote)) {
|
||||||
|
return NextResponse.json({ code: 400, error: 'Invalid vote option' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!questionId) {
|
||||||
|
return NextResponse.json({ code: 400, error: 'Question ID is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const task = await db.task.findFirst({
|
||||||
|
where: {
|
||||||
|
id: taskId,
|
||||||
|
projectId,
|
||||||
|
taskType: 'blind-test'
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!task) {
|
||||||
|
return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (task.status !== 0) {
|
||||||
|
return NextResponse.json({ code: 400, error: 'Task has ended' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse task details
|
||||||
|
let detail = {};
|
||||||
|
try {
|
||||||
|
detail = task.detail ? JSON.parse(task.detail) : {};
|
||||||
|
} catch (e) {
|
||||||
|
console.error('Failed to parse task detail:', e);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Calculate scores
|
||||||
|
// isSwapped: true means left is model B and right is model A
|
||||||
|
// isSwapped: false means left is model A and right is model B
|
||||||
|
let modelAScore = 0;
|
||||||
|
let modelBScore = 0;
|
||||||
|
|
||||||
|
if (vote === 'left') {
|
||||||
|
if (isSwapped) {
|
||||||
|
modelBScore = 1; // Left is B
|
||||||
|
} else {
|
||||||
|
modelAScore = 1; // Left is A
|
||||||
|
}
|
||||||
|
} else if (vote === 'right') {
|
||||||
|
if (isSwapped) {
|
||||||
|
modelAScore = 1; // Right is A
|
||||||
|
} else {
|
||||||
|
modelBScore = 1; // Right is B
|
||||||
|
}
|
||||||
|
} else if (vote === 'both_good') {
|
||||||
|
modelAScore = 0.5;
|
||||||
|
modelBScore = 0.5;
|
||||||
|
}
|
||||||
|
// both_bad: both scores remain 0
|
||||||
|
|
||||||
|
// Store result in EvalResults table
|
||||||
|
const evalResult = await db.evalResults.create({
|
||||||
|
data: {
|
||||||
|
projectId,
|
||||||
|
taskId,
|
||||||
|
evalDatasetId: questionId,
|
||||||
|
modelAnswer: JSON.stringify({
|
||||||
|
leftAnswer: leftAnswer || '',
|
||||||
|
rightAnswer: rightAnswer || ''
|
||||||
|
}),
|
||||||
|
score: modelAScore, // Store modelA score for sorting/aggregation
|
||||||
|
isCorrect: false, // Not applicable for blind-test
|
||||||
|
judgeResponse: JSON.stringify({
|
||||||
|
vote,
|
||||||
|
isSwapped,
|
||||||
|
modelAScore,
|
||||||
|
modelBScore
|
||||||
|
}),
|
||||||
|
duration: 0,
|
||||||
|
status: 0
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Update task progress
|
||||||
|
const evalDatasetIds = detail.evalDatasetIds || [];
|
||||||
|
const newCurrentIndex = (detail.currentIndex || 0) + 1;
|
||||||
|
const isCompleted = newCurrentIndex >= evalDatasetIds.length;
|
||||||
|
|
||||||
|
const updatedDetail = {
|
||||||
|
...detail,
|
||||||
|
currentIndex: newCurrentIndex
|
||||||
|
};
|
||||||
|
|
||||||
|
await db.task.update({
|
||||||
|
where: { id: taskId },
|
||||||
|
data: {
|
||||||
|
detail: JSON.stringify(updatedDetail),
|
||||||
|
completedCount: newCurrentIndex,
|
||||||
|
status: isCompleted ? 1 : 0, // 1-completed, 0-running
|
||||||
|
endTime: isCompleted ? new Date() : null
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Calculate current total scores from EvalResults
|
||||||
|
const allResults = await db.evalResults.findMany({
|
||||||
|
where: { taskId },
|
||||||
|
select: { judgeResponse: true }
|
||||||
|
});
|
||||||
|
|
||||||
|
let totalModelAScore = 0;
|
||||||
|
let totalModelBScore = 0;
|
||||||
|
for (const r of allResults) {
|
||||||
|
try {
|
||||||
|
const judge = JSON.parse(r.judgeResponse || '{}');
|
||||||
|
totalModelAScore += judge.modelAScore || 0;
|
||||||
|
totalModelBScore += judge.modelBScore || 0;
|
||||||
|
} catch (e) {
|
||||||
|
// Ignore parse errors
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
code: 0,
|
||||||
|
data: {
|
||||||
|
success: true,
|
||||||
|
isCompleted,
|
||||||
|
currentIndex: newCurrentIndex,
|
||||||
|
totalCount: evalDatasetIds.length,
|
||||||
|
scores: {
|
||||||
|
modelA: totalModelAScore,
|
||||||
|
modelB: totalModelBScore
|
||||||
|
}
|
||||||
|
},
|
||||||
|
message: isCompleted ? 'Blind-test task completed' : 'Vote recorded'
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to submit vote result:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{ code: 500, error: 'Failed to submit vote result', message: error.message },
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,226 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { db } from '@/lib/db/index';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get all blind-test tasks for a project
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const page = parseInt(searchParams.get('page') || '1');
|
||||||
|
const pageSize = parseInt(searchParams.get('pageSize') || '20');
|
||||||
|
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const skip = (page - 1) * pageSize;
|
||||||
|
|
||||||
|
// Fetch task list and total count
|
||||||
|
const [tasks, total] = await Promise.all([
|
||||||
|
db.task.findMany({
|
||||||
|
where: {
|
||||||
|
projectId,
|
||||||
|
taskType: 'blind-test'
|
||||||
|
},
|
||||||
|
orderBy: { createAt: 'desc' },
|
||||||
|
skip,
|
||||||
|
take: pageSize
|
||||||
|
}),
|
||||||
|
db.task.count({
|
||||||
|
where: {
|
||||||
|
projectId,
|
||||||
|
taskType: 'blind-test'
|
||||||
|
}
|
||||||
|
})
|
||||||
|
]);
|
||||||
|
|
||||||
|
// Fetch evaluation results for all tasks to calculate scores
|
||||||
|
const taskIds = tasks.map(t => t.id);
|
||||||
|
const allEvalResults = await db.evalResults.findMany({
|
||||||
|
where: { taskId: { in: taskIds } },
|
||||||
|
select: {
|
||||||
|
taskId: true,
|
||||||
|
judgeResponse: true
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Group results by taskId and calculate scores
|
||||||
|
const taskScores = {};
|
||||||
|
for (const result of allEvalResults) {
|
||||||
|
if (!taskScores[result.taskId]) {
|
||||||
|
taskScores[result.taskId] = { modelAScore: 0, modelBScore: 0 };
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
const judge = JSON.parse(result.judgeResponse || '{}');
|
||||||
|
taskScores[result.taskId].modelAScore += judge.modelAScore || 0;
|
||||||
|
taskScores[result.taskId].modelBScore += judge.modelBScore || 0;
|
||||||
|
} catch (e) {
|
||||||
|
// Ignore parse errors
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse task detail fields and attach scores
|
||||||
|
const tasksWithDetails = tasks.map(task => {
|
||||||
|
let detail = {};
|
||||||
|
let modelInfo = {};
|
||||||
|
try {
|
||||||
|
detail = task.detail ? JSON.parse(task.detail) : {};
|
||||||
|
modelInfo = task.modelInfo ? JSON.parse(task.modelInfo) : {};
|
||||||
|
} catch (e) {
|
||||||
|
console.error('Failed to parse task detail:', e);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Attach calculated scores as results array
|
||||||
|
const scores = taskScores[task.id] || { modelAScore: 0, modelBScore: 0 };
|
||||||
|
const results = [
|
||||||
|
{
|
||||||
|
modelAScore: scores.modelAScore,
|
||||||
|
modelBScore: scores.modelBScore
|
||||||
|
}
|
||||||
|
];
|
||||||
|
|
||||||
|
return {
|
||||||
|
...task,
|
||||||
|
detail: {
|
||||||
|
...detail,
|
||||||
|
results // Attach results for display in task card
|
||||||
|
},
|
||||||
|
modelInfo
|
||||||
|
};
|
||||||
|
});
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
code: 0,
|
||||||
|
data: {
|
||||||
|
items: tasksWithDetails,
|
||||||
|
total,
|
||||||
|
page,
|
||||||
|
pageSize,
|
||||||
|
totalPages: Math.ceil(total / pageSize)
|
||||||
|
}
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to fetch blind-test task list:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{ code: 500, error: 'Failed to fetch blind-test task list', message: error.message },
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Create a blind-test task
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const data = await request.json();
|
||||||
|
|
||||||
|
const { modelA, modelB, evalDatasetIds, language = 'zh-CN' } = data;
|
||||||
|
|
||||||
|
if (!modelA || !modelA.modelId || !modelA.providerId) {
|
||||||
|
return NextResponse.json({ code: 400, error: 'Please select model A' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!modelB || !modelB.modelId || !modelB.providerId) {
|
||||||
|
return NextResponse.json({ code: 400, error: 'Please select model B' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (modelA.modelId === modelB.modelId && modelA.providerId === modelB.providerId) {
|
||||||
|
return NextResponse.json({ code: 400, error: 'The two models must be different' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!evalDatasetIds || evalDatasetIds.length === 0) {
|
||||||
|
return NextResponse.json({ code: 400, error: 'Please select questions to evaluate' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const evalDatasets = await db.evalDatasets.findMany({
|
||||||
|
where: {
|
||||||
|
id: { in: evalDatasetIds },
|
||||||
|
projectId
|
||||||
|
},
|
||||||
|
select: { id: true, questionType: true }
|
||||||
|
});
|
||||||
|
|
||||||
|
const invalidQuestions = evalDatasets.filter(
|
||||||
|
q => q.questionType !== 'short_answer' && q.questionType !== 'open_ended'
|
||||||
|
);
|
||||||
|
|
||||||
|
if (invalidQuestions.length > 0) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
code: 400,
|
||||||
|
error: 'Blind-test tasks only support short-answer and open-ended questions'
|
||||||
|
},
|
||||||
|
{ status: 400 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch model config info
|
||||||
|
const [modelConfigA, modelConfigB] = await Promise.all([
|
||||||
|
db.modelConfig.findFirst({
|
||||||
|
where: { projectId, providerId: modelA.providerId, modelId: modelA.modelId }
|
||||||
|
}),
|
||||||
|
db.modelConfig.findFirst({
|
||||||
|
where: { projectId, providerId: modelB.providerId, modelId: modelB.modelId }
|
||||||
|
})
|
||||||
|
]);
|
||||||
|
|
||||||
|
// Build model info (two models)
|
||||||
|
const modelInfo = {
|
||||||
|
modelA: {
|
||||||
|
id: modelConfigA?.id,
|
||||||
|
modelId: modelA.modelId,
|
||||||
|
modelName: modelConfigA?.modelName || modelA.modelId,
|
||||||
|
providerId: modelA.providerId,
|
||||||
|
providerName: modelConfigA?.providerName || modelA.providerId
|
||||||
|
},
|
||||||
|
modelB: {
|
||||||
|
id: modelConfigB?.id,
|
||||||
|
modelId: modelB.modelId,
|
||||||
|
modelName: modelConfigB?.modelName || modelB.modelId,
|
||||||
|
providerId: modelB.providerId,
|
||||||
|
providerName: modelConfigB?.providerName || modelB.providerId
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
// Build task detail (only store evalDatasetIds and currentIndex)
|
||||||
|
const taskDetail = {
|
||||||
|
evalDatasetIds,
|
||||||
|
currentIndex: 0 // Current question index
|
||||||
|
};
|
||||||
|
|
||||||
|
// Create task
|
||||||
|
const newTask = await db.task.create({
|
||||||
|
data: {
|
||||||
|
projectId,
|
||||||
|
taskType: 'blind-test',
|
||||||
|
status: 0, // Running
|
||||||
|
modelInfo: JSON.stringify(modelInfo),
|
||||||
|
language,
|
||||||
|
detail: JSON.stringify(taskDetail),
|
||||||
|
totalCount: evalDatasetIds.length,
|
||||||
|
completedCount: 0,
|
||||||
|
note: ''
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
code: 0,
|
||||||
|
data: {
|
||||||
|
...newTask,
|
||||||
|
detail: taskDetail,
|
||||||
|
modelInfo
|
||||||
|
},
|
||||||
|
message: 'Blind-test task created'
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to create blind-test task:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{ code: 500, error: 'Failed to create blind-test task', message: error.message },
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,40 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import logger from '@/lib/util/logger';
|
||||||
|
import cleanService from '@/lib/services/clean';
|
||||||
|
|
||||||
|
// 为指定文本块进行数据清洗
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, chunkId } = params;
|
||||||
|
|
||||||
|
// 验证项目ID和文本块ID
|
||||||
|
if (!projectId || !chunkId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID or text block ID cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 获取请求体
|
||||||
|
const { model, language = '中文' } = await request.json();
|
||||||
|
|
||||||
|
if (!model) {
|
||||||
|
return NextResponse.json({ error: 'Model cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 使用数据清洗服务
|
||||||
|
const result = await cleanService.cleanDataForChunk(projectId, chunkId, {
|
||||||
|
model,
|
||||||
|
language
|
||||||
|
});
|
||||||
|
|
||||||
|
// 返回清洗结果
|
||||||
|
return NextResponse.json({
|
||||||
|
chunkId,
|
||||||
|
originalLength: result.originalLength,
|
||||||
|
cleanedLength: result.cleanedLength,
|
||||||
|
success: result.success,
|
||||||
|
message: '数据清洗完成'
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
logger.error('Error cleaning data:', error);
|
||||||
|
return NextResponse.json({ error: error.message || 'Error cleaning data' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,35 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { generateEvalQuestionsForChunk } from '@/lib/services/eval';
|
||||||
|
import logger from '@/lib/util/logger';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 为指定文本块生成测评题目
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, chunkId } = params;
|
||||||
|
|
||||||
|
// 验证参数
|
||||||
|
if (!projectId || !chunkId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID and Chunk ID are required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 获取请求体
|
||||||
|
const { model, language = 'zh-CN' } = await request.json();
|
||||||
|
|
||||||
|
if (!model) {
|
||||||
|
return NextResponse.json({ error: 'Model configuration is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 调用服务层生成测评题目
|
||||||
|
const result = await generateEvalQuestionsForChunk(projectId, chunkId, {
|
||||||
|
model,
|
||||||
|
language
|
||||||
|
});
|
||||||
|
|
||||||
|
return NextResponse.json(result);
|
||||||
|
} catch (error) {
|
||||||
|
logger.error('Error generating eval questions:', error);
|
||||||
|
return NextResponse.json({ error: error.message || 'Failed to generate eval questions' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,73 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getQuestionsForChunk } from '@/lib/db/questions';
|
||||||
|
import logger from '@/lib/util/logger';
|
||||||
|
import questionService from '@/lib/services/questions';
|
||||||
|
|
||||||
|
// 为指定文本块生成问题
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, chunkId } = params;
|
||||||
|
|
||||||
|
// 验证项目ID和文本块ID
|
||||||
|
if (!projectId || !chunkId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID or text block ID cannot be empty' }, { status: 400 });
|
||||||
|
} // 获取请求体
|
||||||
|
const { model, language = '中文', number, enableGaExpansion = false } = await request.json();
|
||||||
|
|
||||||
|
if (!model) {
|
||||||
|
return NextResponse.json({ error: 'Model cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 后续会根据是否有GA对来选择是否启用GA扩展选择服务函数
|
||||||
|
const serviceFunc = questionService.generateQuestionsForChunkWithGA;
|
||||||
|
|
||||||
|
// 使用问题生成服务
|
||||||
|
const result = await serviceFunc(projectId, chunkId, {
|
||||||
|
model,
|
||||||
|
language,
|
||||||
|
number,
|
||||||
|
enableGaExpansion
|
||||||
|
});
|
||||||
|
|
||||||
|
// 统一返回格式,确保包含GA扩展信息
|
||||||
|
const response = {
|
||||||
|
chunkId,
|
||||||
|
questions: result.questions || result.labelQuestions || [],
|
||||||
|
total: result.total || (result.questions || result.labelQuestions || []).length,
|
||||||
|
gaExpansionUsed: result.gaExpansionUsed || false,
|
||||||
|
gaPairsCount: result.gaPairsCount || 0,
|
||||||
|
expectedTotal: result.expectedTotal || result.total
|
||||||
|
};
|
||||||
|
|
||||||
|
// 返回生成的问题
|
||||||
|
return NextResponse.json(response);
|
||||||
|
} catch (error) {
|
||||||
|
logger.error('Error generating questions:', error);
|
||||||
|
return NextResponse.json({ error: error.message || 'Error generating questions' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 获取指定文本块的问题
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, chunkId } = params;
|
||||||
|
|
||||||
|
// 验证项目ID和文本块ID
|
||||||
|
if (!projectId || !chunkId) {
|
||||||
|
return NextResponse.json({ error: 'The item ID or text block ID cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 获取文本块的问题
|
||||||
|
const questions = await getQuestionsForChunk(projectId, chunkId);
|
||||||
|
|
||||||
|
// 返回问题列表
|
||||||
|
return NextResponse.json({
|
||||||
|
chunkId,
|
||||||
|
questions,
|
||||||
|
total: questions.length
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error getting questions:', String(error));
|
||||||
|
return NextResponse.json({ error: error.message || 'Error getting questions' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,73 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { deleteChunkById, getChunkById, updateChunkById } from '@/lib/db/chunks';
|
||||||
|
|
||||||
|
// 获取文本块内容
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, chunkId } = params;
|
||||||
|
// 验证参数
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
if (!chunkId) {
|
||||||
|
return NextResponse.json({ error: 'Text block ID cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
// 获取文本块内容
|
||||||
|
const chunk = await getChunkById(chunkId);
|
||||||
|
|
||||||
|
return NextResponse.json(chunk);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to get text block content:', String(error));
|
||||||
|
return NextResponse.json({ error: error.message || 'Failed to get text block content' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 删除文本块
|
||||||
|
export async function DELETE(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, chunkId } = params;
|
||||||
|
// 验证参数
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
if (!chunkId) {
|
||||||
|
return NextResponse.json({ error: 'Text block ID cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
await deleteChunkById(chunkId);
|
||||||
|
|
||||||
|
return NextResponse.json({ message: 'Text block deleted successfully' });
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to delete text block:', String(error));
|
||||||
|
return NextResponse.json({ error: error.message || 'Failed to delete text block' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 编辑文本块内容
|
||||||
|
export async function PATCH(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, chunkId } = params;
|
||||||
|
|
||||||
|
// 验证参数
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!chunkId) {
|
||||||
|
return NextResponse.json({ error: '文本块ID不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 解析请求体获取新内容
|
||||||
|
const requestData = await request.json();
|
||||||
|
const { content } = requestData;
|
||||||
|
|
||||||
|
if (!content) {
|
||||||
|
return NextResponse.json({ error: '内容不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
let res = await updateChunkById(chunkId, { content });
|
||||||
|
return NextResponse.json(res);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('编辑文本块失败:', String(error));
|
||||||
|
return NextResponse.json({ error: error.message || '编辑文本块失败' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,20 @@
|
|||||||
|
import { getChunkContentsByNames } from '@/lib/db/chunks';
|
||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { chunkNames } = await request.json();
|
||||||
|
|
||||||
|
if (!chunkNames || !Array.isArray(chunkNames)) {
|
||||||
|
return NextResponse.json({ error: 'chunkNames 参数必须是数组' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const chunkContentMap = await getChunkContentsByNames(projectId, chunkNames);
|
||||||
|
|
||||||
|
return NextResponse.json(chunkContentMap);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('批量获取文本块内容失败:', error);
|
||||||
|
return NextResponse.json({ error: '批量获取文本块内容失败' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,102 @@
|
|||||||
|
import { NextRequest, NextResponse } from 'next/server';
|
||||||
|
import { PrismaClient } from '@prisma/client';
|
||||||
|
|
||||||
|
const prisma = new PrismaClient();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 批量编辑文本块内容
|
||||||
|
* POST /api/projects/[projectId]/chunks/batch-edit
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const body = await request.json();
|
||||||
|
const { position, content, chunkIds } = body;
|
||||||
|
|
||||||
|
// 验证参数
|
||||||
|
if (!position || !content || !chunkIds || !Array.isArray(chunkIds) || chunkIds.length === 0) {
|
||||||
|
return NextResponse.json({ error: 'Missing required parameters: position, content, chunkIds' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!['start', 'end'].includes(position)) {
|
||||||
|
return NextResponse.json({ error: 'Position must be "start" or "end"' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 验证项目权限(获取要编辑的文本块)
|
||||||
|
const chunksToUpdate = await prisma.chunks.findMany({
|
||||||
|
where: {
|
||||||
|
id: { in: chunkIds },
|
||||||
|
projectId: projectId
|
||||||
|
},
|
||||||
|
select: {
|
||||||
|
id: true,
|
||||||
|
content: true,
|
||||||
|
name: true
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
if (chunksToUpdate.length === 0) {
|
||||||
|
return NextResponse.json({ error: 'Not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (chunksToUpdate.length !== chunkIds.length) {
|
||||||
|
return NextResponse.json({ error: 'Some chunks not found' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 准备更新数据
|
||||||
|
const updates = chunksToUpdate.map(chunk => {
|
||||||
|
let newContent;
|
||||||
|
|
||||||
|
if (position === 'start') {
|
||||||
|
// 在开头添加内容
|
||||||
|
newContent = content + '\n\n' + chunk.content;
|
||||||
|
} else {
|
||||||
|
// 在结尾添加内容
|
||||||
|
newContent = chunk.content + '\n\n' + content;
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
where: { id: chunk.id },
|
||||||
|
data: {
|
||||||
|
content: newContent,
|
||||||
|
size: newContent.length,
|
||||||
|
updateAt: new Date()
|
||||||
|
}
|
||||||
|
};
|
||||||
|
});
|
||||||
|
|
||||||
|
async function processBatches(items, batchSize, processFn) {
|
||||||
|
const results = [];
|
||||||
|
for (let i = 0; i < items.length; i += batchSize) {
|
||||||
|
const batch = items.slice(i, i + batchSize);
|
||||||
|
const batchResults = await Promise.all(batch.map(processFn));
|
||||||
|
results.push(...batchResults);
|
||||||
|
}
|
||||||
|
return results;
|
||||||
|
}
|
||||||
|
|
||||||
|
const BATCH_SIZE = 50; // 每批处理 50 个
|
||||||
|
await processBatches(updates, BATCH_SIZE, update => prisma.chunks.update(update));
|
||||||
|
|
||||||
|
// 记录操作日志(可选)
|
||||||
|
console.log(`Successfully updated ${chunksToUpdate.length} chunks`);
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
updatedCount: chunksToUpdate.length,
|
||||||
|
message: `Successfully updated ${chunksToUpdate.length} chunks`
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('批量编辑文本块失败:', error);
|
||||||
|
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: 'Batch edit chunks failed',
|
||||||
|
details: error.message
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
} finally {
|
||||||
|
await prisma.$disconnect();
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,35 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getChunkByName } from '@/lib/db/chunks';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 根据文本块名称获取文本块
|
||||||
|
* @param {Request} request 请求对象
|
||||||
|
* @param {object} context 上下文,包含路径参数
|
||||||
|
* @returns {Promise<NextResponse>} 响应对象
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
|
||||||
|
// 从查询参数中获取 chunkName
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const chunkName = searchParams.get('chunkName');
|
||||||
|
|
||||||
|
if (!chunkName) {
|
||||||
|
return NextResponse.json({ error: '文本块名称不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 根据名称和项目ID查询文本块
|
||||||
|
const chunk = await getChunkByName(projectId, chunkName);
|
||||||
|
|
||||||
|
if (!chunk) {
|
||||||
|
return NextResponse.json({ error: '未找到指定的文本块' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 返回文本块信息
|
||||||
|
return NextResponse.json(chunk);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('根据名称获取文本块失败:', String(error));
|
||||||
|
return NextResponse.json({ error: '获取文本块失败: ' + error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,21 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { deleteChunkById, getChunkByFileIds, getChunkById, getChunksByFileIds, updateChunkById } from '@/lib/db/chunks';
|
||||||
|
|
||||||
|
// 获取文本块内容
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
// 验证参数
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
const { array } = await request.json();
|
||||||
|
// 获取文本块内容
|
||||||
|
const chunk = await getChunksByFileIds(array);
|
||||||
|
|
||||||
|
return NextResponse.json(chunk);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to get text block content:', String(error));
|
||||||
|
return NextResponse.json({ error: String(error) || 'Failed to get text block content' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,36 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getProject, updateProject, getTaskConfig } from '@/lib/db/projects';
|
||||||
|
|
||||||
|
// 获取项目配置
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const projectId = params.projectId;
|
||||||
|
const config = await getProject(projectId);
|
||||||
|
const taskConfig = await getTaskConfig(projectId);
|
||||||
|
return NextResponse.json({ ...config, ...taskConfig });
|
||||||
|
} catch (error) {
|
||||||
|
console.error('获取项目配置失败:', String(error));
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 更新项目配置
|
||||||
|
export async function PUT(request, { params }) {
|
||||||
|
try {
|
||||||
|
const projectId = params.projectId;
|
||||||
|
const newConfig = await request.json();
|
||||||
|
const currentConfig = await getProject(projectId);
|
||||||
|
|
||||||
|
// 只更新 prompts 部分
|
||||||
|
const updatedConfig = {
|
||||||
|
...currentConfig,
|
||||||
|
...newConfig.prompts
|
||||||
|
};
|
||||||
|
|
||||||
|
const config = await updateProject(projectId, updatedConfig);
|
||||||
|
return NextResponse.json(config);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('更新项目配置失败:', String(error));
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,105 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import {
|
||||||
|
getCustomPrompts,
|
||||||
|
getCustomPrompt,
|
||||||
|
saveCustomPrompt,
|
||||||
|
deleteCustomPrompt,
|
||||||
|
batchSaveCustomPrompts,
|
||||||
|
toggleCustomPrompt,
|
||||||
|
getPromptTemplates
|
||||||
|
} from '@/lib/db/custom-prompts';
|
||||||
|
|
||||||
|
// 获取项目的自定义提示词
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const promptType = searchParams.get('promptType');
|
||||||
|
const language = searchParams.get('language');
|
||||||
|
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const customPrompts = await getCustomPrompts(projectId, promptType, language);
|
||||||
|
const templates = await getPromptTemplates();
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
customPrompts,
|
||||||
|
templates
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('获取自定义提示词失败:', error);
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 保存自定义提示词
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const body = await request.json();
|
||||||
|
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 批量保存
|
||||||
|
if (body.prompts && Array.isArray(body.prompts)) {
|
||||||
|
const results = await batchSaveCustomPrompts(projectId, body.prompts);
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
results
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// 单个保存
|
||||||
|
const { promptType, promptKey, language, content } = body;
|
||||||
|
if (!promptType || !promptKey || !language || content === undefined) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: 'promptType, promptKey, language and content are required'
|
||||||
|
},
|
||||||
|
{ status: 400 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
const result = await saveCustomPrompt(projectId, promptType, promptKey, language, content);
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
result
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('保存自定义提示词失败:', error);
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 删除自定义提示词
|
||||||
|
export async function DELETE(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const promptType = searchParams.get('promptType');
|
||||||
|
const promptKey = searchParams.get('promptKey');
|
||||||
|
const language = searchParams.get('language');
|
||||||
|
|
||||||
|
if (!projectId || !promptType || !promptKey || !language) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: 'projectId, promptType, promptKey and language are required'
|
||||||
|
},
|
||||||
|
{ status: 400 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
const success = await deleteCustomPrompt(projectId, promptType, promptKey, language);
|
||||||
|
return NextResponse.json({
|
||||||
|
success
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('删除自定义提示词失败:', error);
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,116 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { saveChunks, deleteChunksByFileId } from '@/lib/db/chunks';
|
||||||
|
import path from 'path';
|
||||||
|
import fs from 'fs/promises';
|
||||||
|
import { getProjectRoot } from '@/lib/db/base';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 处理自定义分块请求
|
||||||
|
* @param {Request} request - 请求对象
|
||||||
|
* @param {Object} params - 路由参数
|
||||||
|
* @returns {Promise<Response>} - 响应对象
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { fileId, fileName, content, splitPoints } = await request.json();
|
||||||
|
|
||||||
|
// 参数验证
|
||||||
|
if (!projectId || !fileId || !fileName || !content || !splitPoints) {
|
||||||
|
return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 获取项目根目录
|
||||||
|
const projectRoot = await getProjectRoot();
|
||||||
|
const projectPath = path.join(projectRoot, projectId);
|
||||||
|
|
||||||
|
// 检查项目是否存在
|
||||||
|
try {
|
||||||
|
await fs.access(projectPath);
|
||||||
|
} catch (error) {
|
||||||
|
return NextResponse.json({ error: 'Project does not exist' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 先删除该文件已有的文本块
|
||||||
|
await deleteChunksByFileId(projectId, fileId);
|
||||||
|
|
||||||
|
// 根据分块点将文件内容分割成多个块
|
||||||
|
const customChunks = generateCustomChunks(projectId, fileId, fileName, content, splitPoints);
|
||||||
|
|
||||||
|
// 保存新的文本块
|
||||||
|
await saveChunks(customChunks);
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
message: 'Custom chunks saved successfully',
|
||||||
|
totalChunks: customChunks.length
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('自定义分块处理出错:', String(error));
|
||||||
|
return NextResponse.json({ error: error.message || 'Failed to process custom split request' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 根据分块点生成自定义文本块
|
||||||
|
* @param {string} projectId - 项目ID
|
||||||
|
* @param {string} fileId - 文件ID
|
||||||
|
* @param {string} fileName - 文件名
|
||||||
|
* @param {string} content - 文件内容
|
||||||
|
* @param {Array} splitPoints - 分块点数组
|
||||||
|
* @returns {Array} - 生成的文本块数组
|
||||||
|
*/
|
||||||
|
function generateCustomChunks(projectId, fileId, fileName, content, splitPoints) {
|
||||||
|
// 按位置排序分块点
|
||||||
|
const sortedPoints = [...splitPoints].sort((a, b) => a.position - b.position);
|
||||||
|
|
||||||
|
// 创建分块
|
||||||
|
const chunks = [];
|
||||||
|
let startPos = 0;
|
||||||
|
|
||||||
|
// 处理每个分块点
|
||||||
|
for (let i = 0; i < sortedPoints.length; i++) {
|
||||||
|
const endPos = sortedPoints[i].position;
|
||||||
|
|
||||||
|
// 提取当前分块内容
|
||||||
|
const chunkContent = content.substring(startPos, endPos);
|
||||||
|
|
||||||
|
// 跳过空白分块
|
||||||
|
if (chunkContent.trim().length === 0) {
|
||||||
|
startPos = endPos;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 创建分块对象
|
||||||
|
const chunk = {
|
||||||
|
projectId,
|
||||||
|
name: `${path.basename(fileName, path.extname(fileName))}-part-${i + 1}`,
|
||||||
|
fileId,
|
||||||
|
fileName,
|
||||||
|
content: chunkContent,
|
||||||
|
summary: `${fileName} 自定义分块 ${i + 1}/${sortedPoints.length + 1}`,
|
||||||
|
size: chunkContent.length
|
||||||
|
};
|
||||||
|
|
||||||
|
chunks.push(chunk);
|
||||||
|
startPos = endPos;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 添加最后一个分块(如果有内容)
|
||||||
|
const lastChunkContent = content.substring(startPos);
|
||||||
|
if (lastChunkContent.trim().length > 0) {
|
||||||
|
const lastChunk = {
|
||||||
|
projectId,
|
||||||
|
name: `${path.basename(fileName, path.extname(fileName))}-part-${sortedPoints.length + 1}`,
|
||||||
|
fileId,
|
||||||
|
fileName,
|
||||||
|
content: lastChunkContent,
|
||||||
|
summary: `${fileName} 自定义分块 ${sortedPoints.length + 1}/${sortedPoints.length + 1}`,
|
||||||
|
size: lastChunkContent.length
|
||||||
|
};
|
||||||
|
|
||||||
|
chunks.push(lastChunk);
|
||||||
|
}
|
||||||
|
|
||||||
|
return chunks;
|
||||||
|
}
|
||||||
@@ -0,0 +1,183 @@
|
|||||||
|
/**
|
||||||
|
* 单个多轮对话数据集操作API
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import {
|
||||||
|
getDatasetConversationById,
|
||||||
|
updateDatasetConversation,
|
||||||
|
deleteDatasetConversation,
|
||||||
|
getConversationNavigationItems
|
||||||
|
} from '@/lib/db/dataset-conversations';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 获取单个多轮对话数据集详情
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, conversationId } = params;
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const operateType = searchParams.get('operateType');
|
||||||
|
|
||||||
|
// 如果是导航操作,返回导航项
|
||||||
|
if (operateType !== null) {
|
||||||
|
const data = await getConversationNavigationItems(projectId, conversationId, operateType);
|
||||||
|
return NextResponse.json(data);
|
||||||
|
}
|
||||||
|
|
||||||
|
const conversation = await getDatasetConversationById(conversationId);
|
||||||
|
|
||||||
|
if (!conversation) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: '对话数据集不存在'
|
||||||
|
},
|
||||||
|
{ status: 404 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (conversation.projectId !== projectId) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: '对话数据集不属于指定项目'
|
||||||
|
},
|
||||||
|
{ status: 403 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json(conversation);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('获取多轮对话数据集详情失败:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: error.message
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 更新多轮对话数据集
|
||||||
|
*/
|
||||||
|
export async function PUT(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, conversationId } = params;
|
||||||
|
const body = await request.json();
|
||||||
|
|
||||||
|
// 验证对话数据集是否存在且属于项目
|
||||||
|
const conversation = await getDatasetConversationById(conversationId);
|
||||||
|
|
||||||
|
if (!conversation) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: '对话数据集不存在'
|
||||||
|
},
|
||||||
|
{ status: 404 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (conversation.projectId !== projectId) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: '对话数据集不属于指定项目'
|
||||||
|
},
|
||||||
|
{ status: 403 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// 只允许更新特定字段
|
||||||
|
const allowedFields = ['score', 'tags', 'note', 'confirmed', 'aiEvaluation', 'messages'];
|
||||||
|
const updateData = {};
|
||||||
|
|
||||||
|
allowedFields.forEach(field => {
|
||||||
|
if (body.hasOwnProperty(field)) {
|
||||||
|
if (field === 'messages') {
|
||||||
|
// 将messages数组转换为rawMessages字符串存储
|
||||||
|
updateData['rawMessages'] = JSON.stringify(body[field]);
|
||||||
|
} else {
|
||||||
|
updateData[field] = body[field];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
if (Object.keys(updateData).length === 0) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: '没有有效的更新字段'
|
||||||
|
},
|
||||||
|
{ status: 400 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
const updatedConversation = await updateDatasetConversation(conversationId, updateData);
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
data: updatedConversation
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('更新多轮对话数据集失败:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: error.message
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 删除多轮对话数据集
|
||||||
|
*/
|
||||||
|
export async function DELETE(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, conversationId } = params;
|
||||||
|
|
||||||
|
// 验证对话数据集是否存在且属于项目
|
||||||
|
const conversation = await getDatasetConversationById(conversationId);
|
||||||
|
|
||||||
|
if (!conversation) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: '对话数据集不存在'
|
||||||
|
},
|
||||||
|
{ status: 404 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (conversation.projectId !== projectId) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: '对话数据集不属于指定项目'
|
||||||
|
},
|
||||||
|
{ status: 403 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
await deleteDatasetConversation(conversationId);
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
message: '删除成功'
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('删除多轮对话数据集失败:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: error.message
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,68 @@
|
|||||||
|
/**
|
||||||
|
* 多轮对话数据集导出API
|
||||||
|
* 直接导出原始的 ShareGPT 格式数据集
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getAllDatasetConversations } from '@/lib/db/dataset-conversations';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 导出多轮对话数据集
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
|
||||||
|
// 筛选条件
|
||||||
|
const filters = {
|
||||||
|
confirmed: searchParams.get('confirmed')
|
||||||
|
};
|
||||||
|
|
||||||
|
// 清除空值
|
||||||
|
Object.keys(filters).forEach(key => {
|
||||||
|
if (!filters[key]) delete filters[key];
|
||||||
|
});
|
||||||
|
|
||||||
|
// 获取所有对话数据集
|
||||||
|
const conversations = await getAllDatasetConversations(projectId, filters);
|
||||||
|
|
||||||
|
if (conversations.length === 0) {
|
||||||
|
return NextResponse.json([]);
|
||||||
|
}
|
||||||
|
|
||||||
|
// 转换为 ShareGPT 格式数组
|
||||||
|
const shareGptData = [];
|
||||||
|
|
||||||
|
for (const conversation of conversations) {
|
||||||
|
try {
|
||||||
|
// 解析 rawMessages
|
||||||
|
const messages = JSON.parse(conversation.rawMessages || '[]');
|
||||||
|
|
||||||
|
if (messages.length > 0) {
|
||||||
|
// 构建 ShareGPT 格式对象
|
||||||
|
const shareGptItem = {
|
||||||
|
messages: messages
|
||||||
|
};
|
||||||
|
|
||||||
|
shareGptData.push(shareGptItem);
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`解析对话消息失败 ${conversation.id}:`, error);
|
||||||
|
// 跳过解析失败的对话,继续处理其他对话
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json(shareGptData);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('导出多轮对话数据集失败:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: error.message
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,135 @@
|
|||||||
|
/**
|
||||||
|
* 多轮对话数据集管理API
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import {
|
||||||
|
getDatasetConversationsByPagination,
|
||||||
|
getAllDatasetConversationIds,
|
||||||
|
createDatasetConversation
|
||||||
|
} from '@/lib/db/dataset-conversations';
|
||||||
|
import { generateMultiTurnConversation } from '@/lib/services/multi-turn/index';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 获取多轮对话数据集列表(支持分页和筛选)
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
|
||||||
|
const getAllIds = searchParams.get('getAllIds') === 'true'; // 新增:获取所有对话ID的标志
|
||||||
|
|
||||||
|
// 筛选条件
|
||||||
|
const filters = {
|
||||||
|
keyword: searchParams.get('keyword'),
|
||||||
|
roleA: searchParams.get('roleA'),
|
||||||
|
roleB: searchParams.get('roleB'),
|
||||||
|
scenario: searchParams.get('scenario'),
|
||||||
|
scoreMin: searchParams.get('scoreMin'),
|
||||||
|
scoreMax: searchParams.get('scoreMax'),
|
||||||
|
confirmed: searchParams.get('confirmed')
|
||||||
|
};
|
||||||
|
|
||||||
|
// 清除空值
|
||||||
|
Object.keys(filters).forEach(key => {
|
||||||
|
if (!filters[key]) delete filters[key];
|
||||||
|
});
|
||||||
|
|
||||||
|
// 如果请求获取所有ID
|
||||||
|
if (getAllIds) {
|
||||||
|
const allConversationIds = await getAllDatasetConversationIds(projectId, filters);
|
||||||
|
return NextResponse.json({ allConversationIds });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 正常分页查询
|
||||||
|
const page = parseInt(searchParams.get('page') || '1');
|
||||||
|
const pageSize = parseInt(searchParams.get('pageSize') || '20');
|
||||||
|
|
||||||
|
const result = await getDatasetConversationsByPagination(projectId, page, pageSize, filters);
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
...result
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('获取多轮对话数据集失败:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: error.message
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 创建多轮对话数据集
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const body = await request.json();
|
||||||
|
|
||||||
|
const { questionId, systemPrompt, scenario, rounds, roleA, roleB, model, language = '中文' } = body;
|
||||||
|
|
||||||
|
if (!questionId) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: '问题ID不能为空'
|
||||||
|
},
|
||||||
|
{ status: 400 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!model || !model.modelId) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: '模型配置不能为空'
|
||||||
|
},
|
||||||
|
{ status: 400 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// 构建配置
|
||||||
|
const config = {
|
||||||
|
systemPrompt: systemPrompt || '',
|
||||||
|
scenario: scenario || '',
|
||||||
|
rounds: rounds || 3,
|
||||||
|
roleA: roleA || '用户',
|
||||||
|
roleB: roleB || '助手',
|
||||||
|
model,
|
||||||
|
language
|
||||||
|
};
|
||||||
|
|
||||||
|
// 生成多轮对话
|
||||||
|
const result = await generateMultiTurnConversation(projectId, questionId, config);
|
||||||
|
|
||||||
|
if (!result.success) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: result.error
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
data: result.data
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('创建多轮对话数据集失败:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: error.message
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,42 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getAllDatasetConversations } from '@/lib/db/dataset-conversations';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 获取项目中多轮对话数据集的所有标签
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 获取项目所有对话数据集
|
||||||
|
const conversations = await getAllDatasetConversations(projectId);
|
||||||
|
|
||||||
|
// 提取所有标签
|
||||||
|
const allTags = new Set();
|
||||||
|
|
||||||
|
conversations.forEach(conversation => {
|
||||||
|
if (conversation.tags && typeof conversation.tags === 'string') {
|
||||||
|
const tags = conversation.tags.split(/\s+/).filter(tag => tag.trim().length > 0);
|
||||||
|
tags.forEach(tag => allTags.add(tag.trim()));
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
tags: Array.from(allTags).sort()
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('获取对话标签失败:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
message: error.message
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,77 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { db } from '@/lib/db';
|
||||||
|
|
||||||
|
export async function POST(req, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, datasetId } = params;
|
||||||
|
|
||||||
|
// 1. 获取数据集详情
|
||||||
|
const dataset = await db.datasets.findUnique({
|
||||||
|
where: { id: datasetId, projectId }
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!dataset) {
|
||||||
|
return NextResponse.json({ error: 'Dataset not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2. 尝试通过 questionId 查找关联的 chunkId
|
||||||
|
let chunkId = null;
|
||||||
|
if (dataset.questionId) {
|
||||||
|
const question = await db.questions.findUnique({
|
||||||
|
where: { id: dataset.questionId }
|
||||||
|
});
|
||||||
|
if (question) {
|
||||||
|
chunkId = question.chunkId;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3. 创建评估数据集记录
|
||||||
|
// 默认使用 open_ended 类型,因为通常数据集是问答对,适合作为评估
|
||||||
|
let evalTags = [];
|
||||||
|
try {
|
||||||
|
evalTags = JSON.parse(dataset.tags || '[]');
|
||||||
|
if (!Array.isArray(evalTags)) evalTags = [];
|
||||||
|
} catch (e) {
|
||||||
|
evalTags = [];
|
||||||
|
}
|
||||||
|
|
||||||
|
// 排除 'Eval' 标签,并将数组转为逗号分隔的字符串
|
||||||
|
const evalTagsString = evalTags.filter(tag => tag !== 'Eval').join(',');
|
||||||
|
|
||||||
|
const evalDataset = await db.evalDatasets.create({
|
||||||
|
data: {
|
||||||
|
projectId,
|
||||||
|
question: dataset.question,
|
||||||
|
questionType: 'open_ended',
|
||||||
|
correctAnswer: dataset.answer,
|
||||||
|
tags: evalTagsString,
|
||||||
|
note: dataset.note,
|
||||||
|
chunkId: chunkId,
|
||||||
|
options: '' // 开放题不需要选项
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// 4. 更新原数据集,添加 'Eval' 标签
|
||||||
|
let currentTags = [];
|
||||||
|
try {
|
||||||
|
currentTags = JSON.parse(dataset.tags || '[]');
|
||||||
|
} catch (e) {
|
||||||
|
// ignore error
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!currentTags.includes('Eval')) {
|
||||||
|
currentTags.push('Eval');
|
||||||
|
await db.datasets.update({
|
||||||
|
where: { id: datasetId },
|
||||||
|
data: {
|
||||||
|
tags: JSON.stringify(currentTags)
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json({ success: true, evalDataset });
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to copy dataset to eval:', error);
|
||||||
|
return NextResponse.json({ error: 'Internal Server Error' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,36 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { evaluateDataset } from '@/lib/services/datasets/evaluation';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 评估单个数据集的质量
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, datasetId } = params;
|
||||||
|
const { model, language = 'zh-CN' } = await request.json();
|
||||||
|
|
||||||
|
if (!projectId || !datasetId) {
|
||||||
|
return NextResponse.json({ success: false, message: '项目ID和数据集ID不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!model) {
|
||||||
|
return NextResponse.json({ success: false, message: '模型配置不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 使用评估服务进行数据集评估
|
||||||
|
const result = await evaluateDataset(projectId, datasetId, model, language);
|
||||||
|
|
||||||
|
if (!result.success) {
|
||||||
|
return NextResponse.json({ success: false, message: result.error }, { status: 500 });
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
message: '数据集评估完成',
|
||||||
|
data: result.data
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('数据集评估失败:', error);
|
||||||
|
return NextResponse.json({ success: false, message: `评估失败: ${error.message}` }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,82 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getDatasetsById, getDatasetsCounts, getNavigationItems, updateDatasetMetadata } from '@/lib/db/datasets';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 获取项目的所有数据集
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, datasetId } = params;
|
||||||
|
// 验证项目ID
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
if (!datasetId) {
|
||||||
|
return NextResponse.json({ error: '数据集ID不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const operateType = searchParams.get('operateType');
|
||||||
|
if (operateType !== null) {
|
||||||
|
const data = await getNavigationItems(projectId, datasetId, operateType);
|
||||||
|
return NextResponse.json(data);
|
||||||
|
}
|
||||||
|
const datasets = await getDatasetsById(datasetId);
|
||||||
|
let counts = await getDatasetsCounts(projectId);
|
||||||
|
|
||||||
|
return NextResponse.json({ datasets, ...counts });
|
||||||
|
} catch (error) {
|
||||||
|
console.error('获取数据集详情失败:', String(error));
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: error.message || '获取数据集详情失败'
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 更新数据集元数据(评分、标签、备注)
|
||||||
|
*/
|
||||||
|
export async function PATCH(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, datasetId } = params;
|
||||||
|
|
||||||
|
// 验证参数
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
if (!datasetId) {
|
||||||
|
return NextResponse.json({ error: '数据集ID不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const body = await request.json();
|
||||||
|
const { score, tags, note } = body;
|
||||||
|
|
||||||
|
// 验证评分范围
|
||||||
|
if (score !== undefined && (score < 0 || score > 5)) {
|
||||||
|
return NextResponse.json({ error: '评分必须在0-5之间' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 验证标签格式
|
||||||
|
if (tags !== undefined && !Array.isArray(tags)) {
|
||||||
|
return NextResponse.json({ error: '标签必须是数组格式' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 更新数据集元数据
|
||||||
|
const updatedDataset = await updateDatasetMetadata(datasetId, { score, tags, note });
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
dataset: updatedDataset
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('更新数据集元数据失败:', String(error));
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: error.message || '更新数据集元数据失败'
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,52 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getDatasetsById } from '@/lib/db/datasets';
|
||||||
|
import { getEncoding } from '@langchain/core/utils/tiktoken';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 异步计算数据集文本的Token数量
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId, datasetId } = params;
|
||||||
|
|
||||||
|
if (!datasetId) {
|
||||||
|
return NextResponse.json({ error: '数据集ID不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const datasets = await getDatasetsById(datasetId);
|
||||||
|
const tokenCounts = {
|
||||||
|
answerTokens: 0,
|
||||||
|
cotTokens: 0
|
||||||
|
};
|
||||||
|
|
||||||
|
try {
|
||||||
|
if (datasets.answer || datasets.cot) {
|
||||||
|
// 使用 cl100k_base 编码,适用于 gpt-3.5-turbo 和 gpt-4
|
||||||
|
const encoding = await getEncoding('cl100k_base');
|
||||||
|
|
||||||
|
if (datasets.answer) {
|
||||||
|
const tokens = encoding.encode(datasets.answer);
|
||||||
|
tokenCounts.answerTokens = tokens.length;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (datasets.cot) {
|
||||||
|
const tokens = encoding.encode(datasets.cot);
|
||||||
|
tokenCounts.cotTokens = tokens.length;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error('计算Token数量失败:', String(error));
|
||||||
|
return NextResponse.json({ error: '计算Token数量失败' }, { status: 500 });
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json(tokenCounts);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('获取Token计数失败:', String(error));
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: error.message || '获取Token计数失败'
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,55 @@
|
|||||||
|
/**
|
||||||
|
* 批量数据集评估任务API
|
||||||
|
* 创建批量评估数据集质量的异步任务
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { db } from '@/lib/db/index';
|
||||||
|
import { processTask } from '@/lib/services/tasks/index';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 创建批量数据集评估任务
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { model, language = 'zh-CN' } = await request.json();
|
||||||
|
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ success: false, message: '项目ID不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!model || !model.modelId) {
|
||||||
|
return NextResponse.json({ success: false, message: '模型配置不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 创建批量评估任务
|
||||||
|
const newTask = await db.task.create({
|
||||||
|
data: {
|
||||||
|
projectId,
|
||||||
|
taskType: 'dataset-evaluation',
|
||||||
|
status: 0, // 初始状态: 处理中
|
||||||
|
modelInfo: JSON.stringify(model),
|
||||||
|
language: language || 'zh-CN',
|
||||||
|
detail: '',
|
||||||
|
totalCount: 0,
|
||||||
|
note: '准备开始批量评估数据集质量...',
|
||||||
|
completedCount: 0
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// 异步处理任务
|
||||||
|
processTask(newTask.id).catch(err => {
|
||||||
|
console.error(`批量评估任务启动失败: ${newTask.id}`, String(err));
|
||||||
|
});
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
message: '批量评估任务已创建',
|
||||||
|
data: { taskId: newTask.id }
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('创建批量评估任务失败:', error);
|
||||||
|
return NextResponse.json({ success: false, message: `创建任务失败: ${error.message}` }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,128 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import {
|
||||||
|
getDatasets,
|
||||||
|
getBalancedDatasetsByTags,
|
||||||
|
getTagsWithDatasetCounts,
|
||||||
|
getDatasetsBatch,
|
||||||
|
getBalancedDatasetsByTagsBatch,
|
||||||
|
getDatasetsByIds,
|
||||||
|
getDatasetsByIdsBatch
|
||||||
|
} from '@/lib/db/datasets';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 获取导出数据集
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
|
||||||
|
// 验证项目ID
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const confirmedParam = searchParams.get('confirmed');
|
||||||
|
const confirmed = confirmedParam === null ? undefined : confirmedParam === 'true';
|
||||||
|
|
||||||
|
// 获取标签统计信息
|
||||||
|
const tagStats = await getTagsWithDatasetCounts(projectId, confirmed);
|
||||||
|
return NextResponse.json(tagStats);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to get tag statistics:', String(error));
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: error.message || 'Failed to get tag statistics'
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 获取标签统计信息
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const body = await request.json();
|
||||||
|
|
||||||
|
// 验证项目ID
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
let status = body.status;
|
||||||
|
let confirmed = undefined;
|
||||||
|
if (status === 'confirmed') confirmed = true;
|
||||||
|
if (status === 'unconfirmed') confirmed = false;
|
||||||
|
|
||||||
|
// 检查是否是分批导出模式
|
||||||
|
const batchMode = body.batchMode ? 'true' : 'false';
|
||||||
|
const offset = body.offset ?? 0;
|
||||||
|
const batchSize = body.batchSize ?? 1000;
|
||||||
|
|
||||||
|
// 检查是否是平衡导出
|
||||||
|
const balanceMode = body.balanceMode ? 'true' : 'false';
|
||||||
|
const balanceConfig = body.balanceConfig;
|
||||||
|
|
||||||
|
// 检查是否有选中的数据集 ID
|
||||||
|
const selectedIds = Array.isArray(body.selectedIds) ? body.selectedIds : null;
|
||||||
|
|
||||||
|
if (batchMode === 'true') {
|
||||||
|
// 分批导出模式
|
||||||
|
if (selectedIds && selectedIds.length > 0) {
|
||||||
|
// 按选中 ID 分批导出
|
||||||
|
const datasets = await getDatasetsByIdsBatch(projectId, selectedIds, offset, batchSize);
|
||||||
|
const hasMore = datasets.length === batchSize;
|
||||||
|
return NextResponse.json({
|
||||||
|
data: datasets,
|
||||||
|
hasMore,
|
||||||
|
offset: offset + datasets.length
|
||||||
|
});
|
||||||
|
} else if (balanceMode === 'true' && balanceConfig) {
|
||||||
|
// 平衡分批导出
|
||||||
|
const parsedConfig = typeof balanceConfig === 'string' ? JSON.parse(balanceConfig) : balanceConfig;
|
||||||
|
const result = await getBalancedDatasetsByTagsBatch(projectId, parsedConfig, confirmed, offset, batchSize);
|
||||||
|
return NextResponse.json({
|
||||||
|
data: result.data,
|
||||||
|
hasMore: result.hasMore,
|
||||||
|
offset: offset + result.data.length
|
||||||
|
});
|
||||||
|
} else {
|
||||||
|
// 常规分批导出
|
||||||
|
const datasets = await getDatasetsBatch(projectId, confirmed, offset, batchSize);
|
||||||
|
const hasMore = datasets.length === batchSize;
|
||||||
|
return NextResponse.json({
|
||||||
|
data: datasets,
|
||||||
|
hasMore,
|
||||||
|
offset: offset + datasets.length
|
||||||
|
});
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
// 传统一次性导出模式(保持向后兼容)
|
||||||
|
if (selectedIds && selectedIds.length > 0) {
|
||||||
|
// 按选中 ID 导出
|
||||||
|
const datasets = await getDatasetsByIds(projectId, selectedIds);
|
||||||
|
return NextResponse.json(datasets);
|
||||||
|
} else if (balanceMode === 'true' && balanceConfig) {
|
||||||
|
// 平衡导出模式
|
||||||
|
const parsedConfig = typeof balanceConfig === 'string' ? JSON.parse(balanceConfig) : balanceConfig;
|
||||||
|
const datasets = await getBalancedDatasetsByTags(projectId, parsedConfig, confirmed);
|
||||||
|
return NextResponse.json(datasets);
|
||||||
|
} else {
|
||||||
|
// 常规导出模式
|
||||||
|
const datasets = await getDatasets(projectId, confirmed);
|
||||||
|
return NextResponse.json(datasets);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to get datasets:', String(error));
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: error.message || 'Failed to get datasets'
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,44 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getDatasetsById } from '@/lib/db/datasets';
|
||||||
|
import LLMClient from '@/lib/llm/core/index';
|
||||||
|
import { getEvalQuestionPrompt } from '@/lib/llm/prompts/evalQuestion';
|
||||||
|
import { extractJsonFromLLMOutput } from '@/lib/llm/common/util';
|
||||||
|
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { datasetId, model, language, questionType = 'open_ended', count = 1 } = await request.json();
|
||||||
|
|
||||||
|
if (!datasetId || !model) {
|
||||||
|
return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 1. 获取原数据集
|
||||||
|
const dataset = await getDatasetsById(datasetId);
|
||||||
|
if (!dataset) {
|
||||||
|
return NextResponse.json({ error: 'Dataset not found' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2. 构建提示词
|
||||||
|
// 将原问题和答案合并作为上下文文本
|
||||||
|
const text = `Question: ${dataset.question}\nAnswer: ${dataset.answer}`;
|
||||||
|
|
||||||
|
const prompt = await getEvalQuestionPrompt(language || 'zh-CN', questionType, { text, number: count }, projectId);
|
||||||
|
|
||||||
|
// 3. 调用 LLM
|
||||||
|
const client = new LLMClient(model);
|
||||||
|
|
||||||
|
const response = await client.getResponse(prompt);
|
||||||
|
const result = extractJsonFromLLMOutput(response);
|
||||||
|
|
||||||
|
// 结果应该是一个数组
|
||||||
|
if (!result || !Array.isArray(result)) {
|
||||||
|
throw new Error('Failed to parse LLM output or output is not an array');
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json({ success: true, data: result });
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Generate eval variant failed:', error);
|
||||||
|
return NextResponse.json({ error: error.message || 'Internal Server Error' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,109 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { createDataset } from '@/lib/db/datasets';
|
||||||
|
import { nanoid } from 'nanoid';
|
||||||
|
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { datasets, sourceInfo } = await request.json();
|
||||||
|
|
||||||
|
if (!datasets || !Array.isArray(datasets)) {
|
||||||
|
return NextResponse.json({ error: 'Invalid datasets data' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const results = [];
|
||||||
|
const errors = [];
|
||||||
|
let successCount = 0;
|
||||||
|
let skippedCount = 0;
|
||||||
|
|
||||||
|
for (let i = 0; i < datasets.length; i++) {
|
||||||
|
try {
|
||||||
|
const dataset = datasets[i];
|
||||||
|
|
||||||
|
// 安全获取与清洗字段
|
||||||
|
const q = typeof dataset?.question === 'string' ? dataset.question.trim() : '';
|
||||||
|
const a = typeof dataset?.answer === 'string' ? dataset.answer.trim() : '';
|
||||||
|
|
||||||
|
// 验证必填字段:缺失则跳过
|
||||||
|
if (!q || !a) {
|
||||||
|
errors.push(`第 ${i + 1} 条记录缺少必填字段(question/answer),已跳过`);
|
||||||
|
skippedCount++;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 规范化可选字段
|
||||||
|
const chunkName = dataset?.chunkName || 'Imported Data';
|
||||||
|
const chunkContent = dataset?.chunkContent || 'Imported from external source';
|
||||||
|
const model = dataset?.model || 'imported';
|
||||||
|
const questionLabel = dataset?.questionLabel || '';
|
||||||
|
const cot = typeof dataset?.cot === 'string' ? dataset.cot : '';
|
||||||
|
const confirmed = typeof dataset?.confirmed === 'boolean' ? dataset.confirmed : false;
|
||||||
|
const score = typeof dataset?.score === 'number' ? dataset.score : 0;
|
||||||
|
// tags: 支持数组/字符串/对象
|
||||||
|
let tags = '[]';
|
||||||
|
if (Array.isArray(dataset?.tags)) {
|
||||||
|
try {
|
||||||
|
tags = JSON.stringify(dataset.tags);
|
||||||
|
} catch {
|
||||||
|
tags = '[]';
|
||||||
|
}
|
||||||
|
} else if (typeof dataset?.tags === 'string') {
|
||||||
|
tags = dataset.tags;
|
||||||
|
} else if (dataset?.tags && typeof dataset.tags === 'object') {
|
||||||
|
try {
|
||||||
|
tags = JSON.stringify(dataset.tags);
|
||||||
|
} catch {
|
||||||
|
tags = '[]';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// other: 对象或字符串
|
||||||
|
let other = '{}';
|
||||||
|
if (typeof dataset?.other === 'string') {
|
||||||
|
other = dataset.other;
|
||||||
|
} else if (dataset?.other && typeof dataset.other === 'object') {
|
||||||
|
try {
|
||||||
|
other = JSON.stringify(dataset.other);
|
||||||
|
} catch {
|
||||||
|
other = '{}';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
const note = typeof dataset?.note === 'string' ? dataset.note : '';
|
||||||
|
|
||||||
|
// 创建数据集记录
|
||||||
|
const newDataset = await createDataset({
|
||||||
|
projectId,
|
||||||
|
questionId: nanoid(), // 生成唯一的问题ID
|
||||||
|
question: q,
|
||||||
|
answer: a,
|
||||||
|
chunkName,
|
||||||
|
chunkContent,
|
||||||
|
model,
|
||||||
|
questionLabel,
|
||||||
|
cot,
|
||||||
|
confirmed,
|
||||||
|
score,
|
||||||
|
tags,
|
||||||
|
note,
|
||||||
|
other
|
||||||
|
});
|
||||||
|
|
||||||
|
results.push(newDataset);
|
||||||
|
successCount++;
|
||||||
|
} catch (error) {
|
||||||
|
errors.push(`第 ${i + 1} 条记录: ${error.message}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: successCount,
|
||||||
|
total: datasets.length,
|
||||||
|
failed: errors.length,
|
||||||
|
skipped: skippedCount,
|
||||||
|
errors,
|
||||||
|
sourceInfo
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Import datasets error:', error);
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,89 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getDatasetsById, updateDataset } from '@/lib/db/datasets';
|
||||||
|
import { getQuestionById } from '@/lib/db/questions';
|
||||||
|
import { getChunkById } from '@/lib/db/chunks';
|
||||||
|
import LLMClient from '@/lib/llm/core/index';
|
||||||
|
import { getNewAnswerPrompt } from '@/lib/llm/prompts/newAnswer';
|
||||||
|
import { extractJsonFromLLMOutput } from '@/lib/llm/common/util';
|
||||||
|
|
||||||
|
// 优化数据集答案
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
|
||||||
|
// 验证项目ID
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 获取请求体
|
||||||
|
const { datasetId, model, advice, language } = await request.json();
|
||||||
|
|
||||||
|
if (!datasetId) {
|
||||||
|
return NextResponse.json({ error: 'Dataset ID cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!model) {
|
||||||
|
return NextResponse.json({ error: 'Model cannot be empty' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!advice) {
|
||||||
|
return NextResponse.json({ error: 'Please provide optimization suggestions' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 获取数据集内容
|
||||||
|
const dataset = await getDatasetsById(datasetId);
|
||||||
|
if (!dataset) {
|
||||||
|
return NextResponse.json({ error: 'Dataset does not exist' }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 创建LLM客户端
|
||||||
|
const llmClient = new LLMClient(model);
|
||||||
|
|
||||||
|
const { question, answer, cot, chunkContent: storedChunkContent, questionId } = dataset;
|
||||||
|
|
||||||
|
let chunkContent = storedChunkContent || '';
|
||||||
|
|
||||||
|
if (!chunkContent && questionId) {
|
||||||
|
try {
|
||||||
|
const questionRecord = await getQuestionById(questionId);
|
||||||
|
if (questionRecord?.chunkId) {
|
||||||
|
const chunkRecord = await getChunkById(questionRecord.chunkId);
|
||||||
|
chunkContent = chunkRecord?.content || '';
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to load chunk content by questionId:', error);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 生成优化后的答案和思维链
|
||||||
|
const prompt = await getNewAnswerPrompt(language, { question, answer, cot, advice, chunkContent }, projectId);
|
||||||
|
|
||||||
|
const response = await llmClient.getResponse(prompt);
|
||||||
|
|
||||||
|
// 从LLM输出中提取JSON格式的优化结果
|
||||||
|
const optimizedResult = extractJsonFromLLMOutput(response);
|
||||||
|
|
||||||
|
if (!optimizedResult || !optimizedResult.answer) {
|
||||||
|
return NextResponse.json({ error: 'Failed to optimize answer, please try again' }, { status: 500 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 更新数据集
|
||||||
|
const updatedDataset = {
|
||||||
|
...dataset,
|
||||||
|
answer: optimizedResult.answer,
|
||||||
|
cot: cot ? optimizedResult.cot || cot : '' // 如果没有提供思考过程,则不更新
|
||||||
|
};
|
||||||
|
|
||||||
|
await updateDataset(updatedDataset);
|
||||||
|
|
||||||
|
// 返回优化后的数据集
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
dataset: updatedDataset
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to optimize answer:', String(error));
|
||||||
|
return NextResponse.json({ error: error.message || 'Failed to optimize answer' }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
193
easy-dataset-main/app/api/projects/[projectId]/datasets/route.js
Normal file
193
easy-dataset-main/app/api/projects/[projectId]/datasets/route.js
Normal file
@@ -0,0 +1,193 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import {
|
||||||
|
deleteDataset,
|
||||||
|
getDatasetsByPagination,
|
||||||
|
getDatasetsIds,
|
||||||
|
getDatasetsById,
|
||||||
|
updateDataset
|
||||||
|
} from '@/lib/db/datasets';
|
||||||
|
import datasetService from '@/lib/services/datasets';
|
||||||
|
|
||||||
|
// 优化思维链函数已移至服务层
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 生成数据集(为单个问题生成答案)
|
||||||
|
*/
|
||||||
|
export async function POST(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { questionId, model, language } = await request.json();
|
||||||
|
|
||||||
|
// 使用数据集生成服务
|
||||||
|
const result = await datasetService.generateDatasetForQuestion(projectId, questionId, {
|
||||||
|
model,
|
||||||
|
language
|
||||||
|
});
|
||||||
|
|
||||||
|
return NextResponse.json(result);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to generate dataset:', String(error));
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: error.message || 'Failed to generate dataset'
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 获取项目的所有数据集
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
// 验证项目ID
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
const page = parseInt(searchParams.get('page')) || 1;
|
||||||
|
const size = parseInt(searchParams.get('size')) || 10;
|
||||||
|
const input = searchParams.get('input');
|
||||||
|
const field = searchParams.get('field') || 'question';
|
||||||
|
const status = searchParams.get('status');
|
||||||
|
const hasCot = searchParams.get('hasCot');
|
||||||
|
const isDistill = searchParams.get('isDistill');
|
||||||
|
const scoreRange = searchParams.get('scoreRange');
|
||||||
|
const customTag = searchParams.get('customTag');
|
||||||
|
const noteKeyword = searchParams.get('noteKeyword');
|
||||||
|
const chunkName = searchParams.get('chunkName');
|
||||||
|
let confirmed = undefined;
|
||||||
|
if (status === 'confirmed') confirmed = true;
|
||||||
|
if (status === 'unconfirmed') confirmed = false;
|
||||||
|
|
||||||
|
let selectedAll = searchParams.get('selectedAll');
|
||||||
|
|
||||||
|
if (selectedAll) {
|
||||||
|
let data = await getDatasetsIds(
|
||||||
|
projectId,
|
||||||
|
confirmed,
|
||||||
|
input,
|
||||||
|
field,
|
||||||
|
hasCot,
|
||||||
|
isDistill,
|
||||||
|
scoreRange,
|
||||||
|
customTag,
|
||||||
|
noteKeyword,
|
||||||
|
chunkName
|
||||||
|
);
|
||||||
|
return NextResponse.json(data);
|
||||||
|
}
|
||||||
|
|
||||||
|
// 获取数据集
|
||||||
|
const datasets = await getDatasetsByPagination(
|
||||||
|
projectId,
|
||||||
|
page,
|
||||||
|
size,
|
||||||
|
confirmed,
|
||||||
|
input,
|
||||||
|
field, // 传递搜索字段参数
|
||||||
|
hasCot, // 传递思维链筛选参数
|
||||||
|
isDistill, // 传递蒸馏数据集筛选参数
|
||||||
|
scoreRange, // 传递评分范围筛选参数
|
||||||
|
customTag, // 传递自定义标签筛选参数
|
||||||
|
noteKeyword, // 传递备注关键字筛选参数
|
||||||
|
chunkName // 传递文本块名称筛选参数
|
||||||
|
);
|
||||||
|
|
||||||
|
return NextResponse.json(datasets);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('获取数据集失败:', String(error));
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: error.message || '获取数据集失败'
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 删除数据集
|
||||||
|
*/
|
||||||
|
export async function DELETE(request) {
|
||||||
|
try {
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const datasetId = searchParams.get('id');
|
||||||
|
if (!datasetId) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: 'Dataset ID cannot be empty'
|
||||||
|
},
|
||||||
|
{ status: 400 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
await deleteDataset(datasetId);
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
message: 'Dataset deleted successfully'
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to delete dataset:', error);
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: error.message || 'Failed to delete dataset'
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 编辑数据集
|
||||||
|
*/
|
||||||
|
export async function PATCH(request) {
|
||||||
|
try {
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const datasetId = searchParams.get('id');
|
||||||
|
const { answer, cot, question, confirmed } = await request.json();
|
||||||
|
if (!datasetId) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: 'Dataset ID cannot be empty'
|
||||||
|
},
|
||||||
|
{ status: 400 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
// 获取所有数据集
|
||||||
|
let dataset = await getDatasetsById(datasetId);
|
||||||
|
if (!dataset) {
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: 'Dataset does not exist'
|
||||||
|
},
|
||||||
|
{ status: 404 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
let data = { id: datasetId };
|
||||||
|
if (confirmed !== undefined) data.confirmed = confirmed;
|
||||||
|
if (answer) data.answer = answer;
|
||||||
|
if (cot) data.cot = cot;
|
||||||
|
if (question) data.question = question;
|
||||||
|
|
||||||
|
// 保存更新后的数据集列表
|
||||||
|
await updateDataset(data);
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
message: 'Dataset updated successfully',
|
||||||
|
dataset: dataset
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Failed to update dataset:', String(error));
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: error.message || 'Failed to update dataset'
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,28 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
import { getUsedCustomTags } from '@/lib/db/datasets';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 获取项目中使用过的自定义标签
|
||||||
|
*/
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { projectId } = params;
|
||||||
|
|
||||||
|
// 验证项目ID
|
||||||
|
if (!projectId) {
|
||||||
|
return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
const tags = await getUsedCustomTags(projectId);
|
||||||
|
|
||||||
|
return NextResponse.json({ tags });
|
||||||
|
} catch (error) {
|
||||||
|
console.error('获取自定义标签失败:', String(error));
|
||||||
|
return NextResponse.json(
|
||||||
|
{
|
||||||
|
error: error.message || '获取自定义标签失败'
|
||||||
|
},
|
||||||
|
{ status: 500 }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,38 @@
|
|||||||
|
import { NextResponse } from 'next/server';
|
||||||
|
|
||||||
|
// 获取默认提示词内容
|
||||||
|
export async function GET(request, { params }) {
|
||||||
|
try {
|
||||||
|
const { searchParams } = new URL(request.url);
|
||||||
|
const promptType = searchParams.get('promptType');
|
||||||
|
const promptKey = searchParams.get('promptKey');
|
||||||
|
|
||||||
|
if (!promptType || !promptKey) {
|
||||||
|
return NextResponse.json({ error: 'promptType and promptKey are required' }, { status: 400 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 动态导入对应的提示词模块
|
||||||
|
let promptModule;
|
||||||
|
try {
|
||||||
|
promptModule = await import(`@/lib/llm/prompts/${promptType}`);
|
||||||
|
} catch (error) {
|
||||||
|
return NextResponse.json({ error: `Prompt module ${promptType} not found` }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
// 获取指定的提示词常量
|
||||||
|
const promptContent = promptModule[promptKey];
|
||||||
|
if (!promptContent) {
|
||||||
|
return NextResponse.json({ error: `Prompt key ${promptKey} not found in module ${promptType}` }, { status: 404 });
|
||||||
|
}
|
||||||
|
|
||||||
|
return NextResponse.json({
|
||||||
|
success: true,
|
||||||
|
content: promptContent,
|
||||||
|
promptType,
|
||||||
|
promptKey
|
||||||
|
});
|
||||||
|
} catch (error) {
|
||||||
|
console.error('获取默认提示词失败:', error);
|
||||||
|
return NextResponse.json({ error: error.message }, { status: 500 });
|
||||||
|
}
|
||||||
|
}
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user