first-update

2026-03-17 14:36:31 +08:00
parent 72f08aee7c
commit 4eddf05e79
516 changed files with 115270 additions and 1 deletions
--- a/.claude/agents/backend-algorithm-developer.md
+++ b/.claude/agents/backend-algorithm-developer.md
@@ -0,0 +1,79 @@
 ---
 name: backend-algorithm-developer
 description: "Use this agent when you need to develop backend services, implement algorithms, or build system components using Java, Python, or Go. Examples include: designing and implementing RESTful APIs, writing efficient algorithms for data processing, creating microservices, optimizing database queries, or building high-performance server applications."
 model: sonnet
 color: red
 memory: user
 ---
 You are an expert backend algorithm development engineer with deep proficiency in Java, Python, and Go. You specialize in designing and implementing efficient, scalable backend services and solving complex algorithmic problems.
 **Core Responsibilities:**
 - Design and implement robust backend services and APIs
 - Write efficient algorithms optimized for performance and scalability
 - Choose the appropriate language (Java/Python/Go) based on use case requirements
 - Ensure code quality through proper testing and optimization
 - Handle database design, caching, and performance tuning
 **Language-Specific Expertise:**
 - **Java**: Spring Boot, Spring Cloud, Maven/Gradle, concurrency handling, JVM optimization
 - **Python**: FastAPI/Flask/Django, asyncio, data processing libraries, ML integration
 - **Go**: Goroutines, channels, Gin/Echo frameworks, microservices patterns
 **Development Approach:**
 1. Understand requirements thoroughly before writing code
 2. Choose the most appropriate technology stack for the specific use case
 3. Write clean, well-documented, and maintainable code
 4. Implement proper error handling and logging
 5. Consider scalability, performance, and security at every step
 6. Write unit tests and integration tests
 7. Optimize critical code paths using appropriate data structures and algorithms
 **Quality Standards:**
 - Follow language-specific best practices and coding conventions
 - Use appropriate design patterns
 - Implement proper input validation and security measures
 - Ensure code is testable and documented
 - Consider edge cases and failure scenarios
 **When to use each language:**
 - Use **Java** for enterprise-scale applications, complex transaction systems, and when strong typing and ecosystem libraries are needed
 - Use **Python** for rapid prototyping, data processing, ML integration, and scripts
 - Use **Go** for high-concurrency services, microservices, and performance-critical components
 Provide well-structured, production-ready code with clear explanations. Always consider the trade-offs of your technical choices.
 # Persistent Agent Memory
 You have a persistent Persistent Agent Memory directory at `C:\Users\caoxiaozhu\.claude\agent-memory\backend-algorithm-developer\`. This directory already exists — write to it directly with the Write tool (do not run mkdir or check for its existence). Its contents persist across conversations.
 As you work, consult your memory files to build on previous experience. When you encounter a mistake that seems like it could be common, check your Persistent Agent Memory for relevant notes — and if nothing is written yet, record what you learned.
 Guidelines:
 - `MEMORY.md` is always loaded into your system prompt — lines after 200 will be truncated, so keep it concise
 - Create separate topic files (e.g., `debugging.md`, `patterns.md`) for detailed notes and link to them from MEMORY.md
 - Update or remove memories that turn out to be wrong or outdated
 - Organize memory semantically by topic, not chronologically
 - Use the Write and Edit tools to update your memory files
 What to save:
 - Stable patterns and conventions confirmed across multiple interactions
 - Key architectural decisions, important file paths, and project structure
 - User preferences for workflow, tools, and communication style
 - Solutions to recurring problems and debugging insights
 What NOT to save:
 - Session-specific context (current task details, in-progress work, temporary state)
 - Information that might be incomplete — verify against project docs before writing
 - Anything that duplicates or contradicts existing CLAUDE.md instructions
 - Speculative or unverified conclusions from reading a single file
 Explicit user requests:
 - When the user asks you to remember something across sessions (e.g., "always use bun", "never auto-commit"), save it — no need to wait for multiple interactions
 - When the user asks to forget or stop remembering something, find and remove the relevant entries from your memory files
 - When the user corrects you on something you stated from memory, you MUST update or remove the incorrect entry. A correction means the stored memory is wrong — fix it at the source before continuing, so the same mistake does not repeat in future conversations.
 - Since this memory is user-scope, keep learnings general since they apply across all projects
 ## MEMORY.md
 Your MEMORY.md is currently empty. When you notice a pattern worth preserving across sessions, save it here. Anything in MEMORY.md will be included in your system prompt next time.
--- a/.claude/agents/elegant-frontend-designer.md
+++ b/.claude/agents/elegant-frontend-designer.md
@@ -0,0 +1,98 @@
 ---
 name: elegant-frontend-designer
 description: "Use this agent when you need to create elegant, visually stunning front-end designs for products. Examples include: designing a new landing page, creating a component library, improving existing UI/UX, building a design system, or crafting a complete product interface with modern, sophisticated aesthetics."
 model: sonnet
 color: purple
 memory: project
 ---
 You are an elite front-end designer with deep expertise in creating elegant, sophisticated user interfaces. You have mastered the art of combining aesthetics with functionality, understanding that true elegance lies in the balance between visual beauty and seamless user experience.
 **Your Design Philosophy:**
 - Embrace minimalism: Less is more. Every element must serve a purpose.
 - Typography is paramount: Choose fonts that communicate personality while ensuring readability.
 - Color should be intentional: Use restrained palettes with purposeful accent colors.
 -Whitespace is your friend: Generous spacing creates breath and sophistication.
 - Motion should feel natural: Animations should enhance, not distract.
 - Consistency builds trust: A cohesive design system ensures harmony across the product.
 **Technical Expertise:**
 You are proficient in:
 - Modern CSS (Flexbox, Grid, CSS Variables, Subgrid)
 - CSS frameworks (Tailwind CSS, UnoCSS,styled-components)
 - Design systems and component libraries
 - Responsive and mobile-first design
 - Micro-interactions and transitions
 - CSS animations and keyframes
 - Dark mode and theme switching
 - Accessibility standards (WCAG)
 **Design Style References:**
 - Apple's human interface guidelines
 - Material Design 3
 - Minimalist Japanese design aesthetics
 - Swiss design principles
 - Modern neumorphism and glassmorphism (when appropriate)
 - Subtle gradients and frosted glass effects
 **When designing, you will:**
 1. Analyze the requirements and determine the optimal design approach
 2. Choose appropriate color palettes, typography, and spacing systems
 3. Create responsive, mobile-first layouts
 4. Implement elegant micro-interactions and transitions
 5. Ensure accessibility and semantic HTML
 6. Provide clean, well-structured code
 7. Consider performance implications of visual effects
 **Output Format:**
 When presenting designs, provide:
 - Conceptual overview and design rationale
 - Color palette with hex codes
 - Typography choices with font families and sizes
 - Layout structure (can use ASCII or describe flex/grid)
 - Component designs with states
 - Animation specifications
 - Code implementation (HTML/CSS/JS as appropriate)
 **You will proactively ask clarifying questions when:**
 - The target audience or use case is unclear
 - Brand guidelines or existing design language conflict with elegant design suggestions
 - Technical constraints might limit design choices
 - The scope is too broad to provide focused recommendations
 Be confident in your design decisions while remaining open to feedback and iteration.
 # Persistent Agent Memory
 You have a persistent Persistent Agent Memory directory at `D:\Code\Project\YG-Datasets\.claude\agent-memory\elegant-frontend-designer\`. This directory already exists — write to it directly with the Write tool (do not run mkdir or check for its existence). Its contents persist across conversations.
 As you work, consult your memory files to build on previous experience. When you encounter a mistake that seems like it could be common, check your Persistent Agent Memory for relevant notes — and if nothing is written yet, record what you learned.
 Guidelines:
 - `MEMORY.md` is always loaded into your system prompt — lines after 200 will be truncated, so keep it concise
 - Create separate topic files (e.g., `debugging.md`, `patterns.md`) for detailed notes and link to them from MEMORY.md
 - Update or remove memories that turn out to be wrong or outdated
 - Organize memory semantically by topic, not chronologically
 - Use the Write and Edit tools to update your memory files
 What to save:
 - Stable patterns and conventions confirmed across multiple interactions
 - Key architectural decisions, important file paths, and project structure
 - User preferences for workflow, tools, and communication style
 - Solutions to recurring problems and debugging insights
 What NOT to save:
 - Session-specific context (current task details, in-progress work, temporary state)
 - Information that might be incomplete — verify against project docs before writing
 - Anything that duplicates or contradicts existing CLAUDE.md instructions
 - Speculative or unverified conclusions from reading a single file
 Explicit user requests:
 - When the user asks you to remember something across sessions (e.g., "always use bun", "never auto-commit"), save it — no need to wait for multiple interactions
 - When the user asks to forget or stop remembering something, find and remove the relevant entries from your memory files
 - When the user corrects you on something you stated from memory, you MUST update or remove the incorrect entry. A correction means the stored memory is wrong — fix it at the source before continuing, so the same mistake does not repeat in future conversations.
 - Since this memory is project-scope and shared with your team via version control, tailor your memories to this project
 ## MEMORY.md
 Your MEMORY.md is currently empty. When you notice a pattern worth preserving across sessions, save it here. Anything in MEMORY.md will be included in your system prompt next time.
--- a/.claude/agents/robustness-tester-submitter.md
+++ b/.claude/agents/robustness-tester-submitter.md
@@ -0,0 +1,94 @@
 ---
 name: robustness-tester-submitter
 description: "Use this agent when you need to validate code quality before submission, including testing robustness, error handling, edge cases, and submitting code to repositories. Examples:\\n- <example>After writing a new function, use this agent to test boundary conditions, invalid inputs, and error scenarios to ensure the code handles them gracefully.</example>\\n- <example>Before committing code to the repository, use this agent to run comprehensive robustness tests and submit the validated code.</example>\\n- <example>When refactoring code, use this agent to verify the changes don't introduce new vulnerabilities or failure points.</example>"
 tools: Glob, Grep, Read, WebFetch, WebSearch
 model: opus
 color: yellow
 memory: project
 ---
 You are a senior QA engineer and code robustness expert specializing in testing software reliability and handling code submission workflows.
 **Core Responsibilities:**
 1. **Robustness Testing**: Evaluate code for resilience against:
   - Edge cases and boundary conditions
   - Invalid or unexpected inputs
   - Race conditions and concurrency issues
   - Resource exhaustion (memory, CPU, file handles)
   - Network failures and timeouts
   - Error handling completeness
 2. **Code Submission**: Handle the process of committing and pushing code to repositories, including:
   - Running pre-submission checks
   - Creating meaningful commit messages
   - Following repository conventions
   - Handling merge conflicts if needed
 **Testing Methodologies:**
 - **Boundary Value Analysis**: Test at and beyond input limits
 - **Equivalence Partitioning**: Group inputs into valid/invalid partitions
 - **Fault Injection**: Introduce failures to test recovery mechanisms
 - **Stress Testing**: Push code beyond normal operational limits
 - **Negative Testing**: Verify proper handling of invalid scenarios
 **Quality Standards:**
 - All critical paths must have proper error handling
 - Input validation must occur at entry points
 - Resource cleanup must be guaranteed (use defer, finally, etc.)
 - Concurrent code must have proper synchronization
 - External dependencies should have appropriate timeouts and fallbacks
 **Submission Process:**
 1. Run all existing tests to ensure no regressions
 2. Execute robustness test suite
 3. Verify code passes linting and formatting standards
 4. Stage changes with appropriate git commands
 5. Create descriptive commit messages following conventional commits format
 6. Push to remote repository
 **Output Expectations:**
 - Provide detailed test results with pass/fail status
 - Document any robustness issues found with severity levels
 - Suggest specific fixes for identified problems
 - Confirm successful submission with commit hash
 **Update your agent memory** as you discover common robustness patterns, testing strategies, and code submission workflows. Record:
 - Common failure modes in different code patterns
 - Effective test cases that catch edge case bugs
 - Repository-specific submission conventions
 - Successful robustness testing approaches
 # Persistent Agent Memory
 You have a persistent Persistent Agent Memory directory at `D:\Code\Project\YG-Datasets\.claude\agent-memory\robustness-tester-submitter\`. This directory already exists — write to it directly with the Write tool (do not run mkdir or check for its existence). Its contents persist across conversations.
 As you work, consult your memory files to build on previous experience. When you encounter a mistake that seems like it could be common, check your Persistent Agent Memory for relevant notes — and if nothing is written yet, record what you learned.
 Guidelines:
 - `MEMORY.md` is always loaded into your system prompt — lines after 200 will be truncated, so keep it concise
 - Create separate topic files (e.g., `debugging.md`, `patterns.md`) for detailed notes and link to them from MEMORY.md
 - Update or remove memories that turn out to be wrong or outdated
 - Organize memory semantically by topic, not chronologically
 - Use the Write and Edit tools to update your memory files
 What to save:
 - Stable patterns and conventions confirmed across multiple interactions
 - Key architectural decisions, important file paths, and project structure
 - User preferences for workflow, tools, and communication style
 - Solutions to recurring problems and debugging insights
 What NOT to save:
 - Session-specific context (current task details, in-progress work, temporary state)
 - Information that might be incomplete — verify against project docs before writing
 - Anything that duplicates or contradicts existing CLAUDE.md instructions
 - Speculative or unverified conclusions from reading a single file
 Explicit user requests:
 - When the user asks you to remember something across sessions (e.g., "always use bun", "never auto-commit"), save it — no need to wait for multiple interactions
 - When the user asks to forget or stop remembering something, find and remove the relevant entries from your memory files
 - When the user corrects you on something you stated from memory, you MUST update or remove the incorrect entry. A correction means the stored memory is wrong — fix it at the source before continuing, so the same mistake does not repeat in future conversations.
 - Since this memory is project-scope and shared with your team via version control, tailor your memories to this project
 ## MEMORY.md
 Your MEMORY.md is currently empty. When you notice a pattern worth preserving across sessions, save it here. Anything in MEMORY.md will be included in your system prompt next time.
--- a/.claude/agents/ux-ui-requirements-analyst.md
+++ b/.claude/agents/ux-ui-requirements-analyst.md
@@ -0,0 +1,95 @@
 ---
 name: ux-ui-requirements-analyst
 description: "Use this agent when you need to analyze user requirements, evaluate UX/UI design quality, assess interface reasonableness, provide recommendations for improving user experience, or review design consistency and usability in a project."
 tools: Glob, Grep, Read, WebFetch, WebSearch
 model: sonnet
 color: blue
 memory: project
 ---
 You are an expert Requirements Analyst specializing in UX/UI evaluation and interface design analysis. Your role is to help projects thoroughly analyze user requirements, evaluate the quality and reasonableness of UX/UI designs, and provide actionable recommendations for improvement.
 **Your expertise includes:**
 - User experience (UX) analysis and best practices
 - User interface (UI) design principles and standards
 - Interface usability and reasonableness evaluation
 - User requirements gathering and analysis
 - Design consistency and coherence assessment
 - Accessibility considerations (WCAG guidelines)
 - User flow and journey mapping
 - Information architecture evaluation
 **Your approach to analysis:**
 1. Examine the design or requirements from multiple perspectives:
   - Visual hierarchy and layout structure
   - Color scheme, typography, and visual consistency
   - Interactive elements and feedback mechanisms
   - Navigation and information architecture
   - Consistency across different screens/pages
   - Accessibility and inclusivity
   - Overall user satisfaction and task efficiency
 2. For each analysis, identify:
   - Strengths and good practices
   - Issues, pain points, or potential improvements
   - Specific, actionable recommendations
   - Priority of improvements based on user impact
 3. Provide rationale for your recommendations, referencing established UX/UI principles and best practices when possible.
 **When analyzing interface reasonableness:**
 - Evaluate if the interface aligns with user expectations and mental models
 - Check if workflows are intuitive and efficient
 - Assess if error prevention and recovery mechanisms are adequate
 - Verify that key features are easily discoverable
 - Consider the learning curve for new users
 **Important guidelines:**
 - Ask clarifying questions when project context, target users, or business objectives are unclear
 - Consider both user needs and technical feasibility in recommendations
 - Provide concrete examples or references to design patterns when helpful
 - Be constructive and solution-oriented in your feedback
 - When analyzing existing designs, be specific about what works and what doesn't
 **Output format:**
 Structure your analysis clearly with:
 - Summary of findings
 - Strengths identified
 - Issues/areas for improvement (prioritized)
 - Specific recommendations with rationale
 - Optional: Questions for further clarification
 # Persistent Agent Memory
 You have a persistent Persistent Agent Memory directory at `D:\Code\Project\YG-Datasets\.claude\agent-memory\ux-ui-requirements-analyst\`. This directory already exists — write to it directly with the Write tool (do not run mkdir or check for its existence). Its contents persist across conversations.
 As you work, consult your memory files to build on previous experience. When you encounter a mistake that seems like it could be common, check your Persistent Agent Memory for relevant notes — and if nothing is written yet, record what you learned.
 Guidelines:
 - `MEMORY.md` is always loaded into your system prompt — lines after 200 will be truncated, so keep it concise
 - Create separate topic files (e.g., `debugging.md`, `patterns.md`) for detailed notes and link to them from MEMORY.md
 - Update or remove memories that turn out to be wrong or outdated
 - Organize memory semantically by topic, not chronologically
 - Use the Write and Edit tools to update your memory files
 What to save:
 - Stable patterns and conventions confirmed across multiple interactions
 - Key architectural decisions, important file paths, and project structure
 - User preferences for workflow, tools, and communication style
 - Solutions to recurring problems and debugging insights
 What NOT to save:
 - Session-specific context (current task details, in-progress work, temporary state)
 - Information that might be incomplete — verify against project docs before writing
 - Anything that duplicates or contradicts existing CLAUDE.md instructions
 - Speculative or unverified conclusions from reading a single file
 Explicit user requests:
 - When the user asks you to remember something across sessions (e.g., "always use bun", "never auto-commit"), save it — no need to wait for multiple interactions
 - When the user asks to forget or stop remembering something, find and remove the relevant entries from your memory files
 - When the user corrects you on something you stated from memory, you MUST update or remove the incorrect entry. A correction means the stored memory is wrong — fix it at the source before continuing, so the same mistake does not repeat in future conversations.
 - Since this memory is project-scope and shared with your team via version control, tailor your memories to this project
 ## MEMORY.md
 Your MEMORY.md is currently empty. When you notice a pattern worth preserving across sessions, save it here. Anything in MEMORY.md will be included in your system prompt next time.
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,15 @@
 # Node.js
 node_modules/
 npm-debug.log*
 yarn-debug.log*
 yarn-error.log*
 pnpm-debug.log*
 # Package lock files (optional - uncomment if you want to ignore them)
 # package-lock.json
 # yarn.lock
 # pnpm-lock.yaml
 # ---> Python
 # Byte-compiled / optimized / DLL files
 __pycache__/
--- a/README.md
+++ b/README.md
@@ -1,2 +1,62 @@
-# YG-Datasets
+# YG-Dataset 本地启动指南
 ## 快速启动
 ### 1. 安装后端依赖
 ```bash
 cd backend
 pip install -r requirements.txt
 ```
 ### 2. 启动后端
 ```bash
 cd backend
 uvicorn app.main:app --reload --port 8000
 ```
 后端地址: http://localhost:8000
 API 文档: http://localhost:8000/docs
 ### 3. 安装前端依赖
 ```bash
 cd frontend
 npm install
 ```
 ### 4. 启动前端
 ```bash
 npm run dev
 ```
 前端地址: http://localhost:3000
 ---
 ## 目录结构
 ```
 YG-Datasets/
 ├── backend/              # FastAPI 后端
 │   ├── app/
 │   │   ├── api/v1/     # API 路由
 │   │   ├── models/     # 数据库模型
 │   │   └── services/   # 业务逻辑
 │   └── requirements.txt
 ├── frontend/             # Vue 3 前端
 │   ├── src/
 │   │   ├── views/     # 页面
 │   │   └── api/       # API 封装
 │   └── package.json
 └── uploads/             # 上传文件存储目录
 ```
 ## 默认配置
 - 数据库: SQLite (`backend/ygdataset.db`)
 - 上传目录: `backend/uploads/`
 - 后端端口: 8000
 - 前端端口: 3000
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@@ -0,0 +1,27 @@
 FROM python:3.11-slim
 WORKDIR /app
 # Install system dependencies
 RUN apt-get update && apt-get install -y \
    build-essential \
    libpq-dev \
    && rm -rf /var/lib/apt/lists/*
 # Copy requirements
 COPY requirements.txt .
 # Install Python dependencies
 RUN pip install --no-cache-dir -r requirements.txt
 # Copy application
 COPY . .
 # Create uploads directory
 RUN mkdir -p uploads
 # Expose port
 EXPOSE 8000
 # Run application
 CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]
--- a/backend/app/api/init.py
+++ b/backend/app/api/init.py
@@ -0,0 +1,3 @@
 """
 API module initialization
 """
--- a/backend/app/api/v1/init.py
+++ b/backend/app/api/v1/init.py
@@ -0,0 +1,17 @@
 """
 API v1 Router
 """
 from fastapi import APIRouter
 from app.api.v1 import files, projects, chunks, questions, datasets, eval
 api_router = APIRouter()
 # Include sub-routers
 api_router.include_router(projects.router, prefix="/projects", tags=["projects"])
 api_router.include_router(files.router, prefix="/files", tags=["files"])
 api_router.include_router(chunks.router, prefix="/chunks", tags=["chunks"])
 api_router.include_router(questions.router, prefix="/questions", tags=["questions"])
 api_router.include_router(datasets.router, prefix="/datasets", tags=["datasets"])
 api_router.include_router(eval.router, prefix="/eval", tags=["eval"])
--- a/backend/app/api/v1/chunks/init.py
+++ b/backend/app/api/v1/chunks/init.py
@@ -0,0 +1,182 @@
 """
 Chunks API Router
 """
 from typing import List, Optional
 from uuid import UUID
 from pydantic import BaseModel
 from fastapi import APIRouter, Depends, HTTPException, Query
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy import select
 from app.core.database import get_db
 from app.models.models import Chunk, File
 from app.schemas.base import ChunkCreate, ChunkResponse
 from app.services.text_splitter.splitter import get_splitter
 from app.services.file_processor.pdf_processor import process_pdf
 from app.services.file_processor.docx_processor import process_docx
 from app.services.file_processor.excel_processor import process_csv, process_excel
 router = APIRouter()
 class SplitRequest(BaseModel):
    """Request model for splitting text"""
    file_id: Optional[UUID] = None
    method: str = "recursive"
    chunk_size: int = 500
    overlap: int = 50
    separator: Optional[str] = None
 class ChunkListResponse(BaseModel):
    """Response for chunk list"""
    chunks: List[ChunkResponse]
    total: int
 def process_file_by_type(file: File) -> str:
    """Process file based on its type"""
    if not file.file_path:
        raise HTTPException(status_code=400, detail="File path not found")
    processors = {
        "pdf": process_pdf,
        "docx": process_docx,
        "xlsx": process_excel,
        "csv": process_csv,
    }
    processor = processors.get(file.file_type)
    if not processor:
        # Return raw text for txt, md files
        with open(file.file_path, 'r', encoding='utf-8') as f:
            return f.read()
    return processor(file.file_path)
@router.post("/split", response_model=dict)
 async def split_text(
    project_id: UUID,
    request: SplitRequest,
    db: AsyncSession = Depends(get_db)
 ):
    """Split text into chunks"""
    # Get file
    if request.file_id:
        result = await db.execute(
            select(File).where(File.id == request.file_id, File.project_id == project_id)
        )
        file = result.scalar_one_or_none()
        if not file:
            raise HTTPException(status_code=404, detail="File not found")
        # Process file
        text = process_file_by_type(file)
        # Update file status
        file.status = "processing"
        await db.commit()
    else:
        raise HTTPException(status_code=400, detail="file_id is required")
    # Split text
    kwargs = {"chunk_size": request.chunk_size, "overlap": request.overlap}
    if request.method == "custom" and request.separator:
        kwargs["separator"] = request.separator
    splitter = get_splitter(request.method, **kwargs)
    split_results = splitter.split(text)
    # Save chunks
    chunks = []
    for chunk_data in split_results:
        db_chunk = Chunk(
            project_id=project_id,
            file_id=file.id,
            name=chunk_data.get("name", f"Chunk {chunk_data['index'] + 1}"),
            content=chunk_data["content"],
            word_count=chunk_data.get("word_count", len(chunk_data["content"].split()))
        )
        db.add(db_chunk)
        chunks.append(db_chunk)
    await db.commit()
    # Update file status
    file.status = "completed"
    await db.commit()
    return {"chunks": len(chunks), "message": f"Successfully split into {len(chunks)} chunks"}
@router.get("/", response_model=dict)
 async def list_chunks(
    project_id: UUID,
    file_id: Optional[UUID] = Query(None),
    db: AsyncSession = Depends(get_db)
 ):
    """List chunks for a project"""
    query = select(Chunk).where(Chunk.project_id == project_id)
    if file_id:
        query = query.where(Chunk.file_id == file_id)
    query = query.order_by(Chunk.created_at.desc())
    result = await db.execute(query)
    chunks = result.scalars().all()
    return {
        "chunks": [ChunkResponse.model_validate(c) for c in chunks],
        "total": len(chunks)
    }
@router.get("/{chunk_id}", response_model=dict)
 async def get_chunk(project_id: UUID, chunk_id: UUID, db: AsyncSession = Depends(get_db)):
    """Get chunk by ID"""
    result = await db.execute(
        select(Chunk).where(Chunk.id == chunk_id, Chunk.project_id == project_id)
    )
    chunk = result.scalar_one_or_none()
    if not chunk:
        raise HTTPException(status_code=404, detail="Chunk not found")
    return ChunkResponse.model_validate(chunk)
@router.put("/{chunk_id}", response_model=dict)
 async def update_chunk(
    project_id: UUID,
    chunk_id: UUID,
    chunk: ChunkCreate,
    db: AsyncSession = Depends(get_db)
 ):
    """Update chunk"""
    result = await db.execute(
        select(Chunk).where(Chunk.id == chunk_id, Chunk.project_id == project_id)
    )
    db_chunk = result.scalar_one_or_none()
    if not db_chunk:
        raise HTTPException(status_code=404, detail="Chunk not found")
    for key, value in chunk.model_dump(exclude_unset=True).items():
        setattr(db_chunk, key, value)
    await db.commit()
    await db.refresh(db_chunk)
    return ChunkResponse.model_validate(db_chunk)
@router.delete("/{chunk_id}", response_model=dict)
 async def delete_chunk(project_id: UUID, chunk_id: UUID, db: AsyncSession = Depends(get_db)):
    """Delete chunk"""
    result = await db.execute(
        select(Chunk).where(Chunk.id == chunk_id, Chunk.project_id == project_id)
    )
    chunk = result.scalar_one_or_none()
    if not chunk:
        raise HTTPException(status_code=404, detail="Chunk not found")
    await db.delete(chunk)
    await db.commit()
    return {"message": "Chunk deleted successfully"}
--- a/backend/app/api/v1/datasets/init.py
+++ b/backend/app/api/v1/datasets/init.py
@@ -0,0 +1,126 @@
 """
 Datasets API Router
 """
 from typing import List, Optional
 from uuid import UUID
 from pydantic import BaseModel
 from fastapi import APIRouter, Depends, HTTPException, Query
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy import select, func
 from app.core.database import get_db
 from app.models.models import Dataset, Question
 from app.schemas.base import DatasetCreate, DatasetResponse
 router = APIRouter()
 class ExportRequest(BaseModel):
    """Export request schema"""
    format: str = "alpaca"  # alpaca, sharegpt, llama_factory, json
@router.get("/", response_model=dict)
 async def list_datasets(project_id: UUID, db: AsyncSession = Depends(get_db)):
    """List datasets for a project"""
    result = await db.execute(
        select(Dataset).where(Dataset.project_id == project_id).order_by(Dataset.created_at.desc())
    )
    datasets = result.scalars().all()
    # Get question count for each dataset
    dataset_list = []
    for dataset in datasets:
        dataset_data = DatasetResponse.model_validate(dataset)
        # TODO: Count questions in dataset
        dataset_data.question_count = 0
        dataset_list.append(dataset_data)
    return {"datasets": dataset_list}
@router.post("/", response_model=dict)
 async def create_dataset(
    project_id: UUID,
    dataset: DatasetCreate,
    db: AsyncSession = Depends(get_db)
 ):
    """Create a new dataset"""
    db_dataset = Dataset(project_id=project_id, **dataset.model_dump())
    db.add(db_dataset)
    await db.commit()
    await db.refresh(db_dataset)
    return {"id": str(db_dataset.id)}
@router.get("/{dataset_id}", response_model=dict)
 async def get_dataset(
    project_id: UUID,
    dataset_id: UUID,
    db: AsyncSession = Depends(get_db)
 ):
    """Get dataset by ID"""
    result = await db.execute(
        select(Dataset).where(Dataset.id == dataset_id, Dataset.project_id == project_id)
    )
    dataset = result.scalar_one_or_none()
    if not dataset:
        raise HTTPException(status_code=404, detail="Dataset not found")
    return DatasetResponse.model_validate(dataset)
@router.delete("/{dataset_id}", response_model=dict)
 async def delete_dataset(
    project_id: UUID,
    dataset_id: UUID,
    db: AsyncSession = Depends(get_db)
 ):
    """Delete dataset"""
    result = await db.execute(
        select(Dataset).where(Dataset.id == dataset_id, Dataset.project_id == project_id)
    )
    dataset = result.scalar_one_or_none()
    if not dataset:
        raise HTTPException(status_code=404, detail="Dataset not found")
    await db.delete(dataset)
    await db.commit()
    return {"message": "Dataset deleted successfully"}
@router.post("/{dataset_id}/export")
 async def export_dataset(
    project_id: UUID,
    dataset_id: UUID,
    request: ExportRequest,
    db: AsyncSession = Depends(get_db)
 ):
    """Export dataset in specified format"""
    # TODO: Implement actual export logic
    # Get dataset
    result = await db.execute(
        select(Dataset).where(Dataset.id == dataset_id, Dataset.project_id == project_id)
    )
    dataset = result.scalar_one_or_none()
    if not dataset:
        raise HTTPException(status_code=404, detail="Dataset not found")
    # Get questions for this dataset (placeholder)
    # In real implementation, would link questions to datasets
    # Return sample data based on format
    sample_data = [
        {
            "instruction": "这是一个示例指令",
            "input": "",
            "output": "这是一个示例输出"
        }
    ]
    if request.format == "json":
        return sample_data
    return {"data": sample_data, "format": request.format}
--- a/backend/app/api/v1/eval/init.py
+++ b/backend/app/api/v1/eval/init.py
@@ -0,0 +1,100 @@
 """
 Evaluation API Router
 """
 from typing import List, Optional
 from uuid import UUID
 from pydantic import BaseModel
 from fastapi import APIRouter, Depends, HTTPException
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy import select
 from app.core.database import get_db
 from app.models.models import EvalDataset, Task
 from app.schemas.base import EvalDatasetCreate, EvalDatasetResponse, TaskResponse
 router = APIRouter()
 class GenerateEvalRequest(BaseModel):
    """Request for generating evaluation dataset"""
    name: str
    question_type: str = "mixed"
    count: int = 50
 class RunEvalRequest(BaseModel):
    """Request for running evaluation"""
    model_config_id: Optional[UUID] = None
@router.get("/", response_model=dict)
 async def list_eval_datasets(project_id: UUID, db: AsyncSession = Depends(get_db)):
    """List evaluation datasets"""
    result = await db.execute(
        select(EvalDataset).where(EvalDataset.project_id == project_id).order_by(EvalDataset.created_at.desc())
    )
    datasets = result.scalars().all()
    return {"datasets": [EvalDatasetResponse.model_validate(d) for d in datasets]}
@router.post("/", response_model=dict)
 async def create_eval_dataset(
    project_id: UUID,
    request: GenerateEvalRequest,
    db: AsyncSession = Depends(get_db)
 ):
    """Create evaluation dataset"""
    db_dataset = EvalDataset(
        project_id=project_id,
        name=request.name,
        question_type=request.question_type
    )
    db.add(db_dataset)
    await db.commit()
    await db.refresh(db_dataset)
    return {"id": str(db_dataset.id)}
@router.post("/{eval_id}/evaluate", response_model=dict)
 async def run_evaluation(
    project_id: UUID,
    eval_id: UUID,
    request: RunEvalRequest,
    db: AsyncSession = Depends(get_db)
 ):
    """Run evaluation on dataset"""
    # Check dataset exists
    result = await db.execute(
        select(EvalDataset).where(EvalDataset.id == eval_id, EvalDataset.project_id == project_id)
    )
    dataset = result.scalar_one_or_none()
    if not dataset:
        raise HTTPException(status_code=404, detail="Evaluation dataset not found")
    # Create evaluation task
    task = Task(
        project_id=project_id,
        task_type="eval",
        status="pending"
    )
    db.add(task)
    await db.commit()
    await db.refresh(task)
    # TODO: Start evaluation in background
    return {"task_id": str(task.id), "message": "Evaluation task started"}
@router.get("/results", response_model=dict)
 async def get_eval_results(project_id: UUID, task_id: UUID, db: AsyncSession = Depends(get_db)):
    """Get evaluation results"""
    result = await db.execute(
        select(Task).where(Task.id == task_id, Task.project_id == project_id)
    )
    task = result.scalar_one_or_none()
    if not task:
        raise HTTPException(status_code=404, detail="Task not found")
    return TaskResponse.model_validate(task)
--- a/backend/app/api/v1/files/init.py
+++ b/backend/app/api/v1/files/init.py
@@ -0,0 +1,110 @@
 """
 Files API Router
 """
 import os
 import aiofiles
 from pathlib import Path
 from typing import List
 from uuid import UUID
 from fastapi import APIRouter, Depends, HTTPException, UploadFile, File, Form
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy import select
 from app.core.database import get_db
 from app.core.config import get_settings
 from app.models.models import File
 from app.schemas.base import FileResponse
 settings = get_settings()
 router = APIRouter()
 # Ensure upload directory exists
 UPLOAD_DIR = Path(settings.UPLOAD_DIR)
 UPLOAD_DIR.mkdir(parents=True, exist_ok=True)
 def get_file_type(filename: str) -> str:
    """Get file type from extension"""
    ext = filename.rsplit('.', 1)[-1].lower() if '.' in filename else ''
    type_map = {
        'pdf': 'pdf',
        'docx': 'docx',
        'doc': 'docx',
        'xlsx': 'xlsx',
        'xls': 'xlsx',
        'csv': 'csv',
        'epub': 'epub',
        'md': 'md',
        'markdown': 'md',
        'txt': 'txt'
    }
    return type_map.get(ext, 'txt')
@router.post("/upload", response_model=dict)
 async def upload_file(
    project_id: UUID,
    file: UploadFile = File(...),
    db: AsyncSession = Depends(get_db)
 ):
    """Upload a file"""
    # Save file to disk
    file_path = UPLOAD_DIR / f"{project_id}_{file.filename}"
    async with aiofiles.open(file_path, 'wb') as f:
        content = await file.read()
        await f.write(content)
    # Create file record
    db_file = File(
        project_id=project_id,
        filename=file.filename,
        file_type=get_file_type(file.filename),
        file_path=str(file_path),
        size=len(content),
        status="pending"
    )
    db.add(db_file)
    await db.commit()
    await db.refresh(db_file)
    return {"id": str(db_file.id), "filename": db_file.filename, "status": db_file.status}
@router.get("/", response_model=dict)
 async def list_files(project_id: UUID, db: AsyncSession = Depends(get_db)):
    """List files for a project"""
    result = await db.execute(
        select(File).where(File.project_id == project_id).order_by(File.created_at.desc())
    )
    files = result.scalars().all()
    return {"files": [FileResponse.model_validate(f) for f in files]}
@router.get("/{file_id}", response_model=dict)
 async def get_file(project_id: UUID, file_id: UUID, db: AsyncSession = Depends(get_db)):
    """Get file by ID"""
    result = await db.execute(
        select(File).where(File.id == file_id, File.project_id == project_id)
    )
    file = result.scalar_one_or_none()
    if not file:
        raise HTTPException(status_code=404, detail="File not found")
    return FileResponse.model_validate(file)
@router.delete("/{file_id}", response_model=dict)
 async def delete_file(project_id: UUID, file_id: UUID, db: AsyncSession = Depends(get_db)):
    """Delete file"""
    result = await db.execute(
        select(File).where(File.id == file_id, File.project_id == project_id)
    )
    file = result.scalar_one_or_none()
    if not file:
        raise HTTPException(status_code=404, detail="File not found")
    # Delete file from disk
    if file.file_path and os.path.exists(file.file_path):
        os.remove(file.file_path)
    await db.delete(file)
    await db.commit()
    return {"message": "File deleted successfully"}
--- a/backend/app/api/v1/projects/init.py
+++ b/backend/app/api/v1/projects/init.py
@@ -0,0 +1,74 @@
 """
 Projects API Router
 """
 from typing import List
 from uuid import UUID
 from fastapi import APIRouter, Depends, HTTPException
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy import select
 from app.core.database import get_db
 from app.models.models import Project
 from app.schemas.base import (
    ProjectCreate,
    ProjectUpdate,
    ProjectResponse
 )
 router = APIRouter()
@router.get("/", response_model=dict)
 async def list_projects(db: AsyncSession = Depends(get_db)):
    """List all projects"""
    result = await db.execute(select(Project).order_by(Project.created_at.desc()))
    projects = result.scalars().all()
    return {"projects": [ProjectResponse.model_validate(p) for p in projects]}
@router.post("/", response_model=dict)
 async def create_project(project: ProjectCreate, db: AsyncSession = Depends(get_db)):
    """Create a new project"""
    db_project = Project(**project.model_dump())
    db.add(db_project)
    await db.commit()
    await db.refresh(db_project)
    return {"id": str(db_project.id)}
@router.get("/{project_id}", response_model=dict)
 async def get_project(project_id: UUID, db: AsyncSession = Depends(get_db)):
    """Get project by ID"""
    result = await db.execute(select(Project).where(Project.id == project_id))
    project = result.scalar_one_or_none()
    if not project:
        raise HTTPException(status_code=404, detail="Project not found")
    return ProjectResponse.model_validate(project)
@router.put("/{project_id}", response_model=dict)
 async def update_project(project_id: UUID, project: ProjectUpdate, db: AsyncSession = Depends(get_db)):
    """Update project"""
    result = await db.execute(select(Project).where(Project.id == project_id))
    db_project = result.scalar_one_or_none()
    if not db_project:
        raise HTTPException(status_code=404, detail="Project not found")
    for key, value in project.model_dump(exclude_unset=True).items():
        setattr(db_project, key, value)
    await db.commit()
    await db.refresh(db_project)
    return ProjectResponse.model_validate(db_project)
@router.delete("/{project_id}", response_model=dict)
 async def delete_project(project_id: UUID, db: AsyncSession = Depends(get_db)):
    """Delete project"""
    result = await db.execute(select(Project).where(Project.id == project_id))
    project = result.scalar_one_or_none()
    if not project:
        raise HTTPException(status_code=404, detail="Project not found")
    await db.delete(project)
    await db.commit()
    return {"message": "Project deleted successfully"}
--- a/backend/app/api/v1/questions/init.py
+++ b/backend/app/api/v1/questions/init.py
@@ -0,0 +1,122 @@
 """
 Questions API Router
 """
 from typing import List, Optional
 from uuid import UUID
 from pydantic import BaseModel
 from fastapi import APIRouter, Depends, HTTPException, Query
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy import select
 from app.core.database import get_db
 from app.models.models import Question, Chunk
 from app.schemas.base import QuestionCreate, QuestionResponse
 router = APIRouter()
 class GenerateRequest(BaseModel):
    """Request model for generating questions"""
    chunk_ids: List[UUID] = []
    count: int = 5
    question_types: List[str] = ["fact", "summary"]
@router.post("/generate", response_model=dict)
 async def generate_questions(
    project_id: UUID,
    request: GenerateRequest,
    db: AsyncSession = Depends(get_db)
 ):
    """Generate questions from chunks using LLM"""
    # TODO: Implement LLM-based question generation
    # This is a placeholder that creates sample questions
    if not request.chunk_ids:
        raise HTTPException(status_code=400, detail="chunk_ids is required")
    # Get chunks
    result = await db.execute(
        select(Chunk).where(Chunk.id.in_(request.chunk_ids), Chunk.project_id == project_id)
    )
    chunks = result.scalars().all()
    if not chunks:
        raise HTTPException(status_code=404, detail="No chunks found")
    # Create sample questions (placeholder)
    created_questions = []
    for chunk in chunks:
        for i in range(request.count):
            question = Question(
                project_id=project_id,
                chunk_id=chunk.id,
                content=f"这是关于「{chunk.name}」的问题 {i+1}？",
                answer=f"这是问题 {i+1} 的答案。",
                question_type=request.question_types[0] if request.question_types else "fact",
                source="generated"
            )
            db.add(question)
            created_questions.append(question)
    await db.commit()
    return {
        "questions": len(created_questions),
        "message": f"Successfully generated {len(created_questions)} questions"
    }
@router.get("/", response_model=dict)
 async def list_questions(
    project_id: UUID,
    chunk_id: Optional[UUID] = Query(None),
    db: AsyncSession = Depends(get_db)
 ):
    """List questions for a project"""
    query = select(Question).where(Question.project_id == project_id)
    if chunk_id:
        query = query.where(Question.chunk_id == chunk_id)
    result = await db.execute(query)
    questions = result.scalars().all()
    return {"questions": [QuestionResponse.model_validate(q) for q in questions]}
@router.put("/{question_id}", response_model=dict)
 async def update_question(
    project_id: UUID,
    question_id: UUID,
    question: QuestionCreate,
    db: AsyncSession = Depends(get_db)
 ):
    """Update question"""
    result = await db.execute(
        select(Question).where(Question.id == question_id, Question.project_id == project_id)
    )
    db_question = result.scalar_one_or_none()
    if not db_question:
        raise HTTPException(status_code=404, detail="Question not found")
    for key, value in question.model_dump(exclude_unset=True).items():
        setattr(db_question, key, value)
    await db.commit()
    await db.refresh(db_question)
    return QuestionResponse.model_validate(db_question)
@router.delete("/{question_id}", response_model=dict)
 async def delete_question(project_id: UUID, question_id: UUID, db: AsyncSession = Depends(get_db)):
    """Delete question"""
    result = await db.execute(
        select(Question).where(Question.id == question_id, Question.project_id == project_id)
    )
    question = result.scalar_one_or_none()
    if not question:
        raise HTTPException(status_code=404, detail="Question not found")
    await db.delete(question)
    await db.commit()
    return {"message": "Question deleted successfully"}
--- a/backend/app/core/init.py
+++ b/backend/app/core/init.py
@@ -0,0 +1,3 @@
 """
 Core module initialization
 """
--- a/backend/app/core/config.py
+++ b/backend/app/core/config.py
@@ -0,0 +1,49 @@
 """
 Application Configuration
 """
 from functools import lru_cache
 from pydantic_settings import BaseSettings
 from pydantic import Field
 class Settings(BaseSettings):
    """Application settings"""
    # App
    APP_NAME: str = "YG-Dataset"
    DEBUG: bool = True
    HOST: str = "0.0.0.0"
    PORT: int = 8000
    # Database - 使用 SQLite 进行开发/测试
    # 生产环境可切换为 PostgreSQL
    DATABASE_URL: str = Field(
        default="sqlite:///./ygdataset.db",
        description="Database connection URL (sqlite:// or postgresql+asyncpg://)"
    )
    DATABASE_URL_SYNC: str = Field(
        default="sqlite:///./ygdataset.db",
        description="Synchronous database connection URL"
    )
    # Redis
    REDIS_URL: str = "redis://localhost:6379/0"
    # File Storage
    UPLOAD_DIR: str = "./uploads"
    MAX_FILE_SIZE: int = 100 * 1024 * 1024  # 100MB
    # LLM Settings
    DEFAULT_MODEL_PROVIDER: str = "openai"
    DEFAULT_MODEL_NAME: str = "gpt-4o-mini"
    class Config:
        env_file = ".env"
        extra = "allow"
@lru_cache()
 def get_settings() -> Settings:
    """Get cached settings"""
    return Settings()
--- a/backend/app/core/database.py
+++ b/backend/app/core/database.py
@@ -0,0 +1,68 @@
 """
 Database Configuration and Session Management
 支持 SQLite 和 PostgreSQL
 """
 from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine, async_sessionmaker
 from sqlalchemy.orm import DeclarativeBase
 from sqlalchemy import create_engine
 from app.core.config import get_settings
 settings = get_settings()
 def get_engine_config():
    """根据数据库类型返回引擎配置"""
    if settings.DATABASE_URL.startswith("sqlite"):
        return {"echo": settings.DEBUG}
    else:
        return {
            "echo": settings.DEBUG,
            "pool_pre_ping": True,
            "pool_size": 10,
            "max_overflow": 20,
        }
 # Async engine for FastAPI
 async_engine = create_async_engine(
    settings.DATABASE_URL,
    **get_engine_config()
 )
 # Sync engine for migrations
 sync_engine = create_engine(
    settings.DATABASE_URL_SYNC,
    echo=settings.DEBUG,
    pool_pre_ping=True,
 )
 # Async session factory
 AsyncSessionLocal = async_sessionmaker(
    async_engine,
    class_=AsyncSession,
    expire_on_commit=False,
    autocommit=False,
    autoflush=False,
 )
 class Base(DeclarativeBase):
    """Base class for all models"""
    pass
 async def init_db():
    """Initialize database tables"""
    async with async_engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)
 async def get_db() -> AsyncSession:
    """Dependency for getting database session"""
    async with AsyncSessionLocal() as session:
        try:
            yield session
        finally:
            await session.close()
--- a/backend/app/main.py
+++ b/backend/app/main.py
@@ -0,0 +1,58 @@
 """
 YG-Dataset Backend Application
 FastAPI-based API server for dataset generation platform
 """
 from contextlib import asynccontextmanager
 from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware
 from app.api.v1 import api_router
 from app.core.config import settings
 from app.core.database import init_db
@asynccontextmanager
 async def lifespan(app: FastAPI):
    """Application lifespan events"""
    # Startup
    await init_db()
    yield
    # Shutdown
    pass
 app = FastAPI(
    title="YG-Dataset API",
    description="Dataset Generation Platform API",
    version="1.0.0",
    lifespan=lifespan,
 )
 # CORS
 app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
 )
 # Include API routes
 app.include_router(api_router, prefix="/api/v1")
@app.get("/health")
 async def health_check():
    """Health check endpoint"""
    return {"status": "healthy", "version": "1.0.0"}
 if __name__ == "__main__":
    import uvicorn
    uvicorn.run(
        "app.main:app",
        host=settings.HOST,
        port=settings.PORT,
        reload=settings.DEBUG,
    )
--- a/backend/app/models/init.py
+++ b/backend/app/models/init.py
@@ -0,0 +1,3 @@
 """
 Database Models
 """
--- a/backend/app/models/base.py
+++ b/backend/app/models/base.py
@@ -0,0 +1,19 @@
 """
 Base Model with UUID support
 """
 import uuid
 from datetime import datetime
 from sqlalchemy import Column, DateTime
 from sqlalchemy.dialects.postgresql import UUID
 from app.core.database import Base
 class TimestampMixin:
    """Mixin for created_at and updated_at timestamps"""
    created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=False)
 class UUIDMixin:
    """Mixin for UUID primary key"""
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4, index=True)
--- a/backend/app/models/models.py
+++ b/backend/app/models/models.py
@@ -0,0 +1,161 @@
 """
 Database Models for YG-Dataset
 """
 from sqlalchemy import Column, String, Text, Integer, BigInteger, ForeignKey, JSON
 from sqlalchemy.dialects.postgresql import UUID
 from sqlalchemy.orm import relationship
 from app.core.database import Base
 from app.models.base import UUIDMixin, TimestampMixin
 class Project(Base, UUIDMixin, TimestampMixin):
    """Project model"""
    __tablename__ = "projects"
    name = Column(String(255), nullable=False)
    description = Column(Text)
    # Relationships
    files = relationship("File", back_populates="project", cascade="all, delete-orphan")
    chunks = relationship("Chunk", back_populates="project", cascade="all, delete-orphan")
    tags = relationship("Tag", back_populates="project", cascade="all, delete-orphan")
    datasets = relationship("Dataset", back_populates="project", cascade="all, delete-orphan")
    eval_datasets = relationship("EvalDataset", back_populates="project", cascade="all, delete-orphan")
    model_configs = relationship("ModelConfig", back_populates="project", cascade="all, delete-orphan")
    tasks = relationship("Task", back_populates="project", cascade="all, delete-orphan")
 class File(Base, UUIDMixin, TimestampMixin):
    """File model for uploaded documents"""
    __tablename__ = "files"
    project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
    filename = Column(String(255), nullable=False)
    file_type = Column(String(50), nullable=False)  # pdf, docx, xlsx, csv, epub, md, txt
    file_path = Column(String(500))
    size = Column(BigInteger)  # file size in bytes
    status = Column(String(20), default="pending")  # pending, processing, completed, failed
    # Relationships
    project = relationship("Project", back_populates="files")
    chunks = relationship("Chunk", back_populates="file", cascade="all, delete-orphan")
 class Chunk(Base, UUIDMixin, TimestampMixin):
    """Text chunk model after splitting"""
    __tablename__ = "chunks"
    project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
    file_id = Column(UUID(as_uuid=True), ForeignKey("files.id", ondelete="CASCADE"))
    name = Column(String(255))
    content = Column(Text, nullable=False)
    summary = Column(Text)
    word_count = Column(Integer)
    metadata = Column(JSON)  # store additional info like headings, page numbers
    # Relationships
    project = relationship("Project", back_populates="chunks")
    file = relationship("File", back_populates="chunks")
    questions = relationship("Question", back_populates="chunk", cascade="all, delete-orphan")
    chunk_tags = relationship("ChunkTag", back_populates="chunk", cascade="all, delete-orphan")
 class Tag(Base, UUIDMixin, TimestampMixin):
    """Tag/Label model for categorizing content"""
    __tablename__ = "tags"
    project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
    label = Column(String(255), nullable=False)
    parent_id = Column(UUID(as_uuid=True), ForeignKey("tags.id", ondelete="CASCADE"))
    color = Column(String(20))  # hex color code
    # Relationships
    project = relationship("Project", back_populates="tags")
    parent = relationship("Tag", remote_side="Tag.id", back_populates="children")
    children = relationship("Tag", back_populates="parent")
    chunk_tags = relationship("ChunkTag", back_populates="tag")
 class ChunkTag(Base, UUIDMixin):
    """Many-to-many relationship between chunks and tags"""
    __tablename__ = "chunk_tags"
    chunk_id = Column(UUID(as_uuid=True), ForeignKey("chunks.id", ondelete="CASCADE"), nullable=False)
    tag_id = Column(UUID(as_uuid=True), ForeignKey("tags.id", ondelete="CASCADE"), nullable=False)
    # Relationships
    chunk = relationship("Chunk", back_populates="chunk_tags")
    tag = relationship("Tag", back_populates="chunk_tags")
 class Question(Base, UUIDMixin, TimestampMixin):
    """Question/QA pair model"""
    __tablename__ = "questions"
    project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
    chunk_id = Column(UUID(as_uuid=True), ForeignKey("chunks.id", ondelete="CASCADE"))
    content = Column(Text, nullable=False)  # question content
    answer = Column(Text)  # answer content
    question_type = Column(String(50))  # fact, summary, reasoning, etc.
    source = Column(String(50), default="manual")  # manual, generated
    # Relationships
    project = relationship("Project")
    chunk = relationship("Chunk", back_populates="questions")
 class Dataset(Base, UUIDMixin, TimestampMixin):
    """Dataset model"""
    __tablename__ = "datasets"
    project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
    name = Column(String(255), nullable=False)
    description = Column(Text)
    dataset_type = Column(String(50))  # qa, conversation, instruction
    metadata = Column(JSON)
    # Relationships
    project = relationship("Project", back_populates="datasets")
 class EvalDataset(Base, UUIDMixin, TimestampMixin):
    """Evaluation dataset model"""
    __tablename__ = "eval_datasets"
    project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
    name = Column(String(255), nullable=False)
    question_type = Column(String(50))  # mixed, fact, reasoning
    metadata = Column(JSON)
    # Relationships
    project = relationship("Project", back_populates="eval_datasets")
 class ModelConfig(Base, UUIDMixin, TimestampMixin):
    """Model configuration for LLM providers"""
    __tablename__ = "model_configs"
    project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False)
    provider = Column(String(50), nullable=False)  # openai, anthropic, ollama, custom
    model_name = Column(String(100))
    api_key = Column(String(500))
    api_base = Column(String(500))
    is_default = Column(String(10), default="false")
    # Relationships
    project = relationship("Project", back_populates="model_configs")
 class Task(Base, UUIDMixin, TimestampMixin):
    """Task model for background jobs"""
    __tablename__ = "tasks"
    project_id = Column(UUID(as_uuid=True), ForeignKey("projects.id", ondelete="CASCADE"))
    task_type = Column(String(50))  # split, generate, eval, export
    status = Column(String(20), default="pending")  # pending, running, completed, failed
    progress = Column(Integer, default=0)  # 0-100
    result = Column(JSON)
    error = Column(Text)
    # Relationships
    project = relationship("Project", back_populates="tasks")
--- a/backend/app/schemas/init.py
+++ b/backend/app/schemas/init.py
@@ -0,0 +1,3 @@
 """
 Pydantic Schemas
 """
--- a/backend/app/schemas/base.py
+++ b/backend/app/schemas/base.py
@@ -0,0 +1,170 @@
 """
 Base Pydantic schemas
 """
 from datetime import datetime
 from typing import Optional, Any
 from uuid import UUID
 from pydantic import BaseModel, ConfigDict
 class TimestampMixin(BaseModel):
    """Mixin for timestamps"""
    created_at: Optional[datetime] = None
    updated_at: Optional[datetime] = None
 class UUIDMixin(BaseModel):
    """Mixin for UUID"""
    model_config = ConfigDict(from_attributes=True)
    id: UUID
 class ProjectBase(BaseModel):
    """Base project schema"""
    name: str
    description: Optional[str] = None
 class ProjectCreate(ProjectBase):
    """Project create schema"""
    pass
 class ProjectUpdate(ProjectBase):
    """Project update schema"""
    pass
 class ProjectResponse(ProjectBase, UUIDMixin, TimestampMixin):
    """Project response schema"""
    pass
 class FileBase(BaseModel):
    """Base file schema"""
    filename: str
    file_type: str
    size: Optional[int] = None
 class FileResponse(FileBase, UUIDMixin, TimestampMixin):
    """File response schema"""
    status: str
 class ChunkBase(BaseModel):
    """Base chunk schema"""
    name: Optional[str] = None
    content: str
    summary: Optional[str] = None
    word_count: Optional[int] = None
 class ChunkCreate(ChunkBase):
    """Chunk create schema"""
    file_id: Optional[UUID] = None
 class ChunkResponse(ChunkBase, UUIDMixin, TimestampMixin):
    """Chunk response schema"""
    pass
 class QuestionBase(BaseModel):
    """Base question schema"""
    content: str
    answer: Optional[str] = None
    question_type: Optional[str] = None
 class QuestionCreate(QuestionBase):
    """Question create schema"""
    chunk_id: Optional[UUID] = None
 class QuestionResponse(QuestionBase, UUIDMixin, TimestampMixin):
    """Question response schema"""
    source: str
 class DatasetBase(BaseModel):
    """Base dataset schema"""
    name: str
    description: Optional[str] = None
    dataset_type: Optional[str] = None
 class DatasetCreate(DatasetBase):
    """Dataset create schema"""
    pass
 class DatasetResponse(DatasetBase, UUIDMixin, TimestampMixin):
    """Dataset response schema"""
    question_count: Optional[int] = None
 class EvalDatasetBase(BaseModel):
    """Base eval dataset schema"""
    name: str
    question_type: Optional[str] = None
 class EvalDatasetCreate(EvalDatasetBase):
    """Eval dataset create schema"""
    pass
 class EvalDatasetResponse(EvalDatasetBase, UUIDMixin, TimestampMixin):
    """Eval dataset response schema"""
    pass
 class TagBase(BaseModel):
    """Base tag schema"""
    label: str
    parent_id: Optional[UUID] = None
    color: Optional[str] = None
 class TagCreate(TagBase):
    """Tag create schema"""
    pass
 class TagResponse(TagBase, UUIDMixin, TimestampMixin):
    """Tag response schema"""
    pass
 class ModelConfigBase(BaseModel):
    """Base model config schema"""
    provider: str
    model_name: Optional[str] = None
    api_key: Optional[str] = None
    api_base: Optional[str] = None
    is_default: Optional[str] = "false"
 class ModelConfigCreate(ModelConfigBase):
    """Model config create schema"""
    pass
 class ModelConfigResponse(ModelConfigBase, UUIDMixin, TimestampMixin):
    """Model config response schema"""
    pass
 class TaskBase(BaseModel):
    """Base task schema"""
    task_type: str
    status: Optional[str] = "pending"
    progress: Optional[int] = 0
 class TaskResponse(TaskBase, UUIDMixin, TimestampMixin):
    """Task response schema"""
    result: Optional[Any] = None
    error: Optional[str] = None
--- a/backend/app/services/init.py
+++ b/backend/app/services/init.py
@@ -0,0 +1,3 @@
 """
 Services module
 """
--- a/backend/app/services/file_processor/init.py
+++ b/backend/app/services/file_processor/init.py
@@ -0,0 +1,3 @@
 """
 File Processing Services
 """
--- a/backend/app/services/file_processor/docx_processor.py
+++ b/backend/app/services/file_processor/docx_processor.py
@@ -0,0 +1,53 @@
 """
 DOCX Text Extractor
 """
 from docx import Document
 from typing import Dict, List
 class DOCXProcessor:
    """Extract text from DOCX files"""
    def extract_text(self, file_path: str) -> str:
        """Extract all text from DOCX"""
        doc = Document(file_path)
        text_parts = []
        for para in doc.paragraphs:
            if para.text.strip():
                text_parts.append(para.text)
        # Also extract text from tables
        for table in doc.tables:
            for row in table.rows:
                for cell in row.cells:
                    if cell.text.strip():
                        text_parts.append(cell.text)
        return "\n\n".join(text_parts)
    def extract_with_metadata(self, file_path: str) -> Dict:
        """Extract text with DOCX metadata"""
        doc = Document(file_path)
        result = {
            "text": self.extract_text(file_path),
            "paragraphs": len(doc.paragraphs),
            "tables": len(doc.tables),
            "sections": len(doc.sections),
            "metadata": {
                "author": doc.core_properties.author,
                "title": doc.core_properties.title,
                "subject": doc.core_properties.subject,
                "created": doc.core_properties.created,
                "modified": doc.core_properties.modified
            }
        }
        return result
 def process_docx(file_path: str) -> str:
    """Process DOCX file and return text"""
    processor = DOCXProcessor()
    return processor.extract_text(file_path)
--- a/backend/app/services/file_processor/excel_processor.py
+++ b/backend/app/services/file_processor/excel_processor.py
@@ -0,0 +1,66 @@
 """
 Excel/CSV Text Extractor
 """
 import pandas as pd
 from typing import Dict, List
 class ExcelProcessor:
    """Extract text from Excel and CSV files"""
    def extract_csv(self, file_path: str) -> str:
        """Extract text from CSV file"""
        df = pd.read_csv(file_path)
        return self._dataframe_to_text(df)
    def extract_excel(self, file_path: str, sheet_name: str = None) -> str:
        """Extract text from Excel file"""
        if sheet_name:
            df = pd.read_excel(file_path, sheet_name=sheet_name)
            return self._dataframe_to_text(df)
        else:
            # Read all sheets
            sheets = pd.read_excel(file_path, sheet_name=None)
            text_parts = []
            for sheet_name, df in sheets.items():
                text_parts.append(f"=== Sheet: {sheet_name} ===\n")
                text_parts.append(self._dataframe_to_text(df))
            return "\n\n".join(text_parts)
    def _dataframe_to_text(self, df: pd.DataFrame) -> str:
        """Convert DataFrame to readable text"""
        text_parts = []
        # Add column headers
        if not df.empty:
            text_parts.append(" | ".join(str(col) for col in df.columns))
            text_parts.append("-" * len(text_parts[-1]))
            # Add rows
            for _, row in df.iterrows():
                row_text = " | ".join(str(val) for val in row.values)
                text_parts.append(row_text)
        return "\n".join(text_parts)
    def extract_all_sheets(self, file_path: str) -> Dict[str, str]:
        """Extract all sheets from Excel file"""
        sheets = pd.read_excel(file_path, sheet_name=None)
        return {name: self._dataframe_to_text(df) for name, df in sheets.items()}
    def get_sheet_names(self, file_path: str) -> List[str]:
        """Get all sheet names from Excel file"""
        xl = pd.ExcelFile(file_path)
        return xl.sheet_names
 def process_csv(file_path: str) -> str:
    """Process CSV file and return text"""
    processor = ExcelProcessor()
    return processor.extract_csv(file_path)
 def process_excel(file_path: str) -> str:
    """Process Excel file and return text"""
    processor = ExcelProcessor()
    return processor.extract_excel(file_path)
--- a/backend/app/services/file_processor/pdf_processor.py
+++ b/backend/app/services/file_processor/pdf_processor.py
@@ -0,0 +1,65 @@
 """
 PDF Text Extractor
 """
 import pdfplumber
 from typing import Dict, List, Optional
 class PDFProcessor:
    """Extract text from PDF files"""
    def extract_text(self, file_path: str) -> str:
        """Extract all text from PDF"""
        text_parts = []
        with pdfplumber.open(file_path) as pdf:
            for page_num, page in enumerate(pdf.pages, 1):
                text = page.extract_text()
                if text:
                    text_parts.append(f"--- Page {page_num} ---\n{text}")
        return "\n\n".join(text_parts)
    def extract_pages(self, file_path: str) -> List[Dict]:
        """Extract text page by page with metadata"""
        pages = []
        with pdfplumber.open(file_path) as pdf:
            for page_num, page in enumerate(pdf.pages, 1):
                text = page.extract_text()
                if text:
                    pages.append({
                        "page_number": page_num,
                        "text": text.strip(),
                        "word_count": len(text.split())
                    })
        return pages
    def extract_with_metadata(self, file_path: str) -> Dict:
        """Extract text with PDF metadata"""
        result = {
            "text": "",
            "pages": [],
            "metadata": {}
        }
        with pdfplumber.open(file_path) as pdf:
            # Get metadata
            result["metadata"] = {
                "page_count": len(pdf.pages),
                "metadata": pdf.metadata
            }
            # Extract pages
            pages = self.extract_pages(file_path)
            result["pages"] = pages
            result["text"] = "\n\n".join([p["text"] for p in pages])
        return result
 def process_pdf(file_path: str) -> str:
    """Process PDF file and return text"""
    processor = PDFProcessor()
    return processor.extract_with_metadata(file_path)["text"]
--- a/backend/app/services/text_splitter/init.py
+++ b/backend/app/services/text_splitter/init.py
@@ -0,0 +1,3 @@
 """
 Text Splitter Services
 """
--- a/backend/app/services/text_splitter/splitter.py
+++ b/backend/app/services/text_splitter/splitter.py
@@ -0,0 +1,248 @@
 """
 Text Splitter
 """
 import re
 from typing import List, Dict, Optional
 class TextSplitter:
    """Base text splitter"""
    def __init__(self, chunk_size: int = 500, overlap: int = 50):
        self.chunk_size = chunk_size
        self.overlap = overlap
    def split(self, text: str) -> List[Dict]:
        """Split text into chunks"""
        raise NotImplementedError
 class RecursiveTextSplitter(TextSplitter):
    """Recursive character text splitter"""
    def __init__(self, chunk_size: int = 500, overlap: int = 50, separators: List[str] = None):
        super().__init__(chunk_size, overlap)
        self.separators = separators or ["\n\n", "\n", ". ", " ", ""]
    def split(self, text: str) -> List[Dict]:
        """Split text recursively"""
        chunks = []
        current_chunk = ""
        chunk_index = 0
        for separator in self.separators:
            if separator in text:
                parts = text.split(separator)
                for part in parts:
                    if len(current_chunk) + len(part) > self.chunk_size:
                        if current_chunk:
                            chunks.append({
                                "index": chunk_index,
                                "content": current_chunk.strip(),
                                "word_count": len(current_chunk.split())
                            })
                            chunk_index += 1
                            # Handle overlap
                            if self.overlap > 0 and chunks:
                                overlap_text = " ".join(chunks[-1]["content"].split()[-self.overlap:])
                                current_chunk = overlap_text + separator + part
                            else:
                                current_chunk = part
                    else:
                        current_chunk += separator + part if current_chunk else part
                if current_chunk:
                    chunks.append({
                        "index": chunk_index,
                        "content": current_chunk.strip(),
                        "word_count": len(current_chunk.split())
                    })
                break
            else:
                continue
        return chunks
 class MarkdownStructureSplitter(TextSplitter):
    """Split text based on Markdown structure (headings)"""
    def __init__(self, chunk_size: int = 2000, overlap: int = 100):
        super().__init__(chunk_size, overlap)
    def split(self, text: str) -> List[Dict]:
        """Split text by Markdown headings"""
        # Find all heading patterns
        heading_pattern = r'^(#{1,6})\s+(.+)$'
        lines = text.split('\n')
        chunks = []
        current_chunk = ""
        current_heading = "文档开头"
        chunk_index = 0
        for line in lines:
            heading_match = re.match(heading_pattern, line.strip())
            if heading_match:
                # Save previous chunk if exists
                if current_chunk.strip():
                    chunks.append({
                        "index": chunk_index,
                        "name": current_heading,
                        "content": current_chunk.strip(),
                        "word_count": len(current_chunk.split())
                    })
                    chunk_index += 1
                current_heading = heading_match.group(2).strip()
                current_chunk = line + "\n"
            else:
                # Check chunk size
                if len(current_chunk) > self.chunk_size:
                    chunks.append({
                        "index": chunk_index,
                        "name": current_heading,
                        "content": current_chunk.strip(),
                        "word_count": len(current_chunk.split())
                    })
                    chunk_index += 1
                    # Handle overlap
                    if self.overlap > 0:
                        overlap_lines = current_chunk.split('\n')[-self.overlap:]
                        current_chunk = '\n'.join(overlap_lines) + '\n'
                    else:
                        current_chunk = ""
                current_chunk += line + "\n"
        # Add last chunk
        if current_chunk.strip():
            chunks.append({
                "index": chunk_index,
                "name": current_heading,
                "content": current_chunk.strip(),
                "word_count": len(current_chunk.split())
            })
        return chunks
 class TokenSplitter(TextSplitter):
    """Split text by token count"""
    def __init__(self, chunk_size: int = 500, overlap: int = 50):
        super().__init__(chunk_size, overlap)
    def split(self, text: str) -> List[Dict]:
        """Split text by approximate token count"""
        words = text.split()
        chunks = []
        chunk_index = 0
        for i in range(0, len(words), self.chunk_size - self.overlap):
            chunk_words = words[i:i + self.chunk_size]
            chunk_text = " ".join(chunk_words)
            chunks.append({
                "index": chunk_index,
                "content": chunk_text,
                "word_count": len(chunk_words),
                "token_estimate": len(chunk_words) * 1.3  # rough token estimate
            })
            chunk_index += 1
        return chunks
 class CodeSplitter(TextSplitter):
    """Split text with code awareness"""
    def __init__(self, chunk_size: int = 500, overlap: int = 50):
        super().__init__(chunk_size, overlap)
    def split(self, text: str) -> List[Dict]:
        """Split text preserving code blocks"""
        # Split by code blocks first
        code_pattern = r'```[\s\S]*?```'
        parts = re.split(code_pattern, text)
        chunks = []
        chunk_index = 0
        current_chunk = ""
        for part in parts:
            if len(current_chunk) + len(part) > self.chunk_size:
                if current_chunk.strip():
                    chunks.append({
                        "index": chunk_index,
                        "content": current_chunk.strip(),
                        "word_count": len(current_chunk.split())
                    })
                    chunk_index += 1
                current_chunk = part
            else:
                current_chunk += part
        if current_chunk.strip():
            chunks.append({
                "index": chunk_index,
                "content": current_chunk.strip(),
                "word_count": len(current_chunk.split())
            })
        return chunks
 class CustomSplitter(TextSplitter):
    """Custom separator splitter"""
    def __init__(self, separator: str = "\n\n", chunk_size: int = 500):
        super().__init__(chunk_size, 0)
        self.separator = separator
    def split(self, text: str) -> List[Dict]:
        """Split by custom separator"""
        parts = text.split(self.separator)
        chunks = []
        current_chunk = ""
        chunk_index = 0
        for part in parts:
            if len(current_chunk) + len(part) > self.chunk_size:
                if current_chunk.strip():
                    chunks.append({
                        "index": chunk_index,
                        "content": current_chunk.strip(),
                        "word_count": len(current_chunk.split())
                    })
                    chunk_index += 1
                current_chunk = part
            else:
                current_chunk += self.separator + part if current_chunk else part
        if current_chunk.strip():
            chunks.append({
                "index": chunk_index,
                "content": current_chunk.strip(),
                "word_count": len(current_chunk.split())
            })
        return chunks
 def get_splitter(method: str, **kwargs) -> TextSplitter:
    """Get text splitter by method name"""
    splitters = {
        "recursive": RecursiveTextSplitter,
        "markdown_structure": MarkdownStructureSplitter,
        "token": TokenSplitter,
        "code": CodeSplitter,
        "custom": CustomSplitter
    }
    splitter_class = splitters.get(method, RecursiveTextSplitter)
    return splitter_class(**kwargs)
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -0,0 +1,37 @@
 # FastAPI
 fastapi>=0.115.0
 uvicorn[standard]>=0.30.0
 python-multipart>=0.0.9
 # Database - SQLite (默认), PostgreSQL 可选
 sqlalchemy>=2.0.0
 alembic>=1.13.0
 # asyncpg>=0.29.0      # PostgreSQL 异步驱动（生产环境使用）
 # psycopg2-binary>=2.9.9  # PostgreSQL 同步驱动
 # Pydantic
 pydantic>=2.0.0
 pydantic-settings>=2.0.0
 # Redis - 可选，用于缓存/队列（开发环境可省略）
 # redis>=5.0.0
 # File Processing
 pdfplumber>=0.10.4
 python-docx>=1.1.0
 openpyxl>=3.1.2
 pandas>=2.2.0
 ebooklib>=0.5
 PyMuPDF>=1.24.0
 # LLM & Text
 langchain>=0.3.0
 langchain-community>=0.2.0
 langchain-openai>=0.1.0
 tiktoken>=0.7.0
 python-dotenv>=1.0.0
 # Utils
 python-dateutil>=2.8.2
 httpx>=0.27.0
 aiofiles>=23.2.1
--- a/bug修改.md
+++ b/bug修改.md
@@ -0,0 +1,20 @@
 # Bug 修改记录
 ## 2026-03-17
 ### 初始项目创建
 - 创建 YG-Dataset 重构项目
 - 搭建 FastAPI + Vue 3 基础架构
 ---
 ## 修复记录格式
 ### 日期
 **问题描述:**
 **原因:**
 **修复方案:**
 ---
 *持续更新中...*
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -0,0 +1,52 @@
 version: '3.8'
 services:
  # FastAPI 后端 (SQLite 数据库，随项目文件存储)
  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile
    container_name: ygdataset-backend
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=sqlite:///./ygdataset.db
      - DEBUG=true
    volumes:
      - ./backend:/app
      - uploads:/app/uploads
    restart: unless-stopped
  # Vue 前端
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile
    container_name: ygdataset-frontend
    ports:
      - "3000:80"
    volumes:
      - ./frontend:/app
      - /app/node_modules
    depends_on:
      - backend
    restart: unless-stopped
 volumes:
  uploads:
 # 如需 PostgreSQL，取消注释以下配置：
 # services:
 #   postgres:
 #     image: postgres:15
 #     environment:
 #       POSTGRES_USER: ygdataset
 #       POSTGRES_PASSWORD: your_password
 #       POSTGRES_DB: ygdataset
 #     ports:
 #       - "5432:5432"
 #     volumes:
 #       - postgres_data:/var/lib/postgresql/data
 # volumes:
 #   postgres_data:
--- a/easy-dataset-main-架构分析报告.md
+++ b/easy-dataset-main-架构分析报告.md
@@ -0,0 +1,306 @@
 # Easy Dataset 项目架构分析报告
 ## 一、项目概述
 **Easy Dataset** 是一个功能强大的大模型微调数据集创建工具，由 ConardLi 开发维护。该应用提供直观的界面和强大的内置文档解析、智能分割、数据清洗和增强功能，可将各种格式的领域文档转换为高质量的结构化数据集，适用于模型微调、RAG（检索增强生成）和模型性能评估等场景。
 **项目地址**: https://github.com/ConardLi/easy-dataset
 **当前版本**: 1.7.2
 **许可证**: AGPL 3.0
 ---
 ## 二、技术栈分析
 ### 2.1 核心框架
 | 类别 | 技术选型 | 说明 |
 |------|----------|------|
 | 前端框架 | Next.js 14 | App Router 架构 |
 | UI 框架 | Material-UI (MUI) | v5.16.14 |
 | 状态管理 | Jotai | 轻量级原子化状态管理 |
 | 数据库 | Prisma + SQLite | 使用 Prisma ORM |
 | 开发语言 | JavaScript | 全栈 JavaScript |
 ### 2.2 关键依赖
 | 类别 | 库名称 | 用途 |
 |------|--------|------|
 | AI/ML | ai SDK, langchain | 大模型集成 |
 | LLM 提供商 | @ai-sdk/openai, ollama-ai-provider, zhipu-ai-provider | 多模型支持 |
 | 国际化 | i18next, react-i18next | 多语言支持 |
 | 文档处理 | @opendocsg/pdf2md, mammoth, pdf2md-js | PDF/DOCX 解析 |
 | 桌面应用 | Electron | 跨平台桌面客户端 |
 | 数据处理 | xlsx, adm-zip, jszip | 文件处理 |
 ### 2.3 开发工具
 - **包管理器**: pnpm
 - **代码规范**: ESLint + Prettier
 - **Git Hooks**: Husky + lint-staged
 - **构建工具**: electron-builder (桌面应用打包)
 ---
 ## 三、目录结构
 ```
 easy-dataset-main/
 ├── app/                          # Next.js 应用目录 (App Router)
 │   ├── api/                      # API 路由 (150+ 个路由)
 │   │   ├── check-update/         # 版本检查
 │   │   ├── llm/                  # LLM 模型相关 API
 │   │   │   ├── fetch-models/     # 获取模型列表
 │   │   │   ├── model/            # 模型配置
 │   │   │   ├── ollama/           # Ollama 本地模型
 │   │   │   └── providers/        # LLM 提供商
 │   │   ├── monitoring/           # 监控 API
 │   │   │   ├── logs/             # 日志
 │   │   │   ├── stats/            # 统计
 │   │   │   └── summary/          # 摘要
 │   │   └── projects/             # 项目相关 API
 │   │       └── [projectId]/      # 动态项目路由
 │   │           ├── chunks/       # 文本分块
 │   │           ├── datasets/     # 数据集
 │   │           ├── eval-datasets/ # 评估数据集
 │   │           ├── eval-tasks/    # 评估任务
 │   │           ├── files/         # 文件管理
 │   │           ├── images/        # 图片处理
 │   │           ├── questions/     # 问题生成
 │   │           ├── distill/       # 数据蒸馏
 │   │           ├── blind-test-tasks/ # 盲测任务
 │   │           ├── playground/    # 模型测试场
 │   │           └── ...
 │   └── (页面路由)
 ├── components/                   # React 组件 (100+ 组件)
 │   ├── common/                   # 通用组件
 │   ├── home/                     # 首页组件
 │   ├── Navbar/                   # 导航栏
 │   ├── dataset-square/           # 数据集广场
 │   ├── datasets/                 # 数据集组件
 │   ├── distill/                  # 数据蒸馏组件
 │   ├── export/                   # 导出组件
 │   ├── questions/                # 问题组件
 │   ├── text-split/               # 文本分割组件
 │   ├── tasks/                    # 任务管理组件
 │   ├── playground/               # 测试场组件
 │   └── settings/                 # 设置组件
 ├── prisma/                       # 数据库 schema
 │   ├── schema.prisma             # Prisma 数据模型
 │   ├── sql.json                  # SQL 模板
 │   └── generate-template.js      # 模板生成
 ├── locales/                      # 国际化资源
 │   ├── en/                      # 英文
 │   ├── zh-CN/                   # 简体中文
 │   └── pt-BR/                   # 葡萄牙语
 ├── electron/                     # Electron 桌面应用
 │   ├── main.js                  # 主进程
 │   └── preload.js               # 预加载脚本
 ├── public/                       # 静态资源
 ├── desktop/                      # 桌面端入口
 └── package.json                  # 项目配置
 ```
 ---
 ## 四、核心模块设计
 ### 4.1 数据模型 (Prisma Schema)
 项目使用 Prisma ORM 管理数据，主要数据模型包括：
 - **Project**: 项目
 - **File**: 上传的文件
 - **Chunk**: 文本分块
 - **Question**: 生成的问题
 - **Dataset**: 微调数据集
 - **EvalDataset**: 评估数据集
 - **EvalTask**: 评估任务
 - **BlindTestTask**: 盲测任务
 - **ModelConfig**: 模型配置
 - **Tag**: 标签
 - **Conversation**: 对话记录
 - **Image**: 图片数据
 - **Task**: 后台任务
 ### 4.2 核心功能模块
 #### 4.2.1 文档处理模块 (Text Split)
 - 支持 PDF、Markdown、DOCX、TXT、EPUB 格式
 - 多种分割算法：Markdown结构、递归分隔符、固定长度、代码感知分块
 - 目录结构提取
 - PDF 转 Markdown
 #### 4.2.2 问题生成模块 (Question Generation)
 - 自动从文本片段提取相关问题
 - 问题模板管理
 - 批量生成
 - 标签树自动构建
 #### 4.2.3 数据集生成模块 (Dataset Generation)
 - 单轮问答数据集
 - 多轮对话数据集
 - 图片问答数据集
 - 数据蒸馏（无需上传文档）
 #### 4.2.4 评估模块 (Evaluation)
 - 评估数据集生成（判断题、单选、多选、简答、开放题）
 - 自动化模型评估（Judge Model）
 - 人类盲测系统（Arena）
 - AI 质量评估
 #### 4.2.5 LLM 集成模块
 支持的模型提供商：
 - OpenAI
 - Ollama (本地模型)
 - 智谱 AI
 - 阿里百炼
 - OpenRouter
 - Google Gemini
 - Anthropic Claude
 ---
 ## 五、API 架构
 ### 5.1 API 设计原则
 - RESTful 风格路由
 - 基于 Next.js App Router 的 Route Handlers
 - 使用 Zod 进行请求/响应验证
 ### 5.2 主要 API 分组
 | API 分组 | 路由前缀 | 功能 |
 |----------|----------|------|
 | 项目管理 | `/api/projects` | 项目 CRUD |
 | 文件管理 | `/api/projects/[id]/files` | 文件上传/处理 |
 | 文本分块 | `/api/projects/[id]/chunks` | 文本分割 |
 | 问题生成 | `/api/projects/[id]/questions` | 问题生成/管理 |
 | 数据集 | `/api/projects/[id]/datasets` | 数据集管理 |
 | 评估 | `/api/projects/[id]/eval-*` | 评估相关 |
 | 盲测 | `/api/projects/[id]/blind-test-tasks` | 盲测系统 |
 | LLM | `/api/llm/*` | 模型配置/调用 |
 | 监控 | `/api/monitoring/*` | 日志/统计 |
 ---
 ## 六、前端架构
 ### 6.1 组件设计模式
 - **Jotai 状态管理**: 使用原子化状态管理，便于细粒度更新
 - **MUI 组件库**: 统一的 UI 组件
 - **Framer Motion**: 动画效果
 ### 6.2 主要页面
 1. **首页** (`/`): 项目列表、创建项目、统计卡片
 2. **项目页** (`/projects/[id]`):
   - 文本分割 (`/text-split`)
   - 问题列表 (`/questions`)
   - 数据集 (`/datasets`)
   - 评估 (`/eval-datasets`)
   - 盲测 Arena (`/arena`)
   - 设置 (`/settings`)
 3. **模型测试场** (`/playground`)
 4. **数据集广场** (`/datasets-square`)
 ---
 ## 七、部署架构
 ### 7.1 多平台支持
 - **Web 应用**: Next.js 生产构建
 - **桌面应用**: Electron
  - Windows (NSIS 安装包)
  - macOS (DMG)
  - Linux (AppImage)
 - **Docker**: 支持 Docker 部署
 ### 7.2 开发命令
 ```bash
 # 开发
 pnpm dev              # 启动开发服务器 (端口 1717)
 # 构建
 pnpm build            # 构建 Next.js 生产版本
 pnpm electron-build   # 构建桌面应用
 # 数据库
 pnpm db:push          # 推送 schema 到数据库
 pnpm db:studio        # 打开 Prisma Studio
 ```
 ---
 ## 八、数据流设计
 ### 8.1 核心业务流程
 ```
 ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
 │  上传文档   │ -> │  文本分割   │ -> │  问题生成   │ -> │ 数据集生成  │
 └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
       │                  │                  │                  │
   PDF/DOCX          Chunk              Question          Dataset
   Markdown          目录结构           标签树            导出格式
 ```
 ### 8.2 评估流程
 ```
 ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
 │ 评估数据集  │ -> │ 评估任务    │ -> │ 模型评估    │ -> │ 结果分析    │
 └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
   生成题目      批量处理          Judge Model         Arena盲测
 ```
 ---
 ## 九、国际化
 - **技术选型**: i18next + react-i18next
 - **支持语言**:
  - 英文 (en)
  - 简体中文 (zh-CN)
  - 土耳其语 (tr)
  - 葡萄牙语 (pt-BR)
 - **语言检测**: i18next-browser-languagedetector
 ---
 ## 十、特性亮点
 1. **智能文档处理**: 支持多种格式，智能识别
 2. **多种分割算法**: 灵活适应不同文档结构
 3. **自动标签树**: 基于文档结构智能构建
 4. **多类型数据集**: 单轮问答、多轮对话、图片问答
 5. **完整评估体系**: 自动化评估 + 人类盲测
 6. **多模型支持**: 兼容 OpenAI 格式的所有 API
 7. **一键导出**: 支持多种格式和 LLaMA Factory 集成
 8. **桌面客户端**: 跨平台支持
 ---
 ## 十一、扩展方向
 根据项目发展路线，未来可能扩展的方向包括：
 1. 更多文件格式支持
 2. 数据集版本管理
 3. 团队协作功能
 4. 更多导出格式
 5. 更强大的数据分析功能
 ---
 *报告生成时间: 2026-03-17*
 *基于 easy-dataset-main 项目源码分析*
--- a/easy-dataset-main/.dockerignore
+++ b/easy-dataset-main/.dockerignore
@@ -0,0 +1,16 @@
 node_modules
 .next
 .git
 .github
 README.md
 README.zh-CN.md
 .gitignore
 .env.local
 .env.development.local
 .env.test.local
 .env.production.local
 /test
 /local-db
 /video
 /prisma/*.sqlite
 /prisma/*.sqlite-*
--- a/easy-dataset-main/.gitattributes
+++ b/easy-dataset-main/.gitattributes
@@ -0,0 +1,6 @@
 # Ensure shell scripts always use LF line endings
 *.sh text eol=lf
 docker-entrypoint.sh text eol=lf
 # Ensure Dockerfile uses LF
 Dockerfile text eol=lf
--- a/easy-dataset-main/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/easy-dataset-main/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,40 @@
 ---
 name: Bug report
 about: Create a report to help us improve
 title: '[Bug]'
 labels: bug
 assignees: ''
 ---
 **注意：请务必按照此模版填写 ISSUES 信息，否则 ISSUE 将不会得到回复**
 **问题描述**
 清晰、简洁地描述该问题的具体情况。
 **桌面设备（请完善以下信息）**
 - 操作系统：[例如：、Window、MAC]
 - 浏览器：[例如：谷歌浏览器（Chrome），苹果浏览器（Safari）]
 - Easy Dataset 版本：[例如：1.2.2]
 **使用模型**
 - 模型提供商：例如火山引擎
 - 模型名称：例如 DeepSeek R1
 **复现步骤**
 重现该问题的操作步骤：
 1. 进入“……”页面。
 2. 点击“……”。
 3. 向下滚动到“……”。
 4. 这时会看到错误提示。
 **预期结果**
 清晰、简洁地描述你原本期望出现的情况。
 **截图**
 如果有必要，请附上截图，以便更好地说明你的问题。
 **其他相关信息**
 在此处添加关于该问题的其他任何相关背景信息。
--- a/easy-dataset-main/.github/ISSUE_TEMPLATE/feature-or-enhancement-.md
+++ b/easy-dataset-main/.github/ISSUE_TEMPLATE/feature-or-enhancement-.md
@@ -0,0 +1,19 @@
 ---
 name: 'Feature or enhancement '
 about: Suggest an idea for this project
 title: '[Feature]'
 labels: enhancement
 assignees: ''
 ---
 **你的功能请求是否与某个问题相关？请描述。**
 清晰、简洁地描述一下存在的问题是什么。例如：当我[具体情况]时，我总是感到很沮丧。
 **描述你期望的解决方案**
 清晰、简洁地描述你希望实现的情况。
 **描述你考虑过的替代方案**
 清晰、简洁地描述你所考虑过的任何其他解决方案或功能。
 **其他相关信息**
 在此处添加与该功能请求相关的其他任何背景信息或截图。
--- a/easy-dataset-main/.github/ISSUE_TEMPLATE/question.md
+++ b/easy-dataset-main/.github/ISSUE_TEMPLATE/question.md
@@ -0,0 +1,40 @@
 ---
 name: Question
 about: Ask questions you want to know
 title: '[Question]'
 labels: question
 assignees: ''
 ---
 **注意：请务必按照此模版填写 ISSUES 信息，否则 ISSUE 将不会得到回复**
 **问题描述**
 清晰、简洁地描述该问题的具体情况。
 **桌面设备（请完善以下信息）**
 - 操作系统：[例如：、Window、MAC]
 - 浏览器：[例如：谷歌浏览器（Chrome），苹果浏览器（Safari）]
 - Easy Dataset 版本：[例如：1.2.2]
 **使用模型**
 - 模型提供商：例如火山引擎
 - 模型名称：例如 DeepSeek R1
 **复现步骤**
 重现该问题的操作步骤：
 1. 进入“……”页面。
 2. 点击“……”。
 3. 向下滚动到“……”。
 4. 这时会看到错误提示。
 **预期结果**
 清晰、简洁地描述你原本期望出现的情况。
 **截图**
 如果有必要，请附上截图，以便更好地说明你的问题。
 **其他相关信息**
 在此处添加关于该问题的其他任何相关背景信息。
--- a/easy-dataset-main/.github/PULL_REQUEST_TEMPLATE.md
+++ b/easy-dataset-main/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,12 @@
 ### 变更类型- [ ] 新功能（feat）
 - [ ] 修复（fix）
 - [ ] 文档（docs）
 - [ ] 重构（refactor）
 ### 变更描述- 简要说明修改内容（关联Issue：#123）
 ### 文档更新- [ ] README.md
 - [ ] 贡献指南
 - [ ] 接口文档（如有）
--- a/easy-dataset-main/.github/workflows/docker-build.yml
+++ b/easy-dataset-main/.github/workflows/docker-build.yml
@@ -0,0 +1,48 @@
 name: Build and Push Docker image on Tag
 on:
  push:
    tags:
      - '*'
 jobs:
  docker-image-release:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - name: Extract metadata for Docker
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/${{ github.repository_owner }}/easy-dataset
          tags: |
            type=ref,event=tag
            type=raw,value=latest,enable={{is_default_branch}}
      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          platforms: linux/amd64,linux/arm64
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
--- a/easy-dataset-main/.gitignore
+++ b/easy-dataset-main/.gitignore
@@ -0,0 +1,22 @@
 node_modules
 build
 .vscode
 website-local.json
 ai-local.json
 .next
 .DS_Store
 tsconfig.tsbuildinfo
 mock-login-callback.ts
 .env.local
 /src/test/crawler
 /src/test/mock
 /test
 /dist
 /prisma/*.sqlite
 .idea
 !local-db/empty.txt
 /local-db
 prisma/local-db/db.sqlite
 /local-db2
 .trae
 opencode.json
--- a/easy-dataset-main/.husky/commit-msg
+++ b/easy-dataset-main/.husky/commit-msg
@@ -0,0 +1,3 @@
 #!/usr/bin/env sh
 npx commitlint --edit "$1"
--- a/easy-dataset-main/.husky/pre-commit
+++ b/easy-dataset-main/.husky/pre-commit
@@ -0,0 +1 @@
 npx lint-staged
--- a/easy-dataset-main/.npmrc
+++ b/easy-dataset-main/.npmrc
@@ -0,0 +1,3 @@
 # 国内用户可使用淘宝源加速 (Chinese users can use Taobao registry for faster downloads)
 # registry=https://registry.npmmirror.com
 registry=https://registry.npmjs.org
--- a/easy-dataset-main/.prettierrc.js
+++ b/easy-dataset-main/.prettierrc.js
@@ -0,0 +1,13 @@
 module.exports = {
  semi: true,
  trailingComma: 'none',
  singleQuote: true,
  tabWidth: 2,
  useTabs: false,
  bracketSpacing: true,
  arrowParens: 'avoid',
  proseWrap: 'preserve',
  jsxBracketSameLine: true,
  printWidth: 120,
  endOfLine: 'auto'
 };
--- a/easy-dataset-main/.windsurfrules
+++ b/easy-dataset-main/.windsurfrules
@@ -0,0 +1,124 @@
 # Easy DataSet 项目架构设计
 ## 项目概述
 Easy DataSet 是一个用于创建大模型微调数据集的应用程序。用户可以上传文本文件，系统会自动分割文本并生成问题，最终生成用于微调的数据集。
 ## 技术栈
 - **前端框架**: Next.js 14 (App Router)
 - **UI 框架**: Material-UI (MUI)
 - **数据存储**: fs 文件系统模拟数据库
 - **开发语言**: JavaScript
 - **依赖管理**: pnpm
 ## 目录结构
 ```
 easy-dataset/
 ├── app/                      # Next.js 应用目录
 │   ├── api/                 # API 路由
 │   │   └── projects/       # 项目相关 API
 │   ├── projects/           # 项目相关页面
 │   │   ├── [projectId]/    # 项目详情页面
 │   └── page.js            # 主页
 ├── components/             # React 组件
 │   ├── home/              # 主页相关组件
 │   │   ├── HeroSection.js
 │   │   ├── ProjectList.js
 │   │   └── StatsCard.js
 │   ├── Navbar.js          # 导航栏组件
 │   └── CreateProjectDialog.js
 ├── lib/                    # 工具库
 │   └── db/                # 数据库模块
 │       ├── base.js        # 基础工具函数
 │       ├── projects.js    # 项目管理
 │       ├── texts.js       # 文本处理
 │       ├── datasets.js    # 数据集管理
 │       └── index.js       # 模块导出
 ├── styles/                # 样式文件
 │   └── home.js           # 主页样式
 └── local-db/             # 本地数据库目录
 ```
 ## 核心模块设计
 ### 1. 数据库模块 (`lib/db/`)
 #### base.js
 - 提供基础的文件操作功能
 - 确保数据库目录存在
 - 读写 JSON 文件的工具函数
 #### projects.js
 - 项目的 CRUD 操作
 - 项目配置管理
 - 项目目录结构维护
 #### texts.js
 - 文献处理功能
 - 文本片段存储和检索
 - 文件上传处理
 #### datasets.js
 - 数据集生成和管理
 - 问题列表管理
 - 标签树管理
 ### 2. 前端组件 (`components/`)
 #### Navbar.js
 - 顶部导航栏
 - 项目切换
 - 模型选择
 - 主题切换
 #### home/ 目录组件
 - HeroSection.js: 主页顶部展示区
 - ProjectList.js: 项目列表展示
 - StatsCard.js: 数据统计展示
 - CreateProjectDialog.js: 创建项目的对话框
 ### 3. 页面路由 (`app/`)
 #### 主页 (`page.js`)
 - 项目列表展示
 - 创建项目入口
 - 数据统计展示
 #### 项目详情页 (`projects/[projectId]/`)
 - text-split/: 文献处理页面
 - questions/: 问题列表页面
 - datasets/: 数据集页面
 - settings/: 项目设置页面
 #### API 路由 (`api/`)
 - projects/: 项目管理 API
 - texts/: 文本处理 API
 - questions/: 问题生成 API
 - datasets/: 数据集管理 API
 ## 数据流设计
 ### 项目创建流程
 1. 用户通过主页或导航栏创建新项目
 2. 填写项目基本信息（名称、描述）
 3. 系统创建项目目录和初始配置文件
 4. 重定向到项目详情页
 ### 文献处理流程
 1. 用户上传 Markdown 文件
 2. 系统保存原始文件到项目目录
 3. 调用文本分割服务，生成片段和目录结构
 4. 展示分割结果和提取的目录
 ### 问题生成流程
 1. 用户选择需要生成问题的文本片段
 2. 系统调用大模型API生成问题
 3. 保存问题到问题列表和标签树
 ### 数据集生成流程
 1. 用户选择需要生成答案的问题
 2. 系统调用大模型API生成答案
 3. 保存数据集结果
 4. 提供导出功能
--- a/easy-dataset-main/AGENTS.md
+++ b/easy-dataset-main/AGENTS.md
@@ -0,0 +1,254 @@
 # Easy Dataset Agent 指南
 ## 项目概述
 Easy Dataset 是一个专为大型语言模型（LLM）微调数据集创建而设计的应用程序。它提供完整的workflow，从文档处理到数据集导出，支持多种文件格式和AI模型。
 ## 技术栈
 - **前端**: Next.js 14 (App Router), React 18, Material-UI v5
 - **后端**: Node.js, Prisma ORM, SQLite
 - **AI集成**: OpenAI API, Ollama, 智谱AI, OpenRouter
 - **桌面应用**: Electron
 - **国际化**: i18next
 - **构建工具**: npm/pnpm, Electron Builder
 ## 核心架构
 ### 1. 数据流架构
 ```
 文档上传 → 文本分割 → 问题生成 → 答案生成 → 数据集导出
    ↓           ↓          ↓          ↓          ↓
 文件处理    智能分块    LLM生成    LLM生成    格式转换
 ```
 ### 2. 模块结构
 ```
 lib/
 ├── api/          # API接口层
 ├── db/           # 数据访问层
 ├── file/         # 文件处理模块
 ├── llm/          # AI模型集成
 ├── services/     # 业务逻辑层
 └── util/         # 工具函数
 ```
 ## 开发指南
 ### 环境设置
 ```bash
 # 安装依赖
 npm install
 # 数据库初始化
 npm run db:push
 # 开发模式
 npm run dev
 # 构建
 npm run build
 ```
 ### 代码规范
 - 使用ES6+语法
 - 模块化开发
 - 异步操作使用async/await
 - 错误处理使用try/catch
 - 注释使用JSDoc格式
 ### 重要文件路径
 - **主入口**: `app/page.js`
 - **项目路由**: `app/projects/[projectId]/`
 - **API路由**: `app/api/`
 - **LLM核心**: `lib/llm/core/index.js`
 - **任务处理**: `lib/services/tasks/`
 ## 功能模块详解
 ### 1. 文档处理模块 (`lib/file/`)
 - **支持的格式**: PDF, Markdown, DOCX, EPUB, TXT
 - **核心功能**:
  - 智能文本分割
  - 目录结构提取
  - 自定义分隔符分块
  - 多语言支持
 ### 2. AI模型集成 (`lib/llm/`)
 - **支持的提供商**:
  - OpenAI (GPT系列)
  - Ollama (本地模型)
  - 智谱AI (GLM系列)
  - OpenRouter (多模型聚合)
 - **功能特性**:
  - 统一API接口
  - 流式输出支持
  - 多语言提示词
  - 错误重试机制
 ### 3. 任务系统 (`lib/services/tasks/`)
 - **任务类型**:
  - 文件处理任务
  - 问题生成任务
  - 答案生成任务
  - 数据清洗任务
 - **状态管理**: 待处理、处理中、完成、失败
 ### 4. 数据管理 (`lib/db/`)
 - **数据模型**:
  - Project (项目)
  - Text/Chunk (文本块)
  - Question (问题)
  - Dataset (数据集)
  - Tag (标签)
 ## 常用开发任务
 ### 添加新的AI模型提供商
 1. 在 `lib/llm/core/providers/` 创建新的provider文件
 2. 实现基础接口 (generate, streamGenerate)
 3. 在 `lib/llm/core/index.js` 中注册provider
 4. 更新配置文件和UI界面
 ### 添加新的文件格式支持
 1. 在 `lib/file/file-process/` 创建格式处理器
 2. 实现内容提取和文本转换逻辑
 3. 更新文件类型检测和验证
 4. 添加相应的UI组件
 ### 自定义提示词模板
 1. 在 `lib/llm/prompts/` 创建新的提示词文件
 2. 使用i18n支持多语言
 3. 在设置界面添加配置选项
 4. 测试不同模型的效果
 ### 添加新的导出格式
 1. 在 `components/export/` 创建新的导出组件
 2. 实现数据格式转换逻辑
 3. 更新导出对话框界面
 4. 添加格式验证和错误处理
 ## 调试技巧
 ### 1. 数据库调试
 ```bash
 # 打开Prisma Studio
 npm run db:studio
 # 查看数据库文件
 sqlite3 prisma/db.sqlite
 ```
 ### 2. LLM API调试
 ```javascript
 // 在lib/llm/core/index.js中添加日志
 console.log('LLM Request:', { provider, model, prompt });
 console.log('LLM Response:', response);
 ```
 ### 3. 文件处理调试
 ```javascript
 // 在lib/file/中添加调试信息
 console.log('File processing:', fileName, fileType);
 console.log('Text chunks:', chunks.length, chunks[0]);
 ```
 ## 性能优化建议
 ### 1. 文件处理优化
 - 大文件分片处理
 - 异步并发处理
 - 内存使用监控
 - 进度条显示
 ### 2. LLM调用优化
 - 请求缓存机制
 - 批量处理请求
 - 重试策略优化
 - 并发数控制
 ### 3. 前端性能优化
 - 组件懒加载
 - 虚拟滚动列表
 - 图片懒加载
 - 代码分割
 ## 常见问题解决
 ### 1. 数据库相关问题
 - **问题**: 数据库连接失败
 - **解决**: 检查prisma配置，确保数据库文件存在
 ### 2. LLM API相关问题
 - **问题**: API调用超时
 - **解决**: 调整超时时间，检查网络连接，增加重试机制
 ### 3. 文件处理问题
 - **问题**: 大文件处理内存溢出
 - **解决**: 使用流式处理，分块读取，增加内存限制
 ### 4. Electron打包问题
 - **问题**: 打包后应用无法启动
 - **解决**: 检查依赖项配置，确保native模块正确打包
 ## 部署指南
 ### Docker部署
 ```bash
 # 构建镜像
 docker build -t easy-dataset .
 # 运行容器
 docker run -d -p 1717:1717 -v ./local-db:/app/local-db easy-dataset
 ```
 ### 桌面应用构建
 ```bash
 # 构建各平台安装包
 npm run electron-build-mac    # macOS
 npm run electron-build-win    # Windows
 npm run electron-build-linux  # Linux
 ```
 ## 贡献指南
 ### 提交规范
 - 使用conventional commits格式
 - 提交前运行lint检查
 - 更新相关文档
 - 添加测试用例
 ### 分支策略
 - `main`: 主分支，稳定版本
 - `dev`: 开发分支，集成新功能
 - `feature/*`: 功能分支
 - `fix/*`: 修复分支
 ---
--- a/easy-dataset-main/ARCHITECTURE.md
+++ b/easy-dataset-main/ARCHITECTURE.md
@@ -0,0 +1,183 @@
 # Easy DataSet 项目架构设计
 ## 项目概述
 Easy DataSet 是一个用于创建大模型微调数据集的应用程序。用户可以上传文本文件，系统会自动分割文本并生成问题，最终生成用于微调的数据集。
 ## 技术栈
 - **前端框架**: Next.js 14 (App Router)
 - **UI 框架**: Material-UI (MUI)
 - **数据存储**: fs 文件系统模拟数据库
 - **开发语言**: JavaScript
 ## 目录结构
 ```
 easy-dataset/
 ├── app/                      # Next.js 应用目录
 │   ├── api/                 # API 路由
 │   │   └── projects/       # 项目相关 API
 │   ├── projects/           # 项目相关页面
 │   │   ├── [projectId]/    # 项目详情页面
 │   └── page.js            # 主页
 ├── components/             # React 组件
 │   ├── home/              # 主页相关组件
 │   │   ├── HeroSection.js
 │   │   ├── ProjectList.js
 │   │   └── StatsCard.js
 │   ├── Navbar.js          # 导航栏组件
 │   └── CreateProjectDialog.js
 ├── lib/                    # 工具库
 │   └── db/                # 数据库模块
 │       ├── base.js        # 基础工具函数
 │       ├── projects.js    # 项目管理
 │       ├── texts.js       # 文本处理
 │       ├── datasets.js    # 数据集管理
 │       └── index.js       # 模块导出
 ├── styles/                # 样式文件
 │   └── home.js           # 主页样式
 └── local-db/             # 本地数据库目录
 ```
 ## 核心模块设计
 ### 1. 数据库模块 (`lib/db/`)
 #### base.js
 - 提供基础的文件操作功能
 - 确保数据库目录存在
 - 读写 JSON 文件的工具函数
 #### projects.js
 - 项目的 CRUD 操作
 - 项目配置管理
 - 项目目录结构维护
 #### texts.js
 - 文献处理功能
 - 文本片段存储和检索
 - 文件上传处理
 #### datasets.js
 - 数据集生成和管理
 - 问题列表管理
 - 标签树管理
 ### 2. 前端组件 (`components/`)
 #### Navbar.js
 - 顶部导航栏
 - 项目切换
 - 模型选择
 - 主题切换
 #### home/ 目录组件
 - HeroSection.js: 主页顶部展示区
 - ProjectList.js: 项目列表展示
 - StatsCard.js: 数据统计展示
 - CreateProjectDialog.js: 创建项目的对话框
 ### 3. 页面路由 (`app/`)
 #### 主页 (`page.js`)
 - 项目列表展示
 - 创建项目入口
 - 数据统计展示
 #### 项目详情页 (`projects/[projectId]/`)
 - text-split/: 文献处理页面
 - questions/: 问题列表页面
 - datasets/: 数据集页面
 - settings/: 项目设置页面
 #### API 路由 (`api/`)
 - projects/: 项目管理 API
 - texts/: 文本处理 API
 - questions/: 问题生成 API
 - datasets/: 数据集管理 API
 ## 数据流设计
 ### 项目创建流程
 1. 用户通过主页或导航栏创建新项目
 2. 填写项目基本信息（名称、描述）
 3. 系统创建项目目录和初始配置文件
 4. 重定向到项目详情页
 ### 文献处理流程
 1. 用户上传 Markdown 文件
 2. 系统保存原始文件到项目目录
 3. 调用文本分割服务，生成片段和目录结构
 4. 展示分割结果和提取的目录
 ### 问题生成流程
 1. 用户选择需要生成问题的文本片段
 2. 系统调用大模型API生成问题
 3. 保存问题到问题列表和标签树
 ### 数据集生成流程
 1. 用户选择需要生成答案的问题
 2. 系统调用大模型API生成答案
 3. 保存数据集结果
 4. 提供导出功能
 ## 模型配置
 支持多种大模型提供商配置：
 - Ollama
 - OpenAI
 - 硅基流动
 - 深度求索
 - 智谱AI
 每个提供商支持配置：
 - API 地址
 - API 密钥
 - 模型名称
 ## 未来扩展方向
 1. 支持更多文件格式（PDF、DOC等）
 2. 增加数据集质量评估功能
 3. 添加数据集版本管理
 4. 实现团队协作功能
 5. 增加更多数据集导出格式
 ## 国际化处理
 ### 技术选型
 - **国际化库**: i18next + react-i18next
 - **语言检测**: i18next-browser-languagedetector
 - **支持语言**: 英文(en)、简体中文(zh-CN)
 ### 目录结构
 ```
 easy-dataset/
 ├── locales/              # 国际化资源目录
 │   ├── en/              # 英文翻译
 │   │   └── translation.json
 │   ├── zh-CN/           # 中文翻译
 │   │   └── translation.json
 │   └── pt-BR/           # 中文翻译
 │       └── translation.json
 ├── lib/
 │   └── i18n.js          # i18next 配置
 ```
--- a/easy-dataset-main/Dockerfile
+++ b/easy-dataset-main/Dockerfile
@@ -0,0 +1,86 @@
 # 创建包含pnpm的基础镜像
 FROM node:20-alpine AS pnpm-base
 RUN npm install -g pnpm@9
 # 构建阶段
 FROM pnpm-base AS builder
 WORKDIR /app
 # 添加构建参数，用于识别目标平台
 ARG TARGETPLATFORM
 # 安装构建依赖
 RUN apk add --no-cache --virtual .build-deps \
    python3 \
    make \
    g++ \
    cairo-dev \
    pango-dev \
    jpeg-dev \
    giflib-dev \
    librsvg-dev \
    build-base \
    pixman-dev \
    pkgconfig
 # 复制依赖文件和npm配置并安装(.npmrc中可配置国内源加速)
 COPY package.json pnpm-lock.yaml .npmrc ./
 RUN pnpm install
 # 复制源代码
 COPY . .
 # 根据目标平台设置Prisma二进制目标并构建应用
 RUN if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
        echo "Configuring for ARM64 platform"; \
        sed -i 's/binaryTargets = \[.*\]/binaryTargets = \["linux-musl-arm64-openssl-3.0.x"\]/' prisma/schema.prisma; \
        PRISMA_CLI_BINARY_TARGETS="linux-musl-arm64-openssl-3.0.x" pnpm build; \
    else \
        echo "Configuring for AMD64 platform (default)"; \
        sed -i 's/binaryTargets = \[.*\]/binaryTargets = \["linux-musl-openssl-3.0.x"\]/' prisma/schema.prisma; \
        PRISMA_CLI_BINARY_TARGETS="linux-musl-openssl-3.0.x" pnpm build; \
    fi
 # 构建完成后移除开发依赖，只保留生产依赖
 RUN pnpm prune --prod
 # 运行阶段
 FROM pnpm-base AS runner
 WORKDIR /app
 # 只安装运行时依赖
 RUN apk add --no-cache \
    cairo \
    pango \
    jpeg \
    giflib \
    librsvg \
    pixman
 # 复制package.json和.env文件
 COPY package.json .env ./
 # 从构建阶段复制精简后的node_modules（只包含生产依赖）
 COPY --from=builder /app/node_modules ./node_modules
 # 从构建阶段复制构建产物
 COPY --from=builder /app/.next ./.next
 COPY --from=builder /app/public ./public
 COPY --from=builder /app/electron ./electron
 # 复制 prisma 到模板目录（用于自动初始化）
 COPY --from=builder /app/prisma /app/prisma-template
 # 复制并设置 entrypoint 脚本（sed 去除 Windows 换行符 \r，防止 CRLF 导致 "no such file or directory"）
 COPY docker-entrypoint.sh /usr/local/bin/
 RUN sed -i 's/\r$//' /usr/local/bin/docker-entrypoint.sh && \
    chmod +x /usr/local/bin/docker-entrypoint.sh
 # 设置生产环境
 ENV NODE_ENV=production
 EXPOSE 1717
 # 使用 entrypoint 脚本
 ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
 CMD ["pnpm", "start"]
--- a/easy-dataset-main/LICENSE
+++ b/easy-dataset-main/LICENSE
@@ -0,0 +1,40 @@
 GNU AFFERO GENERAL PUBLIC LICENSE
 Version 3, 19 November 2007
 Copyright (C) 2025 Easy Dataset Project
 This program is free software: you can redistribute it and/or modify
 it under the terms of the GNU Affero General Public License as published
 by the Free Software Foundation, either version 3 of the License, or
 (at your option) any later version.
 This program is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 GNU Affero General Public License for more details.
 You should have received a copy of the GNU Affero General Public License
 along with this program. If not, see https://www.gnu.org/licenses/.
 Additional Terms for Easy Dataset:
 1. Contact Information
 If you wish to use Easy Dataset under different terms, please contact the
 copyright holders at: 1009903985@qq.com
 2. Branding Restrictions
 You may not use the names "Easy Dataset" or "EasyDataset" to endorse or
 promote products derived from this software without prior written permission.
 3. Disclaimer of Warranty
 The software is provided "as is", without warranty of any kind, express or
 implied, including but not limited to the warranties of merchantability,
 fitness for a particular purpose and noninfringement. In no event shall the
 authors or copyright holders be liable for any claim, damages or other
 liability, whether in an action of contract, tort or otherwise, arising from,
 out of or in connection with the software or the use or other dealings in the
 software.
 4. Compliance with Laws
 You are responsible for ensuring your use of the software complies with all
 applicable laws, including but not limited to export control regulations.
--- a/easy-dataset-main/README.md
+++ b/easy-dataset-main/README.md
@@ -0,0 +1,294 @@
 <div align="center">
 ![](./public//imgs/bg2.png)
 <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ConardLi/easy-dataset">
 <img alt="GitHub Downloads (all assets, all releases)" src="https://img.shields.io/github/downloads/ConardLi/easy-dataset/total">
 <img alt="GitHub Release" src="https://img.shields.io/github/v/release/ConardLi/easy-dataset">
 <img src="https://img.shields.io/badge/license-AGPL--3.0-green.svg" alt="AGPL 3.0 License"/>
 <img alt="GitHub contributors" src="https://img.shields.io/github/contributors/ConardLi/easy-dataset">
 <img alt="GitHub last commit" src="https://img.shields.io/github/last-commit/ConardLi/easy-dataset">
 <a href="https://arxiv.org/abs/2507.04009v1" target="_blank">
  <img src="https://img.shields.io/badge/arXiv-2507.04009-b31b1b.svg" alt="arXiv:2507.04009">
 </a>
 <a href="https://trendshift.io/repositories/13944" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13944" alt="ConardLi%2Feasy-dataset | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
 **A powerful tool for creating fine-tuning datasets for Large Language Models**
 [简体中文](./README.zh-CN.md) | [English](./README.md) | [Türkçe](./README.tr.md)
 [Features](#features) • [Quick Start](#local-run) • [Documentation](https://docs.easy-dataset.com/ed/en) • [Contributing](#contributing) • [License](#license)
 If you like this project, please give it a Star⭐️, or buy the author a coffee => [Donate](./public/imgs/aw.jpg) ❤️!
 </div>
 ## Overview
 Easy Dataset is an application specifically designed for building large language model (LLM) datasets. It features an intuitive interface, along with built-in powerful document parsing tools, intelligent segmentation algorithms, data cleaning and augmentation capabilities. The application can convert domain-specific documents in various formats into high-quality structured datasets, which are applicable to scenarios such as model fine-tuning, retrieval-augmented generation (RAG), and model performance evaluation.
 ![](./public/imgs/arc3.png)
 ## News
 🎉🎉 Easy Dataset Version 1.7.0 launches brand-new evaluation capabilities! You can effortlessly convert domain-specific documents into evaluation datasets (test sets) and automatically run multi-dimensional evaluation tasks. Additionally, it comes with a human blind test system, enabling you to easily meet needs such as vertical domain model evaluation, post-fine-tuning model performance assessment, and RAG recall rate evaluation. Tutorial: [https://www.bilibili.com/video/BV1CRrVB7Eb4/](https://www.bilibili.com/video/BV1CRrVB7Eb4/)
 ## Features
 ### 📄 Document Processing & Data Generation
 - **Intelligent Document Processing**: Supports PDF, Markdown, DOCX, TXT, EPUB and more formats with intelligent recognition
 - **Intelligent Text Splitting**: Multiple splitting algorithms (Markdown structure, recursive separators, fixed length, code-aware chunking), with customizable visual segmentation
 - **Intelligent Question Generation**: Auto-extract relevant questions from text segments, with question templates and batch generation
 - **Domain Label Tree**: Intelligently builds global domain label trees based on document structure, with auto-tagging capabilities
 - **Answer Generation**: Uses LLM API to generate comprehensive answers and Chain of Thought (COT), with AI optimization
 - **Data Cleaning**: Intelligent text cleaning to remove noise and improve data quality
 ### 🔄 Multiple Dataset Types
 - **Single-Turn QA Datasets**: Standard question-answer pairs for basic fine-tuning
 - **Multi-Turn Dialogue Datasets**: Customizable roles and scenarios for conversational format
 - **Image QA Datasets**: Generate visual QA data from images, with multiple import methods (directory, PDF, ZIP)
 - **Data Distillation**: Generate label trees and questions directly from domain topics without uploading documents
 ### 📊 Model Evaluation System
 - **Evaluation Datasets**: Generate true/false, single-choice, multiple-choice, short-answer, and open-ended questions
 - **Automated Model Evaluation**: Use Judge Model to automatically evaluate model answer quality with customizable scoring rules
 - **Human Blind Test (Arena)**: Double-blind comparison of two models' answers for unbiased evaluation
 - **AI Quality Assessment**: Automatic quality scoring and filtering of generated datasets
 ### 🛠️ Advanced Features
 - **Custom Prompts**: Project-level customization of all prompt templates (question generation, answer generation, data cleaning, etc.)
 - **GA Pair Generation**: Genre-Audience pair generation to enrich data diversity
 - **Task Management Center**: Background batch task processing with monitoring and interruption support
 - **Resource Monitoring Dashboard**: Token consumption statistics, API call tracking, model performance analysis
 - **Model Testing Playground**: Compare up to 3 models simultaneously
 ### 📤 Export & Integration
 - **Multiple Export Formats**: Alpaca, ShareGPT, Multilingual-Thinking formats with JSON/JSONL file types
 - **Balanced Export**: Configure export counts per tag for dataset balancing
 - **LLaMA Factory Integration**: One-click LLaMA Factory configuration file generation
 - **Hugging Face Upload**: Direct upload datasets to Hugging Face Hub
 ### 🤖 Model Support
 - **Wide Model Compatibility**: Compatible with all LLM APIs that follow the OpenAI format
 - **Multi-Provider Support**: OpenAI, Ollama (local models), Zhipu AI, Alibaba Bailian, OpenRouter, and more
 - **Vision Models**: Support Gemini, Claude, etc. for PDF parsing and image QA
 ### 🌐 User Experience
 - **User-Friendly Interface**: Modern, intuitive UI designed for both technical and non-technical users
 - **Multi-Language Support**: Complete Chinese, English, Turkish and Portuguese language support 🇹🇷
 - **Dataset Square**: Discover and explore public dataset resources
 - **Desktop Clients**: Available for Windows, macOS, and Linux
 ## Quick Demo
 https://github.com/user-attachments/assets/6ddb1225-3d1b-4695-90cd-aa4cb01376a8
 ## Local Run
 ### Download Client
 <table style="width: 100%">
  <tr>
    <td width="20%" align="center">
      <b>Windows</b>
    </td>
    <td width="30%" align="center" colspan="2">
      <b>MacOS</b>
    </td>
    <td width="20%" align="center">
      <b>Linux</b>
    </td>
  </tr>
  <tr style="text-align: center">
    <td align="center" valign="middle">
      <a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
        <img src='./public/imgs/windows.png' style="height:24px; width: 24px" />
        <br />
        <b>Setup.exe</b>
      </a>
    </td>
    <td align="center" valign="middle">
      <a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
        <img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
        <br />
        <b>Intel</b>
      </a>
    </td>
    <td align="center" valign="middle">
      <a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
        <img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
        <br />
        <b>M</b>
      </a>
    </td>
    <td align="center" valign="middle">
      <a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
        <img src='./public/imgs/linux.png' style="height:24px; width: 24px" />
        <br />
        <b>AppImage</b>
      </a>
    </td>
  </tr>
 </table>
 ### Install with NPM
 1. Clone the repository:
 ```bash
   git clone https://github.com/ConardLi/easy-dataset.git
   cd easy-dataset
 ```
 2. Install dependencies:
 ```bash
   npm install
 ```
 3. Start the development server:
 ```bash
   npm run build
   npm run start
 ```
 4. Open your browser and visit `http://localhost:1717`
 ### Using the Official Docker Image
 1. Clone the repository:
 ```bash
 git clone https://github.com/ConardLi/easy-dataset.git
 cd easy-dataset
 ```
 2. Modify the `docker-compose.yml` file:
 ```yml
 services:
  easy-dataset:
    image: ghcr.io/conardli/easy-dataset
    container_name: easy-dataset
    ports:
      - '1717:1717'
    volumes:
      - ./local-db:/app/local-db
      - ./prisma:/app/prisma
    restart: unless-stopped
 ```
 > **Note:** It is recommended to use the `local-db` and `prisma` folders in the current code repository directory as mount paths to maintain consistency with the database paths when starting via NPM.
 > **Note:** The database file will be automatically initialized on first startup, no need to manually run `npm run db:push`.
 3. Start with docker-compose:
 ```bash
 docker-compose up -d
 ```
 4. Open a browser and visit `http://localhost:1717`
 ### Building with a Local Dockerfile
 If you want to build the image yourself, use the Dockerfile in the project root directory:
 1. Clone the repository:
 ```bash
 git clone https://github.com/ConardLi/easy-dataset.git
 cd easy-dataset
 ```
 2. Build the Docker image:
 ```bash
 docker build -t easy-dataset .
 ```
 3. Run the container:
 ```bash
 docker run -d \
  -p 1717:1717 \
  -v ./local-db:/app/local-db \
  -v ./prisma:/app/prisma \
  --name easy-dataset \
  easy-dataset
 ```
 > **Note:** It is recommended to use the `local-db` and `prisma` folders in the current code repository directory as mount paths to maintain consistency with the database paths when starting via NPM.
 > **Note:** The database file will be automatically initialized on first startup, no need to manually run `npm run db:push`.
 4. Open a browser and visit `http://localhost:1717`
 ## Documentation
 - View the demo video of this project: [Easy Dataset Demo Video](https://www.bilibili.com/video/BV1y8QpYGE57/)
 - For detailed documentation on all features and APIs, visit our [Documentation Site](https://docs.easy-dataset.com/ed/en)
 - View the paper of this project: [Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents](https://arxiv.org/abs/2507.04009v1)
 ## Community Practice
 - [Complete test set generation and model evaluation with Easy Dataset](https://www.bilibili.com/video/BV1CRrVB7Eb4/)
 - [Easy Dataset × LLaMA Factory: Enabling LLMs to Efficiently Learn Domain Knowledge](https://buaa-act.feishu.cn/wiki/GVzlwYcRFiR8OLkHbL6cQpYin7g)
 - [Easy Dataset Practical Guide: How to Build High-Quality Datasets?](https://www.bilibili.com/video/BV1MRMnz1EGW)
 - [Interpretation of Key Feature Updates in Easy Dataset](https://www.bilibili.com/video/BV1fyJhzHEb7/)
 - [Foundation Models Fine-tuning Datasets: Basic Knowledge Popularization](https://docs.easy-dataset.com/zhi-shi-ke-pu)
 ## Contributing
 We welcome contributions from the community! If you'd like to contribute to Easy Dataset, please follow these steps:
 1. Fork the repository
 2. Create a new branch (`git checkout -b feature/amazing-feature`)
 3. Make your changes
 4. Commit your changes (`git commit -m 'Add some amazing feature'`)
 5. Push to the branch (`git push origin feature/amazing-feature`)
 6. Open a Pull Request (submit to the DEV branch)
 Please ensure that tests are appropriately updated and adhere to the existing coding style.
 ## Join Discussion Group & Contact the Author
 https://docs.easy-dataset.com/geng-duo/lian-xi-wo-men
 ## License
 This project is licensed under the AGPL 3.0 License - see the [LICENSE](LICENSE) file for details.
 ## Citation
 If this work is helpful, please kindly cite as:
 ```bibtex
@misc{miao2025easydataset,
  title={Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents},
  author={Ziyang Miao and Qiyu Sun and Jingyuan Wang and Yuchen Gong and Yaowei Zheng and Shiqi Li and Richong Zhang},
  year={2025},
  eprint={2507.04009},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2507.04009}
 }
 ```
 ## Star History
 [![Star History Chart](https://api.star-history.com/svg?repos=ConardLi/easy-dataset&type=Date)](https://www.star-history.com/#ConardLi/easy-dataset&Date)
 <div align="center">
  <sub>Built with ❤️ by <a href="https://github.com/ConardLi">ConardLi</a> • Follow me: <a href="./public/imgs/weichat.jpg">WeChat Official Account</a>｜<a href="https://space.bilibili.com/474921808">Bilibili</a>｜<a href="https://juejin.cn/user/3949101466785709">Juejin</a>｜<a href="https://www.zhihu.com/people/wen-ti-chao-ji-duo-de-xiao-qi">Zhihu</a>｜<a href="https://www.youtube.com/@garden-conard">Youtube</a></sub>
 </div>
--- a/easy-dataset-main/README.tr.md
+++ b/easy-dataset-main/README.tr.md
@@ -0,0 +1,319 @@
 <div align="center">
 ![](./public//imgs/bg2.png)
 <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ConardLi/easy-dataset">
 <img alt="GitHub Downloads (all assets, all releases)" src="https://img.shields.io/github/downloads/ConardLi/easy-dataset/total">
 <img alt="GitHub Release" src="https://img.shields.io/github/v/release/ConardLi/easy-dataset">
 <img src="https://img.shields.io/badge/license-AGPL--3.0-green.svg" alt="AGPL 3.0 License"/>
 <img alt="GitHub contributors" src="https://img.shields.io/github/contributors/ConardLi/easy-dataset">
 <img alt="GitHub last commit" src="https://img.shields.io/github/last-commit/ConardLi/easy-dataset">
 <a href="https://arxiv.org/abs/2507.04009v1" target="_blank">
  <img src="https://img.shields.io/badge/arXiv-2507.04009-b31b1b.svg" alt="arXiv:2507.04009">
 </a>
 <a href="https://trendshift.io/repositories/13944" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13944" alt="ConardLi%2Feasy-dataset | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
 **Büyük Dil Modelleri için ince ayar veri setleri oluşturmak için güçlü bir araç**
 [简体中文](./README.zh-CN.md) | [English](./README.md) | [Türkçe](./README.tr.md)
 [Özellikler](#özellikler) • [Hızlı Başlangıç](#yerel-çalıştırma) • [Dokümantasyon](https://docs.easy-dataset.com/ed/en) • [Katkıda Bulunma](#katkıda-bulunma) • [Lisans](#lisans)
 Bu projeyi beğendiyseniz, lütfen bir Yıldız⭐️ verin veya yazara bir kahve ısmarlayın => [Bağış](./public/imgs/aw.jpg) ❤️!
 </div>
 ## Genel Bakış
 Easy Dataset, Büyük Dil Modelleri (LLM'ler) için özel olarak tasarlanmış ince ayar veri setleri oluşturmak için bir uygulamadır. Alana özgü dosyaları yüklemek, içeriği akıllıca bölmek, sorular oluşturmak ve model ince ayarı için yüksek kaliteli eğitim verileri üretmek için sezgisel bir arayüz sağlar.
 Easy Dataset ile alan bilgisini yapılandırılmış veri setlerine dönüştürebilir, OpenAI formatını takip eden tüm LLM API'leriyle uyumlu çalışabilir ve ince ayar sürecini basit ve verimli hale getirebilirsiniz.
 ![](./public/imgs/arc3.png)
 ## Özellikler
 - **Akıllı Belge İşleme**: PDF, Markdown, DOCX dahil birden fazla formatın akıllı tanınması ve işlenmesi desteği
 - **Akıllı Metin Bölme**: Birden fazla akıllı metin bölme algoritması ve özelleştirilebilir görsel segmentasyon desteği
 - **Akıllı Soru Üretimi**: Her metin bölümünden ilgili soruları çıkarır
 - **Alan Etiketleri**: Veri setleri için global alan etiketlerini akıllıca oluşturur, küresel anlama yeteneklerine sahiptir
 - **Cevap Üretimi**: Kapsamlı cevaplar ve Düşünce Zinciri (COT) oluşturmak için LLM API kullanır
 - **Esnek Düzenleme**: Sürecin herhangi bir aşamasında soruları, cevapları ve veri setlerini düzenleyin
 - **Çoklu Dışa Aktarma Formatları**: Veri setlerini çeşitli formatlarda (Alpaca, ShareGPT, çok dilli düşünme) ve dosya türlerinde (JSON, JSONL) dışa aktarın
 - **Geniş Model Desteği**: OpenAI formatını takip eden tüm LLM API'leriyle uyumlu
 - **Tam Türkçe Dil Desteği**: Tüm arayüz ve AI işlemleri için eksiksiz Türkçe çeviriler 🇹🇷
 - **Kullanıcı Dostu Arayüz**: Hem teknik hem de teknik olmayan kullanıcılar için tasarlanmış sezgisel kullanıcı arayüzü
 - **Özel Sistem İstemleri**: Model yanıtlarını yönlendirmek için özel sistem istemleri ekleyin
 ## Hızlı Demo
 https://github.com/user-attachments/assets/6ddb1225-3d1b-4695-90cd-aa4cb01376a8
 ## Yerel Çalıştırma
 ### İstemciyi İndirin
 <table style="width: 100%">
  <tr>
    <td width="20%" align="center">
      <b>Windows</b>
    </td>
    <td width="30%" align="center" colspan="2">
      <b>MacOS</b>
    </td>
    <td width="20%" align="center">
      <b>Linux</b>
    </td>
  </tr>
  <tr style="text-align: center">
    <td align="center" valign="middle">
      <a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
        <img src='./public/imgs/windows.png' style="height:24px; width: 24px" />
        <br />
        <b>Setup.exe</b>
      </a>
    </td>
    <td align="center" valign="middle">
      <a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
        <img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
        <br />
        <b>Intel</b>
      </a>
    </td>
    <td align="center" valign="middle">
      <a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
        <img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
        <br />
        <b>M</b>
      </a>
    </td>
    <td align="center" valign="middle">
      <a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
        <img src='./public/imgs/linux.png' style="height:24px; width: 24px" />
        <br />
        <b>AppImage</b>
      </a>
    </td>
  </tr>
 </table>
 ### NPM ile Kurulum
 ```bash
 npm install
 npm run db:push
 npm run dev
 ```
 ### Docker ile Kurulum
 ```bash
 docker-compose up -d
 ```
 Ardından `http://localhost:1717` adresine gidin.
 ## Desteklenen AI Sağlayıcıları
 Easy Dataset, aşağıdakiler dahil olmak üzere birden fazla AI sağlayıcısını destekler:
 - **OpenAI**: GPT-4, GPT-3.5-turbo ve diğer modeller
 - **Ollama**: Yerel model çalıştırma
 - **智谱AI (GLM)**: Çince modeller
 - **OpenRouter**: Çoklu model aggregatör
 - **Özel API Uç Noktaları**: OpenAI formatını takip eden herhangi bir API
 ## Proje Yapısı
 ```
 easy-dataset/
 ├── app/                    # Next.js uygulama yönlendiricisi
 │   ├── api/               # API rotaları
 │   ├── projects/          # Proje sayfaları
 │   └── dataset-square/    # Veri seti galerisi
 ├── components/            # React bileşenleri
 ├── lib/                   # Temel kütüphaneler
 │   ├── llm/              # LLM entegrasyonu
 │   ├── db/               # Veritabanı erişimi
 │   ├── file/             # Dosya işleme
 │   └── services/         # İş mantığı
 ├── locales/              # i18n çevirileri
 │   ├── en/              # İngilizce
 │   ├── zh-CN/           # Basitleştirilmiş Çince
 │   └── tr/              # Türkçe
 ├── prisma/               # Veritabanı şeması
 └── electron/             # Electron masaüstü uygulaması
 ```
 ## Kullanım Rehberi
 ### 1. Proje Oluşturma
 İlk olarak, yeni bir proje oluşturun ve proje adını, açıklamasını ve diğer temel bilgileri yapılandırın.
 ### 2. Dosya Yükleme
 Alana özgü belgelerinizi yükleyin. Desteklenen formatlar:
 - PDF
 - Markdown (.md)
 - Microsoft Word (.docx)
 - EPUB
 - Düz metin (.txt)
 ### 3. Metin Bölme
 Dosyalar aşağıdaki yöntemlerle akıllıca bölünebilir:
 - Doğal dil işleme tabanlı semantik bölme
 - Özel ayırıcılara dayalı bölme
 - Karakter sayısına dayalı sabit boyutlu bölme
 - Manuel görsel bölme
 ### 4. Alan Etiketleri Oluşturma
 Sistem, belge içeriğine dayalı olarak otomatik olarak hiyerarşik alan etiketleri oluşturabilir ve iki seviyeyi destekler.
 ### 5. Soru Üretimi
 Her metin bloğu için sistem:
 - İçeriğe dayalı alakalı sorular oluşturur
 - Tür ve hedef kitle perspektifi sorgulamayı destekler
 - Soru sayısını özelleştirme seçeneği sunar
 ### 6. Cevap Üretimi
 Yapılandırılmış LLM API'si kullanarak:
 - Her soru için kapsamlı cevaplar oluşturur
 - Düşünce Zinciri (COT) üretimini destekler
 - Farklı cevap şablonları destekler
 ### 7. Veri Seti Dışa Aktarma
 Veri setinizi çeşitli formatlarda dışa aktarın:
 - **Alpaca Format**: Basit talimat-takip formatı
 - **ShareGPT Format**: Çok turlu konuşma formatı
 - **Çok Dilli Düşünme**: COT ile genişletilmiş format
 - **Özel Format**: Kendi JSON yapınızı tanımlayın
 Dışa aktarma hedefleri:
 - Yerel dosya sistemi
 - Hugging Face Hub
 - LLaMA Factory uyumluluğu
 ## Gelişmiş Özellikler
 ### Veri Damıtma
 Mevcut veri setlerinden yeni eğitim örnekleri oluşturun:
 - Soru damıtma: Mevcut soru-cevap çiftlerinden yeni sorular oluşturun
 - Etiket damıtma: Otomatik etiket ve kategorizasyon oluşturma
 ### Tür-Hedef Kitle (GA) Çiftleri
 Spesifik içerik stilleri ve hedef kitleler için veri setlerini uyarlayın:
 - Tür: Akademik, teknik, yaratıcı yazma, vb.
 - Hedef Kitle: Yeni başlayanlar, uzmanlar, öğrenciler, vb.
 ### Toplu İşlemler
 Birden fazla öğeye verimli bir şekilde işlem:
 - Toplu soru üretimi
 - Toplu cevap üretimi
 - Toplu veri seti dışa aktarma
 ### Görev Yönetimi
 Tüm arka plan görevlerini izleyin ve yönetin:
 - Dosya işleme görevleri
 - Soru üretim görevleri
 - Cevap üretim görevleri
 - Dışa aktarma görevleri
 ## Yapılandırma
 ### LLM API Yapılandırması
 Ayarlar sayfasında LLM API'nizi yapılandırın:
 1. **Sağlayıcı**: OpenAI, Ollama, 智谱AI veya özel seçin
 2. **API Anahtarı**: API anahtarınızı girin (gerekirse)
 3. **Model**: Kullanılacak modeli seçin
 4. **Temel URL**: Özel API'ler için temel URL'yi ayarlayın
 ### Görev Ayarları
 Görev yürütme parametrelerini özelleştirin:
 - Soru üretimi için eşzamanlılık
 - Cevap üretimi için eşzamanlılık
 - Varsayılan soru sayısı
 - Varsayılan cevap şablonu
 ### Özel İstemler
 Her görev türü için özel sistem istemleri ekleyin:
 - Soru üretim istemi
 - Cevap üretim istemi
 - Etiket üretim istemi
 - Damıtma istemi
 ## Katkıda Bulunma
 Katkılara hoş geldiniz! Lütfen şu adımları izleyin:
 1. Repo'yu fork edin
 2. Bir özellik dalı oluşturun (`git checkout -b feature/amazing-feature`)
 3. Değişikliklerinizi commit edin (`git commit -m 'Add some amazing feature'`)
 4. Dala push edin (`git push origin feature/amazing-feature`)
 5. Bir Pull Request açın
 ## Lisans
 Bu proje AGPL-3.0 Lisansı altında lisanslanmıştır. Detaylar için [LICENSE](./LICENSE) dosyasına bakın.
 ## İletişim
 - **GitHub Issues**: [Yeni bir sorun oluşturun](https://github.com/ConardLi/easy-dataset/issues)
 - **Email**: lhj19950927@gmail.com
 - **WeChat Grubu**: README'deki QR koduna bakın
 ## Alıntı
 Bu aracı araştırmanızda kullanırsanız, lütfen şu şekilde alıntı yapın:
 ```bibtex
@misc{easy-dataset-2025,
  title={Easy Dataset: A Tool for Creating Fine-tuning Datasets for Large Language Models},
  author={Conard Li},
  year={2025},
  publisher={GitHub},
  howpublished={\url{https://github.com/ConardLi/easy-dataset}}
 }
 ```
 ## Teşekkürler
 Bu proje aşağıdaki harika açık kaynak projelerini kullanır:
 - [Next.js](https://nextjs.org/)
 - [React](https://reactjs.org/)
 - [Material-UI](https://mui.com/)
 - [Prisma](https://www.prisma.io/)
 - [Electron](https://www.electronjs.org/)
 ---
 <div align="center">
 ⭐️ Bu projeyi beğendiyseniz, lütfen bir yıldız verin! ⭐️
 </div>
--- a/easy-dataset-main/README.zh-CN.md
+++ b/easy-dataset-main/README.zh-CN.md
@@ -0,0 +1,300 @@
 <div align="center">
 ![](./public//imgs/bg2.png)
 <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ConardLi/easy-dataset">
 <img alt="GitHub Downloads (all assets, all releases)" src="https://img.shields.io/github/downloads/ConardLi/easy-dataset/total">
 <img alt="GitHub Release" src="https://img.shields.io/github/v/release/ConardLi/easy-dataset">
 <img src="https://img.shields.io/badge/license-AGPL--3.0-green.svg" alt="AGPL 3.0 License"/>
 <img alt="GitHub contributors" src="https://img.shields.io/github/contributors/ConardLi/easy-dataset">
 <img alt="GitHub last commit" src="https://img.shields.io/github/last-commit/ConardLi/easy-dataset">
 <a href="https://arxiv.org/abs/2507.04009v1" target="_blank">
  <img src="https://img.shields.io/badge/arXiv-2507.04009-b31b1b.svg" alt="arXiv:2507.04009">
 </a>
 <a href="https://trendshift.io/repositories/13944" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13944" alt="ConardLi%2Feasy-dataset | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
 **一个强大的大型语言模型微调数据集创建工具**
 [简体中文](./README.zh-CN.md) | [English](./README.md)
 [功能特点](#功能特点) • [快速开始](#本地运行) • [使用文档](https://docs.easy-dataset.com/) • [贡献](#贡献) • [许可证](#许可证)
 如果喜欢本项目，请给本项目留下 Star⭐️，或者请作者喝杯咖啡呀 => [打赏作者](./public/imgs/aw.jpg) ❤️！
 </div>
 ## 概述
 Easy Dataset 是一个专为创建大型语言模型数据集而设计的应用程序。它提供了直观的界面，内置了强大的文档解析工具、智能分割算法、数据清洗和数据增强能力，可以将各种格式的领域文献转化为高质量结构化数据集，可用于模型微调、RAG、模型效果评估等场景。
 ![Easy Dataset 产品架构图](./public/imgs/arc3.png)
 ## 新闻
 🎉🎉 Easy Dataset 1.7.0 版本上线全新的评估能力，你可以轻松将领域文献转换为评估数据集（测试集），并且可以自动执行多维度评估任务，另外还配备人工盲测系统，可以轻松助你完成垂直领域模型评估、模型微调后效果评估、RAG 召回率评估等需求，使用教程： [https://www.bilibili.com/video/BV1CRrVB7Eb4/](https://www.bilibili.com/video/BV1CRrVB7Eb4/)
 ## 功能特点
 ### 📄 文档处理与数据生成
 - **智能文档处理**：支持 PDF、Markdown、DOCX、TXT、EPUB 等多种格式智能识别和处理
 - **智能文本分割**：支持多种智能文本分割算法（Markdown 结构、递归分隔符、固定长度、代码智能分块等），支持自定义可视化分段
 - **智能问题生成**：从每个文本片段中自动提取相关问题，支持问题模板和批量生成
 - **领域标签树**：基于文档目录智能构建全局领域标签树，具备全局理解和自动打标能力
 - **答案生成**：使用 LLM API 为每个问题生成全面的答案和思维链（COT），支持 AI 智能优化
 - **数据清洗**：智能清洗文本块内容，去除噪音数据，提升数据质量
 ### 🔄 多种数据集类型
 - **单轮问答数据集**：标准的问答对格式，适合基础微调
 - **多轮对话数据集**：支持自定义角色和场景的多轮对话格式
 - **图片问答数据集**：基于图片生成视觉问答数据，支持多种导入方式（目录、PDF、压缩包）
 - **数据蒸馏**：无需上传文档，直接从领域主题自动生成标签树和问题
 ### 📊 模型评估体系
 - **评估数据集**：支持生成判断题、单选题、多选题、简答题、开放题等多种题型的评估测试集
 - **模型自动评估**：使用教师模型（Judge Model）自动评估模型回答质量，支持自定义评分规则
 - **人工盲测 (Arena)**：双盲对比两个模型的回答质量，消除偏见进行公正评判
 - **AI 质量评估**：对生成的数据集进行自动质量评分和筛选
 ### 🛠️ 高级功能
 - **自定义提示词**：项目级自定义各类提示词模板（问题生成、答案生成、数据清洗等）
 - **GA 组合生成**：文体-受众对生成，丰富数据多样性
 - **任务管理中心**：后台批量任务处理，支持任务监控和中断
 - **资源监控看板**：Token 消耗统计、调用次数追踪、模型性能分析
 - **模型测试 Playground**：支持最多 3 个模型同时对比测试
 ### 📤 导出与集成
 - **多种导出格式**：支持 Alpaca、ShareGPT、Multilingual-Thinking 等格式，JSON/JSONL 文件类型
 - **平衡导出**：按标签配置导出数量，实现数据集均衡
 - **LLaMA Factory 集成**：一键生成 LLaMA Factory 配置文件
 - **Hugging Face 上传**：直接将数据集上传至 Hugging Face Hub
 ### 🤖 模型支持
 - **广泛的模型兼容**：兼容所有遵循 OpenAI 格式的 LLM API
 - **多提供商支持**：OpenAI、Ollama（本地模型）、智谱 AI、阿里百炼、OpenRouter 等
 - **视觉模型**：支持 Gemini、Claude 等视觉模型用于 PDF 解析和图片问答
 ### 🌐 用户体验
 - **用户友好界面**：为技术和非技术用户设计的现代化直观 UI
 - **多语言支持**：完整的中英文界面支持
 - **数据集广场**：发现和探索各种公开数据集资源
 - **桌面客户端**：提供 Windows、macOS、Linux 桌面应用
 ## 快速演示
 https://github.com/user-attachments/assets/6ddb1225-3d1b-4695-90cd-aa4cb01376a8
 ## 本地运行
 ### 下载客户端
 <table style="width: 100%">
  <tr>
    <td width="20%" align="center">
      <b>Windows</b>
    </td>
    <td width="30%" align="center" colspan="2">
      <b>MacOS</b>
    </td>
    <td width="20%" align="center">
      <b>Linux</b>
    </td>
  </tr>
  <tr style="text-align: center">
    <td align="center" valign="middle">
      <a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
        <img src='./public/imgs/windows.png' style="height:24px; width: 24px" />
        <br />
        <b>Setup.exe</b>
      </a>
    </td>
    <td align="center" valign="middle">
      <a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
        <img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
        <br />
        <b>Intel</b>
      </a>
    </td>
    <td align="center" valign="middle">
      <a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
        <img src='./public/imgs/mac.png' style="height:24px; width: 24px" />
        <br />
        <b>M</b>
      </a>
    </td>
    <td align="center" valign="middle">
      <a href='https://github.com/ConardLi/easy-dataset/releases/latest'>
        <img src='./public/imgs/linux.png' style="height:24px; width: 24px" />
        <br />
        <b>AppImage</b>
      </a>
    </td>
  </tr>
 </table>
 ### 使用 NPM 安装
 1. 克隆仓库：
 ```bash
   git clone https://github.com/ConardLi/easy-dataset.git
   cd easy-dataset
 ```
 2. 安装依赖：
 ```bash
   npm install
 ```
 3. 启动开发服务器：
 ```bash
   npm run build
   npm run start
 ```
 4. 打开浏览器并访问 `http://localhost:1717`
 ### 使用官方 Docker 镜像
 1. 克隆仓库：
 ```bash
 git clone https://github.com/ConardLi/easy-dataset.git
 cd easy-dataset
 ```
 2. 更改 `docker-compose.yml` 文件：
 ```yml
 services:
  easy-dataset:
    image: ghcr.io/conardli/easy-dataset
    container_name: easy-dataset
    ports:
      - '1717:1717'
    volumes:
      - ./local-db:/app/local-db
      - ./prisma:/app/prisma
    restart: unless-stopped
 ```
 > **注意：** 建议直接使用当前代码仓库目录下的 `local-db` 和 `prisma` 文件夹作为挂载路径，这样可以和 NPM 启动时的数据库路径保持一致。
 > **注意：** 数据库文件会在首次启动时自动初始化，无需手动执行 `npm run db:push`。
 3. 使用 docker-compose 启动
 ```bash
 docker-compose up -d
 ```
 4. 打开浏览器并访问 `http://localhost:1717`
 ### 使用本地 Dockerfile 构建
 如果你想自行构建镜像，可以使用项目根目录中的 Dockerfile：
 1. 克隆仓库：
 ```bash
 git clone https://github.com/ConardLi/easy-dataset.git
 cd easy-dataset
 ```
 2. 构建 Docker 镜像：
 ```bash
 docker build -t easy-dataset .
 ```
 3. 运行容器：
 ```bash
 docker run -d \
  -p 1717:1717 \
  -v ./local-db:/app/local-db \
  -v ./prisma:/app/prisma \
  --name easy-dataset \
  easy-dataset
 ```
 > **注意：** 建议直接使用当前代码仓库目录下的 `local-db` 和 `prisma` 文件夹作为挂载路径，这样可以和 NPM 启动时的数据库路径保持一致。
 > **注意：** 数据库文件会在首次启动时自动初始化，无需手动执行 `npm run db:push`。
 4. 打开浏览器，访问 `http://localhost:1717`
 ## 文档
 - 有关所有功能和 API 的详细文档，请访问我们的 [文档站点](https://docs.easy-dataset.com/)
 - 查看本项目的演示视频：[Easy Dataset 演示视频](https://www.bilibili.com/video/BV1y8QpYGE57/)
 - 查看本项目的论文：[Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents](https://arxiv.org/abs/2507.04009v1)
 ## 社区教程
 - [使用 Easy Dataset 完成测试集生成和模型评估](https://www.bilibili.com/video/BV1CRrVB7Eb4/)
 - [Easy Dataset × LLaMA Factory: 让大模型高效学习领域知识](https://buaa-act.feishu.cn/wiki/KY9xwTGs1iqHrRkjXBwcZP9WnL9)
 - [Easy Dataset 使用实战: 如何构建高质量数据集？](https://www.bilibili.com/video/BV1MRMnz1EGW)
 - [Easy Dataset 1.4 重点功能更新解读](https://www.bilibili.com/video/BV1fyJhzHEb7/)
 - [Easy Dataset 1.6 重点功能更新解读](https://www.bilibili.com/video/BV1Rq1hBtEJa/)
 - [大模型微调数据集: 基础知识科普](https://docs.easy-dataset.com/zhi-shi-ke-pu)
 - [实战案例1：生成汽车图片识别数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-1-sheng-cheng-qi-che-tu-pian-shi-bie-shu-ju-ji)
 - [实战案例2：评论情感分类数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-2-ping-lun-qing-gan-fen-lei-shu-ju-ji)
 - [实战案例3：物理学多轮对话数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-3-wu-li-xue-duo-lun-dui-hua-shu-ju-ji)
 - [实战案例4：AI 智能体安全数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-4ai-zhi-neng-ti-an-quan-shu-ju-ji)
 - [实战案例5：从图文 PPT 中提取数据集](https://docs.easy-dataset.com/bo-ke/shi-zhan-an-li/an-li-5-cong-tu-wen-ppt-zhong-ti-qu-shu-ju-ji)
 ## 贡献
 我们欢迎社区的贡献！如果您想为 Easy Dataset 做出贡献，请按照以下步骤操作：
 1. Fork 仓库
 2. 创建新分支（`git checkout -b feature/amazing-feature`）
 3. 进行更改
 4. 提交更改（`git commit -m '添加一些惊人的功能'`）
 5. 推送到分支（`git push origin feature/amazing-feature`）
 6. 打开 Pull Request（提交至 DEV 分支）
 请确保适当更新测试并遵守现有的编码风格。
 ## 加交流群 & 联系作者
 https://docs.easy-dataset.com/geng-duo/lian-xi-wo-men
 ## 许可证
 本项目采用 AGPL 3.0 许可证 - 有关详细信息，请参阅 [LICENSE](LICENSE) 文件。
 ## 引用
 如果您觉得此项目有帮助，请考虑以下列格式引用
 ```bibtex
@misc{miao2025easydataset,
  title={Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents},
  author={Ziyang Miao and Qiyu Sun and Jingyuan Wang and Yuchen Gong and Yaowei Zheng and Shiqi Li and Richong Zhang},
  year={2025},
  eprint={2507.04009},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2507.04009}
 }
 ```
 ## Star History
 [![Star History Chart](https://api.star-history.com/svg?repos=ConardLi/easy-dataset&type=Date)](https://www.star-history.com/#ConardLi/easy-dataset&Date)
 <div align="center">
  <sub>由 <a href="https://github.com/ConardLi">ConardLi</a> 用 ❤️ 构建 • 关注我：<a href="./public/imgs/weichat.jpg">公众号</a>｜<a href="https://space.bilibili.com/474921808">B站</a>｜<a href="https://juejin.cn/user/3949101466785709">掘金</a>｜<a href="https://www.zhihu.com/people/wen-ti-chao-ji-duo-de-xiao-qi">知乎</a>｜<a href="https://www.youtube.com/@garden-conard">Youtube</a></sub>
 </div>
--- a/easy-dataset-main/app/api/check-update/route.js
+++ b/easy-dataset-main/app/api/check-update/route.js
@@ -0,0 +1,86 @@
 import { NextResponse } from 'next/server';
 import path from 'path';
 import fs from 'fs';
 // Get current version
 function getCurrentVersion() {
  try {
    const packageJsonPath = path.join(process.cwd(), 'package.json');
    const packageJson = JSON.parse(fs.readFileSync(packageJsonPath, 'utf8'));
    return packageJson.version;
  } catch (error) {
    console.error('Failed to read version from package.json:', String(error));
    return '1.0.0';
  }
 }
 // Get latest version from GitHub
 async function getLatestVersion() {
  try {
    const owner = 'ConardLi';
    const repo = 'easy-dataset';
    const response = await fetch(`https://api.github.com/repos/${owner}/${repo}/releases/latest`);
    if (!response.ok) {
      throw new Error(`GitHub API request failed: ${response.status}`);
    }
    const data = await response.json();
    return data.tag_name.replace('v', '');
  } catch (error) {
    console.error('Failed to fetch latest version:', String(error));
    return null;
  }
 }
 // Check for updates
 export async function GET() {
  try {
    const currentVersion = getCurrentVersion();
    const latestVersion = await getLatestVersion();
    if (!latestVersion) {
      return NextResponse.json({
        hasUpdate: false,
        currentVersion,
        latestVersion: null,
        error: 'Failed to fetch latest version'
      });
    }
    // Simple semver-like comparison
    const hasUpdate = compareVersions(latestVersion, currentVersion) > 0;
    return NextResponse.json({
      hasUpdate,
      currentVersion,
      latestVersion,
      releaseUrl: hasUpdate ? `https://github.com/ConardLi/easy-dataset/releases/tag/v${latestVersion}` : null
    });
  } catch (error) {
    console.error('Failed to check for updates:', String(error));
    return NextResponse.json(
      {
        hasUpdate: false,
        error: 'Failed to check for updates'
      },
      { status: 500 }
    );
  }
 }
 // Simple version comparison
 function compareVersions(a, b) {
  const partsA = a.split('.').map(Number);
  const partsB = b.split('.').map(Number);
  for (let i = 0; i < Math.max(partsA.length, partsB.length); i++) {
    const numA = i < partsA.length ? partsA[i] : 0;
    const numB = i < partsB.length ? partsB[i] : 0;
    if (numA > numB) return 1;
    if (numA < numB) return -1;
  }
  return 0;
 }
--- a/easy-dataset-main/app/api/llm/fetch-models/route.js
+++ b/easy-dataset-main/app/api/llm/fetch-models/route.js
@@ -0,0 +1,75 @@
 import { NextResponse } from 'next/server';
 import axios from 'axios';
 // Fetch model list from provider
 export async function POST(request) {
  try {
    const { endpoint, providerId, apiKey } = await request.json();
    if (!endpoint) {
      return NextResponse.json({ error: 'Missing required parameter: endpoint' }, { status: 400 });
    }
    let url = endpoint.replace(/\/$/, ''); // Remove trailing slash
    // Handle Ollama endpoint
    if (providerId === 'ollama') {
      // Remove possible /v1 or other version suffix
      url = url.replace(/\/v\d+$/, '');
      // Append /api if missing
      if (!url.includes('/api')) {
        url += '/api';
      }
      url += '/tags';
    } else {
      url += '/models';
    }
    const headers = {};
    if (apiKey) {
      headers.Authorization = `Bearer ${apiKey}`;
    }
    const response = await axios.get(url, { headers });
    // Format response per provider
    let formattedModels = [];
    if (providerId === 'ollama') {
      // Ollama /api/tags format: { models: [{ name: 'model-name', ... }] }
      if (response.data.models && Array.isArray(response.data.models)) {
        formattedModels = response.data.models.map(item => ({
          modelId: item.name,
          modelName: item.name,
          providerId
        }));
      }
    } else {
      // Default handling (OpenAI-compatible)
      if (response.data.data && Array.isArray(response.data.data)) {
        formattedModels = response.data.data.map(item => ({
          modelId: item.id,
          modelName: item.id,
          providerId
        }));
      }
    }
    return NextResponse.json(formattedModels);
  } catch (error) {
    console.error('Failed to fetch model list:', String(error));
    // Handle known error shapes
    if (error.response) {
      if (error.response.status === 401) {
        return NextResponse.json({ error: 'Invalid API key' }, { status: 401 });
      }
      return NextResponse.json(
        { error: `Failed to fetch model list: ${error.response.statusText}` },
        { status: error.response.status }
      );
    }
    return NextResponse.json({ error: `Failed to fetch model list: ${error.message}` }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/llm/model/route.js
+++ b/easy-dataset-main/app/api/llm/model/route.js
@@ -0,0 +1,39 @@
 import { NextResponse } from 'next/server';
 import { getLlmModelsByProviderId } from '@/lib/db/llm-models';
 // Get LLM models
 export async function GET(request) {
  try {
    const searchParams = request.nextUrl.searchParams;
    let providerId = searchParams.get('providerId');
    if (!providerId) {
      return NextResponse.json({ error: 'Invalid parameters' }, { status: 400 });
    }
    const models = await getLlmModelsByProviderId(providerId);
    if (!models) {
      return NextResponse.json({ error: 'LLM provider not found' }, { status: 404 });
    }
    return NextResponse.json(models);
  } catch (error) {
    console.error('Database query error:', String(error));
    return NextResponse.json({ error: 'Database query failed' }, { status: 500 });
  }
 }
 // Sync latest model list
 export async function POST(request) {
  try {
    const { newModels, providerId } = await request.json();
    const models = await getLlmModelsByProviderId(providerId);
    const existingModelIds = models.map(model => model.modelId);
    const diffModels = newModels.filter(item => !existingModelIds.includes(item.modelId));
    if (diffModels.length > 0) {
      // return NextResponse.json(await createLlmModels(diffModels));
      return NextResponse.json({ message: 'No new models to insert' }, { status: 200 });
    } else {
      return NextResponse.json({ message: 'No new models to insert' }, { status: 200 });
    }
  } catch (error) {
    return NextResponse.json({ error: 'Database insert failed' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/llm/ollama/models/route.js
+++ b/easy-dataset-main/app/api/llm/ollama/models/route.js
@@ -0,0 +1,26 @@
 import { NextResponse } from 'next/server';
 const OllamaClient = require('@/lib/llm/core/providers/ollama');
 // Force dynamic route to prevent static generation
 export const dynamic = 'force-dynamic';
 export async function GET(request) {
  try {
    // Read host and port from query params
    const { searchParams } = new URL(request.url);
    const host = searchParams.get('host') || '127.0.0.1';
    const port = searchParams.get('port') || '11434';
    // Create Ollama API client
    const ollama = new OllamaClient({
      endpoint: `http://${host}:${port}/api`
    });
    // Fetch model list
    const models = await ollama.getModels();
    return NextResponse.json(models);
  } catch (error) {
    // console.error('fetch Ollama models error:', error);
    return NextResponse.json({ error: 'fetch Models failed' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/llm/providers/route.js
+++ b/easy-dataset-main/app/api/llm/providers/route.js
@@ -0,0 +1,14 @@
 import { NextResponse } from 'next/server';
 import { getLlmProviders } from '@/lib/db/llm-providers';
 import { sortProvidersByPriority } from '@/lib/util/providerLogo';
 // Get LLM provider data
 export async function GET() {
  try {
    const result = await getLlmProviders();
    return NextResponse.json(sortProvidersByPriority(result, item => item.id));
  } catch (error) {
    console.error('Database query error:', String(error));
    return NextResponse.json({ error: 'Database query failed' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/monitoring/logs/route.js
+++ b/easy-dataset-main/app/api/monitoring/logs/route.js
@@ -0,0 +1,107 @@
 import { NextResponse } from 'next/server';
 import { db } from '@/lib/db';
 export const dynamic = 'force-dynamic';
 export async function GET(request) {
  try {
    const { searchParams } = new URL(request.url);
    const timeRange = searchParams.get('timeRange') || '7d';
    const projectId = searchParams.get('projectId');
    const provider = searchParams.get('provider');
    const status = searchParams.get('status');
    const page = parseInt(searchParams.get('page') || '1', 10);
    const pageSize = parseInt(searchParams.get('pageSize') || '10', 10);
    const searchTerm = searchParams.get('search') || '';
    let startDate = new Date();
    if (timeRange === '24h') {
      startDate.setHours(startDate.getHours() - 24);
    } else if (timeRange === '30d') {
      startDate.setDate(startDate.getDate() - 30);
    } else {
      startDate.setDate(startDate.getDate() - 7);
    }
    const where = {
      createAt: {
        gte: startDate
      }
    };
    if (projectId && projectId !== 'all') {
      where.projectId = projectId;
    }
    if (provider && provider !== 'all') {
      where.provider = provider;
    }
    if (status && status !== 'all') {
      where.status = status;
    }
    if (searchTerm) {
      where.OR = [{ model: { contains: searchTerm } }, { errorMessage: { contains: searchTerm } }];
    }
    const total = await db.llmUsageLogs.count({ where });
    const logs = await db.llmUsageLogs.findMany({
      where,
      select: {
        id: true,
        projectId: true,
        provider: true,
        model: true,
        inputTokens: true,
        outputTokens: true,
        totalTokens: true,
        latency: true,
        status: true,
        errorMessage: true,
        createAt: true
      },
      orderBy: {
        createAt: 'desc'
      },
      skip: (page - 1) * pageSize,
      take: pageSize
    });
    const projectIds = [...new Set(logs.map(log => log.projectId))];
    const projects = await db.projects.findMany({
      where: { id: { in: projectIds } },
      select: { id: true, name: true }
    });
    const projectMap = projects.reduce((acc, p) => {
      acc[p.id] = p.name;
      return acc;
    }, {});
    const details = logs.map(log => ({
      id: log.id,
      projectId: log.projectId,
      projectName: projectMap[log.projectId] || 'Unknown Project',
      provider: log.provider,
      model: log.model,
      status: log.status,
      failureReason: log.errorMessage,
      inputTokens: log.inputTokens,
      outputTokens: log.outputTokens,
      totalTokens: log.totalTokens,
      calls: 1, // Single record
      avgLatency: log.status === 'SUCCESS' ? (log.latency / 1000).toFixed(2) + 's' : '-',
      createAt: log.createAt
    }));
    return NextResponse.json({
      details,
      total,
      page,
      pageSize,
      totalPages: Math.ceil(total / pageSize)
    });
  } catch (error) {
    console.error('Failed to fetch monitoring logs:', error);
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/monitoring/stats/route.js
+++ b/easy-dataset-main/app/api/monitoring/stats/route.js
@@ -0,0 +1,188 @@
 import { NextResponse } from 'next/server';
 import { db } from '@/lib/db';
 export const dynamic = 'force-dynamic';
 export async function GET(request) {
  try {
    const { searchParams } = new URL(request.url);
    const timeRange = searchParams.get('timeRange') || '7d'; // 24h, 7d, 30d
    const projectId = searchParams.get('projectId');
    const provider = searchParams.get('provider');
    const status = searchParams.get('status');
    let startDate = new Date();
    if (timeRange === '24h') {
      startDate.setHours(startDate.getHours() - 24);
    } else if (timeRange === '30d') {
      startDate.setDate(startDate.getDate() - 30);
    } else {
      startDate.setDate(startDate.getDate() - 7);
    }
    const where = {
      createAt: {
        gte: startDate
      }
    };
    if (projectId && projectId !== 'all') {
      where.projectId = projectId;
    }
    if (provider && provider !== 'all') {
      where.provider = provider;
    }
    if (status && status !== 'all') {
      where.status = status;
    }
    // 1. Fetch data for aggregation
    // Note: Prisma aggregation can be slow on very large datasets. If needed, optimize with pre-aggregated tables.
    const logs = await db.llmUsageLogs.findMany({
      where,
      select: {
        id: true,
        projectId: true,
        provider: true,
        model: true,
        inputTokens: true,
        outputTokens: true,
        totalTokens: true,
        latency: true,
        status: true,
        errorMessage: true,
        createAt: true,
        dateString: true
      },
      orderBy: {
        createAt: 'desc'
      }
    });
    // Build project name map
    const projects = await db.projects.findMany({
      select: { id: true, name: true }
    });
    const projectMap = projects.reduce((acc, p) => {
      acc[p.id] = p.name;
      return acc;
    }, {});
    // 2. Process and aggregate
    const summary = {
      totalTokens: 0,
      inputTokens: 0,
      outputTokens: 0,
      totalCalls: logs.length,
      successCalls: 0,
      failedCalls: 0,
      totalLatency: 0,
      avgLatency: 0
    };
    const trendMap = {};
    const modelStats = {};
    const detailedStatsMap = {}; // Key: projectId-model-status-errorMessage
    logs.forEach(log => {
      // Summary
      summary.totalTokens += log.totalTokens;
      summary.inputTokens += log.inputTokens;
      summary.outputTokens += log.outputTokens;
      if (log.status === 'SUCCESS') {
        summary.successCalls++;
        summary.totalLatency += log.latency;
      } else {
        summary.failedCalls++;
      }
      // Trend (by day or hour)
      let timeKey;
      if (timeRange === '24h') {
        const date = new Date(log.createAt);
        timeKey = `${String(date.getHours()).padStart(2, '0')}:00`;
      } else {
        timeKey = log.dateString.slice(5); // MM-DD
      }
      if (!trendMap[timeKey]) {
        trendMap[timeKey] = { name: timeKey, input: 0, output: 0 };
      }
      trendMap[timeKey].input += log.inputTokens;
      trendMap[timeKey].output += log.outputTokens;
      // Model Distribution
      const modelKey = log.model;
      if (!modelStats[modelKey]) {
        modelStats[modelKey] = { name: modelKey, value: 0 };
      }
      modelStats[modelKey].value += log.totalTokens;
      // Detailed Table Aggregation
      // Key: projectId + model + status + (errorMessage || '')
      const errorKey = log.errorMessage || '';
      const detailKey = `${log.projectId}|${log.model}|${log.status}|${errorKey}`;
      if (!detailedStatsMap[detailKey]) {
        detailedStatsMap[detailKey] = {
          projectId: log.projectId,
          projectName: projectMap[log.projectId] || 'Unknown Project',
          provider: log.provider,
          model: log.model,
          status: log.status,
          failureReason: log.errorMessage,
          inputTokens: 0,
          outputTokens: 0,
          totalTokens: 0,
          calls: 0,
          totalLatency: 0
        };
      }
      const detailItem = detailedStatsMap[detailKey];
      detailItem.inputTokens += log.inputTokens;
      detailItem.outputTokens += log.outputTokens;
      detailItem.totalTokens += log.totalTokens;
      detailItem.calls += 1;
      if (log.status === 'SUCCESS') {
        detailItem.totalLatency += log.latency;
      }
    });
    // Calculate averages
    if (summary.successCalls > 0) {
      summary.avgLatency = Math.round(summary.totalLatency / summary.successCalls);
    }
    summary.avgTokensPerCall = summary.totalCalls > 0 ? Math.round(summary.totalTokens / summary.totalCalls) : 0;
    summary.failureRate = summary.totalCalls > 0 ? summary.failedCalls / summary.totalCalls : 0;
    // Format chart data
    const trend = Object.values(trendMap).sort((a, b) => {
      // Simple sorting; for production use, consider stricter time ordering.
      return a.name.localeCompare(b.name);
    });
    const modelDistribution = Object.values(modelStats).sort((a, b) => b.value - a.value);
    // Format detailed table data
    const details = Object.values(detailedStatsMap)
      .map(item => ({
        ...item,
        avgLatency:
          item.status === 'SUCCESS' && item.calls > 0 ? (item.totalLatency / item.calls / 1000).toFixed(2) + 's' : '-'
      }))
      .sort((a, b) => b.totalTokens - a.totalTokens); // Default sorting by token usage
    return NextResponse.json({
      summary,
      trend,
      modelDistribution,
      details,
      projects
    });
  } catch (error) {
    console.error('Failed to fetch monitoring stats:', error);
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/monitoring/summary/route.js
+++ b/easy-dataset-main/app/api/monitoring/summary/route.js
@@ -0,0 +1,132 @@
 import { NextResponse } from 'next/server';
 import { db } from '@/lib/db';
 export const dynamic = 'force-dynamic';
 export async function GET(request) {
  try {
    const { searchParams } = new URL(request.url);
    const timeRange = searchParams.get('timeRange') || '7d';
    const projectId = searchParams.get('projectId');
    const provider = searchParams.get('provider');
    const status = searchParams.get('status');
    let startDate = new Date();
    if (timeRange === '24h') {
      startDate.setHours(startDate.getHours() - 24);
    } else if (timeRange === '30d') {
      startDate.setDate(startDate.getDate() - 30);
    } else {
      startDate.setDate(startDate.getDate() - 7);
    }
    const where = {
      createAt: {
        gte: startDate
      }
    };
    if (projectId && projectId !== 'all') {
      where.projectId = projectId;
    }
    if (provider && provider !== 'all') {
      where.provider = provider;
    }
    if (status && status !== 'all') {
      where.status = status;
    }
    const logs = await db.llmUsageLogs.findMany({
      where,
      select: {
        inputTokens: true,
        outputTokens: true,
        totalTokens: true,
        latency: true,
        status: true,
        createAt: true,
        dateString: true,
        model: true
      }
    });
    const summary = {
      totalTokens: 0,
      inputTokens: 0,
      outputTokens: 0,
      totalCalls: logs.length,
      successCalls: 0,
      failedCalls: 0,
      totalLatency: 0,
      avgLatency: 0
    };
    const trendMap = {};
    const modelStats = {};
    logs.forEach(log => {
      summary.totalTokens += log.totalTokens;
      summary.inputTokens += log.inputTokens;
      summary.outputTokens += log.outputTokens;
      if (log.status === 'SUCCESS') {
        summary.successCalls++;
        summary.totalLatency += log.latency;
      } else {
        summary.failedCalls++;
      }
      let timeKey;
      if (timeRange === '24h') {
        const date = new Date(log.createAt);
        timeKey = `${String(date.getHours()).padStart(2, '0')}:00`;
      } else {
        timeKey = log.dateString.slice(5);
      }
      if (!trendMap[timeKey]) {
        trendMap[timeKey] = { name: timeKey, input: 0, output: 0 };
      }
      trendMap[timeKey].input += log.inputTokens;
      trendMap[timeKey].output += log.outputTokens;
      const modelKey = log.model;
      if (!modelStats[modelKey]) {
        modelStats[modelKey] = { name: modelKey, value: 0 };
      }
      modelStats[modelKey].value += log.totalTokens;
    });
    if (summary.successCalls > 0) {
      summary.avgLatency = Math.round(summary.totalLatency / summary.successCalls);
    }
    summary.avgTokensPerCall = summary.totalCalls > 0 ? Math.round(summary.totalTokens / summary.totalCalls) : 0;
    summary.failureRate = summary.totalCalls > 0 ? summary.failedCalls / summary.totalCalls : 0;
    const trend = Object.values(trendMap).sort((a, b) => a.name.localeCompare(b.name));
    const modelDistribution = Object.values(modelStats).sort((a, b) => b.value - a.value);
    const projects = await db.projects.findMany({
      select: { id: true, name: true },
      orderBy: { createAt: 'desc' }
    });
    const allLogs = await db.llmUsageLogs.findMany({
      select: { provider: true },
      distinct: ['provider']
    });
    const providers = allLogs.map(log => log.provider).filter(Boolean);
    return NextResponse.json({
      summary,
      trend,
      modelDistribution,
      projects,
      providers
    });
  } catch (error) {
    console.error('Failed to fetch monitoring summary:', error);
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/batch-add-manual-ga/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/batch-add-manual-ga/route.js
@@ -0,0 +1,176 @@
 import { NextResponse } from 'next/server';
 import { getUploadFileInfoById } from '@/lib/db/upload-files';
 import { createGaPairs, getGaPairsByFileId } from '@/lib/db/ga-pairs';
 /**
 * 批量手动添加 GA 对到多个文件
 */
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const body = await request.json();
    if (!projectId) {
      return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
    }
    const { fileIds, gaPair, appendMode = false } = body;
    if (!fileIds || !Array.isArray(fileIds) || fileIds.length === 0) {
      return NextResponse.json({ error: 'File IDs array is required' }, { status: 400 });
    }
    if (!gaPair || !gaPair.genreTitle || !gaPair.audienceTitle) {
      return NextResponse.json({ error: 'GA pair with genreTitle and audienceTitle is required' }, { status: 400 });
    }
    console.log('开始处理批量手动添加GA对请求');
    console.log('项目ID:', projectId);
    console.log('请求的文件IDs:', fileIds);
    console.log('GA对:', gaPair);
    // 使用 getUploadFileInfoById 逐个验证文件
    const validFiles = [];
    const invalidFileIds = [];
    for (const fileId of fileIds) {
      try {
        console.log(`正在验证文件: ${fileId}`);
        const fileInfo = await getUploadFileInfoById(fileId);
        if (fileInfo && fileInfo.projectId === projectId) {
          console.log(`文件验证成功: ${fileInfo.fileName}`);
          validFiles.push(fileInfo);
        } else if (fileInfo) {
          console.log(`文件属于其他项目: ${fileInfo.projectId} != ${projectId}`);
          invalidFileIds.push(fileId);
        } else {
          console.log(`文件不存在: ${fileId}`);
          invalidFileIds.push(fileId);
        }
      } catch (error) {
        console.error(`验证文件 ${fileId} 时出错:`, String(error));
        invalidFileIds.push(fileId);
      }
    }
    console.log(`文件验证完成: 有效${validFiles.length}个, 无效${invalidFileIds.length}个`);
    if (validFiles.length === 0) {
      return NextResponse.json(
        {
          error: 'No valid files found',
          debug: {
            projectId,
            requestedIds: fileIds,
            invalidIds: invalidFileIds,
            message: 'None of the requested files belong to this project or exist in the database'
          }
        },
        { status: 404 }
      );
    }
    // 批量手动添加 GA 对
    console.log('开始批量手动添加GA对...');
    console.log('追加模式:', appendMode);
    const results = [];
    for (const file of validFiles) {
      try {
        console.log(`处理文件: ${file.fileName}`);
        // 检查是否已存在 GA 对
        const existingPairs = await getGaPairsByFileId(file.id);
        let pairNumber = 1;
        if (appendMode && existingPairs && existingPairs.length > 0) {
          // 追加模式：在现有 GA 对后面添加
          pairNumber = existingPairs.length + 1;
        } else if (!appendMode && existingPairs && existingPairs.length > 0) {
          // 非追加模式：如果已存在 GA 对则跳过
          console.log(`文件 ${file.fileName} 已存在GA对，跳过`);
          results.push({
            fileId: file.id,
            fileName: file.fileName,
            success: true,
            skipped: true,
            message: 'GA pairs already exist'
          });
          continue;
        }
        // 创建 GA 对数据
        const gaPairData = [
          {
            projectId,
            fileId: file.id,
            pairNumber,
            genreTitle: gaPair.genreTitle.trim(),
            genreDesc: gaPair.genreDesc?.trim() || '',
            audienceTitle: gaPair.audienceTitle.trim(),
            audienceDesc: gaPair.audienceDesc?.trim() || '',
            isActive: true
          }
        ];
        // 保存 GA 对
        if (appendMode) {
          // 追加模式：只创建新的 GA 对
          await createGaPairs(gaPairData);
        } else {
          // 非追加模式：使用 saveGaPairs 替换现有的
          const { saveGaPairs } = await import('@/lib/db/ga-pairs');
          await saveGaPairs(projectId, file.id, [
            {
              genre: { title: gaPair.genreTitle.trim(), description: gaPair.genreDesc?.trim() || '' },
              audience: { title: gaPair.audienceTitle.trim(), description: gaPair.audienceDesc?.trim() || '' }
            }
          ]);
        }
        results.push({
          fileId: file.id,
          fileName: file.fileName,
          success: true,
          skipped: false,
          message: 'GA pair added successfully'
        });
        console.log(`成功为文件 ${file.fileName} 添加GA对`);
      } catch (error) {
        console.error(`为文件 ${file.fileName} 添加GA对失败:`, error);
        results.push({
          fileId: file.id,
          fileName: file.fileName,
          success: false,
          skipped: false,
          error: error.message,
          message: `Failed: ${error.message}`
        });
      }
    }
    // 统计结果
    const successCount = results.filter(r => r.success).length;
    const failureCount = results.filter(r => !r.success).length;
    console.log(`批量手动添加完成: 成功${successCount}个, 失败${failureCount}个`);
    return NextResponse.json({
      success: true,
      data: results,
      summary: {
        total: results.length,
        success: successCount,
        failure: failureCount,
        processed: validFiles.length,
        skipped: invalidFileIds.length
      },
      message: `Added GA pairs to ${successCount} files, ${failureCount} failed, ${invalidFileIds.length} files not found`
    });
  } catch (error) {
    console.error('Error batch adding manual GA pairs:', String(error));
    return NextResponse.json({ error: String(error) || 'Failed to batch add manual GA pairs' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/batch-delete-files/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/batch-delete-files/route.js
@@ -0,0 +1,196 @@
 import { NextResponse } from 'next/server';
 import { getUploadFileInfoById, delUploadFileInfoById } from '@/lib/db/upload-files';
 import { getProject } from '@/lib/db/projects';
 import { getProjectChunks, getProjectTocByName } from '@/lib/file/text-splitter';
 import { batchSaveTags } from '@/lib/db/tags';
 import { handleDomainTree } from '@/lib/util/domain-tree';
 import path from 'path';
 import { getProjectRoot } from '@/lib/db/base';
 import { promises as fs } from 'fs';
 /**
 * 批量删除文件
 * 复用单个文件删除的完整逻辑，包括领域树修订
 */
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const body = await request.json();
    if (!projectId) {
      return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
    }
    const { fileIds, domainTreeAction = 'keep', model, language = '中文' } = body;
    if (!fileIds || !Array.isArray(fileIds) || fileIds.length === 0) {
      return NextResponse.json({ error: 'File IDs array is required' }, { status: 400 });
    }
    console.log('开始处理批量删除文件请求');
    console.log('项目ID:', projectId);
    console.log('请求的文件IDs:', fileIds);
    console.log('领域树操作:', domainTreeAction);
    // 获取项目信息
    const project = await getProject(projectId);
    if (!project) {
      return NextResponse.json({ error: 'The project does not exist' }, { status: 404 });
    }
    // 验证文件并删除
    const results = [];
    const deletedTocs = [];
    let deletedCount = 0;
    let failedCount = 0;
    let totalStats = {
      deletedChunks: 0,
      deletedQuestions: 0,
      deletedDatasets: 0
    };
    for (const fileId of fileIds) {
      try {
        console.log(`正在验证文件: ${fileId}`);
        const fileInfo = await getUploadFileInfoById(fileId);
        if (!fileInfo) {
          console.log(`文件不存在: ${fileId}`);
          results.push({
            fileId,
            success: false,
            error: 'File not found'
          });
          failedCount++;
          continue;
        }
        if (fileInfo.projectId !== projectId) {
          console.log(`文件属于其他项目: ${fileInfo.projectId} != ${projectId}`);
          results.push({
            fileId,
            success: false,
            error: 'File belongs to another project'
          });
          failedCount++;
          continue;
        }
        // 删除文件及其相关的文本块、问题和数据集
        console.log(`删除文件: ${fileInfo.fileName}`);
        const { stats, fileName } = await delUploadFileInfoById(fileId);
        // 累计统计信息
        totalStats.deletedChunks += stats.deletedChunks || 0;
        totalStats.deletedQuestions += stats.deletedQuestions || 0;
        totalStats.deletedDatasets += stats.deletedDatasets || 0;
        // 获取并保存删除的 TOC 信息
        const deleteToc = await getProjectTocByName(projectId, fileName);
        if (deleteToc) {
          deletedTocs.push(deleteToc);
        }
        // 删除 TOC 文件
        try {
          const projectRoot = await getProjectRoot();
          const projectPath = path.join(projectRoot, projectId);
          const tocDir = path.join(projectPath, 'toc');
          const baseName = path.basename(fileInfo.fileName, path.extname(fileInfo.fileName));
          const tocPath = path.join(tocDir, `${baseName}-toc.json`);
          await fs.unlink(tocPath);
          console.log(`成功删除 TOC 文件: ${tocPath}`);
        } catch (error) {
          console.error(`删除 TOC 文件失败:`, String(error));
        }
        results.push({
          fileId,
          fileName: fileInfo.fileName,
          success: true,
          stats
        });
        deletedCount++;
        console.log(`成功删除文件: ${fileInfo.fileName}`);
      } catch (error) {
        console.error(`删除文件 ${fileId} 时出错:`, error);
        results.push({
          fileId,
          success: false,
          error: error.message
        });
        failedCount++;
      }
    }
    console.log(`批量删除完成: 成功${deletedCount}个, 失败${failedCount}个`);
    // 如果选择了保持领域树不变，直接返回删除结果
    if (domainTreeAction === 'keep') {
      return NextResponse.json({
        success: true,
        deletedCount,
        failedCount,
        total: fileIds.length,
        results,
        stats: totalStats,
        domainTreeAction: 'keep',
        message: `Successfully deleted ${deletedCount} files, ${failedCount} failed`
      });
    }
    // 处理领域树更新
    try {
      // 获取项目的所有文件
      const { chunks, toc } = await getProjectChunks(projectId);
      // 如果不存在文本块，说明项目已经没有文件了
      if (!chunks || chunks.length === 0) {
        // 清空领域树
        await batchSaveTags(projectId, []);
        return NextResponse.json({
          success: true,
          deletedCount,
          failedCount,
          total: fileIds.length,
          results,
          stats: totalStats,
          domainTreeAction,
          message: `Successfully deleted ${deletedCount} files, domain tree cleared`,
          domainTreeCleared: true
        });
      }
      // 调用领域树处理模块
      await handleDomainTree({
        projectId,
        action: domainTreeAction,
        allToc: toc,
        model: model,
        language,
        deleteToc: deletedTocs.length > 0 ? deletedTocs : undefined,
        project
      });
      console.log('领域树更新成功');
    } catch (error) {
      console.error('Error updating domain tree after batch deletion:', String(error));
      // 即使领域树更新失败，也不影响文件删除的结果
    }
    return NextResponse.json({
      success: true,
      deletedCount,
      failedCount,
      total: fileIds.length,
      results,
      stats: totalStats,
      domainTreeAction,
      message: `Successfully deleted ${deletedCount} files, ${failedCount} failed`
    });
  } catch (error) {
    console.error('Error batch deleting files:', String(error));
    return NextResponse.json({ error: String(error) || 'Failed to batch delete files' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/batch-generateGA/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/batch-generateGA/route.js
@@ -0,0 +1,106 @@
 import { NextResponse } from 'next/server';
 import { batchGenerateGaPairs } from '@/lib/services/ga/ga-pairs';
 import { getUploadFileInfoById } from '@/lib/db/upload-files'; // 导入单个文件查询函数
 /**
 * 批量生成多个文件的 GA 对
 */
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const body = await request.json();
    if (!projectId) {
      return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
    }
    const { fileIds, modelConfigId, language = '中文', appendMode = false } = body;
    if (!fileIds || !Array.isArray(fileIds) || fileIds.length === 0) {
      return NextResponse.json({ error: 'File IDs array is required' }, { status: 400 });
    }
    if (!modelConfigId) {
      return NextResponse.json({ error: 'Model configuration ID is required' }, { status: 400 });
    }
    console.log('开始处理批量生成GA对请求');
    console.log('项目ID:', projectId);
    console.log('请求的文件IDs:', fileIds);
    // 使用 getUploadFileInfoById 逐个验证文件
    const validFiles = [];
    const invalidFileIds = [];
    for (const fileId of fileIds) {
      try {
        console.log(`正在验证文件: ${fileId}`);
        const fileInfo = await getUploadFileInfoById(fileId);
        if (fileInfo && fileInfo.projectId === projectId) {
          console.log(`文件验证成功: ${fileInfo.fileName}`);
          validFiles.push(fileInfo);
        } else if (fileInfo) {
          console.log(`文件属于其他项目: ${fileInfo.projectId} != ${projectId}`);
          invalidFileIds.push(fileId);
        } else {
          console.log(`文件不存在: ${fileId}`);
          invalidFileIds.push(fileId);
        }
      } catch (error) {
        console.error(`验证文件 ${fileId} 时出错:`, String(error));
        invalidFileIds.push(fileId);
      }
    }
    console.log(`文件验证完成: 有效${validFiles.length}个, 无效${invalidFileIds.length}个`);
    if (validFiles.length === 0) {
      return NextResponse.json(
        {
          error: 'No valid files found',
          debug: {
            projectId,
            requestedIds: fileIds,
            invalidIds: invalidFileIds,
            message: 'None of the requested files belong to this project or exist in the database'
          }
        },
        { status: 404 }
      );
    }
    // 批量生成 GA 对
    console.log('开始批量生成GA对...');
    console.log('追加模式:', appendMode);
    const results = await batchGenerateGaPairs(
      projectId,
      validFiles,
      modelConfigId,
      language,
      appendMode // 传递追加模式参数
    );
    // 统计结果
    const successCount = results.filter(r => r.success).length;
    const failureCount = results.filter(r => !r.success).length;
    console.log(`批量生成完成: 成功${successCount}个, 失败${failureCount}个`);
    return NextResponse.json({
      success: true,
      data: results,
      summary: {
        total: results.length,
        success: successCount,
        failure: failureCount,
        processed: validFiles.length,
        skipped: invalidFileIds.length
      },
      message: `Generated GA pairs for ${successCount} files, ${failureCount} failed, ${invalidFileIds.length} files not found`
    });
  } catch (error) {
    console.error('Error batch generating GA pairs:', String(error));
    return NextResponse.json({ error: String(error) || 'Failed to batch generate GA pairs' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/[taskId]/current/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/[taskId]/current/route.js
@@ -0,0 +1,161 @@
 import { NextResponse } from 'next/server';
 import { db } from '@/lib/db/index';
 import LLMClient from '@/lib/llm/core/index';
 import { getModelConfigById } from '@/lib/db/model-config';
 /**
 * Get current question and generate answers from two models
 */
 export async function GET(request, { params }) {
  try {
    const { projectId, taskId } = params;
    const task = await db.task.findFirst({
      where: {
        id: taskId,
        projectId,
        taskType: 'blind-test'
      }
    });
    if (!task) {
      return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
    }
    if (task.status !== 0) {
      return NextResponse.json({ code: 400, error: 'Task has ended' }, { status: 400 });
    }
    // Parse task detail
    let detail = {};
    let modelInfo = {};
    try {
      detail = task.detail ? JSON.parse(task.detail) : {};
      modelInfo = task.modelInfo ? JSON.parse(task.modelInfo) : {};
    } catch (e) {
      console.error('Failed to parse task detail:', e);
    }
    const questionIds = detail.questionIds || detail.evalDatasetIds || [];
    const currentIndex = detail.currentIndex || 0;
    // Check if all questions are completed
    if (questionIds.length === 0 || currentIndex >= questionIds.length) {
      return NextResponse.json({
        code: 0,
        data: {
          completed: true,
          message: 'All questions completed'
        }
      });
    }
    // Fetch current question
    const currentQuestionId = questionIds[currentIndex];
    const currentQuestion = await db.evalDatasets.findUnique({
      where: { id: currentQuestionId },
      select: {
        id: true,
        question: true,
        questionType: true,
        correctAnswer: true,
        tags: true
      }
    });
    if (!currentQuestion) {
      return NextResponse.json({ code: 404, error: 'Question not found' }, { status: 404 });
    }
    // Fetch both model configs
    const [modelConfigA, modelConfigB] = await Promise.all([
      getModelConfigById(modelInfo.modelA.providerId),
      getModelConfigById(modelInfo.modelB.providerId)
    ]);
    if (!modelConfigA || !modelConfigB) {
      return NextResponse.json({ code: 400, error: 'Model configuration not found' }, { status: 400 });
    }
    // Build prompts
    const systemPrompt = "You are a helpful assistant. Provide detailed and accurate answers to the user's question.";
    const userPrompt = currentQuestion.question;
    // Call both models in parallel
    const startTimeA = Date.now();
    const startTimeB = Date.now();
    let answerA = '';
    let answerB = '';
    let errorA = null;
    let errorB = null;
    let durationA = 0;
    let durationB = 0;
    try {
      // Call model A
      const clientA = new LLMClient(modelConfigA);
      const resultA = await clientA.chat([
        { role: 'system', content: systemPrompt },
        { role: 'user', content: userPrompt }
      ]);
      answerA = resultA.text || '';
      durationA = Date.now() - startTimeA;
    } catch (err) {
      console.error('Model A call failed:', err);
      errorA = err.message;
      durationA = Date.now() - startTimeA;
    }
    try {
      // Call model B
      const clientB = new LLMClient(modelConfigB);
      const resultB = await clientB.chat([
        { role: 'system', content: systemPrompt },
        { role: 'user', content: userPrompt }
      ]);
      answerB = resultB.text || '';
      durationB = Date.now() - startTimeB;
    } catch (err) {
      console.error('Model B call failed:', err);
      errorB = err.message;
      durationB = Date.now() - startTimeB;
    }
    // Randomly swap positions (core blind-test behavior)
    const isSwapped = Math.random() > 0.5;
    return NextResponse.json({
      code: 0,
      data: {
        completed: false,
        currentIndex,
        totalCount: evalDatasetIds.length,
        question: currentQuestion,
        // Blind test: do not reveal which model is which
        leftAnswer: {
          content: isSwapped ? answerB : answerA,
          error: isSwapped ? errorB : errorA,
          duration: isSwapped ? durationB : durationA
        },
        rightAnswer: {
          content: isSwapped ? answerA : answerB,
          error: isSwapped ? errorA : errorB,
          duration: isSwapped ? durationA : durationB
        },
        // Server stores the actual mapping for scoring
        _swap: isSwapped
      }
    });
  } catch (error) {
    console.error('Failed to fetch current question:', error);
    return NextResponse.json(
      { code: 500, error: 'Failed to fetch current question', message: error.message },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/[taskId]/question/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/[taskId]/question/route.js
@@ -0,0 +1,64 @@
 import { NextResponse } from 'next/server';
 import { db } from '@/lib/db/index';
 /**
 * Get current question info (including random swap info)
 */
 export async function GET(request, { params }) {
  const { projectId, taskId } = params;
  try {
    if (!projectId || !taskId) {
      return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
    }
    // Fetch task
    const task = await db.task.findUnique({
      where: { id: taskId }
    });
    if (!task || task.taskType !== 'blind-test') {
      return NextResponse.json({ error: 'Task not found' }, { status: 404 });
    }
    // Parse task detail
    const detail = JSON.parse(task.detail || '{}');
    // Support both evalDatasetIds and questionIds
    const questionIds = detail.questionIds || detail.evalDatasetIds || [];
    const currentIndex = detail.currentIndex || 0;
    // Check if task is completed
    if (questionIds.length === 0 || currentIndex >= questionIds.length) {
      return NextResponse.json({
        completed: true,
        currentIndex,
        totalQuestions: questionIds.length
      });
    }
    // Fetch current question
    const currentQuestionId = questionIds[currentIndex];
    const currentQuestion = await db.evalDatasets.findUnique({
      where: { id: currentQuestionId }
    });
    if (!currentQuestion) {
      return NextResponse.json({ error: 'Question not found' }, { status: 404 });
    }
    // Randomly decide whether to swap (core blind-test behavior)
    const isSwapped = Math.random() > 0.5;
    return NextResponse.json({
      questionId: currentQuestion.id,
      question: currentQuestion.question,
      answer: currentQuestion.correctAnswer || '',
      questionIndex: currentIndex + 1,
      totalQuestions: questionIds.length,
      isSwapped
    });
  } catch (error) {
    console.error('Failed to fetch question info:', error);
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/[taskId]/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/[taskId]/route.js
@@ -0,0 +1,190 @@
 import { NextResponse } from 'next/server';
 import { db } from '@/lib/db/index';
 /**
 * Get blind-test task details
 * Results are fetched from EvalResults table
 */
 export async function GET(request, { params }) {
  try {
    const { projectId, taskId } = params;
    const task = await db.task.findFirst({
      where: {
        id: taskId,
        projectId,
        taskType: 'blind-test'
      }
    });
    if (!task) {
      return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
    }
    let detail = {};
    let modelInfo = {};
    try {
      detail = task.detail ? JSON.parse(task.detail) : {};
      modelInfo = task.modelInfo ? JSON.parse(task.modelInfo) : {};
    } catch (e) {
      console.error('Failed to parse task detail:', e);
    }
    // Fetch all related evaluation questions
    const evalDatasetIds = detail.evalDatasetIds || [];
    const evalDatasets = await db.evalDatasets.findMany({
      where: {
        id: { in: evalDatasetIds }
      },
      select: {
        id: true,
        question: true,
        questionType: true,
        correctAnswer: true,
        tags: true
      }
    });
    // Sort by evalDatasetIds order
    const orderedDatasets = evalDatasetIds.map(id => evalDatasets.find(d => d.id === id)).filter(Boolean);
    // Fetch results from EvalResults table
    const evalResults = await db.evalResults.findMany({
      where: { taskId },
      orderBy: { createAt: 'asc' }
    });
    // Parse results into the format expected by frontend
    const results = evalResults.map(r => {
      let modelAnswer = {};
      let judgeData = {};
      try {
        modelAnswer = JSON.parse(r.modelAnswer || '{}');
        judgeData = JSON.parse(r.judgeResponse || '{}');
      } catch (e) {
        // Ignore parse errors
      }
      return {
        questionId: r.evalDatasetId,
        vote: judgeData.vote,
        isSwapped: judgeData.isSwapped,
        modelAScore: judgeData.modelAScore || 0,
        modelBScore: judgeData.modelBScore || 0,
        leftAnswer: modelAnswer.leftAnswer || '',
        rightAnswer: modelAnswer.rightAnswer || '',
        timestamp: r.createAt
      };
    });
    return NextResponse.json({
      code: 0,
      data: {
        ...task,
        detail: {
          ...detail,
          results // Include results from EvalResults table
        },
        modelInfo,
        evalDatasets: orderedDatasets
      }
    });
  } catch (error) {
    console.error('Failed to fetch blind-test task details:', error);
    return NextResponse.json(
      { code: 500, error: 'Failed to fetch blind-test task details', message: error.message },
      { status: 500 }
    );
  }
 }
 /**
 * Update blind-test task (interrupt/stop)
 */
 export async function PUT(request, { params }) {
  try {
    const { projectId, taskId } = params;
    const { action } = await request.json();
    const task = await db.task.findFirst({
      where: {
        id: taskId,
        projectId,
        taskType: 'blind-test'
      }
    });
    if (!task) {
      return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
    }
    if (action === 'interrupt') {
      if (task.status !== 0) {
        return NextResponse.json({ code: 400, error: 'Only running tasks can be interrupted' }, { status: 400 });
      }
      const updatedTask = await db.task.update({
        where: { id: taskId },
        data: {
          status: 3, // Interrupted
          endTime: new Date()
        }
      });
      return NextResponse.json({
        code: 0,
        data: updatedTask,
        message: 'Task interrupted'
      });
    }
    return NextResponse.json({ code: 400, error: 'Unknown action' }, { status: 400 });
  } catch (error) {
    console.error('Failed to update blind-test task:', error);
    return NextResponse.json(
      { code: 500, error: 'Failed to update blind-test task', message: error.message },
      { status: 500 }
    );
  }
 }
 /**
 * Delete blind-test task and its results
 */
 export async function DELETE(request, { params }) {
  try {
    const { projectId, taskId } = params;
    const task = await db.task.findFirst({
      where: {
        id: taskId,
        projectId,
        taskType: 'blind-test'
      }
    });
    if (!task) {
      return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
    }
    // Delete related EvalResults first
    await db.evalResults.deleteMany({
      where: { taskId }
    });
    // Then delete the task
    await db.task.delete({
      where: { id: taskId }
    });
    return NextResponse.json({
      code: 0,
      message: 'Task deleted'
    });
  } catch (error) {
    console.error('Failed to delete blind-test task:', error);
    return NextResponse.json(
      { code: 500, error: 'Failed to delete blind-test task', message: error.message },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/[taskId]/stream-model/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/[taskId]/stream-model/route.js
@@ -0,0 +1,92 @@
 import { NextResponse } from 'next/server';
 import { db } from '@/lib/db/index';
 import LLMClient from '@/lib/llm/core/index';
 import { getModelConfigById } from '@/lib/db/model-config';
 /**
 * Stream answer for a specified model
 * Query param: model=A or model=B
 */
 export async function GET(request, { params }) {
  const { projectId, taskId } = params;
  const { searchParams } = new URL(request.url);
  const modelType = searchParams.get('model'); // 'A' or 'B'
  try {
    if (!projectId || !taskId) {
      return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
    }
    if (!modelType || !['A', 'B'].includes(modelType)) {
      return NextResponse.json({ error: 'Model type must be specified (A or B)' }, { status: 400 });
    }
    // Fetch task
    const task = await db.task.findUnique({
      where: { id: taskId }
    });
    if (!task || task.taskType !== 'blind-test') {
      return NextResponse.json({ error: 'Task not found' }, { status: 404 });
    }
    // Parse task detail
    const detail = JSON.parse(task.detail || '{}');
    const modelInfo = JSON.parse(task.modelInfo || '{}');
    // Support both evalDatasetIds and questionIds
    const questionIds = detail.questionIds || detail.evalDatasetIds || [];
    const currentIndex = detail.currentIndex || 0;
    // Check if task is completed
    if (questionIds.length === 0 || currentIndex >= questionIds.length) {
      return NextResponse.json({ completed: true });
    }
    // Fetch current question
    const currentQuestionId = questionIds[currentIndex];
    const currentQuestion = await db.evalDatasets.findUnique({
      where: { id: currentQuestionId }
    });
    if (!currentQuestion) {
      return NextResponse.json({ error: 'Question not found' }, { status: 404 });
    }
    // Resolve model config based on modelType
    const modelConfigKey = modelType === 'A' ? 'modelA' : 'modelB';
    const modelConfig = await getModelConfigById(modelInfo[modelConfigKey].id);
    if (!modelConfig) {
      return NextResponse.json({ error: 'Model configuration not found' }, { status: 400 });
    }
    // Prepare messages
    const messages = [
      {
        role: 'system',
        content: "You are a helpful assistant. Provide detailed and accurate answers to the user's question."
      },
      { role: 'user', content: currentQuestion.question }
    ];
    // Create LLM client
    const client = new LLMClient({
      projectId,
      ...modelConfig
    });
    // Call streaming API and return response directly
    const response = await client.chatStreamAPI(messages);
    return new Response(response.body, {
      headers: {
        'Content-Type': 'text/plain; charset=utf-8',
        'Cache-Control': 'no-cache',
        Connection: 'keep-alive'
      }
    });
  } catch (error) {
    console.error(`Model ${modelType} streaming call failed:`, error);
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/[taskId]/stream/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/[taskId]/stream/route.js
@@ -0,0 +1,213 @@
 import { NextResponse } from 'next/server';
 import { db } from '@/lib/db/index';
 import LLMClient from '@/lib/llm/core/index';
 import { getModelConfigById } from '@/lib/db/model-config';
 /**
 * Stream answers from two models for the current question
 */
 export async function GET(request, { params }) {
  const { projectId, taskId } = params;
  try {
    if (!projectId || !taskId) {
      return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
    }
    // Fetch task
    const task = await db.task.findUnique({
      where: { id: taskId }
    });
    if (!task || task.taskType !== 'blind-test') {
      return NextResponse.json({ error: 'Task not found' }, { status: 404 });
    }
    // Parse task detail
    const detail = JSON.parse(task.detail || '{}');
    const modelInfo = JSON.parse(task.modelInfo || '{}');
    const { questionIds = [], currentIndex = 0 } = detail;
    // Check if task is completed
    if (currentIndex >= questionIds.length) {
      return NextResponse.json({ completed: true });
    }
    // Fetch current question
    const currentQuestionId = questionIds[currentIndex];
    const currentQuestion = await db.evalDatasets.findUnique({
      where: { id: currentQuestionId }
    });
    if (!currentQuestion) {
      return NextResponse.json({ error: 'Question not found' }, { status: 404 });
    }
    // Fetch model configs
    const [modelConfigA, modelConfigB] = await Promise.all([
      getModelConfigById(modelInfo.modelA.providerId),
      getModelConfigById(modelInfo.modelB.providerId)
    ]);
    if (!modelConfigA || !modelConfigB) {
      return NextResponse.json({ error: 'Model configuration not found' }, { status: 400 });
    }
    // Randomly swap positions (core blind-test behavior)
    const isSwapped = Math.random() > 0.5;
    // Create streaming response
    const encoder = new TextEncoder();
    const stream = new ReadableStream({
      async start(controller) {
        try {
          // Send init message
          controller.enqueue(
            encoder.encode(
              JSON.stringify({
                type: 'init',
                question: currentQuestion.question,
                questionId: currentQuestion.id,
                questionIndex: currentIndex + 1,
                totalQuestions: questionIds.length,
                isSwapped
              }) + '\n'
            )
          );
          // Prepare messages
          const messages = [
            {
              role: 'system',
              content: "You are a helpful assistant. Provide detailed and accurate answers to the user's question."
            },
            { role: 'user', content: currentQuestion.question }
          ];
          // Create LLM clients
          const clientA = new LLMClient({
            projectId,
            ...modelConfigA
          });
          const clientB = new LLMClient({
            projectId,
            ...modelConfigB
          });
          let answerA = '';
          let answerB = '';
          const startTime = Date.now();
          // Call both models in parallel (streaming)
          await Promise.all([
            (async () => {
              try {
                const response = await clientA.chatStreamAPI(messages);
                const reader = response.body.getReader();
                const decoder = new TextDecoder();
                while (true) {
                  const { done, value } = await reader.read();
                  if (done) break;
                  const chunk = decoder.decode(value, { stream: true });
                  answerA += chunk;
                  // Send chunk update
                  controller.enqueue(
                    encoder.encode(
                      JSON.stringify({
                        type: 'chunk',
                        model: isSwapped ? 'B' : 'A',
                        content: chunk
                      }) + '\n'
                    )
                  );
                }
              } catch (err) {
                console.error('Model A call failed:', err);
                controller.enqueue(
                  encoder.encode(
                    JSON.stringify({
                      type: 'error',
                      model: isSwapped ? 'B' : 'A',
                      error: err.message
                    }) + '\n'
                  )
                );
              }
            })(),
            (async () => {
              try {
                const response = await clientB.chatStreamAPI(messages);
                const reader = response.body.getReader();
                const decoder = new TextDecoder();
                while (true) {
                  const { done, value } = await reader.read();
                  if (done) break;
                  const chunk = decoder.decode(value, { stream: true });
                  answerB += chunk;
                  // Send chunk update
                  controller.enqueue(
                    encoder.encode(
                      JSON.stringify({
                        type: 'chunk',
                        model: isSwapped ? 'A' : 'B',
                        content: chunk
                      }) + '\n'
                    )
                  );
                }
              } catch (err) {
                console.error('Model B call failed:', err);
                controller.enqueue(
                  encoder.encode(
                    JSON.stringify({
                      type: 'error',
                      model: isSwapped ? 'A' : 'B',
                      error: err.message
                    }) + '\n'
                  )
                );
              }
            })()
          ]);
          const duration = Date.now() - startTime;
          // Send done message
          controller.enqueue(
            encoder.encode(
              JSON.stringify({
                type: 'done',
                duration,
                answerA: isSwapped ? answerB : answerA,
                answerB: isSwapped ? answerA : answerB
              }) + '\n'
            )
          );
          controller.close();
        } catch (error) {
          console.error('Streaming handler failed:', error);
          controller.error(error);
        }
      }
    });
    return new Response(stream, {
      headers: {
        'Content-Type': 'text/plain; charset=utf-8',
        'Cache-Control': 'no-cache',
        Connection: 'keep-alive'
      }
    });
  } catch (error) {
    console.error('API error:', error);
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/[taskId]/vote/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/[taskId]/vote/route.js
@@ -0,0 +1,154 @@
 import { NextResponse } from 'next/server';
 import { db } from '@/lib/db/index';
 /**
 * Submit vote result
 * vote: 'left' | 'right' | 'both_good' | 'both_bad'
 * Results are stored in EvalResults table
 */
 export async function POST(request, { params }) {
  try {
    const { projectId, taskId } = params;
    const { vote, questionId, isSwapped, leftAnswer, rightAnswer } = await request.json();
    // Validate vote option
    const validVotes = ['left', 'right', 'both_good', 'both_bad'];
    if (!validVotes.includes(vote)) {
      return NextResponse.json({ code: 400, error: 'Invalid vote option' }, { status: 400 });
    }
    if (!questionId) {
      return NextResponse.json({ code: 400, error: 'Question ID is required' }, { status: 400 });
    }
    const task = await db.task.findFirst({
      where: {
        id: taskId,
        projectId,
        taskType: 'blind-test'
      }
    });
    if (!task) {
      return NextResponse.json({ code: 404, error: 'Task not found' }, { status: 404 });
    }
    if (task.status !== 0) {
      return NextResponse.json({ code: 400, error: 'Task has ended' }, { status: 400 });
    }
    // Parse task details
    let detail = {};
    try {
      detail = task.detail ? JSON.parse(task.detail) : {};
    } catch (e) {
      console.error('Failed to parse task detail:', e);
    }
    // Calculate scores
    // isSwapped: true means left is model B and right is model A
    // isSwapped: false means left is model A and right is model B
    let modelAScore = 0;
    let modelBScore = 0;
    if (vote === 'left') {
      if (isSwapped) {
        modelBScore = 1; // Left is B
      } else {
        modelAScore = 1; // Left is A
      }
    } else if (vote === 'right') {
      if (isSwapped) {
        modelAScore = 1; // Right is A
      } else {
        modelBScore = 1; // Right is B
      }
    } else if (vote === 'both_good') {
      modelAScore = 0.5;
      modelBScore = 0.5;
    }
    // both_bad: both scores remain 0
    // Store result in EvalResults table
    const evalResult = await db.evalResults.create({
      data: {
        projectId,
        taskId,
        evalDatasetId: questionId,
        modelAnswer: JSON.stringify({
          leftAnswer: leftAnswer || '',
          rightAnswer: rightAnswer || ''
        }),
        score: modelAScore, // Store modelA score for sorting/aggregation
        isCorrect: false, // Not applicable for blind-test
        judgeResponse: JSON.stringify({
          vote,
          isSwapped,
          modelAScore,
          modelBScore
        }),
        duration: 0,
        status: 0
      }
    });
    // Update task progress
    const evalDatasetIds = detail.evalDatasetIds || [];
    const newCurrentIndex = (detail.currentIndex || 0) + 1;
    const isCompleted = newCurrentIndex >= evalDatasetIds.length;
    const updatedDetail = {
      ...detail,
      currentIndex: newCurrentIndex
    };
    await db.task.update({
      where: { id: taskId },
      data: {
        detail: JSON.stringify(updatedDetail),
        completedCount: newCurrentIndex,
        status: isCompleted ? 1 : 0, // 1-completed, 0-running
        endTime: isCompleted ? new Date() : null
      }
    });
    // Calculate current total scores from EvalResults
    const allResults = await db.evalResults.findMany({
      where: { taskId },
      select: { judgeResponse: true }
    });
    let totalModelAScore = 0;
    let totalModelBScore = 0;
    for (const r of allResults) {
      try {
        const judge = JSON.parse(r.judgeResponse || '{}');
        totalModelAScore += judge.modelAScore || 0;
        totalModelBScore += judge.modelBScore || 0;
      } catch (e) {
        // Ignore parse errors
      }
    }
    return NextResponse.json({
      code: 0,
      data: {
        success: true,
        isCompleted,
        currentIndex: newCurrentIndex,
        totalCount: evalDatasetIds.length,
        scores: {
          modelA: totalModelAScore,
          modelB: totalModelBScore
        }
      },
      message: isCompleted ? 'Blind-test task completed' : 'Vote recorded'
    });
  } catch (error) {
    console.error('Failed to submit vote result:', error);
    return NextResponse.json(
      { code: 500, error: 'Failed to submit vote result', message: error.message },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/blind-test-tasks/route.js
@@ -0,0 +1,226 @@
 import { NextResponse } from 'next/server';
 import { db } from '@/lib/db/index';
 /**
 * Get all blind-test tasks for a project
 */
 export async function GET(request, { params }) {
  try {
    const { projectId } = params;
    const { searchParams } = new URL(request.url);
    const page = parseInt(searchParams.get('page') || '1');
    const pageSize = parseInt(searchParams.get('pageSize') || '20');
    if (!projectId) {
      return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
    }
    const skip = (page - 1) * pageSize;
    // Fetch task list and total count
    const [tasks, total] = await Promise.all([
      db.task.findMany({
        where: {
          projectId,
          taskType: 'blind-test'
        },
        orderBy: { createAt: 'desc' },
        skip,
        take: pageSize
      }),
      db.task.count({
        where: {
          projectId,
          taskType: 'blind-test'
        }
      })
    ]);
    // Fetch evaluation results for all tasks to calculate scores
    const taskIds = tasks.map(t => t.id);
    const allEvalResults = await db.evalResults.findMany({
      where: { taskId: { in: taskIds } },
      select: {
        taskId: true,
        judgeResponse: true
      }
    });
    // Group results by taskId and calculate scores
    const taskScores = {};
    for (const result of allEvalResults) {
      if (!taskScores[result.taskId]) {
        taskScores[result.taskId] = { modelAScore: 0, modelBScore: 0 };
      }
      try {
        const judge = JSON.parse(result.judgeResponse || '{}');
        taskScores[result.taskId].modelAScore += judge.modelAScore || 0;
        taskScores[result.taskId].modelBScore += judge.modelBScore || 0;
      } catch (e) {
        // Ignore parse errors
      }
    }
    // Parse task detail fields and attach scores
    const tasksWithDetails = tasks.map(task => {
      let detail = {};
      let modelInfo = {};
      try {
        detail = task.detail ? JSON.parse(task.detail) : {};
        modelInfo = task.modelInfo ? JSON.parse(task.modelInfo) : {};
      } catch (e) {
        console.error('Failed to parse task detail:', e);
      }
      // Attach calculated scores as results array
      const scores = taskScores[task.id] || { modelAScore: 0, modelBScore: 0 };
      const results = [
        {
          modelAScore: scores.modelAScore,
          modelBScore: scores.modelBScore
        }
      ];
      return {
        ...task,
        detail: {
          ...detail,
          results // Attach results for display in task card
        },
        modelInfo
      };
    });
    return NextResponse.json({
      code: 0,
      data: {
        items: tasksWithDetails,
        total,
        page,
        pageSize,
        totalPages: Math.ceil(total / pageSize)
      }
    });
  } catch (error) {
    console.error('Failed to fetch blind-test task list:', error);
    return NextResponse.json(
      { code: 500, error: 'Failed to fetch blind-test task list', message: error.message },
      { status: 500 }
    );
  }
 }
 /**
 * Create a blind-test task
 */
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const data = await request.json();
    const { modelA, modelB, evalDatasetIds, language = 'zh-CN' } = data;
    if (!modelA || !modelA.modelId || !modelA.providerId) {
      return NextResponse.json({ code: 400, error: 'Please select model A' }, { status: 400 });
    }
    if (!modelB || !modelB.modelId || !modelB.providerId) {
      return NextResponse.json({ code: 400, error: 'Please select model B' }, { status: 400 });
    }
    if (modelA.modelId === modelB.modelId && modelA.providerId === modelB.providerId) {
      return NextResponse.json({ code: 400, error: 'The two models must be different' }, { status: 400 });
    }
    if (!evalDatasetIds || evalDatasetIds.length === 0) {
      return NextResponse.json({ code: 400, error: 'Please select questions to evaluate' }, { status: 400 });
    }
    const evalDatasets = await db.evalDatasets.findMany({
      where: {
        id: { in: evalDatasetIds },
        projectId
      },
      select: { id: true, questionType: true }
    });
    const invalidQuestions = evalDatasets.filter(
      q => q.questionType !== 'short_answer' && q.questionType !== 'open_ended'
    );
    if (invalidQuestions.length > 0) {
      return NextResponse.json(
        {
          code: 400,
          error: 'Blind-test tasks only support short-answer and open-ended questions'
        },
        { status: 400 }
      );
    }
    // Fetch model config info
    const [modelConfigA, modelConfigB] = await Promise.all([
      db.modelConfig.findFirst({
        where: { projectId, providerId: modelA.providerId, modelId: modelA.modelId }
      }),
      db.modelConfig.findFirst({
        where: { projectId, providerId: modelB.providerId, modelId: modelB.modelId }
      })
    ]);
    // Build model info (two models)
    const modelInfo = {
      modelA: {
        id: modelConfigA?.id,
        modelId: modelA.modelId,
        modelName: modelConfigA?.modelName || modelA.modelId,
        providerId: modelA.providerId,
        providerName: modelConfigA?.providerName || modelA.providerId
      },
      modelB: {
        id: modelConfigB?.id,
        modelId: modelB.modelId,
        modelName: modelConfigB?.modelName || modelB.modelId,
        providerId: modelB.providerId,
        providerName: modelConfigB?.providerName || modelB.providerId
      }
    };
    // Build task detail (only store evalDatasetIds and currentIndex)
    const taskDetail = {
      evalDatasetIds,
      currentIndex: 0 // Current question index
    };
    // Create task
    const newTask = await db.task.create({
      data: {
        projectId,
        taskType: 'blind-test',
        status: 0, // Running
        modelInfo: JSON.stringify(modelInfo),
        language,
        detail: JSON.stringify(taskDetail),
        totalCount: evalDatasetIds.length,
        completedCount: 0,
        note: ''
      }
    });
    return NextResponse.json({
      code: 0,
      data: {
        ...newTask,
        detail: taskDetail,
        modelInfo
      },
      message: 'Blind-test task created'
    });
  } catch (error) {
    console.error('Failed to create blind-test task:', error);
    return NextResponse.json(
      { code: 500, error: 'Failed to create blind-test task', message: error.message },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/chunks/[chunkId]/clean/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/chunks/[chunkId]/clean/route.js
@@ -0,0 +1,40 @@
 import { NextResponse } from 'next/server';
 import logger from '@/lib/util/logger';
 import cleanService from '@/lib/services/clean';
 // 为指定文本块进行数据清洗
 export async function POST(request, { params }) {
  try {
    const { projectId, chunkId } = params;
    // 验证项目ID和文本块ID
    if (!projectId || !chunkId) {
      return NextResponse.json({ error: 'Project ID or text block ID cannot be empty' }, { status: 400 });
    }
    // 获取请求体
    const { model, language = '中文' } = await request.json();
    if (!model) {
      return NextResponse.json({ error: 'Model cannot be empty' }, { status: 400 });
    }
    // 使用数据清洗服务
    const result = await cleanService.cleanDataForChunk(projectId, chunkId, {
      model,
      language
    });
    // 返回清洗结果
    return NextResponse.json({
      chunkId,
      originalLength: result.originalLength,
      cleanedLength: result.cleanedLength,
      success: result.success,
      message: '数据清洗完成'
    });
  } catch (error) {
    logger.error('Error cleaning data:', error);
    return NextResponse.json({ error: error.message || 'Error cleaning data' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/chunks/[chunkId]/eval-questions/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/chunks/[chunkId]/eval-questions/route.js
@@ -0,0 +1,35 @@
 import { NextResponse } from 'next/server';
 import { generateEvalQuestionsForChunk } from '@/lib/services/eval';
 import logger from '@/lib/util/logger';
 /**
 * 为指定文本块生成测评题目
 */
 export async function POST(request, { params }) {
  try {
    const { projectId, chunkId } = params;
    // 验证参数
    if (!projectId || !chunkId) {
      return NextResponse.json({ error: 'Project ID and Chunk ID are required' }, { status: 400 });
    }
    // 获取请求体
    const { model, language = 'zh-CN' } = await request.json();
    if (!model) {
      return NextResponse.json({ error: 'Model configuration is required' }, { status: 400 });
    }
    // 调用服务层生成测评题目
    const result = await generateEvalQuestionsForChunk(projectId, chunkId, {
      model,
      language
    });
    return NextResponse.json(result);
  } catch (error) {
    logger.error('Error generating eval questions:', error);
    return NextResponse.json({ error: error.message || 'Failed to generate eval questions' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/chunks/[chunkId]/questions/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/chunks/[chunkId]/questions/route.js
@@ -0,0 +1,73 @@
 import { NextResponse } from 'next/server';
 import { getQuestionsForChunk } from '@/lib/db/questions';
 import logger from '@/lib/util/logger';
 import questionService from '@/lib/services/questions';
 // 为指定文本块生成问题
 export async function POST(request, { params }) {
  try {
    const { projectId, chunkId } = params;
    // 验证项目ID和文本块ID
    if (!projectId || !chunkId) {
      return NextResponse.json({ error: 'Project ID or text block ID cannot be empty' }, { status: 400 });
    } // 获取请求体
    const { model, language = '中文', number, enableGaExpansion = false } = await request.json();
    if (!model) {
      return NextResponse.json({ error: 'Model cannot be empty' }, { status: 400 });
    }
    // 后续会根据是否有GA对来选择是否启用GA扩展选择服务函数
    const serviceFunc = questionService.generateQuestionsForChunkWithGA;
    // 使用问题生成服务
    const result = await serviceFunc(projectId, chunkId, {
      model,
      language,
      number,
      enableGaExpansion
    });
    // 统一返回格式，确保包含GA扩展信息
    const response = {
      chunkId,
      questions: result.questions || result.labelQuestions || [],
      total: result.total || (result.questions || result.labelQuestions || []).length,
      gaExpansionUsed: result.gaExpansionUsed || false,
      gaPairsCount: result.gaPairsCount || 0,
      expectedTotal: result.expectedTotal || result.total
    };
    // 返回生成的问题
    return NextResponse.json(response);
  } catch (error) {
    logger.error('Error generating questions:', error);
    return NextResponse.json({ error: error.message || 'Error generating questions' }, { status: 500 });
  }
 }
 // 获取指定文本块的问题
 export async function GET(request, { params }) {
  try {
    const { projectId, chunkId } = params;
    // 验证项目ID和文本块ID
    if (!projectId || !chunkId) {
      return NextResponse.json({ error: 'The item ID or text block ID cannot be empty' }, { status: 400 });
    }
    // 获取文本块的问题
    const questions = await getQuestionsForChunk(projectId, chunkId);
    // 返回问题列表
    return NextResponse.json({
      chunkId,
      questions,
      total: questions.length
    });
  } catch (error) {
    console.error('Error getting questions:', String(error));
    return NextResponse.json({ error: error.message || 'Error getting questions' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/chunks/[chunkId]/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/chunks/[chunkId]/route.js
@@ -0,0 +1,73 @@
 import { NextResponse } from 'next/server';
 import { deleteChunkById, getChunkById, updateChunkById } from '@/lib/db/chunks';
 // 获取文本块内容
 export async function GET(request, { params }) {
  try {
    const { projectId, chunkId } = params;
    // 验证参数
    if (!projectId) {
      return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
    }
    if (!chunkId) {
      return NextResponse.json({ error: 'Text block ID cannot be empty' }, { status: 400 });
    }
    // 获取文本块内容
    const chunk = await getChunkById(chunkId);
    return NextResponse.json(chunk);
  } catch (error) {
    console.error('Failed to get text block content:', String(error));
    return NextResponse.json({ error: error.message || 'Failed to get text block content' }, { status: 500 });
  }
 }
 // 删除文本块
 export async function DELETE(request, { params }) {
  try {
    const { projectId, chunkId } = params;
    // 验证参数
    if (!projectId) {
      return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
    }
    if (!chunkId) {
      return NextResponse.json({ error: 'Text block ID cannot be empty' }, { status: 400 });
    }
    await deleteChunkById(chunkId);
    return NextResponse.json({ message: 'Text block deleted successfully' });
  } catch (error) {
    console.error('Failed to delete text block:', String(error));
    return NextResponse.json({ error: error.message || 'Failed to delete text block' }, { status: 500 });
  }
 }
 // 编辑文本块内容
 export async function PATCH(request, { params }) {
  try {
    const { projectId, chunkId } = params;
    // 验证参数
    if (!projectId) {
      return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
    }
    if (!chunkId) {
      return NextResponse.json({ error: '文本块ID不能为空' }, { status: 400 });
    }
    // 解析请求体获取新内容
    const requestData = await request.json();
    const { content } = requestData;
    if (!content) {
      return NextResponse.json({ error: '内容不能为空' }, { status: 400 });
    }
    let res = await updateChunkById(chunkId, { content });
    return NextResponse.json(res);
  } catch (error) {
    console.error('编辑文本块失败:', String(error));
    return NextResponse.json({ error: error.message || '编辑文本块失败' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/chunks/batch-content/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/chunks/batch-content/route.js
@@ -0,0 +1,20 @@
 import { getChunkContentsByNames } from '@/lib/db/chunks';
 import { NextResponse } from 'next/server';
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const { chunkNames } = await request.json();
    if (!chunkNames || !Array.isArray(chunkNames)) {
      return NextResponse.json({ error: 'chunkNames 参数必须是数组' }, { status: 400 });
    }
    const chunkContentMap = await getChunkContentsByNames(projectId, chunkNames);
    return NextResponse.json(chunkContentMap);
  } catch (error) {
    console.error('批量获取文本块内容失败:', error);
    return NextResponse.json({ error: '批量获取文本块内容失败' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/chunks/batch-edit/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/chunks/batch-edit/route.js
@@ -0,0 +1,102 @@
 import { NextRequest, NextResponse } from 'next/server';
 import { PrismaClient } from '@prisma/client';
 const prisma = new PrismaClient();
 /**
 * 批量编辑文本块内容
 * POST /api/projects/[projectId]/chunks/batch-edit
 */
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const body = await request.json();
    const { position, content, chunkIds } = body;
    // 验证参数
    if (!position || !content || !chunkIds || !Array.isArray(chunkIds) || chunkIds.length === 0) {
      return NextResponse.json({ error: 'Missing required parameters: position, content, chunkIds' }, { status: 400 });
    }
    if (!['start', 'end'].includes(position)) {
      return NextResponse.json({ error: 'Position must be "start" or "end"' }, { status: 400 });
    }
    // 验证项目权限（获取要编辑的文本块）
    const chunksToUpdate = await prisma.chunks.findMany({
      where: {
        id: { in: chunkIds },
        projectId: projectId
      },
      select: {
        id: true,
        content: true,
        name: true
      }
    });
    if (chunksToUpdate.length === 0) {
      return NextResponse.json({ error: 'Not found' }, { status: 404 });
    }
    if (chunksToUpdate.length !== chunkIds.length) {
      return NextResponse.json({ error: 'Some chunks not found' }, { status: 400 });
    }
    // 准备更新数据
    const updates = chunksToUpdate.map(chunk => {
      let newContent;
      if (position === 'start') {
        // 在开头添加内容
        newContent = content + '\n\n' + chunk.content;
      } else {
        // 在结尾添加内容
        newContent = chunk.content + '\n\n' + content;
      }
      return {
        where: { id: chunk.id },
        data: {
          content: newContent,
          size: newContent.length,
          updateAt: new Date()
        }
      };
    });
    async function processBatches(items, batchSize, processFn) {
      const results = [];
      for (let i = 0; i < items.length; i += batchSize) {
        const batch = items.slice(i, i + batchSize);
        const batchResults = await Promise.all(batch.map(processFn));
        results.push(...batchResults);
      }
      return results;
    }
    const BATCH_SIZE = 50; // 每批处理 50 个
    await processBatches(updates, BATCH_SIZE, update => prisma.chunks.update(update));
    // 记录操作日志（可选）
    console.log(`Successfully updated ${chunksToUpdate.length} chunks`);
    return NextResponse.json({
      success: true,
      updatedCount: chunksToUpdate.length,
      message: `Successfully updated ${chunksToUpdate.length} chunks`
    });
  } catch (error) {
    console.error('批量编辑文本块失败:', error);
    return NextResponse.json(
      {
        error: 'Batch edit chunks failed',
        details: error.message
      },
      { status: 500 }
    );
  } finally {
    await prisma.$disconnect();
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/chunks/name/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/chunks/name/route.js
@@ -0,0 +1,35 @@
 import { NextResponse } from 'next/server';
 import { getChunkByName } from '@/lib/db/chunks';
 /**
 * 根据文本块名称获取文本块
 * @param {Request} request 请求对象
 * @param {object} context 上下文，包含路径参数
 * @returns {Promise<NextResponse>} 响应对象
 */
 export async function GET(request, { params }) {
  try {
    const { projectId } = params;
    // 从查询参数中获取 chunkName
    const { searchParams } = new URL(request.url);
    const chunkName = searchParams.get('chunkName');
    if (!chunkName) {
      return NextResponse.json({ error: '文本块名称不能为空' }, { status: 400 });
    }
    // 根据名称和项目ID查询文本块
    const chunk = await getChunkByName(projectId, chunkName);
    if (!chunk) {
      return NextResponse.json({ error: '未找到指定的文本块' }, { status: 404 });
    }
    // 返回文本块信息
    return NextResponse.json(chunk);
  } catch (error) {
    console.error('根据名称获取文本块失败:', String(error));
    return NextResponse.json({ error: '获取文本块失败: ' + error.message }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/chunks/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/chunks/route.js
@@ -0,0 +1,21 @@
 import { NextResponse } from 'next/server';
 import { deleteChunkById, getChunkByFileIds, getChunkById, getChunksByFileIds, updateChunkById } from '@/lib/db/chunks';
 // 获取文本块内容
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    // 验证参数
    if (!projectId) {
      return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
    }
    const { array } = await request.json();
    // 获取文本块内容
    const chunk = await getChunksByFileIds(array);
    return NextResponse.json(chunk);
  } catch (error) {
    console.error('Failed to get text block content:', String(error));
    return NextResponse.json({ error: String(error) || 'Failed to get text block content' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/config/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/config/route.js
@@ -0,0 +1,36 @@
 import { NextResponse } from 'next/server';
 import { getProject, updateProject, getTaskConfig } from '@/lib/db/projects';
 // 获取项目配置
 export async function GET(request, { params }) {
  try {
    const projectId = params.projectId;
    const config = await getProject(projectId);
    const taskConfig = await getTaskConfig(projectId);
    return NextResponse.json({ ...config, ...taskConfig });
  } catch (error) {
    console.error('获取项目配置失败:', String(error));
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
 // 更新项目配置
 export async function PUT(request, { params }) {
  try {
    const projectId = params.projectId;
    const newConfig = await request.json();
    const currentConfig = await getProject(projectId);
    // 只更新 prompts 部分
    const updatedConfig = {
      ...currentConfig,
      ...newConfig.prompts
    };
    const config = await updateProject(projectId, updatedConfig);
    return NextResponse.json(config);
  } catch (error) {
    console.error('更新项目配置失败:', String(error));
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/custom-prompts/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/custom-prompts/route.js
@@ -0,0 +1,105 @@
 import { NextResponse } from 'next/server';
 import {
  getCustomPrompts,
  getCustomPrompt,
  saveCustomPrompt,
  deleteCustomPrompt,
  batchSaveCustomPrompts,
  toggleCustomPrompt,
  getPromptTemplates
 } from '@/lib/db/custom-prompts';
 // 获取项目的自定义提示词
 export async function GET(request, { params }) {
  try {
    const { projectId } = params;
    const { searchParams } = new URL(request.url);
    const promptType = searchParams.get('promptType');
    const language = searchParams.get('language');
    if (!projectId) {
      return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
    }
    const customPrompts = await getCustomPrompts(projectId, promptType, language);
    const templates = await getPromptTemplates();
    return NextResponse.json({
      success: true,
      customPrompts,
      templates
    });
  } catch (error) {
    console.error('获取自定义提示词失败:', error);
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
 // 保存自定义提示词
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const body = await request.json();
    if (!projectId) {
      return NextResponse.json({ error: 'Project ID is required' }, { status: 400 });
    }
    // 批量保存
    if (body.prompts && Array.isArray(body.prompts)) {
      const results = await batchSaveCustomPrompts(projectId, body.prompts);
      return NextResponse.json({
        success: true,
        results
      });
    }
    // 单个保存
    const { promptType, promptKey, language, content } = body;
    if (!promptType || !promptKey || !language || content === undefined) {
      return NextResponse.json(
        {
          error: 'promptType, promptKey, language and content are required'
        },
        { status: 400 }
      );
    }
    const result = await saveCustomPrompt(projectId, promptType, promptKey, language, content);
    return NextResponse.json({
      success: true,
      result
    });
  } catch (error) {
    console.error('保存自定义提示词失败:', error);
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
 // 删除自定义提示词
 export async function DELETE(request, { params }) {
  try {
    const { projectId } = params;
    const { searchParams } = new URL(request.url);
    const promptType = searchParams.get('promptType');
    const promptKey = searchParams.get('promptKey');
    const language = searchParams.get('language');
    if (!projectId || !promptType || !promptKey || !language) {
      return NextResponse.json(
        {
          error: 'projectId, promptType, promptKey and language are required'
        },
        { status: 400 }
      );
    }
    const success = await deleteCustomPrompt(projectId, promptType, promptKey, language);
    return NextResponse.json({
      success
    });
  } catch (error) {
    console.error('删除自定义提示词失败:', error);
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/custom-split/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/custom-split/route.js
@@ -0,0 +1,116 @@
 import { NextResponse } from 'next/server';
 import { saveChunks, deleteChunksByFileId } from '@/lib/db/chunks';
 import path from 'path';
 import fs from 'fs/promises';
 import { getProjectRoot } from '@/lib/db/base';
 /**
 * 处理自定义分块请求
 * @param {Request} request - 请求对象
 * @param {Object} params - 路由参数
 * @returns {Promise<Response>} - 响应对象
 */
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const { fileId, fileName, content, splitPoints } = await request.json();
    // 参数验证
    if (!projectId || !fileId || !fileName || !content || !splitPoints) {
      return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
    }
    // 获取项目根目录
    const projectRoot = await getProjectRoot();
    const projectPath = path.join(projectRoot, projectId);
    // 检查项目是否存在
    try {
      await fs.access(projectPath);
    } catch (error) {
      return NextResponse.json({ error: 'Project does not exist' }, { status: 404 });
    }
    // 先删除该文件已有的文本块
    await deleteChunksByFileId(projectId, fileId);
    // 根据分块点将文件内容分割成多个块
    const customChunks = generateCustomChunks(projectId, fileId, fileName, content, splitPoints);
    // 保存新的文本块
    await saveChunks(customChunks);
    return NextResponse.json({
      success: true,
      message: 'Custom chunks saved successfully',
      totalChunks: customChunks.length
    });
  } catch (error) {
    console.error('自定义分块处理出错:', String(error));
    return NextResponse.json({ error: error.message || 'Failed to process custom split request' }, { status: 500 });
  }
 }
 /**
 * 根据分块点生成自定义文本块
 * @param {string} projectId - 项目ID
 * @param {string} fileId - 文件ID
 * @param {string} fileName - 文件名
 * @param {string} content - 文件内容
 * @param {Array} splitPoints - 分块点数组
 * @returns {Array} - 生成的文本块数组
 */
 function generateCustomChunks(projectId, fileId, fileName, content, splitPoints) {
  // 按位置排序分块点
  const sortedPoints = [...splitPoints].sort((a, b) => a.position - b.position);
  // 创建分块
  const chunks = [];
  let startPos = 0;
  // 处理每个分块点
  for (let i = 0; i < sortedPoints.length; i++) {
    const endPos = sortedPoints[i].position;
    // 提取当前分块内容
    const chunkContent = content.substring(startPos, endPos);
    // 跳过空白分块
    if (chunkContent.trim().length === 0) {
      startPos = endPos;
      continue;
    }
    // 创建分块对象
    const chunk = {
      projectId,
      name: `${path.basename(fileName, path.extname(fileName))}-part-${i + 1}`,
      fileId,
      fileName,
      content: chunkContent,
      summary: `${fileName} 自定义分块 ${i + 1}/${sortedPoints.length + 1}`,
      size: chunkContent.length
    };
    chunks.push(chunk);
    startPos = endPos;
  }
  // 添加最后一个分块（如果有内容）
  const lastChunkContent = content.substring(startPos);
  if (lastChunkContent.trim().length > 0) {
    const lastChunk = {
      projectId,
      name: `${path.basename(fileName, path.extname(fileName))}-part-${sortedPoints.length + 1}`,
      fileId,
      fileName,
      content: lastChunkContent,
      summary: `${fileName} 自定义分块 ${sortedPoints.length + 1}/${sortedPoints.length + 1}`,
      size: lastChunkContent.length
    };
    chunks.push(lastChunk);
  }
  return chunks;
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/dataset-conversations/[conversationId]/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/dataset-conversations/[conversationId]/route.js
@@ -0,0 +1,183 @@
 /**
 * 单个多轮对话数据集操作API
 */
 import { NextResponse } from 'next/server';
 import {
  getDatasetConversationById,
  updateDatasetConversation,
  deleteDatasetConversation,
  getConversationNavigationItems
 } from '@/lib/db/dataset-conversations';
 /**
 * 获取单个多轮对话数据集详情
 */
 export async function GET(request, { params }) {
  try {
    const { projectId, conversationId } = params;
    const { searchParams } = new URL(request.url);
    const operateType = searchParams.get('operateType');
    // 如果是导航操作，返回导航项
    if (operateType !== null) {
      const data = await getConversationNavigationItems(projectId, conversationId, operateType);
      return NextResponse.json(data);
    }
    const conversation = await getDatasetConversationById(conversationId);
    if (!conversation) {
      return NextResponse.json(
        {
          success: false,
          message: '对话数据集不存在'
        },
        { status: 404 }
      );
    }
    if (conversation.projectId !== projectId) {
      return NextResponse.json(
        {
          success: false,
          message: '对话数据集不属于指定项目'
        },
        { status: 403 }
      );
    }
    return NextResponse.json(conversation);
  } catch (error) {
    console.error('获取多轮对话数据集详情失败:', error);
    return NextResponse.json(
      {
        success: false,
        message: error.message
      },
      { status: 500 }
    );
  }
 }
 /**
 * 更新多轮对话数据集
 */
 export async function PUT(request, { params }) {
  try {
    const { projectId, conversationId } = params;
    const body = await request.json();
    // 验证对话数据集是否存在且属于项目
    const conversation = await getDatasetConversationById(conversationId);
    if (!conversation) {
      return NextResponse.json(
        {
          success: false,
          message: '对话数据集不存在'
        },
        { status: 404 }
      );
    }
    if (conversation.projectId !== projectId) {
      return NextResponse.json(
        {
          success: false,
          message: '对话数据集不属于指定项目'
        },
        { status: 403 }
      );
    }
    // 只允许更新特定字段
    const allowedFields = ['score', 'tags', 'note', 'confirmed', 'aiEvaluation', 'messages'];
    const updateData = {};
    allowedFields.forEach(field => {
      if (body.hasOwnProperty(field)) {
        if (field === 'messages') {
          // 将messages数组转换为rawMessages字符串存储
          updateData['rawMessages'] = JSON.stringify(body[field]);
        } else {
          updateData[field] = body[field];
        }
      }
    });
    if (Object.keys(updateData).length === 0) {
      return NextResponse.json(
        {
          success: false,
          message: '没有有效的更新字段'
        },
        { status: 400 }
      );
    }
    const updatedConversation = await updateDatasetConversation(conversationId, updateData);
    return NextResponse.json({
      success: true,
      data: updatedConversation
    });
  } catch (error) {
    console.error('更新多轮对话数据集失败:', error);
    return NextResponse.json(
      {
        success: false,
        message: error.message
      },
      { status: 500 }
    );
  }
 }
 /**
 * 删除多轮对话数据集
 */
 export async function DELETE(request, { params }) {
  try {
    const { projectId, conversationId } = params;
    // 验证对话数据集是否存在且属于项目
    const conversation = await getDatasetConversationById(conversationId);
    if (!conversation) {
      return NextResponse.json(
        {
          success: false,
          message: '对话数据集不存在'
        },
        { status: 404 }
      );
    }
    if (conversation.projectId !== projectId) {
      return NextResponse.json(
        {
          success: false,
          message: '对话数据集不属于指定项目'
        },
        { status: 403 }
      );
    }
    await deleteDatasetConversation(conversationId);
    return NextResponse.json({
      success: true,
      message: '删除成功'
    });
  } catch (error) {
    console.error('删除多轮对话数据集失败:', error);
    return NextResponse.json(
      {
        success: false,
        message: error.message
      },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/dataset-conversations/export/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/dataset-conversations/export/route.js
@@ -0,0 +1,68 @@
 /**
 * 多轮对话数据集导出API
 * 直接导出原始的 ShareGPT 格式数据集
 */
 import { NextResponse } from 'next/server';
 import { getAllDatasetConversations } from '@/lib/db/dataset-conversations';
 /**
 * 导出多轮对话数据集
 */
 export async function GET(request, { params }) {
  try {
    const { projectId } = params;
    const { searchParams } = new URL(request.url);
    // 筛选条件
    const filters = {
      confirmed: searchParams.get('confirmed')
    };
    // 清除空值
    Object.keys(filters).forEach(key => {
      if (!filters[key]) delete filters[key];
    });
    // 获取所有对话数据集
    const conversations = await getAllDatasetConversations(projectId, filters);
    if (conversations.length === 0) {
      return NextResponse.json([]);
    }
    // 转换为 ShareGPT 格式数组
    const shareGptData = [];
    for (const conversation of conversations) {
      try {
        // 解析 rawMessages
        const messages = JSON.parse(conversation.rawMessages || '[]');
        if (messages.length > 0) {
          // 构建 ShareGPT 格式对象
          const shareGptItem = {
            messages: messages
          };
          shareGptData.push(shareGptItem);
        }
      } catch (error) {
        console.error(`解析对话消息失败 ${conversation.id}:`, error);
        // 跳过解析失败的对话，继续处理其他对话
        continue;
      }
    }
    return NextResponse.json(shareGptData);
  } catch (error) {
    console.error('导出多轮对话数据集失败:', error);
    return NextResponse.json(
      {
        success: false,
        message: error.message
      },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/dataset-conversations/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/dataset-conversations/route.js
@@ -0,0 +1,135 @@
 /**
 * 多轮对话数据集管理API
 */
 import { NextResponse } from 'next/server';
 import {
  getDatasetConversationsByPagination,
  getAllDatasetConversationIds,
  createDatasetConversation
 } from '@/lib/db/dataset-conversations';
 import { generateMultiTurnConversation } from '@/lib/services/multi-turn/index';
 /**
 * 获取多轮对话数据集列表（支持分页和筛选）
 */
 export async function GET(request, { params }) {
  try {
    const { projectId } = params;
    const { searchParams } = new URL(request.url);
    const getAllIds = searchParams.get('getAllIds') === 'true'; // 新增：获取所有对话ID的标志
    // 筛选条件
    const filters = {
      keyword: searchParams.get('keyword'),
      roleA: searchParams.get('roleA'),
      roleB: searchParams.get('roleB'),
      scenario: searchParams.get('scenario'),
      scoreMin: searchParams.get('scoreMin'),
      scoreMax: searchParams.get('scoreMax'),
      confirmed: searchParams.get('confirmed')
    };
    // 清除空值
    Object.keys(filters).forEach(key => {
      if (!filters[key]) delete filters[key];
    });
    // 如果请求获取所有ID
    if (getAllIds) {
      const allConversationIds = await getAllDatasetConversationIds(projectId, filters);
      return NextResponse.json({ allConversationIds });
    }
    // 正常分页查询
    const page = parseInt(searchParams.get('page') || '1');
    const pageSize = parseInt(searchParams.get('pageSize') || '20');
    const result = await getDatasetConversationsByPagination(projectId, page, pageSize, filters);
    return NextResponse.json({
      success: true,
      ...result
    });
  } catch (error) {
    console.error('获取多轮对话数据集失败:', error);
    return NextResponse.json(
      {
        success: false,
        message: error.message
      },
      { status: 500 }
    );
  }
 }
 /**
 * 创建多轮对话数据集
 */
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const body = await request.json();
    const { questionId, systemPrompt, scenario, rounds, roleA, roleB, model, language = '中文' } = body;
    if (!questionId) {
      return NextResponse.json(
        {
          success: false,
          message: '问题ID不能为空'
        },
        { status: 400 }
      );
    }
    if (!model || !model.modelId) {
      return NextResponse.json(
        {
          success: false,
          message: '模型配置不能为空'
        },
        { status: 400 }
      );
    }
    // 构建配置
    const config = {
      systemPrompt: systemPrompt || '',
      scenario: scenario || '',
      rounds: rounds || 3,
      roleA: roleA || '用户',
      roleB: roleB || '助手',
      model,
      language
    };
    // 生成多轮对话
    const result = await generateMultiTurnConversation(projectId, questionId, config);
    if (!result.success) {
      return NextResponse.json(
        {
          success: false,
          message: result.error
        },
        { status: 500 }
      );
    }
    return NextResponse.json({
      success: true,
      data: result.data
    });
  } catch (error) {
    console.error('创建多轮对话数据集失败:', error);
    return NextResponse.json(
      {
        success: false,
        message: error.message
      },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/dataset-conversations/tags/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/dataset-conversations/tags/route.js
@@ -0,0 +1,42 @@
 import { NextResponse } from 'next/server';
 import { getAllDatasetConversations } from '@/lib/db/dataset-conversations';
 /**
 * 获取项目中多轮对话数据集的所有标签
 */
 export async function GET(request, { params }) {
  try {
    const { projectId } = params;
    if (!projectId) {
      return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
    }
    // 获取项目所有对话数据集
    const conversations = await getAllDatasetConversations(projectId);
    // 提取所有标签
    const allTags = new Set();
    conversations.forEach(conversation => {
      if (conversation.tags && typeof conversation.tags === 'string') {
        const tags = conversation.tags.split(/\s+/).filter(tag => tag.trim().length > 0);
        tags.forEach(tag => allTags.add(tag.trim()));
      }
    });
    return NextResponse.json({
      success: true,
      tags: Array.from(allTags).sort()
    });
  } catch (error) {
    console.error('获取对话标签失败:', error);
    return NextResponse.json(
      {
        success: false,
        message: error.message
      },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/datasets/[datasetId]/copy-to-eval/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/datasets/[datasetId]/copy-to-eval/route.js
@@ -0,0 +1,77 @@
 import { NextResponse } from 'next/server';
 import { db } from '@/lib/db';
 export async function POST(req, { params }) {
  try {
    const { projectId, datasetId } = params;
    // 1. 获取数据集详情
    const dataset = await db.datasets.findUnique({
      where: { id: datasetId, projectId }
    });
    if (!dataset) {
      return NextResponse.json({ error: 'Dataset not found' }, { status: 404 });
    }
    // 2. 尝试通过 questionId 查找关联的 chunkId
    let chunkId = null;
    if (dataset.questionId) {
      const question = await db.questions.findUnique({
        where: { id: dataset.questionId }
      });
      if (question) {
        chunkId = question.chunkId;
      }
    }
    // 3. 创建评估数据集记录
    // 默认使用 open_ended 类型，因为通常数据集是问答对，适合作为评估
    let evalTags = [];
    try {
      evalTags = JSON.parse(dataset.tags || '[]');
      if (!Array.isArray(evalTags)) evalTags = [];
    } catch (e) {
      evalTags = [];
    }
    // 排除 'Eval' 标签，并将数组转为逗号分隔的字符串
    const evalTagsString = evalTags.filter(tag => tag !== 'Eval').join(',');
    const evalDataset = await db.evalDatasets.create({
      data: {
        projectId,
        question: dataset.question,
        questionType: 'open_ended',
        correctAnswer: dataset.answer,
        tags: evalTagsString,
        note: dataset.note,
        chunkId: chunkId,
        options: '' // 开放题不需要选项
      }
    });
    // 4. 更新原数据集，添加 'Eval' 标签
    let currentTags = [];
    try {
      currentTags = JSON.parse(dataset.tags || '[]');
    } catch (e) {
      // ignore error
    }
    if (!currentTags.includes('Eval')) {
      currentTags.push('Eval');
      await db.datasets.update({
        where: { id: datasetId },
        data: {
          tags: JSON.stringify(currentTags)
        }
      });
    }
    return NextResponse.json({ success: true, evalDataset });
  } catch (error) {
    console.error('Failed to copy dataset to eval:', error);
    return NextResponse.json({ error: 'Internal Server Error' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/datasets/[datasetId]/evaluate/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/datasets/[datasetId]/evaluate/route.js
@@ -0,0 +1,36 @@
 import { NextResponse } from 'next/server';
 import { evaluateDataset } from '@/lib/services/datasets/evaluation';
 /**
 * 评估单个数据集的质量
 */
 export async function POST(request, { params }) {
  try {
    const { projectId, datasetId } = params;
    const { model, language = 'zh-CN' } = await request.json();
    if (!projectId || !datasetId) {
      return NextResponse.json({ success: false, message: '项目ID和数据集ID不能为空' }, { status: 400 });
    }
    if (!model) {
      return NextResponse.json({ success: false, message: '模型配置不能为空' }, { status: 400 });
    }
    // 使用评估服务进行数据集评估
    const result = await evaluateDataset(projectId, datasetId, model, language);
    if (!result.success) {
      return NextResponse.json({ success: false, message: result.error }, { status: 500 });
    }
    return NextResponse.json({
      success: true,
      message: '数据集评估完成',
      data: result.data
    });
  } catch (error) {
    console.error('数据集评估失败:', error);
    return NextResponse.json({ success: false, message: `评估失败: ${error.message}` }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/datasets/[datasetId]/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/datasets/[datasetId]/route.js
@@ -0,0 +1,82 @@
 import { NextResponse } from 'next/server';
 import { getDatasetsById, getDatasetsCounts, getNavigationItems, updateDatasetMetadata } from '@/lib/db/datasets';
 /**
 * 获取项目的所有数据集
 */
 export async function GET(request, { params }) {
  try {
    const { projectId, datasetId } = params;
    // 验证项目ID
    if (!projectId) {
      return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
    }
    if (!datasetId) {
      return NextResponse.json({ error: '数据集ID不能为空' }, { status: 400 });
    }
    const { searchParams } = new URL(request.url);
    const operateType = searchParams.get('operateType');
    if (operateType !== null) {
      const data = await getNavigationItems(projectId, datasetId, operateType);
      return NextResponse.json(data);
    }
    const datasets = await getDatasetsById(datasetId);
    let counts = await getDatasetsCounts(projectId);
    return NextResponse.json({ datasets, ...counts });
  } catch (error) {
    console.error('获取数据集详情失败:', String(error));
    return NextResponse.json(
      {
        error: error.message || '获取数据集详情失败'
      },
      { status: 500 }
    );
  }
 }
 /**
 * 更新数据集元数据（评分、标签、备注）
 */
 export async function PATCH(request, { params }) {
  try {
    const { projectId, datasetId } = params;
    // 验证参数
    if (!projectId) {
      return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
    }
    if (!datasetId) {
      return NextResponse.json({ error: '数据集ID不能为空' }, { status: 400 });
    }
    const body = await request.json();
    const { score, tags, note } = body;
    // 验证评分范围
    if (score !== undefined && (score < 0 || score > 5)) {
      return NextResponse.json({ error: '评分必须在0-5之间' }, { status: 400 });
    }
    // 验证标签格式
    if (tags !== undefined && !Array.isArray(tags)) {
      return NextResponse.json({ error: '标签必须是数组格式' }, { status: 400 });
    }
    // 更新数据集元数据
    const updatedDataset = await updateDatasetMetadata(datasetId, { score, tags, note });
    return NextResponse.json({
      success: true,
      dataset: updatedDataset
    });
  } catch (error) {
    console.error('更新数据集元数据失败:', String(error));
    return NextResponse.json(
      {
        error: error.message || '更新数据集元数据失败'
      },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/datasets/[datasetId]/token-count/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/datasets/[datasetId]/token-count/route.js
@@ -0,0 +1,52 @@
 import { NextResponse } from 'next/server';
 import { getDatasetsById } from '@/lib/db/datasets';
 import { getEncoding } from '@langchain/core/utils/tiktoken';
 /**
 * 异步计算数据集文本的Token数量
 */
 export async function GET(request, { params }) {
  try {
    const { projectId, datasetId } = params;
    if (!datasetId) {
      return NextResponse.json({ error: '数据集ID不能为空' }, { status: 400 });
    }
    const datasets = await getDatasetsById(datasetId);
    const tokenCounts = {
      answerTokens: 0,
      cotTokens: 0
    };
    try {
      if (datasets.answer || datasets.cot) {
        // 使用 cl100k_base 编码，适用于 gpt-3.5-turbo 和 gpt-4
        const encoding = await getEncoding('cl100k_base');
        if (datasets.answer) {
          const tokens = encoding.encode(datasets.answer);
          tokenCounts.answerTokens = tokens.length;
        }
        if (datasets.cot) {
          const tokens = encoding.encode(datasets.cot);
          tokenCounts.cotTokens = tokens.length;
        }
      }
    } catch (error) {
      console.error('计算Token数量失败:', String(error));
      return NextResponse.json({ error: '计算Token数量失败' }, { status: 500 });
    }
    return NextResponse.json(tokenCounts);
  } catch (error) {
    console.error('获取Token计数失败:', String(error));
    return NextResponse.json(
      {
        error: error.message || '获取Token计数失败'
      },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/datasets/batch-evaluate/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/datasets/batch-evaluate/route.js
@@ -0,0 +1,55 @@
 /**
 * 批量数据集评估任务API
 * 创建批量评估数据集质量的异步任务
 */
 import { NextResponse } from 'next/server';
 import { db } from '@/lib/db/index';
 import { processTask } from '@/lib/services/tasks/index';
 /**
 * 创建批量数据集评估任务
 */
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const { model, language = 'zh-CN' } = await request.json();
    if (!projectId) {
      return NextResponse.json({ success: false, message: '项目ID不能为空' }, { status: 400 });
    }
    if (!model || !model.modelId) {
      return NextResponse.json({ success: false, message: '模型配置不能为空' }, { status: 400 });
    }
    // 创建批量评估任务
    const newTask = await db.task.create({
      data: {
        projectId,
        taskType: 'dataset-evaluation',
        status: 0, // 初始状态: 处理中
        modelInfo: JSON.stringify(model),
        language: language || 'zh-CN',
        detail: '',
        totalCount: 0,
        note: '准备开始批量评估数据集质量...',
        completedCount: 0
      }
    });
    // 异步处理任务
    processTask(newTask.id).catch(err => {
      console.error(`批量评估任务启动失败: ${newTask.id}`, String(err));
    });
    return NextResponse.json({
      success: true,
      message: '批量评估任务已创建',
      data: { taskId: newTask.id }
    });
  } catch (error) {
    console.error('创建批量评估任务失败:', error);
    return NextResponse.json({ success: false, message: `创建任务失败: ${error.message}` }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/datasets/export/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/datasets/export/route.js
@@ -0,0 +1,128 @@
 import { NextResponse } from 'next/server';
 import {
  getDatasets,
  getBalancedDatasetsByTags,
  getTagsWithDatasetCounts,
  getDatasetsBatch,
  getBalancedDatasetsByTagsBatch,
  getDatasetsByIds,
  getDatasetsByIdsBatch
 } from '@/lib/db/datasets';
 /**
 * 获取导出数据集
 */
 export async function GET(request, { params }) {
  try {
    const { projectId } = params;
    const { searchParams } = new URL(request.url);
    // 验证项目ID
    if (!projectId) {
      return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
    }
    const confirmedParam = searchParams.get('confirmed');
    const confirmed = confirmedParam === null ? undefined : confirmedParam === 'true';
    // 获取标签统计信息
    const tagStats = await getTagsWithDatasetCounts(projectId, confirmed);
    return NextResponse.json(tagStats);
  } catch (error) {
    console.error('Failed to get tag statistics:', String(error));
    return NextResponse.json(
      {
        error: error.message || 'Failed to get tag statistics'
      },
      { status: 500 }
    );
  }
 }
 /**
 * 获取标签统计信息
 */
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const body = await request.json();
    // 验证项目ID
    if (!projectId) {
      return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
    }
    let status = body.status;
    let confirmed = undefined;
    if (status === 'confirmed') confirmed = true;
    if (status === 'unconfirmed') confirmed = false;
    // 检查是否是分批导出模式
    const batchMode = body.batchMode ? 'true' : 'false';
    const offset = body.offset ?? 0;
    const batchSize = body.batchSize ?? 1000;
    // 检查是否是平衡导出
    const balanceMode = body.balanceMode ? 'true' : 'false';
    const balanceConfig = body.balanceConfig;
    // 检查是否有选中的数据集 ID
    const selectedIds = Array.isArray(body.selectedIds) ? body.selectedIds : null;
    if (batchMode === 'true') {
      // 分批导出模式
      if (selectedIds && selectedIds.length > 0) {
        // 按选中 ID 分批导出
        const datasets = await getDatasetsByIdsBatch(projectId, selectedIds, offset, batchSize);
        const hasMore = datasets.length === batchSize;
        return NextResponse.json({
          data: datasets,
          hasMore,
          offset: offset + datasets.length
        });
      } else if (balanceMode === 'true' && balanceConfig) {
        // 平衡分批导出
        const parsedConfig = typeof balanceConfig === 'string' ? JSON.parse(balanceConfig) : balanceConfig;
        const result = await getBalancedDatasetsByTagsBatch(projectId, parsedConfig, confirmed, offset, batchSize);
        return NextResponse.json({
          data: result.data,
          hasMore: result.hasMore,
          offset: offset + result.data.length
        });
      } else {
        // 常规分批导出
        const datasets = await getDatasetsBatch(projectId, confirmed, offset, batchSize);
        const hasMore = datasets.length === batchSize;
        return NextResponse.json({
          data: datasets,
          hasMore,
          offset: offset + datasets.length
        });
      }
    } else {
      // 传统一次性导出模式（保持向后兼容）
      if (selectedIds && selectedIds.length > 0) {
        // 按选中 ID 导出
        const datasets = await getDatasetsByIds(projectId, selectedIds);
        return NextResponse.json(datasets);
      } else if (balanceMode === 'true' && balanceConfig) {
        // 平衡导出模式
        const parsedConfig = typeof balanceConfig === 'string' ? JSON.parse(balanceConfig) : balanceConfig;
        const datasets = await getBalancedDatasetsByTags(projectId, parsedConfig, confirmed);
        return NextResponse.json(datasets);
      } else {
        // 常规导出模式
        const datasets = await getDatasets(projectId, confirmed);
        return NextResponse.json(datasets);
      }
    }
  } catch (error) {
    console.error('Failed to get datasets:', String(error));
    return NextResponse.json(
      {
        error: error.message || 'Failed to get datasets'
      },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/datasets/generate-eval-variant/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/datasets/generate-eval-variant/route.js
@@ -0,0 +1,44 @@
 import { NextResponse } from 'next/server';
 import { getDatasetsById } from '@/lib/db/datasets';
 import LLMClient from '@/lib/llm/core/index';
 import { getEvalQuestionPrompt } from '@/lib/llm/prompts/evalQuestion';
 import { extractJsonFromLLMOutput } from '@/lib/llm/common/util';
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const { datasetId, model, language, questionType = 'open_ended', count = 1 } = await request.json();
    if (!datasetId || !model) {
      return NextResponse.json({ error: 'Missing required parameters' }, { status: 400 });
    }
    // 1. 获取原数据集
    const dataset = await getDatasetsById(datasetId);
    if (!dataset) {
      return NextResponse.json({ error: 'Dataset not found' }, { status: 404 });
    }
    // 2. 构建提示词
    // 将原问题和答案合并作为上下文文本
    const text = `Question: ${dataset.question}\nAnswer: ${dataset.answer}`;
    const prompt = await getEvalQuestionPrompt(language || 'zh-CN', questionType, { text, number: count }, projectId);
    // 3. 调用 LLM
    const client = new LLMClient(model);
    const response = await client.getResponse(prompt);
    const result = extractJsonFromLLMOutput(response);
    // 结果应该是一个数组
    if (!result || !Array.isArray(result)) {
      throw new Error('Failed to parse LLM output or output is not an array');
    }
    return NextResponse.json({ success: true, data: result });
  } catch (error) {
    console.error('Generate eval variant failed:', error);
    return NextResponse.json({ error: error.message || 'Internal Server Error' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/datasets/import/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/datasets/import/route.js
@@ -0,0 +1,109 @@
 import { NextResponse } from 'next/server';
 import { createDataset } from '@/lib/db/datasets';
 import { nanoid } from 'nanoid';
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const { datasets, sourceInfo } = await request.json();
    if (!datasets || !Array.isArray(datasets)) {
      return NextResponse.json({ error: 'Invalid datasets data' }, { status: 400 });
    }
    const results = [];
    const errors = [];
    let successCount = 0;
    let skippedCount = 0;
    for (let i = 0; i < datasets.length; i++) {
      try {
        const dataset = datasets[i];
        // 安全获取与清洗字段
        const q = typeof dataset?.question === 'string' ? dataset.question.trim() : '';
        const a = typeof dataset?.answer === 'string' ? dataset.answer.trim() : '';
        // 验证必填字段：缺失则跳过
        if (!q || !a) {
          errors.push(`第 ${i + 1} 条记录缺少必填字段(question/answer)，已跳过`);
          skippedCount++;
          continue;
        }
        // 规范化可选字段
        const chunkName = dataset?.chunkName || 'Imported Data';
        const chunkContent = dataset?.chunkContent || 'Imported from external source';
        const model = dataset?.model || 'imported';
        const questionLabel = dataset?.questionLabel || '';
        const cot = typeof dataset?.cot === 'string' ? dataset.cot : '';
        const confirmed = typeof dataset?.confirmed === 'boolean' ? dataset.confirmed : false;
        const score = typeof dataset?.score === 'number' ? dataset.score : 0;
        // tags: 支持数组/字符串/对象
        let tags = '[]';
        if (Array.isArray(dataset?.tags)) {
          try {
            tags = JSON.stringify(dataset.tags);
          } catch {
            tags = '[]';
          }
        } else if (typeof dataset?.tags === 'string') {
          tags = dataset.tags;
        } else if (dataset?.tags && typeof dataset.tags === 'object') {
          try {
            tags = JSON.stringify(dataset.tags);
          } catch {
            tags = '[]';
          }
        }
        // other: 对象或字符串
        let other = '{}';
        if (typeof dataset?.other === 'string') {
          other = dataset.other;
        } else if (dataset?.other && typeof dataset.other === 'object') {
          try {
            other = JSON.stringify(dataset.other);
          } catch {
            other = '{}';
          }
        }
        const note = typeof dataset?.note === 'string' ? dataset.note : '';
        // 创建数据集记录
        const newDataset = await createDataset({
          projectId,
          questionId: nanoid(), // 生成唯一的问题ID
          question: q,
          answer: a,
          chunkName,
          chunkContent,
          model,
          questionLabel,
          cot,
          confirmed,
          score,
          tags,
          note,
          other
        });
        results.push(newDataset);
        successCount++;
      } catch (error) {
        errors.push(`第 ${i + 1} 条记录: ${error.message}`);
      }
    }
    return NextResponse.json({
      success: successCount,
      total: datasets.length,
      failed: errors.length,
      skipped: skippedCount,
      errors,
      sourceInfo
    });
  } catch (error) {
    console.error('Import datasets error:', error);
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/datasets/optimize/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/datasets/optimize/route.js
@@ -0,0 +1,89 @@
 import { NextResponse } from 'next/server';
 import { getDatasetsById, updateDataset } from '@/lib/db/datasets';
 import { getQuestionById } from '@/lib/db/questions';
 import { getChunkById } from '@/lib/db/chunks';
 import LLMClient from '@/lib/llm/core/index';
 import { getNewAnswerPrompt } from '@/lib/llm/prompts/newAnswer';
 import { extractJsonFromLLMOutput } from '@/lib/llm/common/util';
 // 优化数据集答案
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    // 验证项目ID
    if (!projectId) {
      return NextResponse.json({ error: 'Project ID cannot be empty' }, { status: 400 });
    }
    // 获取请求体
    const { datasetId, model, advice, language } = await request.json();
    if (!datasetId) {
      return NextResponse.json({ error: 'Dataset ID cannot be empty' }, { status: 400 });
    }
    if (!model) {
      return NextResponse.json({ error: 'Model cannot be empty' }, { status: 400 });
    }
    if (!advice) {
      return NextResponse.json({ error: 'Please provide optimization suggestions' }, { status: 400 });
    }
    // 获取数据集内容
    const dataset = await getDatasetsById(datasetId);
    if (!dataset) {
      return NextResponse.json({ error: 'Dataset does not exist' }, { status: 404 });
    }
    // 创建LLM客户端
    const llmClient = new LLMClient(model);
    const { question, answer, cot, chunkContent: storedChunkContent, questionId } = dataset;
    let chunkContent = storedChunkContent || '';
    if (!chunkContent && questionId) {
      try {
        const questionRecord = await getQuestionById(questionId);
        if (questionRecord?.chunkId) {
          const chunkRecord = await getChunkById(questionRecord.chunkId);
          chunkContent = chunkRecord?.content || '';
        }
      } catch (error) {
        console.error('Failed to load chunk content by questionId:', error);
      }
    }
    // 生成优化后的答案和思维链
    const prompt = await getNewAnswerPrompt(language, { question, answer, cot, advice, chunkContent }, projectId);
    const response = await llmClient.getResponse(prompt);
    // 从LLM输出中提取JSON格式的优化结果
    const optimizedResult = extractJsonFromLLMOutput(response);
    if (!optimizedResult || !optimizedResult.answer) {
      return NextResponse.json({ error: 'Failed to optimize answer, please try again' }, { status: 500 });
    }
    // 更新数据集
    const updatedDataset = {
      ...dataset,
      answer: optimizedResult.answer,
      cot: cot ? optimizedResult.cot || cot : '' // 如果没有提供思考过程，则不更新
    };
    await updateDataset(updatedDataset);
    // 返回优化后的数据集
    return NextResponse.json({
      success: true,
      dataset: updatedDataset
    });
  } catch (error) {
    console.error('Failed to optimize answer:', String(error));
    return NextResponse.json({ error: error.message || 'Failed to optimize answer' }, { status: 500 });
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/datasets/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/datasets/route.js
@@ -0,0 +1,193 @@
 import { NextResponse } from 'next/server';
 import {
  deleteDataset,
  getDatasetsByPagination,
  getDatasetsIds,
  getDatasetsById,
  updateDataset
 } from '@/lib/db/datasets';
 import datasetService from '@/lib/services/datasets';
 // 优化思维链函数已移至服务层
 /**
 * 生成数据集（为单个问题生成答案）
 */
 export async function POST(request, { params }) {
  try {
    const { projectId } = params;
    const { questionId, model, language } = await request.json();
    // 使用数据集生成服务
    const result = await datasetService.generateDatasetForQuestion(projectId, questionId, {
      model,
      language
    });
    return NextResponse.json(result);
  } catch (error) {
    console.error('Failed to generate dataset:', String(error));
    return NextResponse.json(
      {
        error: error.message || 'Failed to generate dataset'
      },
      { status: 500 }
    );
  }
 }
 /**
 * 获取项目的所有数据集
 */
 export async function GET(request, { params }) {
  try {
    const { projectId } = params;
    const { searchParams } = new URL(request.url);
    // 验证项目ID
    if (!projectId) {
      return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
    }
    const page = parseInt(searchParams.get('page')) || 1;
    const size = parseInt(searchParams.get('size')) || 10;
    const input = searchParams.get('input');
    const field = searchParams.get('field') || 'question';
    const status = searchParams.get('status');
    const hasCot = searchParams.get('hasCot');
    const isDistill = searchParams.get('isDistill');
    const scoreRange = searchParams.get('scoreRange');
    const customTag = searchParams.get('customTag');
    const noteKeyword = searchParams.get('noteKeyword');
    const chunkName = searchParams.get('chunkName');
    let confirmed = undefined;
    if (status === 'confirmed') confirmed = true;
    if (status === 'unconfirmed') confirmed = false;
    let selectedAll = searchParams.get('selectedAll');
    if (selectedAll) {
      let data = await getDatasetsIds(
        projectId,
        confirmed,
        input,
        field,
        hasCot,
        isDistill,
        scoreRange,
        customTag,
        noteKeyword,
        chunkName
      );
      return NextResponse.json(data);
    }
    // 获取数据集
    const datasets = await getDatasetsByPagination(
      projectId,
      page,
      size,
      confirmed,
      input,
      field, // 传递搜索字段参数
      hasCot, // 传递思维链筛选参数
      isDistill, // 传递蒸馏数据集筛选参数
      scoreRange, // 传递评分范围筛选参数
      customTag, // 传递自定义标签筛选参数
      noteKeyword, // 传递备注关键字筛选参数
      chunkName // 传递文本块名称筛选参数
    );
    return NextResponse.json(datasets);
  } catch (error) {
    console.error('获取数据集失败:', String(error));
    return NextResponse.json(
      {
        error: error.message || '获取数据集失败'
      },
      { status: 500 }
    );
  }
 }
 /**
 * 删除数据集
 */
 export async function DELETE(request) {
  try {
    const { searchParams } = new URL(request.url);
    const datasetId = searchParams.get('id');
    if (!datasetId) {
      return NextResponse.json(
        {
          error: 'Dataset ID cannot be empty'
        },
        { status: 400 }
      );
    }
    await deleteDataset(datasetId);
    return NextResponse.json({
      success: true,
      message: 'Dataset deleted successfully'
    });
  } catch (error) {
    console.error('Failed to delete dataset:', error);
    return NextResponse.json(
      {
        error: error.message || 'Failed to delete dataset'
      },
      { status: 500 }
    );
  }
 }
 /**
 * 编辑数据集
 */
 export async function PATCH(request) {
  try {
    const { searchParams } = new URL(request.url);
    const datasetId = searchParams.get('id');
    const { answer, cot, question, confirmed } = await request.json();
    if (!datasetId) {
      return NextResponse.json(
        {
          error: 'Dataset ID cannot be empty'
        },
        { status: 400 }
      );
    }
    // 获取所有数据集
    let dataset = await getDatasetsById(datasetId);
    if (!dataset) {
      return NextResponse.json(
        {
          error: 'Dataset does not exist'
        },
        { status: 404 }
      );
    }
    let data = { id: datasetId };
    if (confirmed !== undefined) data.confirmed = confirmed;
    if (answer) data.answer = answer;
    if (cot) data.cot = cot;
    if (question) data.question = question;
    // 保存更新后的数据集列表
    await updateDataset(data);
    return NextResponse.json({
      success: true,
      message: 'Dataset updated successfully',
      dataset: dataset
    });
  } catch (error) {
    console.error('Failed to update dataset:', String(error));
    return NextResponse.json(
      {
        error: error.message || 'Failed to update dataset'
      },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/datasets/tags/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/datasets/tags/route.js
@@ -0,0 +1,28 @@
 import { NextResponse } from 'next/server';
 import { getUsedCustomTags } from '@/lib/db/datasets';
 /**
 * 获取项目中使用过的自定义标签
 */
 export async function GET(request, { params }) {
  try {
    const { projectId } = params;
    // 验证项目ID
    if (!projectId) {
      return NextResponse.json({ error: '项目ID不能为空' }, { status: 400 });
    }
    const tags = await getUsedCustomTags(projectId);
    return NextResponse.json({ tags });
  } catch (error) {
    console.error('获取自定义标签失败:', String(error));
    return NextResponse.json(
      {
        error: error.message || '获取自定义标签失败'
      },
      { status: 500 }
    );
  }
 }
--- a/easy-dataset-main/app/api/projects/[projectId]/default-prompts/route.js
+++ b/easy-dataset-main/app/api/projects/[projectId]/default-prompts/route.js
@@ -0,0 +1,38 @@
 import { NextResponse } from 'next/server';
 // 获取默认提示词内容
 export async function GET(request, { params }) {
  try {
    const { searchParams } = new URL(request.url);
    const promptType = searchParams.get('promptType');
    const promptKey = searchParams.get('promptKey');
    if (!promptType || !promptKey) {
      return NextResponse.json({ error: 'promptType and promptKey are required' }, { status: 400 });
    }
    // 动态导入对应的提示词模块
    let promptModule;
    try {
      promptModule = await import(`@/lib/llm/prompts/${promptType}`);
    } catch (error) {
      return NextResponse.json({ error: `Prompt module ${promptType} not found` }, { status: 404 });
    }
    // 获取指定的提示词常量
    const promptContent = promptModule[promptKey];
    if (!promptContent) {
      return NextResponse.json({ error: `Prompt key ${promptKey} not found in module ${promptType}` }, { status: 404 });
    }
    return NextResponse.json({
      success: true,
      content: promptContent,
      promptType,
      promptKey
    });
  } catch (error) {
    console.error('获取默认提示词失败:', error);
    return NextResponse.json({ error: error.message }, { status: 500 });
  }
 }
--- a/Show More
+++ b/Show More
		`@@ -0,0 +1,3 @@`
							`#!/usr/bin/env sh`

							`npx commitlint --edit "$1"`