新增了数据集上传界面

2026-01-11 13:26:23 +08:00
parent 217f457f5f
commit 6e1b4b58ba
12 changed files with 1253 additions and 56 deletions
--- a/.claude/settings.local.json
+++ b/.claude/settings.local.json
@@ -21,7 +21,18 @@
      "Bash(kill:*)",
      "Bash(rm:*)",
      "Bash(./start.sh:*)",
-      "Bash(lsof:*)"
+      "Bash(lsof:*)",
      "Bash(pip3:*)",
      "Bash(python3 -m pip:*)",
      "Bash(apt update:*)",
      "Bash(python3:*)",
      "Bash(./test_api.sh:*)",
      "Bash(netstat:*)",
      "Bash(/data/code/FT_Platform/YG_FT_Platform/test_all.sh:*)",
      "Bash(find:*)",
      "Bash(./test_upload.sh:*)",
      "Bash(./test_all.sh)",
      "Bash(/data/code/FT_Platform/YG_FT_Platform/test_data_dir.sh:*)"
    ]
  }
 }
--- a/README.md
+++ b/README.md
@@ -1,3 +1,169 @@
-# YG_FT_Platform
+# 大模型微调平台
-天蜂微调平台构建
+一个完整的大模型微调平台，包含前端 Web 界面和 FastAPI 后端服务。
 ## 🚀 快速开始
 ### 一键启动所有服务
 ```bash
 ./total_start.sh
 ```
 选择 `1) 启动所有服务`，即可同时启动前端和后端服务。
 ## 📁 项目结构
 ```
 YG_FT_Platform/
 ├── total_start.sh      # 一键启动所有服务
 ├── test_all.sh         # 测试所有服务
 ├── README.md           # 项目说明文档
 ├── src/                # FastAPI 后端服务
 │   ├── main.py         # FastAPI 应用主文件
 │   ├── requirements.txt # Python 依赖列表
 │   ├── run.sh          # FastAPI 启动脚本
 │   ├── test_api.sh     # API 测试脚本
 │   └── README.md       # FastAPI 文档
 └── web/                # Web 前端
    ├── pages/          # HTML 页面
    │   ├── main.html    # 主页面 (SPA)
    │   └── login.html   # 登录页面
    ├── css/             # 样式文件
    ├── assets/          # 静态资源
    ├── start.sh         # Web 启动脚本
    └── README.md        # Web 前端文档
 ```
 ## 🌐 服务地址
 ### 前端 (端口 8000)
 - **主页**: http://10.10.10.77:8000/pages/main.html
 - **登录页**: http://10.10.10.77:8000/pages/login.html
 ### 后端 (端口 8001)
 - **API 根路径**: http://10.10.10.77:8001/
 - **API 文档**: http://10.10.10.77:8001/docs
 - **替代文档**: http://10.10.10.77:8001/redoc
 ## 🎯 功能特性
 ### 前端特性
 - ✅ 单页应用 (SPA)
 - ✅ 响应式设计，支持手机/平板访问
 - ✅ 用户登录验证
 - ✅ 数据集管理页面
 - ✅ 系统监控仪表盘
 - ✅ 模拟数据实时更新
 ### 后端特性
 - ✅ RESTful API 设计
 - ✅ 用户认证与授权
 - ✅ 数据集管理 API
 - ✅ 模型配置管理
 - ✅ 训练状态监控
 - ✅ 系统统计信息
 - ✅ 统一的响应格式
 ## 🔧 启动方式
 ### 方式 1: 一键启动所有服务 (推荐)
 ```bash
 ./total_start.sh
 ```
 选择启动模式：
 - `1` - 启动所有服务（FastAPI + Web前端）
 - `2` - 只启动 FastAPI 服务
 - `3` - 只启动 Web 前端服务
 - `4` - 交互式选择
 ### 方式 2: 单独启动服务
 #### 启动后端服务
 ```bash
 cd src
 ./run.sh
 ```
 #### 启动前端服务
 ```bash
 cd web
 ./start.sh
 ```
 ## 🧪 测试
 ### 测试所有服务
 ```bash
 ./test_all.sh
 ```
 ### 测试 API
 ```bash
 cd src
 ./test_api.sh
 ```
 ### 手动测试 API
 ```bash
 # 健康检查
 curl http://10.10.10.77:8001/api/health
 # 获取数据集
 curl http://10.10.10.77:8001/api/datasets
 # 用户登录
 curl -X POST http://10.10.10.77:8001/api/login \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "123456"}'
 ```
 ## 📚 API 文档
 ### 主要端点
 | 方法 | 路径 | 描述 |
 |------|------|------|
 | GET | / | 根路径 |
 | GET | /api/health | 健康检查 |
 | POST | /api/login | 用户登录 |
 | GET | /api/datasets | 获取数据集列表 |
 | POST | /api/datasets | 创建数据集 |
 | GET | /api/models | 获取模型列表 |
 | POST | /api/models/config | 配置模型参数 |
 | GET | /api/training/status | 获取训练状态 |
 | GET | /api/system/stats | 获取系统统计 |
 ## 🛠️ 技术栈
 ### 前端
 - HTML5 + CSS3 + JavaScript
 - Tailwind CSS (样式框架)
 - Chart.js (图表库)
 - 单页应用 (SPA) 架构
 ### 后端
 - Python 3.10+
 - FastAPI (Web 框架)
 - Uvicorn (ASGI 服务器)
 - Pydantic (数据验证)
 ## ⚙️ 系统要求
 - Python 3.7+
 - pip (Python 包管理器)
 - 现代浏览器 (Chrome, Firefox, Safari, Edge)
 ## 📝 许可证
 MIT License
 ## 🤝 贡献
 欢迎提交 Issue 和 Pull Request！
 ## 📧 联系方式
 如有问题，请提交 Issue 或联系开发者。
--- a/data/.gitignore
+++ b/data/.gitignore
@@ -0,0 +1,3 @@
 # 忽略data目录下的所有文件
 *
 !.gitignore
--- a/src/README.md
+++ b/src/README.md
@@ -0,0 +1,129 @@
 # FastAPI 服务器
 ## 功能特性
 这个 FastAPI 服务器为大模型微调平台提供了 RESTful API 接口。
 ## API 端点
 ### 基础信息
 - `GET /` - 根路径，返回欢迎信息
 - `GET /api/health` - 健康检查
 ### 用户认证
 - `POST /api/login` - 用户登录
  ```json
  {
    "username": "admin",
    "password": "your_password"
  }
  ```
 ### 数据集管理
 - `GET /api/datasets` - 获取数据集列表
 - `POST /api/datasets` - 创建新数据集
  ```json
  {
    "name": "新数据集名称",
    "description": "数据集描述",
    "size": "数据集大小"
  }
  ```
 - `POST /api/datasets/upload` - 上传数据集文件（支持 JSON 和 JSONL 格式）
  ```bash
  curl -X POST "http://10.10.10.77:8001/api/datasets/upload" \
    -F "file=@dataset.json" \
    -F "description=数据集描述"
  ```
  **支持的文件格式**: .json, .jsonl
  **文件大小限制**: 100MB
 - `GET /api/datasets/files` - 获取data目录中保存的文件列表
 - `DELETE /api/datasets/{dataset_id}` - 删除数据集
 ### 模型管理
 - `GET /api/models` - 获取模型列表
 - `POST /api/models/config` - 配置模型参数
  ```json
  {
    "model_name": "GPT-4",
    "learning_rate": 0.001,
    "batch_size": 32,
    "epochs": 100
  }
  ```
 ### 训练管理
 - `GET /api/training/status` - 获取训练状态
 - `POST /api/training/start` - 开始训练任务
 - `POST /api/training/stop/{task_id}` - 停止训练任务
 - `GET /api/model/{model_id}/metrics` - 获取模型指标
 ### 系统监控
 - `GET /api/system/stats` - 获取系统统计信息
 ## 启动服务器
 ### 方法 1: 使用启动脚本（推荐）
 ```bash
 cd src
 ./run.sh
 ```
 ### 方法 2: 手动启动
 ```bash
 # 安装依赖
 pip3 install -r requirements.txt
 # 启动服务器
 uvicorn main:app --host 0.0.0.0 --port 8001 --reload
 ```
 ## 访问地址
 - **服务器**: http://10.10.10.77:8001
 - **API 文档**: http://10.10.10.77:8001/docs
 - **替代文档**: http://10.10.10.77:8001/redoc
 ## 示例请求
 ### 登录
 ```bash
 curl -X POST "http://10.10.10.77:8001/api/login" \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "123456"}'
 ```
 ### 获取数据集列表
 ```bash
 curl -X GET "http://10.10.10.77:8001/api/datasets"
 ```
 ### 上传数据集文件
 ```bash
 curl -X POST "http://10.10.10.77:8001/api/datasets/upload" \
  -F "file=@dataset.json" \
  -F "description=数据集描述"
 ```
 ### 获取data目录文件列表
 ```bash
 curl -X GET "http://10.10.10.77:8001/api/datasets/files"
 ```
 ### 获取系统统计
 ```bash
 curl -X GET "http://10.10.10.77:8001/api/system/stats"
 ```
 ## 依赖
 - Python 3.7+
 - FastAPI 0.104.1
 - Uvicorn 0.24.0
 - Pydantic 2.5.0
 ## 注意事项
 - 服务器默认运行在端口 8001
 - 使用 `--reload` 参数启用热重载
 - 所有 API 响应都遵循统一格式
--- a/src/main.py
+++ b/src/main.py
@@ -0,0 +1,381 @@
 from fastapi import FastAPI, File, UploadFile, HTTPException
 from pydantic import BaseModel
 from typing import List, Optional
 import uvicorn
 import os
 import json
 import re
 import time
 app = FastAPI(title="大模型微调平台 API", version="1.0.0")
 # 请求模型
 class UserModel(BaseModel):
    username: str
    password: str
 class DatasetModel(BaseModel):
    name: str
    description: Optional[str] = None
    size: str
 class ModelConfigModel(BaseModel):
    model_name: str
    learning_rate: float
    batch_size: int
    epochs: int
 # 响应模型
 class ResponseModel(BaseModel):
    code: int
    message: str
    data: Optional[dict] = None
 # 模拟数据存储
 datasets = [
    {"id": 1, "name": "中文对话数据集", "size": "1.2GB", "status": "已处理"},
    {"id": 2, "name": "英文文本分类数据集", "size": "856MB", "status": "处理中"},
    {"id": 3, "name": "图像识别数据集", "size": "2.5GB", "status": "待处理"},
 ]
 models = [
    {"id": 1, "name": "GPT-4", "status": "训练中", "accuracy": "92%"},
    {"id": 2, "name": "BERT", "status": "已完成", "accuracy": "89%"},
    {"id": 3, "name": "LLaMA", "status": "已完成", "accuracy": "95%"},
 ]
@app.get("/")
 async def root():
    """根路径"""
    return {"message": "大模型微调平台 API 服务"}
@app.get("/api/health")
 async def health_check():
    """健康检查"""
    return ResponseModel(code=200, message="服务运行正常", data={"status": "healthy"})
@app.post("/api/login", response_model=ResponseModel)
 async def login(user: UserModel):
    """用户登录"""
    if user.username == "admin" and user.password:
        return ResponseModel(
            code=200,
            message="登录成功",
            data={"token": "mock_token_12345", "user": user.username}
        )
    else:
        return ResponseModel(code=401, message="用户名或密码错误")
@app.get("/api/datasets", response_model=ResponseModel)
 async def get_datasets():
    """获取数据集列表"""
    return ResponseModel(code=200, message="获取成功", data={"datasets": datasets})
@app.post("/api/datasets", response_model=ResponseModel)
 async def create_dataset(dataset: DatasetModel):
    """创建数据集"""
    new_dataset = {
        "id": len(datasets) + 1,
        "name": dataset.name,
        "description": dataset.description,
        "size": "0MB",
        "status": "待处理"
    }
    datasets.append(new_dataset)
    return ResponseModel(code=201, message="创建成功", data={"dataset": new_dataset})
@app.post("/api/datasets/upload", response_model=ResponseModel)
 async def upload_dataset(file: UploadFile = File(...), description: Optional[str] = None):
    """上传数据集文件（仅支持 JSON 和 JSONL 格式）"""
    # 检查文件类型
    allowed_extensions = ['.json', '.jsonl']
    file_extension = os.path.splitext(file.filename)[1].lower()
    if file_extension not in allowed_extensions:
        raise HTTPException(
            status_code=400,
            detail=f"不支持的文件类型。只能上传 {', '.join(allowed_extensions)} 格式的文件"
        )
    # 检查文件大小（限制为 100MB）
    max_size = 100 * 1024 * 1024  # 100MB
    contents = await file.read()
    file_size = len(contents)
    if file_size > max_size:
        raise HTTPException(
            status_code=400,
            detail=f"文件大小超过限制。最大支持 100MB，当前文件大小: {file_size / (1024*1024):.2f}MB"
        )
    try:
        # 验证文件内容
        if file_extension == '.json':
            # 验证 JSON 文件
            json.loads(contents.decode('utf-8'))
        elif file_extension == '.jsonl':
            # 验证 JSONL 文件（每行必须是有效的 JSON）
            lines = contents.decode('utf-8').strip().split('\n')
            for i, line in enumerate(lines):
                if line.strip():
                    try:
                        json.loads(line)
                    except json.JSONDecodeError as e:
                        raise HTTPException(
                            status_code=400,
                            detail=f"JSONL 文件格式错误：第 {i+1} 行不是有效的 JSON 格式"
                        )
        # 生成文件大小字符串
        if file_size < 1024:
            size_str = f"{file_size}B"
        elif file_size < 1024 * 1024:
            size_str = f"{file_size / 1024:.2f}KB"
        else:
            size_str = f"{file_size / (1024*1024):.2f}MB"
        # 计算行数（用于统计）
        lines_count = len(contents.decode('utf-8').strip().split('\n')) if contents else 0
        # 保存文件到 data 目录
        data_dir = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), 'data')
        os.makedirs(data_dir, exist_ok=True)
        # 生成唯一文件名（避免冲突）
        base_name = os.path.splitext(file.filename)[0]
        timestamp = int(time.time())
        saved_filename = f"{base_name}_{timestamp}{file_extension}"
        saved_path = os.path.join(data_dir, saved_filename)
        # 写入文件
        with open(saved_path, 'wb') as f:
            f.write(contents)
        # 创建新数据集记录
        new_dataset = {
            "id": len(datasets) + 1,
            "name": file.filename,
            "description": description or f"上传的数据集文件，包含 {lines_count} 行数据",
            "size": size_str,
            "status": "已处理",
            "upload_time": "刚刚",
            "file_extension": file_extension,
            "records_count": lines_count,
            "saved_path": saved_path  # 添加保存路径信息
        }
        # 添加到数据集列表
        datasets.append(new_dataset)
        return ResponseModel(
            code=200,
            message="文件上传成功",
            data={
                "dataset": new_dataset,
                "file_info": {
                    "filename": file.filename,
                    "size": size_str,
                    "extension": file_extension,
                    "records": lines_count
                }
            }
        )
    except json.JSONDecodeError:
        raise HTTPException(
            status_code=400,
            detail="JSON 文件格式错误：文件内容不是有效的 JSON 格式"
        )
    except UnicodeDecodeError:
        raise HTTPException(
            status_code=400,
            detail="文件编码错误：请确保文件使用 UTF-8 编码"
        )
    except Exception as e:
        raise HTTPException(
            status_code=500,
            detail=f"文件处理错误：{str(e)}"
        )
@app.get("/api/datasets/files", response_model=ResponseModel)
 async def list_dataset_files():
    """列出data目录中所有保存的数据集文件"""
    try:
        data_dir = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), 'data')
        if not os.path.exists(data_dir):
            return ResponseModel(
                code=200,
                message="获取成功",
                data={"files": [], "total": 0, "directory": data_dir}
            )
        files = []
        for filename in os.listdir(data_dir):
            file_path = os.path.join(data_dir, filename)
            if os.path.isfile(file_path):
                stat = os.stat(file_path)
                files.append({
                    "filename": filename,
                    "size": stat.st_size,
                    "size_human": format_size(stat.st_size),
                    "modified_time": time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(stat.st_mtime)),
                    "path": file_path
                })
        # 按修改时间排序（最新的在前）
        files.sort(key=lambda x: x["modified_time"], reverse=True)
        return ResponseModel(
            code=200,
            message="获取成功",
            data={
                "files": files,
                "total": len(files),
                "directory": data_dir
            }
        )
    except Exception as e:
        raise HTTPException(
            status_code=500,
            detail=f"获取文件列表失败：{str(e)}"
        )
 def format_size(size_bytes):
    """格式化文件大小"""
    if size_bytes < 1024:
        return f"{size_bytes}B"
    elif size_bytes < 1024 * 1024:
        return f"{size_bytes / 1024:.2f}KB"
    else:
        return f"{size_bytes / (1024*1024):.2f}MB"
@app.delete("/api/datasets/{dataset_id}", response_model=ResponseModel)
 async def delete_dataset(dataset_id: int):
    """删除数据集"""
    global datasets
    for i, dataset in enumerate(datasets):
        if dataset["id"] == dataset_id:
            deleted_dataset = datasets.pop(i)
            return ResponseModel(
                code=200,
                message="删除成功",
                data={"deleted_dataset": deleted_dataset}
            )
    raise HTTPException(status_code=404, detail="数据集不存在")
@app.get("/api/models", response_model=ResponseModel)
 async def get_models():
    """获取模型列表"""
    return ResponseModel(code=200, message="获取成功", data={"models": models})
@app.post("/api/models/config", response_model=ResponseModel)
 async def config_model(config: ModelConfigModel):
    """配置模型参数"""
    return ResponseModel(
        code=200,
        message="配置成功",
        data={
            "model_name": config.model_name,
            "learning_rate": config.learning_rate,
            "batch_size": config.batch_size,
            "epochs": config.epochs,
            "status": "已配置"
        }
    )
@app.get("/api/training/status")
 async def get_training_status():
    """获取训练状态"""
    return ResponseModel(
        code=200,
        message="获取成功",
        data={
            "current_task": "GPT-4微调",
            "progress": 75,
            "eta": "2小时",
            "loss": 0.23,
            "accuracy": 0.89
        }
    )
@app.get("/api/system/stats")
 async def get_system_stats():
    """获取系统统计信息"""
    import random
    return ResponseModel(
        code=200,
        message="获取成功",
        data={
            "cpu_usage": random.randint(30, 80),
            "memory_usage": random.randint(40, 70),
            "gpu_usage": random.randint(50, 90),
            "active_tasks": 5,
            "completed_tasks": 158
        }
    )
@app.post("/api/training/start")
 async def start_training(model_name: str, dataset_id: int):
    """开始训练任务"""
    return ResponseModel(
        code=200,
        message="训练任务已启动",
        data={
            "task_id": random.randint(1000, 9999),
            "model_name": model_name,
            "dataset_id": dataset_id,
            "status": "running"
        }
    )
@app.post("/api/training/stop/{task_id}")
 async def stop_training(task_id: int):
    """停止训练任务"""
    return ResponseModel(
        code=200,
        message=f"训练任务 {task_id} 已停止",
        data={"task_id": task_id, "status": "stopped"}
    )
@app.get("/api/model/{model_id}/metrics")
 async def get_model_metrics(model_id: int):
    """获取模型指标"""
    return ResponseModel(
        code=200,
        message="获取成功",
        data={
            "model_id": model_id,
            "accuracy": round(random.uniform(0.85, 0.98), 3),
            "precision": round(random.uniform(0.80, 0.95), 3),
            "recall": round(random.uniform(0.82, 0.96), 3),
            "f1_score": round(random.uniform(0.83, 0.97), 3),
            "training_time": f"{random.randint(2, 24)}小时",
            "parameters": random.randint(1000000, 100000000)
        }
    )
 if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8001)
--- a/src/requirements.txt
+++ b/src/requirements.txt
@@ -0,0 +1,4 @@
 fastapi==0.104.1
 uvicorn[standard]==0.24.0
 pydantic==2.5.0
 python-multipart==0.0.6
--- a/src/run.sh
+++ b/src/run.sh
@@ -0,0 +1,43 @@
 #!/bin/bash
 echo "🚀 启动 FastAPI 服务器..."
 # 确保在正确的目录中
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 cd "$SCRIPT_DIR"
 echo "📂 当前目录: $SCRIPT_DIR"
 # 检查Python是否安装
 if ! command -v python3 &> /dev/null; then
    echo "❌ 错误: Python3 未安装"
    echo "请先安装 Python3"
    exit 1
 fi
 # 检查pip是否安装
 if ! command -v pip3 &> /dev/null; then
    echo "❌ 错误: pip3 未安装"
    echo "请先安装 pip3"
    exit 1
 fi
 # 安装依赖
 echo "📦 安装依赖包..."
 pip3 install -r requirements.txt
 if [ $? -ne 0 ]; then
    echo "❌ 依赖安装失败"
    exit 1
 fi
 echo ""
 echo "🌐 服务器地址: http://localhost:8001"
 echo "📚 API 文档: http://localhost:8001/docs"
 echo "🔍 替代文档: http://localhost:8001/redoc"
 echo ""
 echo "按 Ctrl+C 停止服务器"
 echo ""
 # 启动服务器
 python3 -m uvicorn main:app --host 0.0.0.0 --port 8001 --reload
--- a/src/test_api.sh
+++ b/src/test_api.sh
@@ -0,0 +1,47 @@
 #!/bin/bash
 echo "🧪 测试 FastAPI 服务器"
 echo "=================================="
 echo ""
 BASE_URL="http://localhost:8001"
 # 测试 1: 根路径
 echo "1. 测试根路径..."
 curl -s "$BASE_URL/" | python3 -m json.tool
 echo ""
 # 测试 2: 健康检查
 echo "2. 测试健康检查..."
 curl -s "$BASE_URL/api/health" | python3 -m json.tool
 echo ""
 # 测试 3: 用户登录
 echo "3. 测试用户登录..."
 curl -s -X POST "$BASE_URL/api/login" \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "123456"}' | python3 -m json.tool
 echo ""
 # 测试 4: 获取数据集列表
 echo "4. 测试获取数据集列表..."
 curl -s "$BASE_URL/api/datasets" | python3 -m json.tool
 echo ""
 # 测试 5: 获取模型列表
 echo "5. 测试获取模型列表..."
 curl -s "$BASE_URL/api/models" | python3 -m json.tool
 echo ""
 # 测试 6: 系统统计
 echo "6. 测试系统统计..."
 curl -s "$BASE_URL/api/system/stats" | python3 -m json.tool
 echo ""
 # 测试 7: 训练状态
 echo "7. 测试训练状态..."
 curl -s "$BASE_URL/api/training/status" | python3 -m json.tool
 echo ""
 echo "=================================="
 echo "✅ 所有测试完成！"
--- a/test_all.sh
+++ b/test_all.sh
@@ -0,0 +1,88 @@
 #!/bin/bash
 echo "🧪 测试大模型微调平台 - 所有服务"
 echo "=================================="
 echo ""
 BASE_URL="http://localhost"
 # 检查服务是否运行
 echo "1. 检查服务状态..."
 echo ""
 # 检查端口 8000 (Web 前端)
 if curl -s "${BASE_URL}:8000" > /dev/null 2>&1; then
    echo "✅ Web 前端服务正在运行 (端口 8000)"
 else
    echo "❌ Web 前端服务未运行 (端口 8000)"
 fi
 # 检查端口 8001 (FastAPI)
 if curl -s "${BASE_URL}:8001" > /dev/null 2>&1; then
    echo "✅ FastAPI 服务正在运行 (端口 8001)"
 else
    echo "❌ FastAPI 服务未运行 (端口 8001)"
 fi
 echo ""
 echo "=================================="
 echo ""
 # 获取本机IP
 SERVER_IP=$(hostname -I | awk '{print $1}')
 echo "📱 访问地址:"
 echo ""
 echo "前端页面:"
 echo "   - 主页: http://$SERVER_IP:8000/pages/main.html"
 echo "   - 登录: http://$SERVER_IP:8000/pages/login.html"
 echo ""
 echo "API 服务:"
 echo "   - API 根路径: http://$SERVER_IP:8001/"
 echo "   - API 健康检查: http://$SERVER_IP:8001/api/health"
 echo "   - API 文档: http://$SERVER_IP:8001/docs"
 echo ""
 echo "=================================="
 echo "2. 测试 API 接口..."
 echo ""
 # 测试 API
 echo "测试根路径:"
 curl -s "${BASE_URL}:8001/" | python3 -m json.tool 2>/dev/null || curl -s "${BASE_URL}:8001/"
 echo ""
 echo "测试健康检查:"
 curl -s "${BASE_URL}:8001/api/health" | python3 -m json.tool 2>/dev/null || echo "请求失败"
 echo ""
 echo "测试数据集 API:"
 curl -s "${BASE_URL}:8001/api/datasets" | python3 -m json.tool 2>/dev/null || echo "请求失败"
 echo ""
 echo "=================================="
 echo "3. 测试前端页面..."
 echo ""
 # 测试前端页面
 echo "测试主页:"
 if curl -s -I "${BASE_URL}:8000/pages/main.html" | grep -q "200 OK"; then
    echo "✅ 主页可访问"
 else
    echo "❌ 主页无法访问"
 fi
 echo "测试登录页:"
 if curl -s -I "${BASE_URL}:8000/pages/login.html" | grep -q "200 OK"; then
    echo "✅ 登录页可访问"
 else
    echo "❌ 登录页无法访问"
 fi
 echo ""
 echo "=================================="
 echo "✅ 测试完成！"
 echo ""
 echo "💡 如果服务未运行，请使用以下命令启动:"
 echo "   ./total_start.sh"
 echo ""
--- a/test_data_dir.sh
+++ b/test_data_dir.sh
@@ -0,0 +1,53 @@
 #!/bin/bash
 echo "🧪 测试data目录功能"
 echo "=================================="
 echo ""
 API_URL="http://10.10.10.77:8001/api"
 echo "1. 测试获取文件列表..."
 curl -s "${API_URL}/datasets/files" | python3 -c "
 import json, sys
 data = json.load(sys.stdin)
 print(f'✅ 共 {data[\"data\"][\"total\"]} 个文件')
 print('')
 for f in data['data']['files']:
    print(f'  📄 {f[\"filename\"]} - {f[\"size_human\"]}')
 "
 echo ""
 echo "2. 测试上传新文件..."
 cat > /tmp/test_upload.json << 'INNER_EOF'
 [{"text": "测试上传", "label": "test"}]
 INNER_EOF
 curl -s -X POST "${API_URL}/datasets/upload" \
  -F "file=@/tmp/test_upload.json" \
  -F "description=测试data目录上传" | python3 -c "
 import json, sys
 data = json.load(sys.stdin)
 if data['code'] == 200:
    print('✅ 文件上传成功')
    print(f'  保存路径: {data[\"data\"][\"dataset\"][\"saved_path\"]}')
 else:
    print('❌ 上传失败')
 "
 echo ""
 echo "3. 再次获取文件列表..."
 curl -s "${API_URL}/datasets/files" | python3 -c "
 import json, sys
 data = json.load(sys.stdin)
 print(f'✅ 当前共 {data[\"data\"][\"total\"]} 个文件')
 "
 echo ""
 echo "=================================="
 echo "✅ 测试完成！"
 echo ""
 echo "📁 data目录位置："
 echo "   /data/code/FT_Platform/YG_FT_Platform/data/"
 echo ""
 echo "🔍 查看文件："
 echo "   ls -lh /data/code/FT_Platform/YG_FT_Platform/data/"
--- a/total_start.sh
+++ b/total_start.sh
@@ -0,0 +1,200 @@
 #!/bin/bash
 echo "🚀 大模型微调平台 - 一键启动"
 echo "=================================="
 echo ""
 # 确保在正确的目录中
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 cd "$SCRIPT_DIR"
 echo "📂 当前目录: $SCRIPT_DIR"
 echo ""
 # 获取本机IP地址
 SERVER_IP=$(hostname -I | awk '{print $1}')
 echo "🌐 本机 IP 地址: $SERVER_IP"
 echo ""
 # 检查Python是否安装
 if ! command -v python3 &> /dev/null; then
    echo "❌ 错误: Python3 未安装"
    echo "请先安装 Python3"
    exit 1
 fi
 echo "请选择启动方式:"
 echo "1) 启动所有服务（FastAPI + Web前端）"
 echo "2) 只启动 FastAPI 服务（端口 8001）"
 echo "3) 只启动 Web 前端服务（端口 8000）"
 echo "4) 交互式选择"
 echo ""
 read -p "请输入选择 (1-4): " choice
 case $choice in
    1)
        echo ""
        echo "✅ 启动所有服务..."
        echo ""
        # 检查并启动 FastAPI 服务
        echo "🔧 启动 FastAPI 服务..."
        cd "$SCRIPT_DIR/src"
        if [ ! -d "node_modules" ] 2>/dev/null && [ ! -f "main.py" ]; then
            echo "⚠️  警告: 未找到 FastAPI 源码文件"
        fi
        # 启动 FastAPI (后台运行)
        cd "$SCRIPT_DIR/src"
        python3 -m uvicorn main:app --host 0.0.0.0 --port 8001 &
        FASTAPI_PID=$!
        echo "✅ FastAPI 服务已启动 (PID: $FASTAPI_PID)"
        echo "   API 地址: http://localhost:8001"
        echo "   API 文档: http://localhost:8001/docs"
        echo ""
        # 等待 FastAPI 启动
        sleep 2
        # 启动 Web 前端服务
        echo "🌐 启动 Web 前端服务..."
        cd "$SCRIPT_DIR/web"
        python3 -m http.server 8000 &
        WEB_PID=$!
        echo "✅ Web 前端服务已启动 (PID: $WEB_PID)"
        echo "   前端地址: http://localhost:8000"
        echo ""
        echo "=================================="
        echo "🎉 所有服务启动完成！"
        echo ""
        echo "📱 访问地址:"
        echo "   - 前端页面: http://$SERVER_IP:8000/pages/main.html"
        echo "   - 登录页面: http://$SERVER_IP:8000/pages/login.html"
        echo "   - API 服务: http://$SERVER_IP:8001"
        echo "   - API 文档: http://$SERVER_IP:8001/docs"
        echo ""
        echo "⚠️  按 Ctrl+C 停止所有服务"
        echo ""
        # 等待用户中断
        trap "echo ''; echo '🛑 正在停止服务...'; kill $FASTAPI_PID $WEB_PID 2>/dev/null; echo '✅ 所有服务已停止'; exit 0" INT
        # 保持脚本运行
        while true; do
            sleep 1
        done
        ;;
    2)
        echo ""
        echo "🔧 启动 FastAPI 服务..."
        cd "$SCRIPT_DIR/src"
        # 检查源码文件是否存在
        if [ ! -f "main.py" ]; then
            echo "❌ 错误: 未找到 main.py 文件"
            echo "请确保在正确的目录中"
            exit 1
        fi
        echo "✅ FastAPI 服务启动中..."
        echo "   API 地址: http://localhost:8001"
        echo "   API 文档: http://localhost:8001/docs"
        echo ""
        echo "⚠️  按 Ctrl+C 停止服务"
        echo ""
        python3 -m uvicorn main:app --host 0.0.0.0 --port 8001 --reload
        ;;
    3)
        echo ""
        echo "🌐 启动 Web 前端服务..."
        cd "$SCRIPT_DIR/web"
        # 检查页面文件是否存在
        if [ ! -f "pages/main.html" ]; then
            echo "❌ 错误: 未找到 pages/main.html 文件"
            echo "请确保在正确的目录中"
            exit 1
        fi
        echo "✅ Web 前端服务启动中..."
        echo "   前端地址: http://localhost:8000"
        echo ""
        echo "⚠️  按 Ctrl+C 停止服务"
        echo ""
        python3 -m http.server 8000
        ;;
    4)
        echo ""
        echo "🔧 检查服务状态..."
        echo ""
        # 检查 FastAPI
        if curl -s http://localhost:8001 > /dev/null 2>&1; then
            echo "✅ FastAPI 服务正在运行 (端口 8001)"
        else
            echo "❌ FastAPI 服务未运行 (端口 8001)"
        fi
        # 检查 Web 服务
        if curl -s http://localhost:8000 > /dev/null 2>&1; then
            echo "✅ Web 前端服务正在运行 (端口 8000)"
        else
            echo "❌ Web 前端服务未运行 (端口 8000)"
        fi
        echo ""
        read -p "是否启动 FastAPI 服务？(y/n): " start_fastapi
        if [[ $start_fastapi == "y" || $start_fastapi == "Y" ]]; then
            cd "$SCRIPT_DIR/src"
            python3 -m uvicorn main:app --host 0.0.0.0 --port 8001 &
            FASTAPI_PID=$!
            echo "✅ FastAPI 服务已启动 (PID: $FASTAPI_PID)"
        fi
        echo ""
        read -p "是否启动 Web 前端服务？(y/n): " start_web
        if [[ $start_web == "y" || $start_web == "Y" ]]; then
            cd "$SCRIPT_DIR/web"
            python3 -m http.server 8000 &
            WEB_PID=$!
            echo "✅ Web 前端服务已启动 (PID: $WEB_PID)"
        fi
        if [[ $start_fastapi == "y" || $start_fastapi == "Y" || $start_web == "y" || $start_web == "Y" ]]; then
            echo ""
            echo "=================================="
            echo "🎉 服务启动完成！"
            echo ""
            echo "📱 访问地址:"
            echo "   - 前端页面: http://$SERVER_IP:8000/pages/main.html"
            echo "   - 登录页面: http://$SERVER_IP:8000/pages/login.html"
            echo "   - API 服务: http://$SERVER_IP:8001"
            echo "   - API 文档: http://$SERVER_IP:8001/docs"
            echo ""
            echo "⚠️  按 Ctrl+C 停止服务"
            echo ""
            # 等待用户中断
            trap "echo ''; echo '🛑 正在停止服务...'; kill $FASTAPI_PID $WEB_PID 2>/dev/null; echo '✅ 所有服务已停止'; exit 0" INT
            # 保持脚本运行
            while true; do
                sleep 1
            done
        else
            echo "未启动任何服务"
        fi
        ;;
    *)
        echo "❌ 无效选择，请运行脚本重新选择"
        exit 1
        ;;
 esac
--- a/web/pages/main.html
+++ b/web/pages/main.html
@@ -367,64 +367,15 @@
                <div class="bg-white rounded-xl p-6 card-shadow">
                    <div class="flex justify-between items-center mb-6">
                        <p class="text-lg font-semibold text-dashboard-text">数据集管理</p>
-                        <button class="px-4 py-2 gradient-blue text-white rounded-lg hover:opacity-90 transition-opacity">
+                        <button onclick="document.getElementById('fileInput').click()" class="px-4 py-2 gradient-blue text-white rounded-lg hover:opacity-90 transition-opacity">
                            上传数据集
                        </button>
                        <input type="file" id="fileInput" style="display: none" accept=".json,.jsonl" onchange="handleFileUpload(this)">
                    </div>
                    <!-- 数据集列表 -->
-                    <div class="space-y-4">
+                    <div id="datasetList" class="space-y-4">
-                        <div class="border border-dashboard-bg rounded-lg p-4 hover:shadow-md transition-shadow">
+                        <!-- 数据集列表将通过 JavaScript 动态加载 -->
                            <div class="flex justify-between items-center">
                                <div>
                                    <h3 class="font-medium text-dashboard-text">中文对话数据集</h3>
                                    <p class="text-sm text-dashboard-textLight mt-1">1.2GB • 125,000条对话</p>
                                </div>
                                <div class="flex items-center space-x-2">
                                    <span class="px-2 py-1 bg-green-100 text-green-600 text-xs rounded">已处理</span>
                                    <button class="text-dashboard-primary hover:underline text-sm">查看详情</button>
                                </div>
                            </div>
                        </div>
                        <div class="border border-dashboard-bg rounded-lg p-4 hover:shadow-md transition-shadow">
                            <div class="flex justify-between items-center">
                                <div>
                                    <h3 class="font-medium text-dashboard-text">英文文本分类数据集</h3>
                                    <p class="text-sm text-dashboard-textLight mt-1">856MB • 89,000条文本</p>
                                </div>
                                <div class="flex items-center space-x-2">
                                    <span class="px-2 py-1 bg-blue-100 text-blue-600 text-xs rounded">处理中</span>
                                    <button class="text-dashboard-primary hover:underline text-sm">查看详情</button>
                                </div>
                            </div>
                        </div>
                        <div class="border border-dashboard-bg rounded-lg p-4 hover:shadow-md transition-shadow">
                            <div class="flex justify-between items-center">
                                <div>
                                    <h3 class="font-medium text-dashboard-text">图像识别数据集</h3>
                                    <p class="text-sm text-dashboard-textLight mt-1">2.5GB • 45,000张图片</p>
                                </div>
                                <div class="flex items-center space-x-2">
                                    <span class="px-2 py-1 bg-gray-100 text-gray-600 text-xs rounded">待处理</span>
                                    <button class="text-dashboard-primary hover:underline text-sm">查看详情</button>
                                </div>
                            </div>
                        </div>
                        <div class="border border-dashboard-bg rounded-lg p-4 hover:shadow-md transition-shadow">
                            <div class="flex justify-between items-center">
                                <div>
                                    <h3 class="font-medium text-dashboard-text">代码生成数据集</h3>
                                    <p class="text-sm text-dashboard-textLight mt-1">3.1GB • 234,000个代码片段</p>
                                </div>
                                <div class="flex items-center space-x-2">
                                    <span class="px-2 py-1 bg-green-100 text-green-600 text-xs rounded">已处理</span>
                                    <button class="text-dashboard-primary hover:underline text-sm">查看详情</button>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
@@ -793,6 +744,127 @@
            });
        });
        // 修改 switchPage 函数以支持数据集页面加载
        const originalSwitchPage = switchPage;
        switchPage = function(pageId) {
            originalSwitchPage(pageId);
            // 如果切换到数据集页面，加载数据集列表
            if (pageId === 'dataset') {
                loadDatasets();
            }
        };
        // 数据集管理相关功能
        const API_BASE = 'http://10.10.10.77:8001/api';
        // 加载数据集列表
        async function loadDatasets() {
            try {
                const response = await fetch(`${API_BASE}/datasets`);
                const result = await response.json();
                if (result.code === 200) {
                    renderDatasetList(result.data.datasets);
                }
            } catch (error) {
                console.error('加载数据集列表失败:', error);
            }
        }
        // 渲染数据集列表
        function renderDatasetList(datasets) {
            const container = document.getElementById('datasetList');
            if (!container) return;
            container.innerHTML = '';
            datasets.forEach(dataset => {
                const statusClass = getStatusClass(dataset.status);
                const statusText = dataset.status;
                const datasetDiv = document.createElement('div');
                datasetDiv.className = 'border border-dashboard-bg rounded-lg p-4 hover:shadow-md transition-shadow';
                datasetDiv.innerHTML = `
                    <div class="flex justify-between items-center">
                        <div>
                            <h3 class="font-medium text-dashboard-text">${dataset.name}</h3>
                            <p class="text-sm text-dashboard-textLight mt-1">${dataset.size || '未知大小'} • ${dataset.records_count || dataset.description || '未知'}</p>
                        </div>
                        <div class="flex items-center space-x-2">
                            <span class="px-2 py-1 ${statusClass} text-xs rounded">${statusText}</span>
                            <button class="text-dashboard-primary hover:underline text-sm">查看详情</button>
                        </div>
                    </div>
                `;
                container.appendChild(datasetDiv);
            });
        }
        // 获取状态样式类
        function getStatusClass(status) {
            switch(status) {
                case '已处理':
                    return 'bg-green-100 text-green-600';
                case '处理中':
                    return 'bg-blue-100 text-blue-600';
                case '待处理':
                    return 'bg-gray-100 text-gray-600';
                default:
                    return 'bg-gray-100 text-gray-600';
            }
        }
        // 处理文件上传
        async function handleFileUpload(input) {
            const file = input.files[0];
            if (!file) return;
            // 检查文件类型
            const allowedTypes = ['.json', '.jsonl'];
            const fileExtension = '.' + file.name.split('.').pop().toLowerCase();
            if (!allowedTypes.includes(fileExtension)) {
                alert('只支持 JSON 和 JSONL 格式的文件！');
                input.value = '';
                return;
            }
            // 检查文件大小 (100MB)
            const maxSize = 100 * 1024 * 1024;
            if (file.size > maxSize) {
                alert('文件大小不能超过 100MB！');
                input.value = '';
                return;
            }
            try {
                const formData = new FormData();
                formData.append('file', file);
                formData.append('description', `用户上传的数据集文件: ${file.name}`);
                const response = await fetch(`${API_BASE}/datasets/upload`, {
                    method: 'POST',
                    body: formData
                });
                const result = await response.json();
                if (result.code === 200) {
                    alert('上传成功！');
                    // 重新加载数据集列表
                    loadDatasets();
                } else {
                    alert('上传失败: ' + (result.detail || result.message || '未知错误'));
                }
            } catch (error) {
                console.error('上传失败:', error);
                alert('上传失败: 网络错误');
            }
            // 清空文件选择
            input.value = '';
        }
        // 退出登录函数
        function logout() {
            // 清除登录状态