README.md

# LLM Model Evaluation System

[![Python Version](https://img.shields.io/badge/python-3.7%2B-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

A comprehensive evaluation system for Large Language Models (LLMs) with concurrent processing, batch progress tracking, and automatic retry mechanisms.

## ✨ Features

### 🚀 High-Performance Concurrent Processing
- **True Concurrency**: Utilizes ThreadPoolExecutor for real concurrent execution
- **Configurable Workers**: Set concurrent thread count via configuration
- **Auto CPU Detection**: Automatically uses all CPU cores by default
- **Batch Processing**: Processes data in batches for efficient resource utilization

### 📊 Intelligent Batch Progress Tracking
- **Dynamic Progress Bars**: Creates progress bars only for current batch
- **Memory Efficient**: Constant memory usage regardless of batch count
- **Scalable**: Supports 100K+ batches without performance degradation
- **Auto Cleanup**: Automatically closes progress bars after batch completion

### 🔄 Robust API Retry Mechanism
- **Automatic Retry**: Automatically retries failed API calls
- **Exponential Backoff**: Uses 2^n delay strategy to avoid API overload
- **Configurable**: Set retry count and delay via configuration file
- **Smart Error Handling**: Distinguishes retryable vs non-retryable errors

### 🌐 Flexible API Support
- **HTTP-Based**: Uses standard HTTP requests instead of vendor-specific SDKs
- **Multi-API Compatible**: Works with any OpenAI-compatible API endpoint
- **No Vendor Lock-in**: Supports custom, proxy, and self-hosted APIs

### 📈 Comprehensive Evaluation Metrics
- **Traditional Metrics**: BLEU, ROUGE-L, Exact Match, Keyword Overlap
- **LLM-Based Evaluation**: Semantic understanding via LLM scoring
- **Combined Scoring**: Weighted combination of multiple metrics
- **Detailed Reports**: Comprehensive evaluation reports with visualizations

## 📦 Installation

### Prerequisites
- Python 3.7 or higher
- pip (Python package manager)

### Install Dependencies

```bash
# Clone or download the repository
cd YG_LLM_Tester

# Install required packages
pip install -r requirements.txt
```

### Manual Installation

If you prefer to install packages individually:

```bash
pip install numpy nltk jieba pandas tqdm requests
```

**Note**: Some NLTK data will be downloaded automatically on first use.

## ⚙️ Configuration

### Basic Configuration (llm_config.py)

```python
# Concurrent Processing
MAX_CONCURRENT_WORKERS = 4  # Number of concurrent threads
SHOW_DETAILED_PROGRESS = True  # Show detailed progress bars

# API Retry Settings
MAX_API_RETRIES = 3  # Maximum retry attempts
RETRY_DELAY = 1.0  # Initial retry delay in seconds

# API Configuration
USE_REAL_LLM = False  # True for real LLM API, False for simulation
OPENAI_CONFIG = {
    "api_key": "your-api-key",
    "api_base": "https://api.openai.com/v1",
    "model": "gpt-3.5-turbo",
    "temperature": 0,
    "max_tokens": 500,
    "timeout": 60
}
```

### Environment Variables

You can also configure via environment variables:

```bash
export OPENAI_API_KEY="your-api-key"
export API_BASE_URL="https://your-api-endpoint/v1"
export USE_REAL_LLM="true"
```

## 🎯 Quick Start

### Basic Usage

```python
from model_evaluation import evaluate_dataset_parallel, ModelEvaluator

# Create evaluator
evaluator = ModelEvaluator()

# Prepare your data
data = [
    {
        'question': 'What is machine learning?',
        'output': 'Machine learning is a technology that enables computers to learn from data',
        'answer': 'Machine learning is a branch of artificial intelligence that allows computers to learn patterns from data'
    },
    # Add more data...
]

# Run evaluation (simulation mode)
results, metrics = evaluate_dataset_parallel(
    data=data,
    evaluator=evaluator,
    use_real_llm=False,  # Use simulation
    max_workers=2  # Optional: override default workers
)

# Print results
print(f"Evaluation Results: {results}")
print(f"Overall Metrics: {metrics}")
```

### Real LLM API Usage

```python
# Enable real LLM API (requires API key configuration)
results, metrics = evaluate_dataset_parallel(
    data=data,
    evaluator=evaluator,
    use_real_llm=True,  # Use real LLM API
    max_workers=4  # Recommended: 4-8 for real APIs
)

# API calls will automatically retry on failure
# using settings from llm_config.py
```

### Custom Retry Configuration

```python
# Get evaluation prompt
prompt = evaluator.get_llm_evaluation_prompt(
    reference="Reference answer",
    candidate="Model output",
    question="Question"
)

# Use custom retry settings
score, reason = evaluator.call_llm_for_evaluation(
    prompt,
    max_retries=5,      # Custom retry count
    retry_delay=2.0     # Custom retry delay
)
```

## 📊 Understanding the Output

### Progress Display

When running evaluations, you'll see progress bars:

```
总进度: 50%|█████     | 3/6 [00:00<00:00, 26.25it/s]
批次2-并发1: 任务3:   0%|          | 0/1 [00:00<?, ?it/s]
批次2-并发2: 任务4:   0%|          | 0/1 [00:00<?, ?it/s]
```

### Evaluation Results

```python
results = [
    {
        'index': 1,
        'Input': 'What is AI?',
        'Output': 'AI is artificial intelligence...',
        'Answer': 'Artificial intelligence is...',
        'bleu_score': 0.85,
        'rouge_l_score': 0.90,
        'exact_match_rate': 0.75,
        'keyword_overlap_rate': 0.80,
        'llm_score': 8,
        'llm_reason': 'The answer is accurate and well-structured...'
    }
]

metrics = {
    'bleu_score': 0.85,
    'rouge_l_score': 0.90,
    'character_overlap_rate': 0.75,
    'length_similarity': 0.80,
    'exact_match_rate': 0.75,
    'keyword_overlap_rate': 0.80,
    'llm_score': 8.0
}
```

## 🧪 Testing

Run the included test scripts:

```bash
# Test batch progress bars
python quick_batch_test.py

# Test HTTP API functionality
python test_http_api.py

# Test retry mechanism
python test_retry_simple.py

# Test retry configuration
python test_retry_config.py

# Run comprehensive tests
python final_test.py
```

## 📖 Documentation

### Core Components

- **[ModelEvaluator](model_evaluation.py)**: Main evaluation class
- **[Configuration](llm_config.py)**: All configuration parameters
- **[Batch Processing Guide](BATCH_PROGRESS_GUIDE.md)**: Detailed batch progress bar documentation
- **[Retry Mechanism Guide](RETRY_MECHANISM_GUIDE.md)**: Automatic retry mechanism documentation
- **[Retry Configuration Guide](RETRY_CONFIG_README.md)**: Configuration management guide

### Key Features Documentation

- **Concurrent Processing**: [Complete Implementation Summary](COMPLETE_IMPLEMENTATION_SUMMARY.md)
- **Batch Progress Bars**: [Batch Progress Guide](BATCH_PROGRESS_GUIDE.md)
- **HTTP API Migration**: [API Migration Report](HTTP_API_MIGRATION_REPORT.md)
- **Retry Mechanism**: [Retry Mechanism Guide](RETRY_MECHANISM_GUIDE.md)

## 🎛️ Advanced Configuration

### Concurrent Processing Settings

```python
# llm_config.py
MAX_CONCURRENT_WORKERS = None  # Auto-detect CPU cores
# or
MAX_CONCURRENT_WORKERS = 8     # Manual setting
```

**Recommendations**:
- Simulation mode: Use all CPU cores
- Real API mode: 4-8 workers (avoid rate limits)

### Progress Bar Settings

```python
# llm_config.py
SHOW_DETAILED_PROGRESS = True  # Show per-batch progress bars
```

**Recommendations**:
- Small datasets (< 20 items): Enable
- Large datasets (> 100 items): Disable for cleaner output

### Retry Mechanism Settings

```python
# llm_config.py
MAX_API_RETRIES = 3    # Number of retry attempts
RETRY_DELAY = 1.0      # Initial delay in seconds
```

**Recommendations**:
- Stable network: 1-2 retries, 0.5s delay
- Standard environment: 3 retries, 1.0s delay
- Unstable network: 5 retries, 2.0s delay

### API Configuration

```python
# llm_config.py
OPENAI_CONFIG = {
    "api_key": os.environ.get("OPENAI_API_KEY", "your-key"),
    "api_base": os.environ.get("API_BASE_URL", "https://api.openai.com/v1"),
    "model": "gpt-3.5-turbo",
    "temperature": 0,
    "max_tokens": 500,
    "timeout": 60
}
```

## 🔧 Troubleshooting

### Common Issues

#### 1. Import Errors
```
ImportError: No module named 'jieba'
```
**Solution**: Install missing dependencies
```bash
pip install -r requirements.txt
```

#### 2. NLTK Data Issues
```
LookupError: Resource 'tokenizers/punkt' not found
```
**Solution**: NLTK will download data automatically on first use. If issues persist:
```python
import nltk
nltk.download('punkt')
```

#### 3. API Connection Errors
```
requests.exceptions.ConnectionError
```
**Solution**:
- Check API endpoint URL
- Verify API key is correct
- Check network connectivity
- Adjust retry settings for unstable networks

#### 4. Memory Issues with Large Datasets
```
OutOfMemoryError
```
**Solution**:
- Disable detailed progress bars: `SHOW_DETAILED_PROGRESS = False`
- Reduce concurrent workers: `MAX_CONCURRENT_WORKERS = 2`
- Process data in smaller chunks

#### 5. Slow Performance
**Solutions**:
- Increase `MAX_CONCURRENT_WORKERS` for simulation mode
- Disable detailed progress bars for large datasets
- Use simulation mode (`use_real_llm=False`) for testing

### Performance Tuning

#### For Large Datasets (10K+ items)
```python
MAX_CONCURRENT_WORKERS = None  # Use all CPU cores
SHOW_DETAILED_PROGRESS = False # Disable detailed bars
MAX_API_RETRIES = 2            # Reduce retries
```

#### For High-Throughput Testing
```python
MAX_CONCURRENT_WORKERS = 8     # More workers
USE_REAL_LLM = False           # Use simulation
SHOW_DETAILED_PROGRESS = False # Disable detailed bars
```

#### For Production with Real APIs
```python
MAX_CONCURRENT_WORKERS = 4     # Balanced for API limits
USE_REAL_LLM = True            # Use real API
MAX_API_RETRIES = 3            # Ensure reliability
RETRY_DELAY = 1.0              # Standard delay
```

## 🤝 Contributing

We welcome contributions! Please feel free to submit issues and enhancement requests.

### Development Setup

1. Fork the repository
2. Create a virtual environment
3. Install development dependencies
4. Make your changes
5. Run tests to ensure everything works
6. Submit a pull request

### Code Style

- Follow PEP 8 guidelines
- Add docstrings to new functions
- Include type hints where applicable
- Write tests for new features

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- NLTK team for natural language processing tools
- jieba team for Chinese text segmentation
- tqdm team for progress bar functionality
- OpenAI for LLM API inspiration

## 📞 Support

For support, please:
1. Check the [Troubleshooting](#troubleshooting) section
2. Review the documentation files
3. Run the test scripts to verify your setup
4. Open an issue with detailed error information

## 🗺️ Roadmap

- [ ] Support for more evaluation metrics (BERTScore, METEOR)
- [ ] Integration with wandb for experiment tracking
- [ ] Web-based dashboard for result visualization
- [ ] Support for multilingual evaluation
- [ ] Batch evaluation with different models
- [ ] Export results to various formats (JSON, CSV, Excel)

## 📈 Version History

- **v5.2** (Current): Added configurable retry mechanism
- **v5.1**: Implemented HTTP API with retry logic
- **v5.0**: Batch progress tracking with dynamic bars
- **v4.0**: Concurrent processing implementation
- **v3.0**: LLM evaluation integration
- **v2.0**: Traditional metric evaluation
- **v1.0**: Initial release

---

**Made with ❤️ for the AI community**
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`# LLM Model Evaluation System`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`[![Python Version](https://img.shields.io/badge/python-3.7%2B-blue.svg)](https://www.python.org/downloads/)`
			`[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`A comprehensive evaluation system for Large Language Models (LLMs) with concurrent processing, batch progress tracking, and automatic retry mechanisms.`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`## ✨ Features`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### 🚀 High-Performance Concurrent Processing`
			`- True Concurrency: Utilizes ThreadPoolExecutor for real concurrent execution`
			`- Configurable Workers: Set concurrent thread count via configuration`
			`- Auto CPU Detection: Automatically uses all CPU cores by default`
			`- Batch Processing: Processes data in batches for efficient resource utilization`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### 📊 Intelligent Batch Progress Tracking`
			`- Dynamic Progress Bars: Creates progress bars only for current batch`
			`- Memory Efficient: Constant memory usage regardless of batch count`
			`- Scalable: Supports 100K+ batches without performance degradation`
			`- Auto Cleanup: Automatically closes progress bars after batch completion`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### 🔄 Robust API Retry Mechanism`
			`- Automatic Retry: Automatically retries failed API calls`
			`- Exponential Backoff: Uses 2^n delay strategy to avoid API overload`
			`- Configurable: Set retry count and delay via configuration file`
			`- Smart Error Handling: Distinguishes retryable vs non-retryable errors`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### 🌐 Flexible API Support`
			`- HTTP-Based: Uses standard HTTP requests instead of vendor-specific SDKs`
			`- Multi-API Compatible: Works with any OpenAI-compatible API endpoint`
			`- No Vendor Lock-in: Supports custom, proxy, and self-hosted APIs`

			`### 📈 Comprehensive Evaluation Metrics`
			`- Traditional Metrics: BLEU, ROUGE-L, Exact Match, Keyword Overlap`
			`- LLM-Based Evaluation: Semantic understanding via LLM scoring`
			`- Combined Scoring: Weighted combination of multiple metrics`
			`- Detailed Reports: Comprehensive evaluation reports with visualizations`

			`## 📦 Installation`

			`### Prerequisites`
			`- Python 3.7 or higher`
			`- pip (Python package manager)`

			`### Install Dependencies`
上传了相关方法 2025-12-23 15:07:19 +08:00
			```bash
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`# Clone or download the repository`
			`cd YG_LLM_Tester`

			`# Install required packages`
上传了相关方法 2025-12-23 15:07:19 +08:00			`pip install -r requirements.txt`
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			```
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### Manual Installation`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`If you prefer to install packages individually:`
上传了相关方法 2025-12-23 15:07:19 +08:00
			```bash
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`pip install numpy nltk jieba pandas tqdm requests`
上传了相关方法 2025-12-23 15:07:19 +08:00			```

1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`Note: Some NLTK data will be downloaded automatically on first use.`

			`## ⚙️ Configuration`

			`### Basic Configuration (llm_config.py)`

			```python
			`# Concurrent Processing`
			`MAX_CONCURRENT_WORKERS = 4 # Number of concurrent threads`
			`SHOW_DETAILED_PROGRESS = True # Show detailed progress bars`

			`# API Retry Settings`
			`MAX_API_RETRIES = 3 # Maximum retry attempts`
			`RETRY_DELAY = 1.0 # Initial retry delay in seconds`

			`# API Configuration`
			`USE_REAL_LLM = False # True for real LLM API, False for simulation`
			`OPENAI_CONFIG = {`
			`"api_key": "your-api-key",`
			`"api_base": "https://api.openai.com/v1",`
			`"model": "gpt-3.5-turbo",`
			`"temperature": 0,`
			`"max_tokens": 500,`
			`"timeout": 60`
			`}`
			```

			`### Environment Variables`

			`You can also configure via environment variables:`

上传了相关方法 2025-12-23 15:07:19 +08:00			```bash
			`export OPENAI_API_KEY="your-api-key"`
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`export API_BASE_URL="https://your-api-endpoint/v1"`
			`export USE_REAL_LLM="true"`
			```

			`## 🎯 Quick Start`

			`### Basic Usage`

			```python
			`from model_evaluation import evaluate_dataset_parallel, ModelEvaluator`

			`# Create evaluator`
			`evaluator = ModelEvaluator()`

			`# Prepare your data`
			`data = [`
			`{`
			`'question': 'What is machine learning?',`
			`'output': 'Machine learning is a technology that enables computers to learn from data',`
			`'answer': 'Machine learning is a branch of artificial intelligence that allows computers to learn patterns from data'`
			`},`
			`# Add more data...`
			`]`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`# Run evaluation (simulation mode)`
			`results, metrics = evaluate_dataset_parallel(`
			`data=data,`
			`evaluator=evaluator,`
			`use_real_llm=False, # Use simulation`
			`max_workers=2 # Optional: override default workers`
			`)`

			`# Print results`
			`print(f"Evaluation Results: {results}")`
			`print(f"Overall Metrics: {metrics}")`
上传了相关方法 2025-12-23 15:07:19 +08:00			```

1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### Real LLM API Usage`

			```python
			`# Enable real LLM API (requires API key configuration)`
			`results, metrics = evaluate_dataset_parallel(`
			`data=data,`
			`evaluator=evaluator,`
			`use_real_llm=True, # Use real LLM API`
			`max_workers=4 # Recommended: 4-8 for real APIs`
			`)`

			`# API calls will automatically retry on failure`
			`# using settings from llm_config.py`
上传了相关方法 2025-12-23 15:07:19 +08:00			```

1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### Custom Retry Configuration`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			```python
			`# Get evaluation prompt`
			`prompt = evaluator.get_llm_evaluation_prompt(`
			`reference="Reference answer",`
			`candidate="Model output",`
			`question="Question"`
			`)`

			`# Use custom retry settings`
			`score, reason = evaluator.call_llm_for_evaluation(`
			`prompt,`
			`max_retries=5, # Custom retry count`
			`retry_delay=2.0 # Custom retry delay`
			`)`
上传了相关方法 2025-12-23 15:07:19 +08:00			```
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00
			`## 📊 Understanding the Output`

			`### Progress Display`

			`When running evaluations, you'll see progress bars:`

			```
			`总进度: 50%\|█████ \| 3/6 [00:00<00:00, 26.25it/s]`
			`批次2-并发1: 任务3: 0%\| \| 0/1 [00:00<?, ?it/s]`
			`批次2-并发2: 任务4: 0%\| \| 0/1 [00:00<?, ?it/s]`
上传了相关方法 2025-12-23 15:07:19 +08:00			```

1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### Evaluation Results`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			```python
			`results = [`
			`{`
			`'index': 1,`
			`'Input': 'What is AI?',`
			`'Output': 'AI is artificial intelligence...',`
			`'Answer': 'Artificial intelligence is...',`
			`'bleu_score': 0.85,`
			`'rouge_l_score': 0.90,`
			`'exact_match_rate': 0.75,`
			`'keyword_overlap_rate': 0.80,`
			`'llm_score': 8,`
			`'llm_reason': 'The answer is accurate and well-structured...'`
			`}`
			`]`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`metrics = {`
			`'bleu_score': 0.85,`
			`'rouge_l_score': 0.90,`
			`'character_overlap_rate': 0.75,`
			`'length_similarity': 0.80,`
			`'exact_match_rate': 0.75,`
			`'keyword_overlap_rate': 0.80,`
			`'llm_score': 8.0`
			`}`
			```
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`## 🧪 Testing`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`Run the included test scripts:`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			```bash
			`# Test batch progress bars`
			`python quick_batch_test.py`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`# Test HTTP API functionality`
			`python test_http_api.py`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`# Test retry mechanism`
			`python test_retry_simple.py`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`# Test retry configuration`
			`python test_retry_config.py`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`# Run comprehensive tests`
			`python final_test.py`
上传了相关方法 2025-12-23 15:07:19 +08:00			```

1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`## 📖 Documentation`

			`### Core Components`

			`- [ModelEvaluator](model_evaluation.py): Main evaluation class`
			`- [Configuration](llm_config.py): All configuration parameters`
			`- [Batch Processing Guide](BATCH_PROGRESS_GUIDE.md): Detailed batch progress bar documentation`
			`- [Retry Mechanism Guide](RETRY_MECHANISM_GUIDE.md): Automatic retry mechanism documentation`
			`- [Retry Configuration Guide](RETRY_CONFIG_README.md): Configuration management guide`

			`### Key Features Documentation`

			`- Concurrent Processing: [Complete Implementation Summary](COMPLETE_IMPLEMENTATION_SUMMARY.md)`
			`- Batch Progress Bars: [Batch Progress Guide](BATCH_PROGRESS_GUIDE.md)`
			`- HTTP API Migration: [API Migration Report](HTTP_API_MIGRATION_REPORT.md)`
			`- Retry Mechanism: [Retry Mechanism Guide](RETRY_MECHANISM_GUIDE.md)`

			`## 🎛️ Advanced Configuration`

			`### Concurrent Processing Settings`

			```python
			`# llm_config.py`
			`MAX_CONCURRENT_WORKERS = None # Auto-detect CPU cores`
			`# or`
			`MAX_CONCURRENT_WORKERS = 8 # Manual setting`
上传了相关方法 2025-12-23 15:07:19 +08:00			```

1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`Recommendations:`
			`- Simulation mode: Use all CPU cores`
			`- Real API mode: 4-8 workers (avoid rate limits)`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### Progress Bar Settings`
上传了相关方法 2025-12-23 15:07:19 +08:00
			```python
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`# llm_config.py`
			`SHOW_DETAILED_PROGRESS = True # Show per-batch progress bars`
			```

			`Recommendations:`
			`- Small datasets (< 20 items): Enable`
			`- Large datasets (> 100 items): Disable for cleaner output`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### Retry Mechanism Settings`

			```python
			`# llm_config.py`
			`MAX_API_RETRIES = 3 # Number of retry attempts`
			`RETRY_DELAY = 1.0 # Initial delay in seconds`
上传了相关方法 2025-12-23 15:07:19 +08:00			```

1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`Recommendations:`
			`- Stable network: 1-2 retries, 0.5s delay`
			`- Standard environment: 3 retries, 1.0s delay`
			`- Unstable network: 5 retries, 2.0s delay`

			`### API Configuration`

			```python
			`# llm_config.py`
			`OPENAI_CONFIG = {`
			`"api_key": os.environ.get("OPENAI_API_KEY", "your-key"),`
			`"api_base": os.environ.get("API_BASE_URL", "https://api.openai.com/v1"),`
			`"model": "gpt-3.5-turbo",`
			`"temperature": 0,`
			`"max_tokens": 500,`
			`"timeout": 60`
			`}`
上传了相关方法 2025-12-23 15:07:19 +08:00			```

1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`## 🔧 Troubleshooting`

			`### Common Issues`

			`#### 1. Import Errors`
			```
			`ImportError: No module named 'jieba'`
			```
			`Solution: Install missing dependencies`
			```bash
			`pip install -r requirements.txt`
			```
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`#### 2. NLTK Data Issues`
			```
			`LookupError: Resource 'tokenizers/punkt' not found`
			```
			`Solution: NLTK will download data automatically on first use. If issues persist:`
上传了相关方法 2025-12-23 15:07:19 +08:00			```python
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`import nltk`
			`nltk.download('punkt')`
			```
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`#### 3. API Connection Errors`
			```
			`requests.exceptions.ConnectionError`
			```
			`Solution:`
			`- Check API endpoint URL`
			`- Verify API key is correct`
			`- Check network connectivity`
			`- Adjust retry settings for unstable networks`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`#### 4. Memory Issues with Large Datasets`
			```
			`OutOfMemoryError`
			```
			`Solution:`
			- Disable detailed progress bars: `SHOW_DETAILED_PROGRESS = False`
			- Reduce concurrent workers: `MAX_CONCURRENT_WORKERS = 2`
			`- Process data in smaller chunks`

			`#### 5. Slow Performance`
			`Solutions:`
			- Increase `MAX_CONCURRENT_WORKERS` for simulation mode
			`- Disable detailed progress bars for large datasets`
			- Use simulation mode (`use_real_llm=False`) for testing
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### Performance Tuning`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`#### For Large Datasets (10K+ items)`
			```python
			`MAX_CONCURRENT_WORKERS = None # Use all CPU cores`
			`SHOW_DETAILED_PROGRESS = False # Disable detailed bars`
			`MAX_API_RETRIES = 2 # Reduce retries`
			```
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`#### For High-Throughput Testing`
			```python
			`MAX_CONCURRENT_WORKERS = 8 # More workers`
			`USE_REAL_LLM = False # Use simulation`
			`SHOW_DETAILED_PROGRESS = False # Disable detailed bars`
			```

			`#### For Production with Real APIs`
			```python
			`MAX_CONCURRENT_WORKERS = 4 # Balanced for API limits`
			`USE_REAL_LLM = True # Use real API`
			`MAX_API_RETRIES = 3 # Ensure reliability`
			`RETRY_DELAY = 1.0 # Standard delay`
上传了相关方法 2025-12-23 15:07:19 +08:00			```

1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`## 🤝 Contributing`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`We welcome contributions! Please feel free to submit issues and enhancement requests.`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### Development Setup`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`1. Fork the repository`
			`2. Create a virtual environment`
			`3. Install development dependencies`
			`4. Make your changes`
			`5. Run tests to ensure everything works`
			`6. Submit a pull request`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`### Code Style`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`- Follow PEP 8 guidelines`
			`- Add docstrings to new functions`
			`- Include type hints where applicable`
			`- Write tests for new features`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`## 📄 License`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`## 🙏 Acknowledgments`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`- NLTK team for natural language processing tools`
			`- jieba team for Chinese text segmentation`
			`- tqdm team for progress bar functionality`
			`- OpenAI for LLM API inspiration`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`## 📞 Support`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`For support, please:`
			`1. Check the [Troubleshooting](#troubleshooting) section`
			`2. Review the documentation files`
			`3. Run the test scripts to verify your setup`
			`4. Open an issue with detailed error information`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`## 🗺️ Roadmap`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`- [ ] Support for more evaluation metrics (BERTScore, METEOR)`
			`- [ ] Integration with wandb for experiment tracking`
			`- [ ] Web-based dashboard for result visualization`
			`- [ ] Support for multilingual evaluation`
			`- [ ] Batch evaluation with different models`
			`- [ ] Export results to various formats (JSON, CSV, Excel)`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`## 📈 Version History`
上传了相关方法 2025-12-23 15:07:19 +08:00
1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`- v5.2 (Current): Added configurable retry mechanism`
			`- v5.1: Implemented HTTP API with retry logic`
			`- v5.0: Batch progress tracking with dynamic bars`
			`- v4.0: Concurrent processing implementation`
			`- v3.0: LLM evaluation integration`
			`- v2.0: Traditional metric evaluation`
			`- v1.0: Initial release`
上传了相关方法 2025-12-23 15:07:19 +08:00
			`---`

1. 修改了重试次数 2. 增加了readme和requirements.txt 2025-12-24 11:20:06 +08:00			`Made with ❤️ for the AI community`