Files
YG-Rules/skill/yg-rules-pipeline/SKILL.md

81 lines
3.0 KiB
Markdown
Raw Normal View History

2026-06-10 19:15:24 +08:00
---
name: yg-rules-pipeline
description: Run or maintain an end-to-end YG-Rules local pipeline from a fixed input folder to generated Excel and Markdown outputs. Use when files are placed in a directory and Codex should collect domains, schema, and guidance files, run the existing parser/storage/analysis/rule-generation code directly without Flask, and produce output/rules-{task_id}/ artifacts.
---
# YG-Rules Pipeline
## Overview
Use this skill when the user wants a folder-driven local run: collect files from an input directory, populate the project intermediate state, analyze guidance files, and generate Excel plus Markdown outputs.
Do not call Flask routes for this workflow. Run the bundled script, which imports the project code directly.
## Input Folder Contract
Default layout:
```text
input/
domains.xlsx # or domains.csv / domains.json
schema.xlsx # or schema.xls
guidance/
过度负债/
policy.md
policy.docx
无关多元/
policy.pdf
_all/
common-policy.md
```
Rules:
- `domains.*` is required and must parse through `app.utils.parser.parse_upload_file`.
- `schema.*` is recommended and must be `.xlsx` or `.xls`.
- `guidance/<domain name>/` files attach only to the matching domain.
- `guidance/_all/` files attach to every domain.
- Supported guidance extensions follow the app: `.txt`, `.pdf`, `.doc`, `.docx`, `.md`.
## Run
From the repo root:
```powershell
python skill\yg-rules-pipeline\scripts\run_pipeline.py --input input --limit 2 --create-sql
```
Useful options:
```powershell
python skill\yg-rules-pipeline\scripts\run_pipeline.py --input E:\path\to\folder --granularity high --limit 5 --timeout 900
python skill\yg-rules-pipeline\scripts\run_pipeline.py --input input --skip-schema --limit 1
```
The script prints the task id, output directory, Excel path, Markdown path, skipped domains, skipped rules, and any Markdown error.
## Implementation Flow
The script performs these steps directly:
1. Parse `domains.*` with `parse_upload_file`.
2. Save domains with `DomainStorage.save_domains`, replacing prior domain state.
3. Save `schema.*` with `SchemaStorage.save` unless `--skip-schema` is set.
4. Upload guidance files with `DomainStorage.save_guidance_file`.
5. Analyze guidance with `DomainStorage.analyze_guidance`.
6. Start `RuleGenerationService(create_sql=...)`.
7. Poll `RuleGenerationService.get_status` until `done` or `failed`.
8. Validate sibling `.xlsx` and `.md` outputs when possible.
## Maintenance Rules
- Keep this pipeline script direct-code based; do not rewrite it to use HTTP unless explicitly requested.
- Keep output artifact rules aligned with `skill/yg-rules-output`.
- If app APIs or storage methods change, update `references/local-pipeline.md` and `scripts/run_pipeline.py` together.
- Add script tests or smoke checks when changing input discovery or status polling.
## References
- Read `references/local-pipeline.md` before changing script behavior or input layout.
- Use `skill/yg-rules-output` for details about final task output directory and Markdown/Excel contracts.