--- name: yg-rules-pipeline description: Run or maintain an end-to-end YG-Rules local pipeline from a fixed input folder to generated Excel and Markdown outputs. Use when files are placed in a directory and Codex should collect domains, schema, and guidance files, run the existing parser/storage/analysis/rule-generation code directly without Flask, and produce output/rules-{task_id}/ artifacts. --- # YG-Rules Pipeline ## Overview Use this skill when the user wants a folder-driven local run: collect files from an input directory, populate the project intermediate state, analyze guidance files, and generate Excel plus Markdown outputs. Do not call Flask routes for this workflow. Run the bundled script, which imports the project code directly. ## Input Folder Contract Default layout: ```text input/ domains.xlsx # or domains.csv / domains.json schema.xlsx # or schema.xls guidance/ 过度负债/ policy.md policy.docx 无关多元/ policy.pdf _all/ common-policy.md ``` Rules: - `domains.*` is required and must parse through `app.utils.parser.parse_upload_file`. - `schema.*` is recommended and must be `.xlsx` or `.xls`. - `guidance//` files attach only to the matching domain. - `guidance/_all/` files attach to every domain. - Supported guidance extensions follow the app: `.txt`, `.pdf`, `.doc`, `.docx`, `.md`. ## Run From the repo root: ```powershell python skill\yg-rules-pipeline\scripts\run_pipeline.py --input input --limit 2 --create-sql ``` Useful options: ```powershell python skill\yg-rules-pipeline\scripts\run_pipeline.py --input E:\path\to\folder --granularity high --limit 5 --timeout 900 python skill\yg-rules-pipeline\scripts\run_pipeline.py --input input --skip-schema --limit 1 ``` The script prints the task id, output directory, Excel path, Markdown path, skipped domains, skipped rules, and any Markdown error. ## Implementation Flow The script performs these steps directly: 1. Parse `domains.*` with `parse_upload_file`. 2. Save domains with `DomainStorage.save_domains`, replacing prior domain state. 3. Save `schema.*` with `SchemaStorage.save` unless `--skip-schema` is set. 4. Upload guidance files with `DomainStorage.save_guidance_file`. 5. Analyze guidance with `DomainStorage.analyze_guidance`. 6. Start `RuleGenerationService(create_sql=...)`. 7. Poll `RuleGenerationService.get_status` until `done` or `failed`. 8. Validate sibling `.xlsx` and `.md` outputs when possible. ## Maintenance Rules - Keep this pipeline script direct-code based; do not rewrite it to use HTTP unless explicitly requested. - Keep output artifact rules aligned with `skill/yg-rules-output`. - If app APIs or storage methods change, update `references/local-pipeline.md` and `scripts/run_pipeline.py` together. - Add script tests or smoke checks when changing input discovery or status polling. ## References - Read `references/local-pipeline.md` before changing script behavior or input layout. - Use `skill/yg-rules-output` for details about final task output directory and Markdown/Excel contracts.