81 lines
3.0 KiB
Markdown
81 lines
3.0 KiB
Markdown
---
|
|
name: yg-rules-pipeline
|
|
description: Run or maintain an end-to-end YG-Rules local pipeline from a fixed input folder to generated Excel and Markdown outputs. Use when files are placed in a directory and Codex should collect domains, schema, and guidance files, run the existing parser/storage/analysis/rule-generation code directly without Flask, and produce output/rules-{task_id}/ artifacts.
|
|
---
|
|
|
|
# YG-Rules Pipeline
|
|
|
|
## Overview
|
|
|
|
Use this skill when the user wants a folder-driven local run: collect files from an input directory, populate the project intermediate state, analyze guidance files, and generate Excel plus Markdown outputs.
|
|
|
|
Do not call Flask routes for this workflow. Run the bundled script, which imports the project code directly.
|
|
|
|
## Input Folder Contract
|
|
|
|
Default layout:
|
|
|
|
```text
|
|
input/
|
|
domains.xlsx # or domains.csv / domains.json
|
|
schema.xlsx # or schema.xls
|
|
guidance/
|
|
过度负债/
|
|
policy.md
|
|
policy.docx
|
|
无关多元/
|
|
policy.pdf
|
|
_all/
|
|
common-policy.md
|
|
```
|
|
|
|
Rules:
|
|
|
|
- `domains.*` is required and must parse through `app.utils.parser.parse_upload_file`.
|
|
- `schema.*` is recommended and must be `.xlsx` or `.xls`.
|
|
- `guidance/<domain name>/` files attach only to the matching domain.
|
|
- `guidance/_all/` files attach to every domain.
|
|
- Supported guidance extensions follow the app: `.txt`, `.pdf`, `.doc`, `.docx`, `.md`.
|
|
|
|
## Run
|
|
|
|
From the repo root:
|
|
|
|
```powershell
|
|
python skill\yg-rules-pipeline\scripts\run_pipeline.py --input input --limit 2 --create-sql
|
|
```
|
|
|
|
Useful options:
|
|
|
|
```powershell
|
|
python skill\yg-rules-pipeline\scripts\run_pipeline.py --input E:\path\to\folder --granularity high --limit 5 --timeout 900
|
|
python skill\yg-rules-pipeline\scripts\run_pipeline.py --input input --skip-schema --limit 1
|
|
```
|
|
|
|
The script prints the task id, output directory, Excel path, Markdown path, skipped domains, skipped rules, and any Markdown error.
|
|
|
|
## Implementation Flow
|
|
|
|
The script performs these steps directly:
|
|
|
|
1. Parse `domains.*` with `parse_upload_file`.
|
|
2. Save domains with `DomainStorage.save_domains`, replacing prior domain state.
|
|
3. Save `schema.*` with `SchemaStorage.save` unless `--skip-schema` is set.
|
|
4. Upload guidance files with `DomainStorage.save_guidance_file`.
|
|
5. Analyze guidance with `DomainStorage.analyze_guidance`.
|
|
6. Start `RuleGenerationService(create_sql=...)`.
|
|
7. Poll `RuleGenerationService.get_status` until `done` or `failed`.
|
|
8. Validate sibling `.xlsx` and `.md` outputs when possible.
|
|
|
|
## Maintenance Rules
|
|
|
|
- Keep this pipeline script direct-code based; do not rewrite it to use HTTP unless explicitly requested.
|
|
- Keep output artifact rules aligned with `skill/yg-rules-output`.
|
|
- If app APIs or storage methods change, update `references/local-pipeline.md` and `scripts/run_pipeline.py` together.
|
|
- Add script tests or smoke checks when changing input discovery or status polling.
|
|
|
|
## References
|
|
|
|
- Read `references/local-pipeline.md` before changing script behavior or input layout.
|
|
- Use `skill/yg-rules-output` for details about final task output directory and Markdown/Excel contracts.
|