feat: enhance agent orchestration, knowledge flow and UI refinements

2026-03-29 20:31:13 +08:00
parent d85cb9cf35
commit e0fe3ca623
301 changed files with 1197804 additions and 7863 deletions
--- a/docs/superpowers/specs/2026-03-25-schedule-planner-design.md
+++ b/docs/superpowers/specs/2026-03-25-schedule-planner-design.md
@@ -0,0 +1,561 @@
+# Schedule Planner Agent Redesign
+
+## Goal
+
+Replace the current planner role with a schedule-focused planning system that analyzes conversation history, the task board, and forum signals to produce actionable scheduling recommendations for the user.
+
+## Scope
+
+This redesign covers both the main planner role and its subagents across backend orchestration, prompts, routing, scheduled execution, todo generation, frontend presentation, and related tests.
+
+## User-Approved Direction
+
+- Replace the current path-planning semantics with schedule-planning semantics.
+- Redesign both the main planner role and its subagents.
+- Inputs for planning:
+  - conversation history
+  - task board
+  - forum information
+- Output style:
+  - conclusion first
+  - executable schedule next
+- Trigger modes:
+  - when the user explicitly asks for scheduling advice
+  - at a fixed daily time
+- Daily scheduled analysis should write actionable suggestions into todo items.
+
+## Architecture
+
+### Main Role
+
+The current `planner` role will be replaced at the system level by a new role id:
+
+- `schedule_planner`
+
+Its responsibility is no longer “find the shortest execution path for a goal.” Instead, it becomes the scheduling brain that:
+
+1. understands current commitments and pressure signals
+2. evaluates urgency, importance, dependency, and timing
+3. recommends near-term scheduling actions
+4. converts useful scheduled guidance into concrete todo items when triggered by the daily scheduler
+
+### Subagents
+
+The existing planner subagent structure will be redesigned into two schedule-specific subagents:
+
+- `schedule_analysis`
+  - analyzes conversation history, task board state, and forum signals
+  - identifies priorities, pressure points, conflicts, dependencies, risks, and things that can be delayed
+
+- `schedule_planning`
+  - converts analysis into an execution-oriented schedule recommendation
+  - outputs conclusion first, then a practical schedule proposal
+  - when running from the daily scheduled workflow, produces todo-ready action items
+
+### Trigger Paths
+
+#### Interactive Trigger
+
+When the user asks questions such as:
+
+- what should I do today
+- how should I arrange this week
+- based on my recent work, what should I focus on next
+- help me schedule upcoming work
+
+The master agent should route to `schedule_planner`.
+
+The expected response shape:
+
+1. current conclusion
+2. today / near-term schedule recommendation
+3. next actions
+
+#### Daily Scheduled Trigger
+
+A daily scheduled job invokes the schedule planner flow automatically.
+
+The daily run should:
+
+1. collect relevant context from conversation history, tasks, and forum data
+2. run `schedule_analysis`
+3. run `schedule_planning`
+4. convert only actionable, non-duplicate recommendations into todo items
+
+The daily run should not dump raw analysis into todos. Only concise, action-worthy, user-meaningful recommendations become todos.
+
+## Data Flow
+
+### Inputs
+
+The schedule planning system should read from three sources:
+
+1. **Conversation history**
+   - recent user intent
+   - commitments implied in prior discussion
+   - stated priorities, urgency, and unresolved threads
+
+2. **Task board**
+   - open items
+   - current statuses
+   - stalled work
+   - high-priority or overdue work
+
+3. **Forum information**
+   - new items requiring attention
+   - external pressure or discussion signals
+   - updates that may change priority
+
+### Internal Processing
+
+The main flow should be:
+
+- Master decides scheduling intent
+- `schedule_planner` receives context
+- `schedule_analysis` identifies priority structure
+- `schedule_planning` produces human-usable output
+- scheduled mode additionally writes selected suggestions into todos
+
+### Outputs
+
+#### Interactive Output
+
+The default answer structure should be:
+
+- conclusion first
+- suggested schedule second
+- next actions last
+
+#### Scheduled Output
+
+The scheduled run should create todo entries with:
+
+- concise action phrasing
+- enough context to be actionable
+- source attribution where useful (conversation/task/forum)
+- duplicate avoidance
+
+## Migration Strategy
+
+This redesign uses a two-phase migration to avoid breaking stored state and UI rendering.
+
+### Phase 1: Compatibility Window
+
+- accept legacy `planner` values from stored traces, mock payloads, and historical records
+- normalize legacy `planner` to `schedule_planner` at read boundaries where practical
+- accept legacy `planner_scope` and `planner_steps` as read-only legacy values and normalize them to `schedule_analysis` and `schedule_planning`
+- write only the new ids going forward:
+  - `schedule_planner`
+  - `schedule_analysis`
+  - `schedule_planning`
+
+### Phase 2: Legacy Removal
+
+After the migration is complete and all active UI payloads, mock data, and tests are updated:
+
+- remove legacy id acceptance from orchestration and frontend display logic
+- remove legacy mock fixtures
+- keep migration code out of prompts and core scheduling behavior
+
+### Migration Scope
+
+The migration must cover:
+
+- backend enums and routing
+- frontend agent ids and telemetry labels
+- stored trace rendering paths
+- mock data used by agent dashboards and chat orchestration views
+- tests that still refer to `planner`, `planner_scope`, or `planner_steps`
+
+## Input Contracts
+
+The schedule planning system reads from three sources with explicit limits.
+
+### Conversation History Contract
+
+- use recent conversation history from the current user context
+- default retrieval window: last 7 days of relevant conversation turns, capped at the latest 50 turns
+- prefer turns that include commitments, priorities, deadlines, blockers, or future-oriented intent
+- if conversation history is unavailable, continue with degraded confidence
+
+### Task Board Contract
+
+- include open, in-progress, blocked, overdue, and high-priority tasks
+- exclude completed and archived items by default
+- include enough task metadata to reason about urgency and dependency:
+  - title
+  - status
+  - priority
+  - due date if present
+  - last updated time if present
+- if task data is unavailable, continue with degraded confidence
+
+### Forum Information Contract
+
+- include recent forum items that may affect user priorities
+- default retrieval window: last 7 days of relevant forum signals
+- forum signals may include:
+  - new posts requiring attention
+  - replies or escalations
+  - updates that change urgency or expected follow-up
+- if forum data is unavailable, continue with degraded confidence
+
+## Output Contracts
+
+### `schedule_analysis` Output Schema
+
+The analysis stage should produce a structured summary with these fields:
+
+- `top_priorities`: list of current highest-priority focus areas
+- `risks`: list of risk or pressure signals
+- `conflicts`: list of timing or dependency conflicts
+- `deferrable_items`: list of lower-priority items that can be delayed
+- `evidence`: source references grouped by `conversation`, `task_board`, or `forum`
+- `confidence`: one of `high`, `medium`, `low`
+
+### `schedule_planning` Output Schema
+
+The planning stage should produce a structured recommendation with these fields:
+
+- `conclusion`: short decision-oriented summary
+- `today_plan`: list of suggested actions for the current day or immediate next window
+- `near_term_plan`: list of actions for the next few days or current week
+- `next_actions`: short ordered action list
+- `todo_candidates`: only present in scheduled mode; candidate todo items derived from the recommendation
+- `confidence`: one of `high`, `medium`, `low`
+
+### `todo_candidates` Schema
+
+Each `todo_candidate` must use this structure:
+
+- `title`: required short action text
+- `description`: required short rationale grounded in source context
+- `sources`: required list of provenance objects
+- `priority`: optional normalized priority such as `high`, `medium`, `low`
+- `target_window`: optional string such as `today` or `this_week`
+
+Each provenance object in `sources` must contain:
+
+- `type`: one of `conversation`, `task_board`, `forum`
+- `id`: source object id when available, otherwise a stable synthetic reference
+- `label`: short human-readable source label
+
+### Evidence Structure
+
+Each item in `schedule_analysis.evidence` must contain:
+
+- `type`: one of `conversation`, `task_board`, `forum`
+- `id`: source object id when available, otherwise a stable synthetic reference
+- `label`: short human-readable identifier
+- `reason`: brief explanation of why the signal matters to scheduling
+
+### Interactive Response Contract
+
+The user-facing answer should always follow this shape:
+
+1. conclusion
+2. suggested schedule
+3. next actions
+
+If confidence is low, the response must say that explicitly and avoid overconfident scheduling language.
+
+## Daily Scheduler Contract
+
+The daily scheduled trigger must follow explicit execution semantics.
+
+### Execution Model
+
+- run once per user per local date
+- default execution time: 07:00 in the user's configured timezone
+- if the user has no configured timezone, skip the run and log the skip reason
+- do not automatically backfill missed runs
+- enforce idempotency by `(user_id, local_date, job_type)` so the same daily analysis is not executed more than once successfully
+
+### Scheduled Mode Behavior
+
+A successful scheduled run should:
+
+1. gather available context from the three input sources
+2. execute `schedule_analysis`
+3. execute `schedule_planning`
+4. create todo items from selected `todo_candidates`
+5. store run telemetry and outcome metadata
+
+If one or more sources are missing, continue when there is still enough evidence to produce a useful recommendation and mark confidence as reduced.
+
+Signal evaluation rules:
+
+- a **strong source** is a source with enough current evidence to support prioritization on its own, such as multiple open high-priority tasks or a recent forum escalation
+- a **meaningful signal** is a discrete scheduling-relevant item extracted from any source, such as an overdue task, a stated commitment in conversation history, or a forum escalation
+- the planner may still run with one strong source
+- scheduled mode may create todos only when at least two meaningful signals exist across all inputs
+
+If fewer than two meaningful signals are available across all sources, the scheduler should not create todos and should log a low-context outcome.
+
+Delayed execution rule:
+
+- if the 07:00 run is delayed by temporary outage or worker unavailability, the system may still execute one delayed run later on the same user-local date
+- if the entire local date passes without a successful run, do not backfill on the next day
+
+## Todo Creation Rules
+
+Todo creation is the main scheduled side effect and must be tightly constrained.
+
+### Creation Rules
+
+- create at most 3 todo items per daily run
+- only create todos for actions that are concrete, near-term, and user-actionable
+- do not create todos for vague advice, reflections, or duplicated reminders
+- store source provenance when available:
+  - `conversation`
+  - `task_board`
+  - `forum`
+
+### Duplicate Detection
+
+A candidate todo is considered a duplicate if there is already an open todo that matches all of the following:
+
+- same normalized action text
+- same source category or same source object when available
+- created within the last 7 days
+
+Normalization rules for action text:
+
+- trim surrounding whitespace
+- collapse repeated internal whitespace to a single space
+- lowercase Latin characters
+- remove trailing full stop / period punctuation only
+
+Source comparison rules:
+
+- if a provenance object includes a stable source `id`, compare by `(type, id)`
+- if no stable source id exists, compare by `(type, normalized label)`
+- if multiple sources support one recommendation, compare against the highest-priority provenance in this order: `task_board`, `forum`, `conversation`
+
+When a duplicate is detected:
+
+- do not create a new todo
+- record the skip reason in scheduler telemetry
+
+### Todo Fields
+
+Scheduled-created todos should include at minimum these persisted fields:
+
+- `title`: required
+- `description`: required
+- `source_type`: required primary provenance type
+- `source_id`: optional stable source id
+- `source_label`: required fallback human-readable provenance label
+- `created_by`: required and set to `schedule_planner`
+- `created_at`: required timestamp
+- `priority`: optional normalized priority
+- `target_window`: optional normalized scheduling window
+
+## Routing Boundaries
+
+The system must distinguish scheduling from adjacent planning behaviors.
+
+### Route to `schedule_planner` when the user asks for:
+
+- today or this week planning
+- what to focus on next
+- priority ordering across ongoing work
+- time-aware sequencing of current commitments
+
+### Do not route to `schedule_planner` when the user asks for:
+
+- deep implementation planning for a feature
+- code execution or task fulfillment
+- research-only retrieval
+- pure analysis without scheduling intent
+
+In ambiguous cases such as "what should I do next?", prefer `schedule_planner` when the available context includes multiple active tasks, recent commitments, or forum pressure signals.
+
+## Backend Changes
+
+### Role and Graph Layer
+
+Update the orchestration layer so the planner role is redefined as `schedule_planner` rather than `planner`.
+
+Files likely involved:
+
+- `backend/app/agents/state.py`
+- `backend/app/agents/graph.py`
+- `backend/app/agents/prompts.py`
+- `backend/app/routers/agent.py`
+- `backend/app/services/agent_service.py`
+
+Required changes:
+
+- rename role ids where appropriate
+- update graph node registration
+- update master routing rules
+- replace planner subagent mappings
+- update telemetry and sub-commander trace labels
+
+### Prompt Layer
+
+Replace the current planner prompt family with schedule-specific instructions.
+
+Needed prompt families:
+
+- `SCHEDULE_PLANNER_SYSTEM_PROMPT`
+- `SCHEDULE_ANALYSIS_PROMPT`
+- `SCHEDULE_PLANNING_PROMPT`
+
+Prompt requirements:
+
+- reason over conversation history, tasks, and forum state
+- prioritize urgency, importance, and dependency
+- avoid abstract productivity advice
+- produce concrete, immediate scheduling output
+- in scheduled mode, generate todo-worthy suggestions only
+
+### Scheduled Execution Layer
+
+Add or update the daily scheduled workflow so it can call the schedule planner flow automatically.
+
+Likely touchpoints:
+
+- scheduler service
+- existing daily planning jobs
+- todo creation services
+
+Required behavior:
+
+- fixed daily execution time
+- fetch relevant context
+- call schedule planner pipeline
+- write selected recommendations into todos
+- skip duplicate todo creation
+
+## Frontend Changes
+
+Frontend needs to reflect the new role system consistently.
+
+Files likely involved:
+
+- `frontend/src/data/agents.ts`
+- `frontend/src/pages/agents/index.vue`
+- `frontend/src/components/chat/OrchestrationPanel.vue`
+- `frontend/src/pages/chat/composables/useChatView.ts`
+- related frontend tests
+
+Required updates:
+
+- replace planner display labels with schedule planner labels
+- rename planner subagents to schedule analysis / schedule planning
+- update orchestration telemetry labels
+- update example mock state and tests
+- use these exact frontend ids:
+  - `schedule_planner`
+  - `schedule_analysis`
+  - `schedule_planning`
+- use these exact default Chinese labels:
+  - `日程规划师`
+  - `日程分析员`
+  - `日程编排员`
+- update active route visualization and commander skill labels to the new ids
+
+## Naming
+
+### Main Agent
+
+- old: `planner`
+- new: `schedule_planner`
+- display role: `日程规划师`
+
+### Subagents
+
+- old: `planner_scope`
+- new: `schedule_analysis`
+- display role: `日程分析员`
+
+- old: `planner_steps`
+- new: `schedule_planning`
+- display role: `日程编排员`
+
+## Constraints
+
+- do not keep dual role names for long-term compatibility unless a specific dependency forces it
+- do not create todos for every suggestion
+- do not turn the planner into a generic life coach
+- keep scheduling grounded in current project signals
+- preserve the existing agent architecture where possible, while fully changing planner semantics
+
+## Observability
+
+The redesign must emit enough telemetry to debug routing and scheduled execution.
+
+Required telemetry fields:
+
+- selected main route
+- selected subagent
+- available input sources
+- missing input sources
+- run mode: `interactive` or `scheduled`
+- confidence level
+- todos created count
+- todos skipped as duplicates count
+- scheduler run success / skipped / failed
+
+## Acceptance Criteria
+
+### Backend Acceptance Criteria
+
+- a scheduling-intent user query routes to `schedule_planner`
+- `schedule_analysis` and `schedule_planning` are both reachable through the orchestration layer
+- legacy planner ids are normalized during the compatibility window
+- daily scheduled runs do not execute more than once per user per local date
+- low-context daily runs do not create todos
+- duplicate todo candidates are skipped instead of recreated
+
+### Frontend Acceptance Criteria
+
+- the agents page displays `日程规划师` instead of the previous planner label
+- the planner subagent chips display `日程分析员` and `日程编排员`
+- orchestration mock data and route highlights use the new ids
+- tests no longer depend on `planner_scope` or `planner_steps` after migration is complete
+
+### Failure and Fallback Criteria
+
+- if forum data is missing, the planner still runs with degraded confidence
+- if task board data is missing, the planner still runs with degraded confidence when other strong context exists
+- if fewer than two meaningful signals are available, scheduled mode creates no todos
+- if the user has no timezone configured, the daily scheduled run is skipped and logged
+
+## Testing Strategy
+
+### Backend
+
+Add or update tests for:
+
+- master routing to `schedule_planner`
+- schedule subagent selection behavior
+- prompt invariants for schedule-focused output
+- scheduled daily run creates todos from actionable suggestions
+- duplicate todo protection
+
+### Frontend
+
+Add or update tests for:
+
+- renamed main role and subagent labels
+- orchestration panel route display
+- active subagent telemetry
+- mock agent graph data using `schedule_planner`, `schedule_analysis`, and `schedule_planning`
+
+## Risks
+
+1. **Broad rename surface**
+   - `planner` is referenced across backend and frontend, so a full rename must be systematic
+
+2. **Scheduled todo spam**
+   - daily runs may create low-value or duplicate todos unless filtered carefully
+
+3. **Prompt drift**
+   - if prompts stay too abstract, the new agent will sound renamed but not actually scheduling-oriented
+
+## Recommendation
+
+Implement this as a real role-system redesign, not as a display-only rename. The role id, subagent ids, prompt family, routing logic, and frontend telemetry should all align on the new scheduling semantics so the system remains internally coherent.