JARVIS/docs/superpowers/specs/2026-03-20-knowledge-brain-blueprint-notes.md

# Notes: Jarvis Knowledge Brain Blueprint

## Current-State Findings
- Existing source domains already exist separately: conversations, documents, todos, tasks, forum posts.
- Current long-term memory only comes from conversation extraction via `UserMemory`.
- Current graph build path only uses indexed document chunks.
- Scheduler infrastructure already exists and can host daily brain-learning jobs.
- Frontend already exposes a `知识大脑` navigation entry, but it currently points to the graph page.

## Synthesized Findings

### What can be reused
- `memory_service` as a seed for conversation extraction and recall.
- `scheduler_service` as the base for daily learning workflows.
- `tag_service` as an early foundation for brain tags.
- Existing business tables as authoritative raw source records.

### What is missing
- Unified event layer across all source systems.
- Candidate memory layer between raw events and durable brain memory.
- Timeline-aware memory model with reinforcement / archival states.
- Retrieval path that combines long-term memory with recent relevant events.
- Brain-specific APIs and a dedicated frontend dashboard module.

### Phase 1 objective
- Build the minimum architecture needed for a real event-driven brain:
  - BrainEvent
  - BrainCandidate
  - BrainMemory
  - BrainTag and link tables
  - ingestion services
  - daily learning job
  - retrieval integration
  - brain dashboard APIs

## Additional Findings: Knowledge Parsing Normalization
- Current document ingestion parses each format separately and builds chunks directly from ParsedNode items.
- Current chunks already carry structural metadata, but there is no explicit parent-child chunk graph.
- The agreed direction is to use MinerU for PDF only, keep existing parsers for DOCX/XLSX/CSV/MD/TXT, and converge all outputs into structured markdown.
- normalized_content should be persisted on documents so preview, rebuild, and future chunking can reuse the same canonical text.
- Lightweight hierarchy should be represented in chunk metadata first, not in a new relational tree schema.
- Current DOCX upload failure in the running environment is caused by a missing python-docx installation in the active backend environment.

## Additional Findings: L3 Merge Progress
- `backend/app/agents/state.py` has been expanded to the newer L3 runtime state shape so graph/runtime code can rely on structured continuity, tool-round, retry, routing-hop, and datetime-reference fields.
- `backend/app/agents/graph.py` no longer contains merge markers and the phantom `EXECUTOR_ACCOUNTING` branch has been removed from graph registration and routing.
- Accounting-style prompts are currently normalized onto `AgentRole.EXECUTOR` instead of a separate executor-accounting role, which avoids dangling enum/runtime references while keeping those intents routable.
- `backend/tests/backend/app/agents/test_graph.py` has been reconciled onto the newer L3 runtime test branch and stale `EXECUTOR_ACCOUNTING` expectations were updated to `AgentRole.EXECUTOR`.
- Tool execution now uses a shared async bridge in `backend/app/agents/tools/async_bridge.py`, and `search.py`, `schedule.py`, `task.py`, plus `forum.py` all route synchronous tool entrypoints through that same bridge to keep runtime behavior consistent inside and outside active event loops.
- Current task/schedule canonicalization remains intentionally narrow for L3: task aliases (`content`, `date`, legacy priorities) and reminder aliases (`datetime`, `at`, `remind_at`, `time`, timezone variants) are normalized; deferred domains such as weather/accounting-specific tool routing remain outside this stabilization slice.
- Targeted verification now covers async bridge behavior plus task/schedule alias persistence tests; local pytest invocation still depends on resolving environment-level startup issues when the interpreter exits before running the selected files.
- L3 runtime/service integration now persists continuity snapshots in a single canonical envelope (`kind`, `version`, `state`) on both assistant message attachments and `Conversation.agent_state`, so streaming and sync chat entrypoints rehydrate the same shape.
- The continuity rehydration path is also tolerant of older `Conversation` rows/models that do not expose `agent_state`, falling back to assistant message attachments instead of failing before graph execution.
- The finalized L3 continuity contract persists a canonical `agent_continuity_state` snapshot: `turn_context.active_sub_commander`, `pending_action.type|owner_agent|owner_sub_commander|status`, `clarification_context.owning_agent|owning_sub_commander|target_action|question|status`, and `continuity_state.status|mode`.
- `backend/app/services/agent_service.py` normalizes legacy persisted snapshots (`active_sub_flow`, `agent`, `sub_flow`, `action_type`, `awaiting_user_input`, `awaiting_clarification`) into that canonical shape on both save and rehydration so older brain-ingestion records still resume correctly.
- Edge cases: explicit new requests may keep stale continuity in memory for override-aware routing, but only `continuity_state.status == fresh` participates in active continuation; clarification resumes use `continuity_state.mode = resume_after_clarification`.
- `memory_service.build_memory_context(...)` remains the shared retrieval join point for conversation summaries, user memory, and BrainMemory recall, while `document_service` continues emitting BrainEvent records from upload flow without changing the graph runtime contract.