Refine knowledge brain workflow

Align the brain prompts, graph view, and startup defaults with the latest phase 1 flow so local runs and navigation stay consistent.
2026-03-22 22:42:47 +08:00
parent 67ea3d2682
commit 6f594631e9
23 changed files with 1508 additions and 526 deletions
--- a/docs/superpowers/specs/2026-03-20-knowledge-brain-blueprint-notes.md
+++ b/docs/superpowers/specs/2026-03-20-knowledge-brain-blueprint-notes.md
@@ -0,0 +1,42 @@
+# Notes: Jarvis Knowledge Brain Blueprint
+
+## Current-State Findings
+- Existing source domains already exist separately: conversations, documents, todos, tasks, forum posts.
+- Current long-term memory only comes from conversation extraction via `UserMemory`.
+- Current graph build path only uses indexed document chunks.
+- Scheduler infrastructure already exists and can host daily brain-learning jobs.
+- Frontend already exposes a `知识大脑` navigation entry, but it currently points to the graph page.
+
+## Synthesized Findings
+
+### What can be reused
+- `memory_service` as a seed for conversation extraction and recall.
+- `scheduler_service` as the base for daily learning workflows.
+- `tag_service` as an early foundation for brain tags.
+- Existing business tables as authoritative raw source records.
+
+### What is missing
+- Unified event layer across all source systems.
+- Candidate memory layer between raw events and durable brain memory.
+- Timeline-aware memory model with reinforcement / archival states.
+- Retrieval path that combines long-term memory with recent relevant events.
+- Brain-specific APIs and a dedicated frontend dashboard module.
+
+### Phase 1 objective
+- Build the minimum architecture needed for a real event-driven brain:
+  - BrainEvent
+  - BrainCandidate
+  - BrainMemory
+  - BrainTag and link tables
+  - ingestion services
+  - daily learning job
+  - retrieval integration
+  - brain dashboard APIs
+
+## Additional Findings: Knowledge Parsing Normalization
+- Current document ingestion parses each format separately and builds chunks directly from ParsedNode items.
+- Current chunks already carry structural metadata, but there is no explicit parent-child chunk graph.
+- The agreed direction is to use MinerU for PDF only, keep existing parsers for DOCX/XLSX/CSV/MD/TXT, and converge all outputs into structured markdown.
+- normalized_content should be persisted on documents so preview, rebuild, and future chunking can reuse the same canonical text.
+- Lightweight hierarchy should be represented in chunk metadata first, not in a new relational tree schema.
+- Current DOCX upload failure in the running environment is caused by a missing python-docx installation in the active backend environment.