docs/superpowers/specs/2026-03-20-knowledge-brain-blueprint-notes.md

# Notes: Jarvis Knowledge Brain Blueprint

## Current-State Findings
- Existing source domains already exist separately: conversations, documents, todos, tasks, forum posts.
- Current long-term memory only comes from conversation extraction via `UserMemory`.
- Current graph build path only uses indexed document chunks.
- Scheduler infrastructure already exists and can host daily brain-learning jobs.
- Frontend already exposes a `知识大脑` navigation entry, but it currently points to the graph page.

## Synthesized Findings

### What can be reused
- `memory_service` as a seed for conversation extraction and recall.
- `scheduler_service` as the base for daily learning workflows.
- `tag_service` as an early foundation for brain tags.
- Existing business tables as authoritative raw source records.

### What is missing
- Unified event layer across all source systems.
- Candidate memory layer between raw events and durable brain memory.
- Timeline-aware memory model with reinforcement / archival states.
- Retrieval path that combines long-term memory with recent relevant events.
- Brain-specific APIs and a dedicated frontend dashboard module.

### Phase 1 objective
- Build the minimum architecture needed for a real event-driven brain:
  - BrainEvent
  - BrainCandidate
  - BrainMemory
  - BrainTag and link tables
  - ingestion services
  - daily learning job
  - retrieval integration
  - brain dashboard APIs

## Additional Findings: Knowledge Parsing Normalization
- Current document ingestion parses each format separately and builds chunks directly from ParsedNode items.
- Current chunks already carry structural metadata, but there is no explicit parent-child chunk graph.
- The agreed direction is to use MinerU for PDF only, keep existing parsers for DOCX/XLSX/CSV/MD/TXT, and converge all outputs into structured markdown.
- normalized_content should be persisted on documents so preview, rebuild, and future chunking can reuse the same canonical text.
- Lightweight hierarchy should be represented in chunk metadata first, not in a new relational tree schema.
- Current DOCX upload failure in the running environment is caused by a missing python-docx installation in the active backend environment.
Add local project snapshots and plans Capture the current local data snapshot and planning artifacts alongside this development batch so the workspace state matches the code changes. This preserves the reference materials and generated files that were kept in the working tree. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-22 13:49:03 +08:00			`# Notes: Jarvis Knowledge Brain Blueprint`

			`## Current-State Findings`
			`- Existing source domains already exist separately: conversations, documents, todos, tasks, forum posts.`
			- Current long-term memory only comes from conversation extraction via `UserMemory`.
			`- Current graph build path only uses indexed document chunks.`
			`- Scheduler infrastructure already exists and can host daily brain-learning jobs.`
			- Frontend already exposes a `知识大脑` navigation entry, but it currently points to the graph page.

			`## Synthesized Findings`

			`### What can be reused`
			- `memory_service` as a seed for conversation extraction and recall.
			- `scheduler_service` as the base for daily learning workflows.
			- `tag_service` as an early foundation for brain tags.
			`- Existing business tables as authoritative raw source records.`

			`### What is missing`
			`- Unified event layer across all source systems.`
			`- Candidate memory layer between raw events and durable brain memory.`
			`- Timeline-aware memory model with reinforcement / archival states.`
			`- Retrieval path that combines long-term memory with recent relevant events.`
			`- Brain-specific APIs and a dedicated frontend dashboard module.`

			`### Phase 1 objective`
			`- Build the minimum architecture needed for a real event-driven brain:`
			`- BrainEvent`
			`- BrainCandidate`
			`- BrainMemory`
			`- BrainTag and link tables`
			`- ingestion services`
			`- daily learning job`
			`- retrieval integration`
			`- brain dashboard APIs`

			`## Additional Findings: Knowledge Parsing Normalization`
			`- Current document ingestion parses each format separately and builds chunks directly from ParsedNode items.`
			`- Current chunks already carry structural metadata, but there is no explicit parent-child chunk graph.`
			`- The agreed direction is to use MinerU for PDF only, keep existing parsers for DOCX/XLSX/CSV/MD/TXT, and converge all outputs into structured markdown.`
			`- normalized_content should be persisted on documents so preview, rebuild, and future chunking can reuse the same canonical text.`
			`- Lightweight hierarchy should be represented in chunk metadata first, not in a new relational tree schema.`
			`- Current DOCX upload failure in the running environment is caused by a missing python-docx installation in the active backend environment.`