Files
JARVIS/docs/superpowers/specs/2026-03-20-knowledge-brain-blueprint-notes.md
DESKTOP-72TV0V4\caoxiaozhu 6f594631e9 Refine knowledge brain workflow
Align the brain prompts, graph view, and startup defaults with the
latest phase 1 flow so local runs and navigation stay consistent.
2026-03-22 22:42:47 +08:00

2.2 KiB

Notes: Jarvis Knowledge Brain Blueprint

Current-State Findings

  • Existing source domains already exist separately: conversations, documents, todos, tasks, forum posts.
  • Current long-term memory only comes from conversation extraction via UserMemory.
  • Current graph build path only uses indexed document chunks.
  • Scheduler infrastructure already exists and can host daily brain-learning jobs.
  • Frontend already exposes a 知识大脑 navigation entry, but it currently points to the graph page.

Synthesized Findings

What can be reused

  • memory_service as a seed for conversation extraction and recall.
  • scheduler_service as the base for daily learning workflows.
  • tag_service as an early foundation for brain tags.
  • Existing business tables as authoritative raw source records.

What is missing

  • Unified event layer across all source systems.
  • Candidate memory layer between raw events and durable brain memory.
  • Timeline-aware memory model with reinforcement / archival states.
  • Retrieval path that combines long-term memory with recent relevant events.
  • Brain-specific APIs and a dedicated frontend dashboard module.

Phase 1 objective

  • Build the minimum architecture needed for a real event-driven brain:
    • BrainEvent
    • BrainCandidate
    • BrainMemory
    • BrainTag and link tables
    • ingestion services
    • daily learning job
    • retrieval integration
    • brain dashboard APIs

Additional Findings: Knowledge Parsing Normalization

  • Current document ingestion parses each format separately and builds chunks directly from ParsedNode items.
  • Current chunks already carry structural metadata, but there is no explicit parent-child chunk graph.
  • The agreed direction is to use MinerU for PDF only, keep existing parsers for DOCX/XLSX/CSV/MD/TXT, and converge all outputs into structured markdown.
  • normalized_content should be persisted on documents so preview, rebuild, and future chunking can reuse the same canonical text.
  • Lightweight hierarchy should be represented in chunk metadata first, not in a new relational tree schema.
  • Current DOCX upload failure in the running environment is caused by a missing python-docx installation in the active backend environment.