Files
JARVIS/docs/superpowers/specs/2026-03-20-knowledge-brain-phase-1-blueprint.md
DESKTOP-72TV0V4\caoxiaozhu 6f594631e9 Refine knowledge brain workflow
Align the brain prompts, graph view, and startup defaults with the
latest phase 1 flow so local runs and navigation stay consistent.
2026-03-22 22:42:47 +08:00

14 KiB

Jarvis Knowledge Brain Phase 1 Blueprint

1. Phase 1 Goal

Phase 1 establishes the first production-ready version of Jarvis's event-driven knowledge brain. The objective is not to finish the entire intelligence system, but to create the minimum architecture that lets Jarvis ingest key user actions from across the product, learn from them on a daily schedule, store only high-value knowledge, and retrieve that knowledge during future conversations.

Phase 1 should make the brain real in six ways:

  1. unify source events across core modules;
  2. create an intermediate candidate-learning layer;
  3. promote durable knowledge into long-term brain memory;
  4. maintain tags and time-aware traceability;
  5. expose APIs for inspection and management;
  6. allow the chat system to retrieve brain knowledge during answers.

2. Scope Boundaries

In scope

  • New persistence models for brain events, candidates, memories, tags, and relationships.
  • Ingestion of source signals from conversations, knowledge documents, todos, kanban tasks, and forum posts.
  • A daily autonomous learning pipeline that tags, scores, deduplicates, and upgrades knowledge.
  • Retrieval integration for future responses.
  • Brain dashboard APIs.
  • A new frontend brain module structure replacing the current graph-only mental model.

Out of scope for phase 1

  • Full graph-native reasoning engine.
  • Fully autonomous suggestion orchestration across all screens.
  • Complex reinforcement-learning style adaptation.
  • Fine-grained user-tunable learning policy UI.
  • Automatic deletion and archival heuristics beyond simple status transitions.

3. Target Architecture

Phase 1 should introduce a four-layer brain pipeline:

  1. Source Records Existing domain tables remain the source of truth: messages, documents/chunks, todos, tasks, forum posts/replies.

  2. BrainEvent A normalized event layer representing meaningful user/system actions. This is the single intake format for downstream learning.

  3. BrainCandidate AI-generated candidate knowledge distilled from one or more events. Candidates are scored, tagged, typed, and traced back to source events.

  4. BrainMemory Durable long-term memory that Jarvis can retrieve during future interactions. This becomes the brain's core persistence layer.

Graph visualization should be treated as a projection layer, not the primary storage model. In later phases, graph nodes and edges can be generated from BrainMemory records and their relationships.


4. Data Model Additions

4.1 BrainEvent

Purpose: normalized raw learning input.

Recommended fields:

  • id
  • user_id
  • source_type (conversation, document, todo, task, forum_post, forum_reply)
  • source_id
  • event_type (created, updated, completed, mentioned, uploaded, resolved, marked_important, etc.)
  • occurred_at
  • event_date
  • title
  • content_summary
  • raw_excerpt
  • metadata_ (JSON; source-specific facts such as conversation_id, task status, folder path)
  • importance_signal (numeric seed score)
  • is_user_pinned
  • processed_at
  • status (pending, processed, ignored)

Indexes:

  • (user_id, event_date)
  • (user_id, source_type, source_id)
  • (user_id, status, occurred_at)

4.2 BrainCandidate

Purpose: intermediate learned knowledge awaiting acceptance into durable memory.

Recommended fields:

  • id
  • user_id
  • candidate_type (preference, habit, project_fact, decision, solution, topic, goal, temporary_focus)
  • title
  • summary
  • importance_score
  • confidence_score
  • time_scope (short_term, phase, long_term)
  • valid_from
  • valid_to
  • source_event_ids (JSON array)
  • reasoning_trace (short explanation of why the system extracted it)
  • status (new, promoted, rejected, merged)
  • created_at
  • reviewed_at

4.3 BrainMemory

Purpose: durable brain knowledge used at retrieval time.

Recommended fields:

  • id
  • user_id
  • memory_type (preference, habit, goal, project_fact, decision, solution, topic_profile)
  • title
  • content
  • importance
  • confidence
  • timeline_date
  • first_learned_at
  • last_reinforced_at
  • reinforcement_count
  • status (active, archived, deleted)
  • origin_candidate_id
  • origin_source_types (JSON array)
  • metadata_ (JSON)

4.4 BrainTag

Purpose: independent tagging layer for brain browsing, filtering, and scoring.

Recommended fields:

  • id
  • user_id
  • name
  • category (topic, value, time, source)
  • priority (important, secondary)
  • score
  • last_seen_at
  • created_at

Add many-to-many link tables:

  • brain_event_tags
  • brain_candidate_tags
  • brain_memory_tags
  • optional brain_memory_events for direct memory-to-event traceability beyond JSON arrays

These link tables are critical because phase 1 needs tag filters and timeline tracing before advanced graph projection exists.


5. Ingestion Strategy

Phase 1 should not rewrite existing modules. Instead, it should add thin ingestion hooks near existing write paths.

Conversation ingestion

Trigger points:

  • after user message creation
  • after assistant completion
  • after memory extraction / summary creation

Event examples:

  • important user instruction
  • explicit “remember this” request
  • repeated topic cluster
  • conversation-derived decision or unresolved goal

Document ingestion

Trigger points:

  • after upload success
  • after indexing completes
  • after manual chunk edits

Event examples:

  • document uploaded
  • document indexed
  • high-value section discovered
  • document summary available

Todo ingestion

Trigger points:

  • todo created
  • todo completed
  • AI-generated todo created

Event examples:

  • planned work item
  • recurring operational duty
  • completion signal reflecting actual user focus

Task/Kanban ingestion

Trigger points:

  • task created
  • task status changed
  • task completed
  • priority changed

Event examples:

  • declared project goal
  • active workstream
  • resolved milestone

Forum ingestion

Trigger points:

  • post created
  • reply created
  • forum instruction executed or referenced

Event examples:

  • public project decision
  • repeated operational issue
  • reusable explanation or solution

Implementation note: source ingestion should create BrainEvent rows synchronously or via lightweight background tasks, but should not block the original user flow.


6. Learning and Promotion Pipeline

Phase 1 should add a new daily scheduler workflow dedicated to the brain.

New scheduler job: brain_daily_learning_task

Suggested run: once daily after the bulk of user activity, for example 01:00 or configurable per user later.

Pipeline steps:

  1. collect unprocessed BrainEvent rows for the target date;
  2. cluster by source, topic, and repeated patterns;
  3. ask the LLM to produce candidate knowledge with tags and importance explanations;
  4. deduplicate against existing BrainMemory by semantic and rule-based matching;
  5. promote high-confidence candidates into BrainMemory;
  6. mark low-value candidates rejected or retained as observation-only;
  7. refresh tag scores and priority levels;
  8. mark consumed events as processed.

Promotion rules for phase 1

Promote automatically when any of these are true:

  • user explicitly requested the system to remember something;
  • the same topic appears across multiple sources;
  • a solution/decision was formed and looks reusable;
  • a stable preference or habit is seen repeatedly;
  • a task/todo/forum thread confirms relevance with user action.

Keep as candidate-only when:

  • information is recent but not yet stable;
  • importance is uncertain;
  • it appears only once without reinforcement.

Reject when:

  • content is obviously transient;
  • it is too generic to help future answers;
  • it duplicates active memory without adding new value.

7. Retrieval Integration

Phase 1 must let chat use the brain in a controlled way.

New retrieval service

Add a dedicated brain_retrieval_service or extend memory_service with brain-aware retrieval APIs.

Responsibilities:

  • retrieve top relevant BrainMemory rows by query, tags, time context, and importance;
  • optionally retrieve recent BrainEvent summaries for recency-sensitive answers;
  • merge existing UserMemory and MemorySummary into one retrieval result shape;
  • support limits to avoid prompt bloat.

Retrieval policy

At answer time:

  • always consider long-term BrainMemory;
  • include recent event summaries only when the question appears time-sensitive or project-state-sensitive;
  • cap injected brain context to a small curated set.

Recommended first integration path:

  • extend build_memory_context() to append a new 【知识大脑】 block built from BrainMemory retrieval.
  • keep existing conversation summary logic intact.

This gives immediate product value without requiring a full prompt orchestration rewrite.


8. Backend Services to Add or Refactor

New services

  1. brain_event_service.py

    • normalize incoming source data into BrainEvent rows
    • provide source-specific helper constructors
  2. brain_learning_service.py

    • run daily candidate extraction
    • score, dedupe, and promote memories
  3. brain_tag_service.py

    • manage tags, scoring, priority updates, and cleanup suggestions
  4. brain_retrieval_service.py

    • retrieve relevant memories and recent events for chat and UI

Existing services to extend

  • memory_service.py: integrate BrainMemory retrieval and possibly migrate UserMemory into the new model later
  • scheduler_service.py: register brain daily learning job
  • agent_service.py: inject retrieved brain context into chat pipeline
  • document_service.py, todo_service.py, task/forum write paths: emit BrainEvent rows

9. API Plan

Phase 1 should add a dedicated /api/brain router.

Read APIs

  • GET /api/brain/overview

    • counts: active memories, candidates, important tags, recent events
    • today's learning summary
  • GET /api/brain/memories

    • filters: tag, type, status, date range, source type
  • GET /api/brain/candidates

    • filters: status, date, score threshold
  • GET /api/brain/tags

    • segmented into important and secondary
  • GET /api/brain/timeline

    • grouped by day/week; includes events, candidate promotions, reinforced memories
  • GET /api/brain/memory/{id}

    • full traceability including linked events and tags

Write/management APIs

  • POST /api/brain/memory/{id}/promote
  • POST /api/brain/memory/{id}/archive
  • DELETE /api/brain/memory/{id}
  • POST /api/brain/tag/{id}/promote
  • POST /api/brain/tag/{id}/demote
  • DELETE /api/brain/tag/{id}
  • POST /api/brain/learn/run
    • manual trigger for daily learning pipeline

Compatibility note

Do not remove /api/graph in phase 1. Keep it as a legacy projection route while the new brain module is introduced.


10. Frontend Module Structure

The current 知识大脑 nav item should stop meaning “graph only” and become a real brain dashboard.

Route strategy

Preferred phase 1 structure:

  • /brain → new knowledge brain dashboard
  • /graph → graph view tab or subview under the brain module, retained for relation visualization

Brain dashboard sections

  1. Overview header

    • total active memories
    • today's learned items
    • important tags count
    • last learning run
  2. Important tags panel

    • AI-ranked important tags
    • click to filter related memories and timeline entries
  3. Secondary tags panel

    • lower-priority tags with cleanup actions
  4. Recent learned knowledge

    • newly promoted memories
    • reasons and source badges
  5. Timeline panel

    • daily grouped events and promotions
    • support time-based backtracking
  6. Graph subview

    • optional tab or secondary panel for relation projection

User actions in phase 1

  • delete memory
  • archive memory
  • promote/demote tag priority
  • manually trigger learning run
  • inspect why a memory exists

This is enough to make the brain visible and manageable even before advanced graph reasoning exists.


11. Suggested Delivery Breakdown

Step 1: Persistence foundation

  • add brain models and migrations
  • add SQLAlchemy registrations and schemas

Step 2: Event ingestion

  • emit BrainEvent rows from conversation/document/todo/task/forum flows

Step 3: Learning workflow

  • implement daily learning job and manual trigger API

Step 4: Retrieval integration

  • wire BrainMemory into chat context assembly

Step 5: Brain dashboard backend

  • add overview, memories, tags, timeline endpoints

Step 6: Brain dashboard frontend

  • add /brain page and move graph into a subview or separate tab

12. Risks and Guardrails

Main risks

  • over-collection leading to noisy memories;
  • prompt bloat from injecting too much brain context;
  • duplicate memory creation across repeated daily runs;
  • unclear distinction between candidate and durable memory;
  • UI becoming graph-centric again instead of brain-centric.

Guardrails

  • enforce candidate layer before promotion;
  • cap retrieval size strictly;
  • keep source traceability for every promoted memory;
  • make tag cleanup explicit in UI;
  • treat graph as a projection, not the source of truth.

13. Phase 1 Success Criteria

Phase 1 is successful when all of the following are true:

  • the system creates normalized BrainEvent rows from all five major source domains;
  • a scheduled daily learning job produces candidates and promotes high-value memories;
  • Jarvis can retrieve durable brain memories during future answers;
  • the frontend exposes a real brain dashboard with tags, recent knowledge, and timeline;
  • users can inspect and clean what the system learned;
  • the old graph page is no longer the only visible representation of the brain.