Align the brain prompts, graph view, and startup defaults with the latest phase 1 flow so local runs and navigation stay consistent.
14 KiB
Jarvis Knowledge Brain Phase 1 Blueprint
1. Phase 1 Goal
Phase 1 establishes the first production-ready version of Jarvis's event-driven knowledge brain. The objective is not to finish the entire intelligence system, but to create the minimum architecture that lets Jarvis ingest key user actions from across the product, learn from them on a daily schedule, store only high-value knowledge, and retrieve that knowledge during future conversations.
Phase 1 should make the brain real in six ways:
- unify source events across core modules;
- create an intermediate candidate-learning layer;
- promote durable knowledge into long-term brain memory;
- maintain tags and time-aware traceability;
- expose APIs for inspection and management;
- allow the chat system to retrieve brain knowledge during answers.
2. Scope Boundaries
In scope
- New persistence models for brain events, candidates, memories, tags, and relationships.
- Ingestion of source signals from conversations, knowledge documents, todos, kanban tasks, and forum posts.
- A daily autonomous learning pipeline that tags, scores, deduplicates, and upgrades knowledge.
- Retrieval integration for future responses.
- Brain dashboard APIs.
- A new frontend brain module structure replacing the current graph-only mental model.
Out of scope for phase 1
- Full graph-native reasoning engine.
- Fully autonomous suggestion orchestration across all screens.
- Complex reinforcement-learning style adaptation.
- Fine-grained user-tunable learning policy UI.
- Automatic deletion and archival heuristics beyond simple status transitions.
3. Target Architecture
Phase 1 should introduce a four-layer brain pipeline:
-
Source Records Existing domain tables remain the source of truth: messages, documents/chunks, todos, tasks, forum posts/replies.
-
BrainEvent A normalized event layer representing meaningful user/system actions. This is the single intake format for downstream learning.
-
BrainCandidate AI-generated candidate knowledge distilled from one or more events. Candidates are scored, tagged, typed, and traced back to source events.
-
BrainMemory Durable long-term memory that Jarvis can retrieve during future interactions. This becomes the brain's core persistence layer.
Graph visualization should be treated as a projection layer, not the primary storage model. In later phases, graph nodes and edges can be generated from BrainMemory records and their relationships.
4. Data Model Additions
4.1 BrainEvent
Purpose: normalized raw learning input.
Recommended fields:
iduser_idsource_type(conversation,document,todo,task,forum_post,forum_reply)source_idevent_type(created,updated,completed,mentioned,uploaded,resolved,marked_important, etc.)occurred_atevent_datetitlecontent_summaryraw_excerptmetadata_(JSON; source-specific facts such as conversation_id, task status, folder path)importance_signal(numeric seed score)is_user_pinnedprocessed_atstatus(pending,processed,ignored)
Indexes:
(user_id, event_date)(user_id, source_type, source_id)(user_id, status, occurred_at)
4.2 BrainCandidate
Purpose: intermediate learned knowledge awaiting acceptance into durable memory.
Recommended fields:
iduser_idcandidate_type(preference,habit,project_fact,decision,solution,topic,goal,temporary_focus)titlesummaryimportance_scoreconfidence_scoretime_scope(short_term,phase,long_term)valid_fromvalid_tosource_event_ids(JSON array)reasoning_trace(short explanation of why the system extracted it)status(new,promoted,rejected,merged)created_atreviewed_at
4.3 BrainMemory
Purpose: durable brain knowledge used at retrieval time.
Recommended fields:
iduser_idmemory_type(preference,habit,goal,project_fact,decision,solution,topic_profile)titlecontentimportanceconfidencetimeline_datefirst_learned_atlast_reinforced_atreinforcement_countstatus(active,archived,deleted)origin_candidate_idorigin_source_types(JSON array)metadata_(JSON)
4.4 BrainTag
Purpose: independent tagging layer for brain browsing, filtering, and scoring.
Recommended fields:
iduser_idnamecategory(topic,value,time,source)priority(important,secondary)scorelast_seen_atcreated_at
4.5 Link Tables
Add many-to-many link tables:
brain_event_tagsbrain_candidate_tagsbrain_memory_tags- optional
brain_memory_eventsfor direct memory-to-event traceability beyond JSON arrays
These link tables are critical because phase 1 needs tag filters and timeline tracing before advanced graph projection exists.
5. Ingestion Strategy
Phase 1 should not rewrite existing modules. Instead, it should add thin ingestion hooks near existing write paths.
Conversation ingestion
Trigger points:
- after user message creation
- after assistant completion
- after memory extraction / summary creation
Event examples:
- important user instruction
- explicit “remember this” request
- repeated topic cluster
- conversation-derived decision or unresolved goal
Document ingestion
Trigger points:
- after upload success
- after indexing completes
- after manual chunk edits
Event examples:
- document uploaded
- document indexed
- high-value section discovered
- document summary available
Todo ingestion
Trigger points:
- todo created
- todo completed
- AI-generated todo created
Event examples:
- planned work item
- recurring operational duty
- completion signal reflecting actual user focus
Task/Kanban ingestion
Trigger points:
- task created
- task status changed
- task completed
- priority changed
Event examples:
- declared project goal
- active workstream
- resolved milestone
Forum ingestion
Trigger points:
- post created
- reply created
- forum instruction executed or referenced
Event examples:
- public project decision
- repeated operational issue
- reusable explanation or solution
Implementation note: source ingestion should create BrainEvent rows synchronously or via lightweight background tasks, but should not block the original user flow.
6. Learning and Promotion Pipeline
Phase 1 should add a new daily scheduler workflow dedicated to the brain.
New scheduler job: brain_daily_learning_task
Suggested run: once daily after the bulk of user activity, for example 01:00 or configurable per user later.
Pipeline steps:
- collect unprocessed
BrainEventrows for the target date; - cluster by source, topic, and repeated patterns;
- ask the LLM to produce candidate knowledge with tags and importance explanations;
- deduplicate against existing
BrainMemoryby semantic and rule-based matching; - promote high-confidence candidates into
BrainMemory; - mark low-value candidates rejected or retained as observation-only;
- refresh tag scores and priority levels;
- mark consumed events as processed.
Promotion rules for phase 1
Promote automatically when any of these are true:
- user explicitly requested the system to remember something;
- the same topic appears across multiple sources;
- a solution/decision was formed and looks reusable;
- a stable preference or habit is seen repeatedly;
- a task/todo/forum thread confirms relevance with user action.
Keep as candidate-only when:
- information is recent but not yet stable;
- importance is uncertain;
- it appears only once without reinforcement.
Reject when:
- content is obviously transient;
- it is too generic to help future answers;
- it duplicates active memory without adding new value.
7. Retrieval Integration
Phase 1 must let chat use the brain in a controlled way.
New retrieval service
Add a dedicated brain_retrieval_service or extend memory_service with brain-aware retrieval APIs.
Responsibilities:
- retrieve top relevant
BrainMemoryrows by query, tags, time context, and importance; - optionally retrieve recent
BrainEventsummaries for recency-sensitive answers; - merge existing
UserMemoryandMemorySummaryinto one retrieval result shape; - support limits to avoid prompt bloat.
Retrieval policy
At answer time:
- always consider long-term
BrainMemory; - include recent event summaries only when the question appears time-sensitive or project-state-sensitive;
- cap injected brain context to a small curated set.
Recommended first integration path:
- extend
build_memory_context()to append a new【知识大脑】block built fromBrainMemoryretrieval. - keep existing conversation summary logic intact.
This gives immediate product value without requiring a full prompt orchestration rewrite.
8. Backend Services to Add or Refactor
New services
-
brain_event_service.py- normalize incoming source data into BrainEvent rows
- provide source-specific helper constructors
-
brain_learning_service.py- run daily candidate extraction
- score, dedupe, and promote memories
-
brain_tag_service.py- manage tags, scoring, priority updates, and cleanup suggestions
-
brain_retrieval_service.py- retrieve relevant memories and recent events for chat and UI
Existing services to extend
memory_service.py: integrate BrainMemory retrieval and possibly migrateUserMemoryinto the new model laterscheduler_service.py: register brain daily learning jobagent_service.py: inject retrieved brain context into chat pipelinedocument_service.py,todo_service.py, task/forum write paths: emit BrainEvent rows
9. API Plan
Phase 1 should add a dedicated /api/brain router.
Read APIs
-
GET /api/brain/overview- counts: active memories, candidates, important tags, recent events
- today's learning summary
-
GET /api/brain/memories- filters: tag, type, status, date range, source type
-
GET /api/brain/candidates- filters: status, date, score threshold
-
GET /api/brain/tags- segmented into important and secondary
-
GET /api/brain/timeline- grouped by day/week; includes events, candidate promotions, reinforced memories
-
GET /api/brain/memory/{id}- full traceability including linked events and tags
Write/management APIs
POST /api/brain/memory/{id}/promotePOST /api/brain/memory/{id}/archiveDELETE /api/brain/memory/{id}POST /api/brain/tag/{id}/promotePOST /api/brain/tag/{id}/demoteDELETE /api/brain/tag/{id}POST /api/brain/learn/run- manual trigger for daily learning pipeline
Compatibility note
Do not remove /api/graph in phase 1. Keep it as a legacy projection route while the new brain module is introduced.
10. Frontend Module Structure
The current 知识大脑 nav item should stop meaning “graph only” and become a real brain dashboard.
Route strategy
Preferred phase 1 structure:
/brain→ new knowledge brain dashboard/graph→ graph view tab or subview under the brain module, retained for relation visualization
Brain dashboard sections
-
Overview header
- total active memories
- today's learned items
- important tags count
- last learning run
-
Important tags panel
- AI-ranked important tags
- click to filter related memories and timeline entries
-
Secondary tags panel
- lower-priority tags with cleanup actions
-
Recent learned knowledge
- newly promoted memories
- reasons and source badges
-
Timeline panel
- daily grouped events and promotions
- support time-based backtracking
-
Graph subview
- optional tab or secondary panel for relation projection
User actions in phase 1
- delete memory
- archive memory
- promote/demote tag priority
- manually trigger learning run
- inspect why a memory exists
This is enough to make the brain visible and manageable even before advanced graph reasoning exists.
11. Suggested Delivery Breakdown
Step 1: Persistence foundation
- add brain models and migrations
- add SQLAlchemy registrations and schemas
Step 2: Event ingestion
- emit BrainEvent rows from conversation/document/todo/task/forum flows
Step 3: Learning workflow
- implement daily learning job and manual trigger API
Step 4: Retrieval integration
- wire BrainMemory into chat context assembly
Step 5: Brain dashboard backend
- add overview, memories, tags, timeline endpoints
Step 6: Brain dashboard frontend
- add
/brainpage and move graph into a subview or separate tab
12. Risks and Guardrails
Main risks
- over-collection leading to noisy memories;
- prompt bloat from injecting too much brain context;
- duplicate memory creation across repeated daily runs;
- unclear distinction between candidate and durable memory;
- UI becoming graph-centric again instead of brain-centric.
Guardrails
- enforce candidate layer before promotion;
- cap retrieval size strictly;
- keep source traceability for every promoted memory;
- make tag cleanup explicit in UI;
- treat graph as a projection, not the source of truth.
13. Phase 1 Success Criteria
Phase 1 is successful when all of the following are true:
- the system creates normalized BrainEvent rows from all five major source domains;
- a scheduled daily learning job produces candidates and promotes high-value memories;
- Jarvis can retrieve durable brain memories during future answers;
- the frontend exposes a real brain dashboard with tags, recent knowledge, and timeline;
- users can inspect and clean what the system learned;
- the old graph page is no longer the only visible representation of the brain.