Files

DESKTOP-72TV0V4\caoxiaozhu 6f594631e9 Refine knowledge brain workflow

Align the brain prompts, graph view, and startup defaults with the
latest phase 1 flow so local runs and navigation stay consistent.

2026-03-22 22:42:47 +08:00

14 KiB

Raw Blame History

Jarvis Knowledge Brain Phase 1 Blueprint

1. Phase 1 Goal

Phase 1 establishes the first production-ready version of Jarvis's event-driven knowledge brain. The objective is not to finish the entire intelligence system, but to create the minimum architecture that lets Jarvis ingest key user actions from across the product, learn from them on a daily schedule, store only high-value knowledge, and retrieve that knowledge during future conversations.

Phase 1 should make the brain real in six ways:

unify source events across core modules;
create an intermediate candidate-learning layer;
promote durable knowledge into long-term brain memory;
maintain tags and time-aware traceability;
expose APIs for inspection and management;
allow the chat system to retrieve brain knowledge during answers.

2. Scope Boundaries

In scope

New persistence models for brain events, candidates, memories, tags, and relationships.
Ingestion of source signals from conversations, knowledge documents, todos, kanban tasks, and forum posts.
A daily autonomous learning pipeline that tags, scores, deduplicates, and upgrades knowledge.
Retrieval integration for future responses.
Brain dashboard APIs.
A new frontend brain module structure replacing the current graph-only mental model.

Out of scope for phase 1

Full graph-native reasoning engine.
Fully autonomous suggestion orchestration across all screens.
Complex reinforcement-learning style adaptation.
Fine-grained user-tunable learning policy UI.
Automatic deletion and archival heuristics beyond simple status transitions.

3. Target Architecture

Phase 1 should introduce a four-layer brain pipeline:

Source Records Existing domain tables remain the source of truth: messages, documents/chunks, todos, tasks, forum posts/replies.
BrainEvent A normalized event layer representing meaningful user/system actions. This is the single intake format for downstream learning.
BrainCandidate AI-generated candidate knowledge distilled from one or more events. Candidates are scored, tagged, typed, and traced back to source events.
BrainMemory Durable long-term memory that Jarvis can retrieve during future interactions. This becomes the brain's core persistence layer.

Graph visualization should be treated as a projection layer, not the primary storage model. In later phases, graph nodes and edges can be generated from BrainMemory records and their relationships.

4. Data Model Additions

4.1 BrainEvent

Purpose: normalized raw learning input.

Recommended fields:

id
user_id
source_type (conversation, document, todo, task, forum_post, forum_reply)
source_id
event_type (created, updated, completed, mentioned, uploaded, resolved, marked_important, etc.)
occurred_at
event_date
title
content_summary
raw_excerpt
metadata_ (JSON; source-specific facts such as conversation_id, task status, folder path)
importance_signal (numeric seed score)
is_user_pinned
processed_at
status (pending, processed, ignored)

Indexes:

(user_id, event_date)
(user_id, source_type, source_id)
(user_id, status, occurred_at)

4.2 BrainCandidate

Purpose: intermediate learned knowledge awaiting acceptance into durable memory.

Recommended fields:

id
user_id
candidate_type (preference, habit, project_fact, decision, solution, topic, goal, temporary_focus)
title
summary
importance_score
confidence_score
time_scope (short_term, phase, long_term)
valid_from
valid_to
source_event_ids (JSON array)
reasoning_trace (short explanation of why the system extracted it)
status (new, promoted, rejected, merged)
created_at
reviewed_at

4.3 BrainMemory

Purpose: durable brain knowledge used at retrieval time.

Recommended fields:

id
user_id
memory_type (preference, habit, goal, project_fact, decision, solution, topic_profile)
title
content
importance
confidence
timeline_date
first_learned_at
last_reinforced_at
reinforcement_count
status (active, archived, deleted)
origin_candidate_id
origin_source_types (JSON array)
metadata_ (JSON)

4.4 BrainTag

Purpose: independent tagging layer for brain browsing, filtering, and scoring.

Recommended fields:

id
user_id
name
category (topic, value, time, source)
priority (important, secondary)
score
last_seen_at
created_at

4.5 Link Tables

Add many-to-many link tables:

brain_event_tags
brain_candidate_tags
brain_memory_tags
optional brain_memory_events for direct memory-to-event traceability beyond JSON arrays

These link tables are critical because phase 1 needs tag filters and timeline tracing before advanced graph projection exists.

5. Ingestion Strategy

Phase 1 should not rewrite existing modules. Instead, it should add thin ingestion hooks near existing write paths.

Conversation ingestion

Trigger points:

after user message creation
after assistant completion
after memory extraction / summary creation

Event examples:

important user instruction
explicit “remember this” request
repeated topic cluster
conversation-derived decision or unresolved goal

Document ingestion

Trigger points:

after upload success
after indexing completes
after manual chunk edits

Event examples:

document uploaded
document indexed
high-value section discovered
document summary available

Todo ingestion

Trigger points:

todo created
todo completed
AI-generated todo created

Event examples:

planned work item
recurring operational duty
completion signal reflecting actual user focus

Task/Kanban ingestion

Trigger points:

task created
task status changed
task completed
priority changed

Event examples:

declared project goal
active workstream
resolved milestone

Forum ingestion

Trigger points:

post created
reply created
forum instruction executed or referenced

Event examples:

public project decision
repeated operational issue
reusable explanation or solution

Implementation note: source ingestion should create BrainEvent rows synchronously or via lightweight background tasks, but should not block the original user flow.

6. Learning and Promotion Pipeline

Phase 1 should add a new daily scheduler workflow dedicated to the brain.

New scheduler job: `brain_daily_learning_task`

Suggested run: once daily after the bulk of user activity, for example 01:00 or configurable per user later.

Pipeline steps:

collect unprocessed BrainEvent rows for the target date;
cluster by source, topic, and repeated patterns;
ask the LLM to produce candidate knowledge with tags and importance explanations;
deduplicate against existing BrainMemory by semantic and rule-based matching;
promote high-confidence candidates into BrainMemory;
mark low-value candidates rejected or retained as observation-only;
refresh tag scores and priority levels;
mark consumed events as processed.

Promotion rules for phase 1

Promote automatically when any of these are true:

user explicitly requested the system to remember something;
the same topic appears across multiple sources;
a solution/decision was formed and looks reusable;
a stable preference or habit is seen repeatedly;
a task/todo/forum thread confirms relevance with user action.

Keep as candidate-only when:

information is recent but not yet stable;
importance is uncertain;
it appears only once without reinforcement.

Reject when:

content is obviously transient;
it is too generic to help future answers;
it duplicates active memory without adding new value.

7. Retrieval Integration

Phase 1 must let chat use the brain in a controlled way.

New retrieval service

Add a dedicated brain_retrieval_service or extend memory_service with brain-aware retrieval APIs.

Responsibilities:

retrieve top relevant BrainMemory rows by query, tags, time context, and importance;
optionally retrieve recent BrainEvent summaries for recency-sensitive answers;
merge existing UserMemory and MemorySummary into one retrieval result shape;
support limits to avoid prompt bloat.

Retrieval policy

At answer time:

always consider long-term BrainMemory;
include recent event summaries only when the question appears time-sensitive or project-state-sensitive;
cap injected brain context to a small curated set.

Recommended first integration path:

extend build_memory_context() to append a new 【知识大脑】 block built from BrainMemory retrieval.
keep existing conversation summary logic intact.

This gives immediate product value without requiring a full prompt orchestration rewrite.

8. Backend Services to Add or Refactor

New services

brain_event_service.py
- normalize incoming source data into BrainEvent rows
- provide source-specific helper constructors
brain_learning_service.py
- run daily candidate extraction
- score, dedupe, and promote memories
brain_tag_service.py
- manage tags, scoring, priority updates, and cleanup suggestions
brain_retrieval_service.py
- retrieve relevant memories and recent events for chat and UI

Existing services to extend

memory_service.py: integrate BrainMemory retrieval and possibly migrate UserMemory into the new model later
scheduler_service.py: register brain daily learning job
agent_service.py: inject retrieved brain context into chat pipeline
document_service.py, todo_service.py, task/forum write paths: emit BrainEvent rows

9. API Plan

Phase 1 should add a dedicated /api/brain router.

Read APIs

GET /api/brain/overview
- counts: active memories, candidates, important tags, recent events
- today's learning summary
GET /api/brain/memories
- filters: tag, type, status, date range, source type
GET /api/brain/candidates
- filters: status, date, score threshold
GET /api/brain/tags
- segmented into important and secondary
GET /api/brain/timeline
- grouped by day/week; includes events, candidate promotions, reinforced memories
GET /api/brain/memory/{id}
- full traceability including linked events and tags

Write/management APIs

POST /api/brain/memory/{id}/promote
POST /api/brain/memory/{id}/archive
DELETE /api/brain/memory/{id}
POST /api/brain/tag/{id}/promote
POST /api/brain/tag/{id}/demote
DELETE /api/brain/tag/{id}
POST /api/brain/learn/run
- manual trigger for daily learning pipeline

Compatibility note

Do not remove /api/graph in phase 1. Keep it as a legacy projection route while the new brain module is introduced.

10. Frontend Module Structure

The current 知识大脑 nav item should stop meaning “graph only” and become a real brain dashboard.

Route strategy

Preferred phase 1 structure:

/brain → new knowledge brain dashboard
/graph → graph view tab or subview under the brain module, retained for relation visualization

Brain dashboard sections

Overview header
- total active memories
- today's learned items
- important tags count
- last learning run
Important tags panel
- AI-ranked important tags
- click to filter related memories and timeline entries
Secondary tags panel
- lower-priority tags with cleanup actions
Recent learned knowledge
- newly promoted memories
- reasons and source badges
Timeline panel
- daily grouped events and promotions
- support time-based backtracking
Graph subview
- optional tab or secondary panel for relation projection

User actions in phase 1

delete memory
archive memory
promote/demote tag priority
manually trigger learning run
inspect why a memory exists

This is enough to make the brain visible and manageable even before advanced graph reasoning exists.

11. Suggested Delivery Breakdown

Step 1: Persistence foundation

add brain models and migrations
add SQLAlchemy registrations and schemas

Step 2: Event ingestion

emit BrainEvent rows from conversation/document/todo/task/forum flows

Step 3: Learning workflow

implement daily learning job and manual trigger API

Step 4: Retrieval integration

wire BrainMemory into chat context assembly

Step 5: Brain dashboard backend

add overview, memories, tags, timeline endpoints

Step 6: Brain dashboard frontend

add /brain page and move graph into a subview or separate tab

12. Risks and Guardrails

Main risks

over-collection leading to noisy memories;
prompt bloat from injecting too much brain context;
duplicate memory creation across repeated daily runs;
unclear distinction between candidate and durable memory;
UI becoming graph-centric again instead of brain-centric.

Guardrails

enforce candidate layer before promotion;
cap retrieval size strictly;
keep source traceability for every promoted memory;
make tag cleanup explicit in UI;
treat graph as a projection, not the source of truth.

13. Phase 1 Success Criteria

Phase 1 is successful when all of the following are true:

the system creates normalized BrainEvent rows from all five major source domains;
a scheduled daily learning job produces candidates and promotes high-value memories;
Jarvis can retrieve durable brain memories during future answers;
the frontend exposes a real brain dashboard with tags, recent knowledge, and timeline;
users can inspect and clean what the system learned;
the old graph page is no longer the only visible representation of the brain.

14 KiB Raw Blame History

Jarvis Knowledge Brain Phase 1 Blueprint

1. Phase 1 Goal

2. Scope Boundaries

In scope

Out of scope for phase 1

3. Target Architecture

4. Data Model Additions

4.1 BrainEvent

4.2 BrainCandidate

4.3 BrainMemory

4.4 BrainTag

4.5 Link Tables

5. Ingestion Strategy

Conversation ingestion

Document ingestion

Todo ingestion

Task/Kanban ingestion

Forum ingestion

6. Learning and Promotion Pipeline

New scheduler job: brain_daily_learning_task

Promotion rules for phase 1

7. Retrieval Integration

New retrieval service

Retrieval policy

8. Backend Services to Add or Refactor

New services

Existing services to extend

9. API Plan

Read APIs

Write/management APIs

Compatibility note

10. Frontend Module Structure

Route strategy

Brain dashboard sections

User actions in phase 1

11. Suggested Delivery Breakdown

Step 1: Persistence foundation

Step 2: Event ingestion

Step 3: Learning workflow

Step 4: Retrieval integration

Step 5: Brain dashboard backend

Step 6: Brain dashboard frontend

12. Risks and Guardrails

Main risks

Guardrails

13. Phase 1 Success Criteria

14 KiB

Raw Blame History

New scheduler job: `brain_daily_learning_task`