Memory
How Luna stores and retrieves facts, runs semantic search with ChromaDB, maintains personality state, and compacts conversations.
Overview
Luna's memory is a three-tier system that gives it persistent, searchable context across sessions. Unlike a stateless chatbot that forgets everything when you close the window, Luna builds a personal model of you over time — your preferences, habits, projects, and personality.
Everything is stored locally. No external service is involved.
| Tier | Storage | Purpose |
|---|---|---|
| Facts | SQLite (data/luna.db) | Structured facts extracted from conversations. |
| Embeddings | ChromaDB (data/chroma/) | Vector representations for semantic similarity search. |
| Personality | SQLite | Floating-point state vector for response tone adaptation. |
Facts
A fact is a discrete piece of information about you that Luna has extracted or been told. Facts are rows in the Fact table with the following structure:
| Column | Type | Description |
|---|---|---|
id | UUID | Unique identifier. |
content | text | The fact itself, e.g. "user prefers dark mode interfaces". |
source_conversation | UUID | Which conversation produced this fact. |
confidence | float 0–1 | Extraction confidence. Low-confidence facts are pruned over time. |
created_at | datetime | When the fact was first recorded. |
last_accessed | datetime | Updated when the fact is retrieved for context. Used for pruning. |
How facts are extracted
The memory_maintenance background process runs every 5 minutes. It scans recent conversations that haven't been processed yet, sends them to backend/services/fact_extractor.py, and stores the results. The extractor uses a lightweight LLM prompt to identify facts.
You can also tell Luna a fact directly:
"Remember that I'm lactose intolerant."
"My meeting with the design team is every Tuesday at 2pm."Luna will store these as high-confidence facts immediately via the memory route.
Viewing and managing facts
Open the Memory view in the sidebar. You can see all stored facts, their confidence scores, and delete any you don't want. You can also add facts manually.
Embeddings and semantic search
Every fact is embedded using nomic-embed-text via Ollama and stored in a ChromaDB collection at data/chroma/. The embeddings are 768-dimensional vectors.
Context retrieval
When a new message arrives, Luna embeds the user's query and runs a cosine similarity search against the fact collection. The top-k most relevant facts are injected into the system prompt — not just the most recent facts, but the most semantically related ones.
For example, if you mention "what should I have for dinner?" and Luna stored the fact"user is vegetarian and doesn't like spicy food" six months ago, that fact surfaces even though it was never recently accessed.
On first run, ChromaDB will be empty. Luna will have no memory of previous conversations. After a few sessions the memory fills in and context quality improves noticeably.
Rebuilding the index
If the ChromaDB collection becomes corrupted, delete data/chroma/ and restart Luna. The collection rebuilds from the SQLite facts on next startup.
Personality engine
backend/services/personality.py maintains a state vector that adapts Luna's response tone in real time. The vector has five dimensions:
| Dimension | Range | Effect on responses |
|---|---|---|
mood | -1.0 to 1.0 | Negative mood → more empathetic, careful tone. Positive → warmer, more playful. |
energy | 0.0 to 1.0 | Low energy → concise responses. High energy → more expansive answers. |
formality | 0.0 to 1.0 | Low formality → casual language. High formality → professional tone. |
humor | 0.0 to 1.0 | Controls how often Luna makes jokes or playful observations. |
emotional_support | 0.0 to 1.0 | High → Luna prioritises empathy over information delivery. |
These values drift based on conversation sentiment analysis and voice emotion detection. They are included in every system prompt so the LLM can adapt its style accordingly.
You can view and manually adjust personality state in the Memory → Personalitytab in the sidebar.
Conversation compaction
Long conversations would eventually overflow the LLM's context window. Thememory_maintenance process compacts conversations older than a configurable threshold into compressed summaries.
Compaction works in two stages:
- An LLM prompt summarises the conversation into bullet points of key decisions, facts mentioned, and emotional tone.
- The summary replaces the full conversation in context retrieval. The original messages are archived in SQLite but no longer included in prompts.
By default, conversations longer than 40 messages trigger compaction. This is configurable in backend/processes/memory_maintenance/.
Memory API
The memory system is accessible via REST for scripting or integration:
# List all facts
GET /api/memory/facts
# Add a fact manually
POST /api/memory/facts
{"content": "user prefers concise answers", "confidence": 0.9}
# Delete a fact
DELETE /api/memory/facts/{id}
# Semantic search
POST /api/memory/search
{"query": "dietary preferences", "limit": 5}
# Get personality state
GET /api/memory/personality
# Update a personality dimension
PUT /api/memory/personality
{"humor": 0.8, "formality": 0.2}Business variant routes require Authorization: Bearer <user-jwt>.
Privacy
- All facts, embeddings, and personality state are stored in
data/on your machine. data/is gitignored — it is never committed to version control.- No memory data is sent to Ollama during embedding — only the text of the fact.
- If you use an OpenAI-compatible embedding provider, that provider receives the fact text for embedding. Choose a local provider if this concerns you.
- You can delete all memory at any time by removing
data/luna.dbanddata/chroma/.