Memory

How Luna stores and retrieves facts, runs semantic search with ChromaDB, maintains personality state, and compacts conversations.

Overview

Luna's memory is a three-tier system that gives it persistent, searchable context across sessions. Unlike a stateless chatbot that forgets everything when you close the window, Luna builds a personal model of you over time — your preferences, habits, projects, and personality.

Everything is stored locally. No external service is involved.

TierStoragePurpose
FactsSQLite (data/luna.db)Structured facts extracted from conversations.
EmbeddingsChromaDB (data/chroma/)Vector representations for semantic similarity search.
PersonalitySQLiteFloating-point state vector for response tone adaptation.

Facts

A fact is a discrete piece of information about you that Luna has extracted or been told. Facts are rows in the Fact table with the following structure:

ColumnTypeDescription
idUUIDUnique identifier.
contenttextThe fact itself, e.g. "user prefers dark mode interfaces".
source_conversationUUIDWhich conversation produced this fact.
confidencefloat 0–1Extraction confidence. Low-confidence facts are pruned over time.
created_atdatetimeWhen the fact was first recorded.
last_accesseddatetimeUpdated when the fact is retrieved for context. Used for pruning.

How facts are extracted

The memory_maintenance background process runs every 5 minutes. It scans recent conversations that haven't been processed yet, sends them to backend/services/fact_extractor.py, and stores the results. The extractor uses a lightweight LLM prompt to identify facts.

You can also tell Luna a fact directly:

"Remember that I'm lactose intolerant."
"My meeting with the design team is every Tuesday at 2pm."

Luna will store these as high-confidence facts immediately via the memory route.

Viewing and managing facts

Open the Memory view in the sidebar. You can see all stored facts, their confidence scores, and delete any you don't want. You can also add facts manually.

Embeddings and semantic search

Every fact is embedded using nomic-embed-text via Ollama and stored in a ChromaDB collection at data/chroma/. The embeddings are 768-dimensional vectors.

Context retrieval

When a new message arrives, Luna embeds the user's query and runs a cosine similarity search against the fact collection. The top-k most relevant facts are injected into the system prompt — not just the most recent facts, but the most semantically related ones.

For example, if you mention "what should I have for dinner?" and Luna stored the fact"user is vegetarian and doesn't like spicy food" six months ago, that fact surfaces even though it was never recently accessed.

ℹ️
Cold start

On first run, ChromaDB will be empty. Luna will have no memory of previous conversations. After a few sessions the memory fills in and context quality improves noticeably.

Rebuilding the index

If the ChromaDB collection becomes corrupted, delete data/chroma/ and restart Luna. The collection rebuilds from the SQLite facts on next startup.

Personality engine

backend/services/personality.py maintains a state vector that adapts Luna's response tone in real time. The vector has five dimensions:

DimensionRangeEffect on responses
mood-1.0 to 1.0Negative mood → more empathetic, careful tone. Positive → warmer, more playful.
energy0.0 to 1.0Low energy → concise responses. High energy → more expansive answers.
formality0.0 to 1.0Low formality → casual language. High formality → professional tone.
humor0.0 to 1.0Controls how often Luna makes jokes or playful observations.
emotional_support0.0 to 1.0High → Luna prioritises empathy over information delivery.

These values drift based on conversation sentiment analysis and voice emotion detection. They are included in every system prompt so the LLM can adapt its style accordingly.

You can view and manually adjust personality state in the Memory → Personalitytab in the sidebar.

Conversation compaction

Long conversations would eventually overflow the LLM's context window. Thememory_maintenance process compacts conversations older than a configurable threshold into compressed summaries.

Compaction works in two stages:

  1. An LLM prompt summarises the conversation into bullet points of key decisions, facts mentioned, and emotional tone.
  2. The summary replaces the full conversation in context retrieval. The original messages are archived in SQLite but no longer included in prompts.
📌
Compaction threshold

By default, conversations longer than 40 messages trigger compaction. This is configurable in backend/processes/memory_maintenance/.

Memory API

The memory system is accessible via REST for scripting or integration:

Examples
# List all facts
GET /api/memory/facts

# Add a fact manually
POST /api/memory/facts
{"content": "user prefers concise answers", "confidence": 0.9}

# Delete a fact
DELETE /api/memory/facts/{id}

# Semantic search
POST /api/memory/search
{"query": "dietary preferences", "limit": 5}

# Get personality state
GET /api/memory/personality

# Update a personality dimension
PUT /api/memory/personality
{"humor": 0.8, "formality": 0.2}

Business variant routes require Authorization: Bearer <user-jwt>.

Privacy

  • All facts, embeddings, and personality state are stored in data/ on your machine.
  • data/ is gitignored — it is never committed to version control.
  • No memory data is sent to Ollama during embedding — only the text of the fact.
  • If you use an OpenAI-compatible embedding provider, that provider receives the fact text for embedding. Choose a local provider if this concerns you.
  • You can delete all memory at any time by removing data/luna.db and data/chroma/.