Architecture

How the Electron shell, React frontend, and FastAPI backend fit together — and how a message flows from input to streamed response.

Overview

Luna is built from three cooperating layers that communicate over HTTP and SSE. All three can run on a single machine; only the Electron shell requires a native OS environment.

L.U.N.A. system architecture
System layers: Electron shell, React/Vite frontend, and FastAPI backend with service modules.

Three layers

Electron shell

electron/main.js is the desktop process owner. It starts the FastAPI backend as a child process, manages the health-check loop (exponential backoff restart on crash), creates the browser window, and wires up tray and IPC. The preload script (electron/preload.js) exposes electronAPI.apiBase and electronAPI.isElectron to the renderer without leaking Node.js APIs.

React / Vite frontend

frontend/src/App.tsx is the root. The app renders in one of three view modes —dev (sidebar + content), user (voice-focused), and luna(full-screen HUD). State is managed by a single Zustand store at frontend/src/store/index.ts. Feature components live in frontend/src/components/ grouped by domain.

FastAPI backend

backend/main.py bootstraps the FastAPI app, registers CORS middleware, mounts all routers, and starts background processes. Routers in backend/routers/ are intentionally thin — they validate requests and delegate all logic to backend/services/.

Message lifecycle

Here is what happens from the moment you send a message to the moment the response is complete:

  1. User input — text typed in InputBar, voice processed by the voice route, or a scheduled proactive trigger fires.
  2. Fast-path check — the backend checks a small set of intent patterns that should not hit the LLM (e.g. explicit app-launch commands).
  3. Context assemblymemory_manager.py fetches relevant facts from ChromaDB using semantic search, then appends personality state, recent calendar tasks, active activities, vision observations, and the last N conversation turns.
  4. LLM call — the assembled prompt is sent to the configured provider (Ollama or OpenAI-compatible) with num_ctx: 8192 and num_predict: 1024. The response streams as tokens.
  5. Stream parsing — as tokens arrive, the backend scans for bracket commands ([WIDGET:...], [WEB_SEARCH:...], [MAP:...]) and JSON tool calls. Commands are stripped from the displayed text and emitted as separate SSE events.
  6. Tool execution — detected tools run concurrently where possible. Results (search snippets, Spotify state, widget data) may be appended to the stream as additional content.
  7. Memory update — after the done event, background coroutines extract new facts, update personality scores, and compact long conversations into summaries.

SSE event protocol

The chat stream endpoint is POST /api/chat/stream. It returns text/event-stream. Each event has a type field:

TypePayloadDescription
metadataconversationId, modelFirst event — identifies the conversation and model being used.
tokencontent: stringA streamed text chunk from the LLM. Append to the current message bubble.
commandaction, payloadA parsed tool call — widget open, web search result, map display, Spotify action, 3D scene, etc.
confirmationtool, description, idLuna wants to execute a tool but needs user approval first (confirm-mode tool).
doneconversationIdStream complete. Memory extraction runs after this event.
errormessageUnrecoverable stream error. The frontend shows an error state.
ℹ️
Handling the stream in the frontend

frontend/src/api/chat.ts wraps the SSE connection. The Zustand store dispatches each event type to the correct reducer — tokens go to streamMessage, commands open widgets via setDynamicWidget, and confirmation events set pendingConfirmation.

Tool execution model

Luna supports two command syntaxes that can appear in LLM output:

Bracket tags

Simple inline commands parsed from the token stream by regex:

[WEB_SEARCH:query here]
[WIDGET:{"type":"steps","data":[...]}]
[MAP:{"lat":40.7,"lon":-74.0}]
[SPOTIFY:{"action":"play","query":"artist name"}]
[SCENE:{"prompt":"rotating cube"}]

JSON tool calls

Structured tool calls in the model's native tool-use format. These go through tool_registry.pywhere each tool is registered with a name, schema, and permission mode.

Permission modes

Every tool has one of three permission modes set per user in data/permissions.json:

ModeBehaviour
allowExecutes immediately without prompting the user.
confirmEmits a confirmation SSE event. The tool waits until the user approves or rejects via the UI banner.
blockThe tool call is silently dropped and Luna is told the tool is unavailable.

Memory architecture

Luna's memory system has three tiers:

Structured facts — SQLite

Explicit facts about you ("user prefers dark mode", "user's dog is named Max") are stored as rows in the Fact table in data/luna.db. Each fact has a source (conversation ID), confidence score, and creation timestamp.

Semantic search — ChromaDB

All facts are also embedded with nomic-embed-text via Ollama and stored indata/chroma/. When assembling context for a new message, the backend runs a semantic search against the user's query to surface the most relevant facts — not just the most recent ones.

Personality engine

backend/services/personality.py maintains a floating-point state vector with dimensions for mood, energy level, formality preference, humor level, and emotional support need. These values drift based on conversation sentiment and update Luna's system-prompt tone in real time.

📌
Privacy

All memory is stored locally. Nothing is sent to external servers. The ChromaDB collection and SQLite database live in data/ and are gitignored.

Background processes

The backend registers long-running coroutines via backend/processes/registry.py. Each process runs on its own schedule:

ProcessScheduleResponsibility
memory_maintenanceEvery 5 minExtracts facts from recent conversations, compacts long threads into summaries, prunes low-confidence facts.
proactive_followupsEvery 20 sChecks whether Luna should send an unsolicited message (reminders, observations, check-ins) and emits it to the frontend.
calendar_remindersEvery 60 sScans upcoming tasks and calendar events and fires reminder notifications.
voice_runtimeContinuousRuns the wake-word detection loop and pipes audio to the STT model.

List all registered processes at runtime:

npm run luna -- processes

Contribution boundaries

Keep changes scoped to one layer when possible. Crossing layers in a single PR makes review harder:

What you're changingWhere it lives
API endpoint logicbackend/services/ — not in routers
New background jobbackend/processes/ — registered in registry.py
UI component or viewfrontend/src/components/<Feature>/
Global client statefrontend/src/store/index.ts
Desktop/native behaviourelectron/main.js or electron/preload.js
New tool or skillbackend/services/tool_registry.py + skills/