How it works

Six stages.
One machine.

From your first word to the final response, every step of how L.U.N.A. processes, acts, and learns. Entirely local.

Request lifecycle

1
Step 01 — Input

You speak, type, or trigger.

Luna accepts input through the chat UI, push-to-talk voice, wake-word detection, or scheduled proactive follow-ups. On desktop, screen and camera frames can be submitted to the vision pipeline.

Chat UIPush-to-TalkWake WordVision FramesScheduled Triggers
2
Step 02 — Context Assembly

Every relevant signal is gathered.

Before calling the LLM, Luna assembles a rich context window. Memory facts, personality state, active activities, current calendar tasks, recent vision observations, and conversation history are all injected — giving the model full situational awareness.

Memory FactsPersonalityCalendar & TasksVision SummaryConversation HistoryLive Dashboard Data
3
Step 03 — LLM Inference

Your local model processes the request.

The assembled prompt is sent to your configured LLM provider — local Ollama by default, or any OpenAI-compatible endpoint. The model streams a structured response containing answer tokens, tool calls, and bracket commands.

Ollama (local)OpenAI-compatible APInum_ctx: 8192Streaming SSE
4
Step 04 — Tool Execution

Luna acts on what it decides.

Structured tool calls and bracket commands are parsed from the stream. Luna can search the web, fetch pages, control Spotify, launch applications, manage calendar tasks, open dynamic widget overlays, generate 3D scenes, and display map overlays — all with per-tool permission controls.

Web SearchWeb FetchSpotify ControlApp LaunchCalendarDynamic Widgets3D ScenesMaps
5
Step 05 — Memory Update

Facts and context are persisted.

After each exchange, background processes extract new facts from the conversation, update personality vectors, and compact long conversations into summaries. Everything is stored locally in SQLite and ChromaDB — no external database, no cloud sync.

SQLiteChromaDB EmbeddingsFact ExtractionPersonality UpdateConversation Compaction
6
Step 06 — Privacy First

Nothing leaves your machine by default.

Inference, memory, vision, and voice processing all run locally. Cloud APIs (news, markets, Spotify) are opt-in and only contacted for the features they power. Your conversations, facts, and preferences are yours.

Local InferenceLocal StorageNo TelemetryOpt-in Cloud Only

Ready to run it?

Get Luna running locally in minutes with the setup guide, or explore the source on GitHub.