How it works
From your first word to the final response, every step of how L.U.N.A. processes, acts, and learns. Entirely local.
Request lifecycle
Luna accepts input through the chat UI, push-to-talk voice, wake-word detection, or scheduled proactive follow-ups. On desktop, screen and camera frames can be submitted to the vision pipeline.
Before calling the LLM, Luna assembles a rich context window. Memory facts, personality state, active activities, current calendar tasks, recent vision observations, and conversation history are all injected — giving the model full situational awareness.
The assembled prompt is sent to your configured LLM provider — local Ollama by default, or any OpenAI-compatible endpoint. The model streams a structured response containing answer tokens, tool calls, and bracket commands.
Structured tool calls and bracket commands are parsed from the stream. Luna can search the web, fetch pages, control Spotify, launch applications, manage calendar tasks, open dynamic widget overlays, generate 3D scenes, and display map overlays — all with per-tool permission controls.
After each exchange, background processes extract new facts from the conversation, update personality vectors, and compact long conversations into summaries. Everything is stored locally in SQLite and ChromaDB — no external database, no cloud sync.
Inference, memory, vision, and voice processing all run locally. Cloud APIs (news, markets, Spotify) are opt-in and only contacted for the features they power. Your conversations, facts, and preferences are yours.
Get Luna running locally in minutes with the setup guide, or explore the source on GitHub.