Voice
Configure speech-to-text, wake-word detection, push-to-talk, TTS voice selection, and emotion analysis.
Overview
Luna's voice pipeline is fully local. Audio is captured by your microphone, transcribed on-device with faster-whisper, processed by the LLM, and spoken back with pyttsx3. No audio leaves your machine.
The pipeline runs as the voice_runtime background process (backend/processes/voice_runtime/) and exposes its state through /api/voice/. The frontend's VoiceOrb component connects to this state and drives the animated listening indicator.
The Electron shell requests microphone access on first launch. On Windows you may also need to enable microphone access for desktop apps in Settings → Privacy → Microphone.
Speech-to-text (STT)
Luna uses faster-whisper — a CTranslate2-optimised Whisper implementation — for transcription. It runs entirely on your CPU or GPU.
Model selection
faster-whisper auto-downloads the model on first use. The default is base. Change it in .env:
# Options: tiny, base, small, medium, large-v2, large-v3
whisper_model=base| Model | RAM | Speed | Accuracy |
|---|---|---|---|
tiny | ~200 MB | Very fast | Low — good for clear speech |
base | ~400 MB | Fast | Good — recommended default |
small | ~500 MB | Moderate | Better — noisy environments |
medium | ~1.5 GB | Slow | High — multiple speakers |
faster-whisper can use CUDA if torch with CUDA support is installed. Setwhisper_device=cuda in .env to enable it. Falls back to CPU silently.
Wake word
Luna listens continuously in the background for a wake word. When detected, it begins capturing the following utterance and sends it to the STT model.
# Enable wake-word detection
wake_word_enabled=true
# The word or phrase Luna listens for (case-insensitive)
wake_word=hey lunaThe wake-word detector runs on a lightweight energy-based heuristic — it does not send audio to the LLM until the wake word is confidently detected. This keeps CPU usage near zero while idle.
In noisy environments, short wake words like "luna" may trigger unexpectedly. Use a longer phrase like "hey luna" or "okay luna" to reduce false positives.
Push-to-talk
Push-to-talk is always available in the frontend regardless of wake-word settings. Hold the microphone button in the InputBar to record, release to transcribe and send.
The recording indicator uses the VoiceOrb component (frontend/src/components/Voice/VoiceOrb.tsx) which animates based on audio amplitude from the useVoiceRecorder hook.
While the chat input is focused, hold Space to trigger push-to-talk.
Text-to-speech (TTS)
Luna speaks responses using pyttsx3, which wraps your OS's native TTS engine — SAPI5 on Windows, NSSpeechSynthesizer on macOS, and eSpeak on Linux.
# Enable TTS
tts_enabled=true
# Speaking rate (words per minute). Default is 150.
tts_rate=150
# Voice index — 0 is your first system voice, 1 is the second, etc.
tts_voice_index=0Listing available voices
To find available voice indices on your system, run:
import pyttsx3
engine = pyttsx3.init()
for i, voice in enumerate(engine.getProperty('voices')):
print(i, voice.name, voice.languages)Set tts_voice_index to the index of your preferred voice in .env.
Go to Settings → Time & language → Speech → Add voices to install additional high-quality neural TTS voices. They will appear in the pyttsx3 voice list.
Emotion detection
Luna analyses the emotional tone of transcribed speech to update the personality engine. The analysis runs in backend/services/voice.py using keyword heuristics and sentiment scoring — no external model is required.
Detected emotions influence the emotional_support dimension of the personality state, causing Luna to respond with more or less empathy based on your current mood.
Troubleshooting
Voice says "off" in the UI
- Check that
tts_enabled=trueis set in.env. - Check microphone permissions in the OS and in Electron.
- Open the backend logs and look for
Microphone opened OK. If absent, the audio device initialisation failed. - Try a different audio device by setting
audio_device_indexin.env.
Wake word never triggers
- Confirm
wake_word_enabled=truein.env. - Check the backend log for
[voice_runtime] listening for wake word. - Speak clearly and at a normal volume — the detector needs a minimum energy threshold.
TTS not speaking
- Confirm
tts_enabled=true. - On Windows, check that a SAPI5 voice is installed. Run the listing snippet above to verify.
- If pyttsx3 raises an error, try
pip install pyttsx3 --upgrade.