Voice

Configure speech-to-text, wake-word detection, push-to-talk, TTS voice selection, and emotion analysis.

Overview

Luna's voice pipeline is fully local. Audio is captured by your microphone, transcribed on-device with faster-whisper, processed by the LLM, and spoken back with pyttsx3. No audio leaves your machine.

The pipeline runs as the voice_runtime background process (backend/processes/voice_runtime/) and exposes its state through /api/voice/. The frontend's VoiceOrb component connects to this state and drives the animated listening indicator.

ℹ️
Microphone permission

The Electron shell requests microphone access on first launch. On Windows you may also need to enable microphone access for desktop apps in Settings → Privacy → Microphone.

Speech-to-text (STT)

Luna uses faster-whisper — a CTranslate2-optimised Whisper implementation — for transcription. It runs entirely on your CPU or GPU.

Model selection

faster-whisper auto-downloads the model on first use. The default is base. Change it in .env:

.env
# Options: tiny, base, small, medium, large-v2, large-v3
whisper_model=base
ModelRAMSpeedAccuracy
tiny~200 MBVery fastLow — good for clear speech
base~400 MBFastGood — recommended default
small~500 MBModerateBetter — noisy environments
medium~1.5 GBSlowHigh — multiple speakers
💡
GPU acceleration

faster-whisper can use CUDA if torch with CUDA support is installed. Setwhisper_device=cuda in .env to enable it. Falls back to CPU silently.

Wake word

Luna listens continuously in the background for a wake word. When detected, it begins capturing the following utterance and sends it to the STT model.

.env
# Enable wake-word detection
wake_word_enabled=true

# The word or phrase Luna listens for (case-insensitive)
wake_word=hey luna

The wake-word detector runs on a lightweight energy-based heuristic — it does not send audio to the LLM until the wake word is confidently detected. This keeps CPU usage near zero while idle.

⚠️
False positives

In noisy environments, short wake words like "luna" may trigger unexpectedly. Use a longer phrase like "hey luna" or "okay luna" to reduce false positives.

Push-to-talk

Push-to-talk is always available in the frontend regardless of wake-word settings. Hold the microphone button in the InputBar to record, release to transcribe and send.

The recording indicator uses the VoiceOrb component (frontend/src/components/Voice/VoiceOrb.tsx) which animates based on audio amplitude from the useVoiceRecorder hook.

💡
Keyboard shortcut

While the chat input is focused, hold Space to trigger push-to-talk.

Text-to-speech (TTS)

Luna speaks responses using pyttsx3, which wraps your OS's native TTS engine — SAPI5 on Windows, NSSpeechSynthesizer on macOS, and eSpeak on Linux.

.env
# Enable TTS
tts_enabled=true

# Speaking rate (words per minute). Default is 150.
tts_rate=150

# Voice index — 0 is your first system voice, 1 is the second, etc.
tts_voice_index=0

Listing available voices

To find available voice indices on your system, run:

Python
import pyttsx3
engine = pyttsx3.init()
for i, voice in enumerate(engine.getProperty('voices')):
    print(i, voice.name, voice.languages)

Set tts_voice_index to the index of your preferred voice in .env.

ℹ️
Installing more voices on Windows

Go to Settings → Time & language → Speech → Add voices to install additional high-quality neural TTS voices. They will appear in the pyttsx3 voice list.

Emotion detection

Luna analyses the emotional tone of transcribed speech to update the personality engine. The analysis runs in backend/services/voice.py using keyword heuristics and sentiment scoring — no external model is required.

Detected emotions influence the emotional_support dimension of the personality state, causing Luna to respond with more or less empathy based on your current mood.

Troubleshooting

Voice says "off" in the UI

  • Check that tts_enabled=true is set in .env.
  • Check microphone permissions in the OS and in Electron.
  • Open the backend logs and look for Microphone opened OK. If absent, the audio device initialisation failed.
  • Try a different audio device by setting audio_device_index in .env.

Wake word never triggers

  • Confirm wake_word_enabled=true in .env.
  • Check the backend log for [voice_runtime] listening for wake word.
  • Speak clearly and at a normal volume — the detector needs a minimum energy threshold.

TTS not speaking

  • Confirm tts_enabled=true.
  • On Windows, check that a SAPI5 voice is installed. Run the listing snippet above to verify.
  • If pyttsx3 raises an error, try pip install pyttsx3 --upgrade.