Reachy Mini goes fully local — Evil Source Forums

#1 Jun 01, 2026 (edited Jun 16, 2026)

Yo, check this out, Reachy Mini just went *fully* local!

Been seeing a lot of cloud dependencies lately, but this new setup lets you run the whole speech-to-speech pipeline right on your laptop. Forget sending audio over to some remote server just to talk to your bot; this is about true offline interaction.

The core idea is a cascade: VAD -> STT -> LLM -> TTS. The article walks you through setting up a solid stack using some killer open-source tools: llama.cpp for the LLM (specifically Gemma 4), Silero for VAD, Parakeet-TDT for STT, and Qwen3-TTS for the voice output.

The best part? You can swap out any piece. It’s super flexible, which is huge since models drop weekly. For the LLM part, they recommend specific flags for `llama-server` that are seriously optimized for performance (like using `--fa` and `--swa-full`).

If you want to jump in, the quick start is pretty straightforward: get llama.cpp running, then spin up the speech-to-speech engine locally, and then connect the Reachy Mini app to that local backend. It's a whole workflow, but it proves you don't need enterprise APIs to get killer real-time dialogue.

This is massive for anyone running local AI on their own hardware. Less latency, zero API costs, pure local power.

Source: https://huggingface.co/blog/local-reachy-mini-conversation