Yo, check this out, Reachy Mini just went *fully* local!<br> <br> Been seeing a lot of cloud dependencies lately, but this new setup lets you run the whole speech-to-speech pipeline right on your laptop. Forget sending audio over to some remote server just to talk to your bot; this is about true offline interaction.<br> <br> The core idea is a cascade: VAD -> STT -> LLM -> TTS. The article walks you through setting up a solid stack using some killer open-source tools: llama.cpp for the LLM (specifically Gemma 4), Silero for VAD, Parakeet-TDT for STT, and Qwen3-TTS for the voice output.<br> <br> The best part? You can swap out any piece. It’s super flexible, which is huge since models drop weekly. For the LLM part, they recommend specific flags for `llama-server` that are seriously optimized for performance (like using `--fa` and `--swa-full`).<br> <br> If you want to jump in, the quick start is pretty straightforward: get llama.cpp running, then spin up the speech-to-speech engine locally, and then connect the Reachy Mini app to that local backend. It's a whole workflow, but it proves you don't need enterprise APIs to get killer real-time dialogue.<br> <br> This is massive for anyone running local AI on their own hardware. Less latency, zero API costs, pure local power.<br> <br> Source: https://huggingface.co/blog/local-reachy-mini-conversation