Description
This project aims to build a fully local, privacy-preserving AI voice assistant using OLLama, LLaMA 3.2, LMM, and the M5Stack Atom Echo. The system will enable offline speech interaction, running some language processing on a local device or server, without relying on cloud-based APIs. The Atom Echo will act as the voice interface (microphone + speaker), while OLLama will manage local model inference, enabling natural language understanding, reasoning, and spoken responses.
Goals
Local-only operation: Ensure all speech recognition, LLM inference, and response generation occur offline for maximum privacy.
Voice input/output pipeline: Create a robust flow using the Atom Echo for audio capture and playback.
Run LLaMA 3.2 and multimodal models locally: Use OLLama to serve LLaMA 3.2 for conversational logic and use LMM for possible multimodal inputs.
Wake word / push-to-talk support: Implement a simple and reliable mechanism for activating the assistant.
Low latency: Optimize for fast local inference and smooth user interaction.
Extendability: Provide an architecture that can be easily expanded with home-automation hooks, custom commands, or additional sensors.
Automation: Provide a way to automate the whole thing "just run an ansible script or terraforming" and have everything in place
Resources
Hardware - M5Stack Atom Echo (microphone + speaker module)
Local machine/server for running OLLama ( Linux)
A machine where Home assistant is run
Software / Models - OLLama (model runner for local LLM inference)
LLaMA 3.2 (base conversational model)
LMM (local multimodal model for future image/audio extensions)
Local Speech-to-Text engine (e.g., Whisper via OLLama or standalone)
Local Text-to-Speech engine (e.g., Coqui TTS, Piper, or on-device TTS)
A workable Home-assistant instance
Ansible or Terraform
Development Tools - Python or Node.js for writing the assistant logic
M5Stack Arduino or UIFlow SDK for programming the Atom Echo
MQTT or HTTP for communication between Atom Echo and server
Git/GitHub for project versioning
Ansible or Terraform
Looking for hackers with the skills:
Nothing? Add some keywords!
This project is part of:
Hack Week 25
Activity
Comments
-
2 days ago by mmilella | Reply
Reached a milestone.
Project Results
For this Hack Week project I successfully implemented a local voice assistant setup using:
Whisper for voice-to-text
Piper for text-to-speech
Wake words for Assist used for setting up "wake word" to activate microphone and the AI Voice Both integrated inside Home Assistant
Ollama running the LLaMA 3.1 1B model to provide the AI/ML logic through the Home Assistant Assist pipeline
Ollama was hosted on a dedicated server, and during data processing I observed several performance spikes. A future improvement will be upgrading to a more powerful GPU to evaluate how much the performance and response time improve.
Regarding audio hardware, using the speakerphone itself was straightforward. The tricky part was finding drivers that worked properly with the main server setup (Proxmox host with a VM running Home Assistant).
Overall, the entire stack is working and responding reliably, and the project goal has been successfully achieved.
-
2 days ago by mmilella | Reply
What Is Still Missing / Next Steps
There are still several areas that can be improved or expanded:
I still need to implement Ansible automation, so the whole setup (Whisper, Piper, Ollama, Home Assistant integrations, drivers, etc.) can be deployed easily and reproducibly.
The current LLaMA 3.1 1B model works, but it is not very powerful. It can answer queries, but it can also produce incorrect or incomplete results. This limitation is expected, and since the system is offline, it cannot access real-time information — which is intentional for privacy and local-only operation.
For the voice pipeline, I used the Wyoming Protocol with a single speakerphone. It would be interesting to explore whether I can create a Wyoming Satellite setup: having multiple speakerphones distributed across the home and letting Home Assistant automatically use the one that detects the wake word.
These improvements will help make the system more scalable, accurate, and practical for everyday use.
Similar Projects
This project is one of its kind!