SUSE Hack Week: Local AI Voice Assistant

Description

This project aims to build a fully local, privacy-preserving AI voice assistant using OLLama, LLaMA 3.2, LMM, and the M5Stack Atom Echo. The system will enable offline speech interaction, running some language processing on a local device or server, without relying on cloud-based APIs. The Atom Echo will act as the voice interface (microphone + speaker), while OLLama will manage local model inference, enabling natural language understanding, reasoning, and spoken responses.

Goals

Local-only operation: Ensure all speech recognition, LLM inference, and response generation occur offline for maximum privacy.
Voice input/output pipeline: Create a robust flow using the Atom Echo for audio capture and playback.
Run LLaMA 3.2 and multimodal models locally: Use OLLama to serve LLaMA 3.2 for conversational logic and use LMM for possible multimodal inputs.
Wake word / push-to-talk support: Implement a simple and reliable mechanism for activating the assistant.
Low latency: Optimize for fast local inference and smooth user interaction.
Extendability: Provide an architecture that can be easily expanded with home-automation hooks, custom commands, or additional sensors.
Automation: Provide a way to automate the whole thing "just run an ansible script or terraforming" and have everything in place

Resources

Hardware - M5Stack Atom Echo (microphone + speaker module)

Local machine/server for running OLLama ( Linux)
A machine where Home assistant is run

Software / Models - OLLama (model runner for local LLM inference)

LLaMA 3.2 (base conversational model)
LMM (local multimodal model for future image/audio extensions)
Local Speech-to-Text engine (e.g., Whisper via OLLama or standalone)
Local Text-to-Speech engine (e.g., Coqui TTS, Piper, or on-device TTS)
A workable Home-assistant instance
Ansible or Terraform

Development Tools - Python or Node.js for writing the assistant logic

M5Stack Arduino or UIFlow SDK for programming the Atom Echo
MQTT or HTTP for communication between Atom Echo and server
Git/GitHub for project versioning
Ansible or Terraform

Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 25

Activity

about 2 months ago: mwilck liked this project.

about 2 months ago: gcolangiuli liked this project.

about 2 months ago: mmilella started this project.

about 2 months ago: mmilella originated this project.

Comments

about 2 months ago by mmilella | Reply

Reached a milestone.

Project Results

For this Hack Week project I successfully implemented a local voice assistant setup using:

Whisper for voice-to-text

Piper for text-to-speech

Wake words for Assist used for setting up "wake word" to activate microphone and the AI Voice Both integrated inside Home Assistant

Ollama running the LLaMA 3.1 1B model to provide the AI/ML logic through the Home Assistant Assist pipeline

Ollama was hosted on a dedicated server, and during data processing I observed several performance spikes. A future improvement will be upgrading to a more powerful GPU to evaluate how much the performance and response time improve.

Regarding audio hardware, using the speakerphone itself was straightforward. The tricky part was finding drivers that worked properly with the main server setup (Proxmox host with a VM running Home Assistant).

Overall, the entire stack is working and responding reliably, and the project goal has been successfully achieved.

about 2 months ago by mmilella | Reply

What Is Still Missing / Next Steps

There are still several areas that can be improved or expanded:

I still need to implement Ansible automation, so the whole setup (Whisper, Piper, Ollama, Home Assistant integrations, drivers, etc.) can be deployed easily and reproducibly.

The current LLaMA 3.1 1B model works, but it is not very powerful. It can answer queries, but it can also produce incorrect or incomplete results. This limitation is expected, and since the system is offline, it cannot access real-time information — which is intentional for privacy and local-only operation.

For the voice pipeline, I used the Wyoming Protocol with a single speakerphone. It would be interesting to explore whether I can create a Wyoming Satellite setup: having multiple speakerphones distributed across the home and letting Home Assistant automatically use the one that detects the wake word.

These improvements will help make the system more scalable, accurate, and practical for everyday use.

Similar Projects

This project is one of its kind!