Description

This project aims to build a fully local, privacy-preserving AI voice assistant using OLLama, LLaMA 3.2, LMM, and the M5Stack Atom Echo. The system will enable offline speech interaction, running some language processing on a local device or server, without relying on cloud-based APIs. The Atom Echo will act as the voice interface (microphone + speaker), while OLLama will manage local model inference, enabling natural language understanding, reasoning, and spoken responses.

Goals

  • Local-only operation: Ensure all speech recognition, LLM inference, and response generation occur offline for maximum privacy.

  • Voice input/output pipeline: Create a robust flow using the Atom Echo for audio capture and playback.

  • Run LLaMA 3.2 and multimodal models locally: Use OLLama to serve LLaMA 3.2 for conversational logic and use LMM for possible multimodal inputs.

  • Wake word / push-to-talk support: Implement a simple and reliable mechanism for activating the assistant.

  • Low latency: Optimize for fast local inference and smooth user interaction.

  • Extendability: Provide an architecture that can be easily expanded with home-automation hooks, custom commands, or additional sensors.

  • Automation: Provide a way to automate the whole thing "just run an ansible script or terraforming" and have everything in place

Resources

Hardware - M5Stack Atom Echo (microphone + speaker module)

  • Local machine/server for running OLLama ( Linux)

  • A machine where Home assistant is run

Software / Models - OLLama (model runner for local LLM inference)

  • LLaMA 3.2 (base conversational model)

  • LMM (local multimodal model for future image/audio extensions)

  • Local Speech-to-Text engine (e.g., Whisper via OLLama or standalone)

  • Local Text-to-Speech engine (e.g., Coqui TTS, Piper, or on-device TTS)

  • A workable Home-assistant instance

  • Ansible or Terraform

Development Tools - Python or Node.js for writing the assistant logic

  • M5Stack Arduino or UIFlow SDK for programming the Atom Echo

  • MQTT or HTTP for communication between Atom Echo and server

  • Git/GitHub for project versioning

  • Ansible or Terraform

Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 25

Activity

  • 9 days ago: mwilck liked this project.
  • 10 days ago: gcolangiuli liked this project.
  • 10 days ago: mmilella started this project.
  • 10 days ago: mmilella originated this project.

  • Comments

    • mmilella
      2 days ago by mmilella | Reply

      Reached a milestone.

      Project Results

      For this Hack Week project I successfully implemented a local voice assistant setup using:

      Whisper for voice-to-text

      Piper for text-to-speech

      Wake words for Assist used for setting up "wake word" to activate microphone and the AI Voice Both integrated inside Home Assistant

      Ollama running the LLaMA 3.1 1B model to provide the AI/ML logic through the Home Assistant Assist pipeline

      Ollama was hosted on a dedicated server, and during data processing I observed several performance spikes. A future improvement will be upgrading to a more powerful GPU to evaluate how much the performance and response time improve.

      Regarding audio hardware, using the speakerphone itself was straightforward. The tricky part was finding drivers that worked properly with the main server setup (Proxmox host with a VM running Home Assistant).

      Overall, the entire stack is working and responding reliably, and the project goal has been successfully achieved.

    • mmilella
      2 days ago by mmilella | Reply

      What Is Still Missing / Next Steps

      There are still several areas that can be improved or expanded:

      I still need to implement Ansible automation, so the whole setup (Whisper, Piper, Ollama, Home Assistant integrations, drivers, etc.) can be deployed easily and reproducibly.

      The current LLaMA 3.1 1B model works, but it is not very powerful. It can answer queries, but it can also produce incorrect or incomplete results. This limitation is expected, and since the system is offline, it cannot access real-time information — which is intentional for privacy and local-only operation.

      For the voice pipeline, I used the Wyoming Protocol with a single speakerphone. It would be interesting to explore whether I can create a Wyoming Satellite setup: having multiple speakerphones distributed across the home and letting Home Assistant automatically use the one that detects the wake word.

      These improvements will help make the system more scalable, accurate, and practical for everyday use.

    Similar Projects

    This project is one of its kind!