Mozilla's DeepSpeech project[1] is using TensorFlow and some paper from Baidu to make an open source speech to text system, based on deep learning (TensorFlow). The current project allow the training for own local datasets, but also there is a pre-trained model that can be used during the development.

The goal of the project is:

  • Connect to mumble or to the local audio stream
  • Connect to etherpad
  • Map the sound to text, and write it into the etherpad
  • Have fun how funny accents break the system
  • Redo the etherpad based on what you remember from the meeting and send it to the RESULT mailing list

[1] https://github.com/mozilla/DeepSpeech

Looking for hackers with the skills:

speech tensorflow

This project is part of:

Hack Week 17

Activity

  • over 6 years ago: mbrugger liked this project.
  • over 6 years ago: aplanas started this project.
  • over 6 years ago: ancorgs liked this project.
  • over 6 years ago: aplanas added keyword "speech" to this project.
  • over 6 years ago: aplanas added keyword "tensorflow" to this project.
  • over 6 years ago: aplanas originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    Make more sense of openQA test results using AI by livdywan

    Description

    AI has the potential to help with something many of us spend a lot of time doing which is making sense of openQA logs when a job fails.

    User Story

    Allison Average has a puzzled look on their face while staring at log files that seem to make little sense. Is this a known issue, something completely new or maybe related to infrastructure changes?

    Goals

    • Leverage a chat interface to help Allison
    • Create a model from scratch based on data from openQA
    • Proof of concept for automated analysis of openQA test results

    Bonus

    • Use AI to suggest solutions to merge conflicts
      • This would need a merge conflict editor that can suggest solving the conflict
    • Use image recognition for needles

    Resources

    Timeline

    Day 1

    • Conversing with open-webui to teach me how to create a model based on openQA test results

    Day 2

    Highlights

    • I briefly tested compared models to see if they would make me more productive. Between llama, gemma and mistral there was no amazing difference in the results for my case.
    • Convincing the chat interface to produce code specific to my use case required very explicit instructions.
    • Asking for advice on how to use open-webui itself better was frustratingly unfruitful both in trivial and more advanced regards.
    • Documentation on source materials used by LLM's and tools for this purpose seems virtually non-existent - specifically if a logo can be generated based on particular licenses

    Outcomes

    • Chat interface-supported development is providing good starting points and open-webui being open source is more flexible than Gemini. Although currently some fancy features such as grounding and generated podcasts are missing.
    • Allison still has to be very experienced with openQA to use a chat interface for test review. Publicly available system prompts would make that easier, though.