Project Description

I was talking with a friend the other day who is blind. He briefly explained to me how he reads books (the regular, paper printed ones). So, he is taking a photo of each page, passes that to the OCR to extract the text in digital form, then passes that to some text to speech engine to read it out loud.

One of the problems he is facing is that the OCR program he has purchased doesn't handle Unicode letters very well when they have accents on them (e.g. "ή" in "Δημήτρης" would not be recognized most of the time). The other problem is how "manual" the whole process is. Nowadays there are open source ocr solutions that can do a wonderful job with Unicode. There are also APIs that can be used (e.g. https://cloud.google.com/vision/docs/ocr). The whole process can be very much automated too. Maybe there are tools out there that work, but they are probably not free.

I feel there is a problem of motivation in developing software for visually impaired people. The people who care the most about it (e.g. blind people), are the people who can't write it and the people who can write it (people who can see) don't have a need for it. This problem can be easily solved if people who can produce software, do it for the people who can't.

Goal for this Hackweek

So here is (roughly) what I had in mind:

You got a Rasberry pi with a camera attached to it and a hardware button connected to some IO pin. You put the book in front of the camera (maybe on a permanent stand). Every time the user clicks the button, a program does the following automatically:

  • detects the books page and aligns it and cuts the non-paper part (optional, it can be the user that has to align it properly, once)
  • runs some filters on the image to make it more readable
  • passes the image through OCR (open source or some remote API)
  • makes guesses and corrections on the text (optional, not sure if there are free tools for this)
  • passes the text to a text to speech engine (or API based: https://cloud.google.com/text-to-speech)

It can also support translation of the text in the future (through some online API).

This is meant to be a proof of concept and should be done in one week (Hackweek). One could easily see how a mobile application that does the same job may be easier to distribute and use but it's also more complicated to write. If the proof of concept seems useful and if there is interest, this project could expand its goals.

Resources

None yet.

Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 21

Activity

  • 3 months ago: hennevogel liked this project.
  • 3 months ago: andreas-kupries liked this project.
  • 3 months ago: rbonafiglia joined this project.
  • 3 months ago: Pi-Cla liked this project.
  • 3 months ago: crameleon liked this project.
  • 3 months ago: mmanno liked this project.
  • 3 months ago: DKarakasilis liked this project.
  • 4 months ago: gkalog joined this project.
  • 4 months ago: ph03nix liked this project.
  • 4 months ago: DKarakasilis started this project.
  • 4 months ago: DKarakasilis originated this project.

  • Comments

    • andreas-kupries
    • andreas-kupries
      4 months ago by andreas-kupries | Reply

    • DKarakasilis
      4 months ago by DKarakasilis | Reply

      Thanks for the links @andreas-kupries ! Hopefully we'll only need to produce some glue code to make the various existing tools work nicely together. Everything should already be there, we just lack a good user experience.

    • DKarakasilis
      4 months ago by DKarakasilis | Reply

      Another thing I want to do before we start would be a more thorough research for an existing open source solution that does the same. I did look around and didn't find one but let's make sure we are not re-inventing the wheel before we start.

    • andreas-kupries
      4 months ago by andreas-kupries | Reply

      +1 research

    • DKarakasilis
      4 months ago by DKarakasilis | Reply

      Fallback in case we are not satisfied with offline libraries: https://cloud.google.com/vision/docs/ocr

    • DKarakasilis
      3 months ago by DKarakasilis | Reply

      I just tried tesseract (https://github.com/tesseract-ocr/tesseract) though this library: https://github.com/otiai10/gosseract

      On opensuse tumbleweed I had to install first: - leptonica-devel - tesseract-ocr-devel - tesseract-ocr-traineddata-greek (for the greek language support) - leptonica-devel

      The test program with a sample photo of a book page (rather clumsy and low quality on purpose):

      https://gist.github.com/jimmykarily/e5bde4ac64592abd0fc6b52f56c9c20c

      Timing:

      real    0m4.616s
      user    0m4.579s
      sys 0m0.226s
      

      This is promising.

    • DKarakasilis
      3 months ago by DKarakasilis | Reply

      Also tried espeak (http://espeak.sourceforge.net/) using this docker image: https://github.com/parente/espeakbox

      with :

      docker run --name espeakbox -d -p 8080:8080 parente/espeakbox
      

      and then:

      cvlc http://localhost:8080/speech?text=this%80is%80text&voice=en

      and works fine. I couldn't get it to read anything other than English though. Supported voices should be these:

      http://espeak.sourceforge.net/languages.html

      but changing to Greek makes the thing simply list the letters one by one.

    • DKarakasilis
      3 months ago by DKarakasilis | Reply

      Must have something to do with the text being url-encoded. If I get a shell to the container with:

      docker exec -it espeakbox /bin/sh
      

      and I run this command:

      espeak -v el  "Δεν μιλάω πολύ καλά Ελληνικά" --stdout > file.mp3
      

      I can then copy the file to the host and play it with cvlc:

      docker cp espeakbox:file.mp3 .
      cvlc file.mp3
      

      and this actually sounds like Greek.

    • DKarakasilis
      3 months ago by DKarakasilis | Reply

      I was looking for a good name for this project and thought of "readforme". I googled that to check if it was already used by something else and found this: https://www.readforme.io/

      The result of this page is outstanding! (I tried with a screenshot of a wikipedia page)

    • DKarakasilis
      3 months ago by DKarakasilis | Reply

      And this page lists various Amazon services that could act as fallbacks if no open source solution can be used: https://www.readforme.io/static/media/rfm_arch.3f287400.png

    • DKarakasilis
      3 months ago by DKarakasilis | Reply

      A list of TTS engines to try out: https://medevel.com/14-os-text-to-speech/

    • DKarakasilis
    • DKarakasilis
    • DKarakasilis
      3 months ago by DKarakasilis | Reply

      These are some useful libraries they are using: https://gitlab.gnome.org/World/OpenPaperwork/libpillowfight

    • DKarakasilis

    Similar Projects

    This project is one of its kind!