Project Description

I was talking with a friend the other day who is blind. He briefly explained to me how he reads books (the regular, paper printed ones). So, he is taking a photo of each page, passes that to the OCR to extract the text in digital form, then passes that to some text to speech engine to read it out loud.

One of the problems he is facing is that the OCR program he has purchased doesn't handle Unicode letters very well when they have accents on them (e.g. "ή" in "Δημήτρης" would not be recognized most of the time). The other problem is how "manual" the whole process is. Nowadays there are open source ocr solutions that can do a wonderful job with Unicode. There are also APIs that can be used (e.g. https://cloud.google.com/vision/docs/ocr). The whole process can be very much automated too. Maybe there are tools out there that work, but they are probably not free.

I feel there is a problem of motivation in developing software for visually impaired people. The people who care the most about it (e.g. blind people), are the people who can't write it and the people who can write it (people who can see) don't have a need for it. This problem can be easily solved if people who can produce software, do it for the people who can't.

Goal for this Hackweek

So here is (roughly) what I had in mind:

You got a Rasberry pi with a camera attached to it and a hardware button connected to some IO pin. You put the book in front of the camera (maybe on a permanent stand). Every time the user clicks the button, a program does the following automatically:

  • detects the books page and aligns it and cuts the non-paper part (optional, it can be the user that has to align it properly, once)
  • runs some filters on the image to make it more readable
  • passes the image through OCR (open source or some remote API)
  • makes guesses and corrections on the text (optional, not sure if there are free tools for this)
  • passes the text to a text to speech engine (or API based: https://cloud.google.com/text-to-speech)

It can also support translation of the text in the future (through some online API).

This is meant to be a proof of concept and should be done in one week (Hackweek). One could easily see how a mobile application that does the same job may be easier to distribute and use but it's also more complicated to write. If the proof of concept seems useful and if there is interest, this project could expand its goals.

Resources

None yet.

Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 21

Activity

  • 11 months ago: Gabriel_Alado liked this project.
  • 11 months ago: Gabriel_Alado joined this project.
  • over 2 years ago: hennevogel liked this project.
  • over 2 years ago: andreas-kupries liked this project.
  • over 2 years ago: rbonafiglia joined this project.
  • over 2 years ago: Pi-Cla liked this project.
  • over 2 years ago: crameleon liked this project.
  • over 2 years ago: mmanno liked this project.
  • over 2 years ago: DKarakasilis liked this project.
  • over 2 years ago: gkalog joined this project.
  • over 2 years ago: ph03nix liked this project.
  • over 2 years ago: DKarakasilis started this project.
  • over 2 years ago: DKarakasilis originated this project.

  • Comments

    • andreas-kupries
      over 2 years ago by andreas-kupries | Reply

    • andreas-kupries
      over 2 years ago by andreas-kupries | Reply

      • (Deskewing is the process of rotating a document upright, i.e. align to the vertical)

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      Thanks for the links @andreas-kupries ! Hopefully we'll only need to produce some glue code to make the various existing tools work nicely together. Everything should already be there, we just lack a good user experience.

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      Another thing I want to do before we start would be a more thorough research for an existing open source solution that does the same. I did look around and didn't find one but let's make sure we are not re-inventing the wheel before we start.

    • andreas-kupries
      over 2 years ago by andreas-kupries | Reply

      +1 research

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      Fallback in case we are not satisfied with offline libraries: https://cloud.google.com/vision/docs/ocr

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      I just tried tesseract (https://github.com/tesseract-ocr/tesseract) though this library: https://github.com/otiai10/gosseract

      On opensuse tumbleweed I had to install first: - leptonica-devel - tesseract-ocr-devel - tesseract-ocr-traineddata-greek (for the greek language support) - leptonica-devel

      The test program with a sample photo of a book page (rather clumsy and low quality on purpose):

      https://gist.github.com/jimmykarily/e5bde4ac64592abd0fc6b52f56c9c20c

      Timing:

      real 0m4.616s user 0m4.579s sys 0m0.226s

      This is promising.

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      Also tried espeak (http://espeak.sourceforge.net/) using this docker image: https://github.com/parente/espeakbox

      with :

      docker run --name espeakbox -d -p 8080:8080 parente/espeakbox

      and then:

      cvlc http://localhost:8080/speech?text=this%80is%80text&voice=en

      and works fine. I couldn't get it to read anything other than English though. Supported voices should be these:

      http://espeak.sourceforge.net/languages.html

      but changing to Greek makes the thing simply list the letters one by one.

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      Must have something to do with the text being url-encoded. If I get a shell to the container with:

      docker exec -it espeakbox /bin/sh and I run this command:

      espeak -v el "Δεν μιλάω πολύ καλά Ελληνικά" --stdout > file.mp3

      I can then copy the file to the host and play it with cvlc:

      docker cp espeakbox:file.mp3 . cvlc file.mp3

      and this actually sounds like Greek.

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      I was looking for a good name for this project and thought of "readforme". I googled that to check if it was already used by something else and found this: https://www.readforme.io/

      The result of this page is outstanding! (I tried with a screenshot of a wikipedia page)

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      And this page lists various Amazon services that could act as fallbacks if no open source solution can be used: https://www.readforme.io/static/media/rfm_arch.3f287400.png

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      A list of TTS engines to try out: https://medevel.com/14-os-text-to-speech/

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      and a way to detect blocks of text: https://www.geeksforgeeks.org/text-detection-and-extraction-using-opencv-and-ocr/

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      Useful: https://openpaper.work/en/

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      These are some useful libraries they are using: https://gitlab.gnome.org/World/OpenPaperwork/libpillowfight

    • DKarakasilis
      over 2 years ago by DKarakasilis | Reply

      Started a project here: https://github.com/jimmykarily/open-ocr-reader

    • mcepl
      almost 2 years ago by mcepl | Reply

      Just to say that in the TTS arena I have excellent experience with mimic (packaged for openSUSE). Sound is much better than espeak, not sure about its support for Greek. Oh, it has https://mycroftai.github.io/mimic3-voices/samples/elGR/rapunzelinalow/sample.wav (https://mycroft.ai/mimic-3/).

    Similar Projects

    This project is one of its kind!