Project Description
I was talking with a friend the other day who is blind. He briefly explained to me how he reads books (the regular, paper printed ones). So, he is taking a photo of each page, passes that to the OCR to extract the text in digital form, then passes that to some text to speech engine to read it out loud.
One of the problems he is facing is that the OCR program he has purchased doesn't handle Unicode letters very well when they have accents on them (e.g. "ή" in "Δημήτρης" would not be recognized most of the time). The other problem is how "manual" the whole process is. Nowadays there are open source ocr solutions that can do a wonderful job with Unicode. There are also APIs that can be used (e.g. https://cloud.google.com/vision/docs/ocr). The whole process can be very much automated too. Maybe there are tools out there that work, but they are probably not free.
I feel there is a problem of motivation in developing software for visually impaired people. The people who care the most about it (e.g. blind people), are the people who can't write it and the people who can write it (people who can see) don't have a need for it. This problem can be easily solved if people who can produce software, do it for the people who can't.
Goal for this Hackweek
So here is (roughly) what I had in mind:
You got a Rasberry pi with a camera attached to it and a hardware button connected to some IO pin. You put the book in front of the camera (maybe on a permanent stand). Every time the user clicks the button, a program does the following automatically:
- detects the books page and aligns it and cuts the non-paper part (optional, it can be the user that has to align it properly, once)
- runs some filters on the image to make it more readable
- passes the image through OCR (open source or some remote API)
- makes guesses and corrections on the text (optional, not sure if there are free tools for this)
- passes the text to a text to speech engine (or API based: https://cloud.google.com/text-to-speech)
It can also support translation of the text in the future (through some online API).
This is meant to be a proof of concept and should be done in one week (Hackweek). One could easily see how a mobile application that does the same job may be easier to distribute and use but it's also more complicated to write. If the proof of concept seems useful and if there is interest, this project could expand its goals.
Resources
None yet.
Looking for hackers with the skills:
Nothing? Add some keywords!
This project is part of:
Hack Week 21
Activity
Comments
-
over 2 years ago by andreas-kupries | Reply
- https://kaerumy.medium.com/cleaning-up-scanned-documents-with-open-source-tools-9d87e15305b
- https://github.com/topics/deskew (Deskewing is the process of rotating a document upright, i.e. align to the vertical)
-
over 2 years ago by DKarakasilis | Reply
Thanks for the links @andreas-kupries ! Hopefully we'll only need to produce some glue code to make the various existing tools work nicely together. Everything should already be there, we just lack a good user experience.
-
over 2 years ago by DKarakasilis | Reply
Another thing I want to do before we start would be a more thorough research for an existing open source solution that does the same. I did look around and didn't find one but let's make sure we are not re-inventing the wheel before we start.
-
over 2 years ago by DKarakasilis | Reply
Fallback in case we are not satisfied with offline libraries: https://cloud.google.com/vision/docs/ocr
-
over 2 years ago by DKarakasilis | Reply
I just tried tesseract (https://github.com/tesseract-ocr/tesseract) though this library: https://github.com/otiai10/gosseract
On opensuse tumbleweed I had to install first: - leptonica-devel - tesseract-ocr-devel - tesseract-ocr-traineddata-greek (for the greek language support) - leptonica-devel
The test program with a sample photo of a book page (rather clumsy and low quality on purpose):
https://gist.github.com/jimmykarily/e5bde4ac64592abd0fc6b52f56c9c20c
Timing:
real 0m4.616s user 0m4.579s sys 0m0.226s
This is promising.
-
over 2 years ago by DKarakasilis | Reply
Also tried espeak (http://espeak.sourceforge.net/) using this docker image: https://github.com/parente/espeakbox
with :
docker run --name espeakbox -d -p 8080:8080 parente/espeakbox
and then:
cvlc http://localhost:8080/speech?text=this%80is%80text&voice=en
and works fine. I couldn't get it to read anything other than English though. Supported voices should be these:
http://espeak.sourceforge.net/languages.html
but changing to Greek makes the thing simply list the letters one by one.
-
over 2 years ago by DKarakasilis | Reply
Must have something to do with the text being url-encoded. If I get a shell to the container with:
docker exec -it espeakbox /bin/sh
and I run this command:espeak -v el "Δεν μιλάω πολύ καλά Ελληνικά" --stdout > file.mp3
I can then copy the file to the host and play it with cvlc:
docker cp espeakbox:file.mp3 . cvlc file.mp3
and this actually sounds like Greek.
-
over 2 years ago by DKarakasilis | Reply
I was looking for a good name for this project and thought of "readforme". I googled that to check if it was already used by something else and found this: https://www.readforme.io/
The result of this page is outstanding! (I tried with a screenshot of a wikipedia page)
-
over 2 years ago by DKarakasilis | Reply
And this page lists various Amazon services that could act as fallbacks if no open source solution can be used: https://www.readforme.io/static/media/rfm_arch.3f287400.png
-
over 2 years ago by DKarakasilis | Reply
A list of TTS engines to try out: https://medevel.com/14-os-text-to-speech/
-
over 2 years ago by DKarakasilis | Reply
and a way to detect blocks of text: https://www.geeksforgeeks.org/text-detection-and-extraction-using-opencv-and-ocr/
-
over 2 years ago by DKarakasilis | Reply
These are some useful libraries they are using: https://gitlab.gnome.org/World/OpenPaperwork/libpillowfight
-
over 2 years ago by DKarakasilis | Reply
Started a project here: https://github.com/jimmykarily/open-ocr-reader
-
almost 2 years ago by mcepl | Reply
Just to say that in the TTS arena I have excellent experience with mimic (packaged for openSUSE). Sound is much better than espeak, not sure about its support for Greek. Oh, it has https://mycroftai.github.io/mimic3-voices/samples/elGR/rapunzelinalow/sample.wav (https://mycroft.ai/mimic-3/).
Similar Projects
This project is one of its kind!