SUSE Hack Week: Open Source book reader for visually impaired/blind

Project Description

I was talking with a friend the other day who is blind. He briefly explained to me how he reads books (the regular, paper printed ones). So, he is taking a photo of each page, passes that to the OCR to extract the text in digital form, then passes that to some text to speech engine to read it out loud.

One of the problems he is facing is that the OCR program he has purchased doesn't handle Unicode letters very well when they have accents on them (e.g. "ή" in "Δημήτρης" would not be recognized most of the time). The other problem is how "manual" the whole process is. Nowadays there are open source ocr solutions that can do a wonderful job with Unicode. There are also APIs that can be used (e.g. https://cloud.google.com/vision/docs/ocr). The whole process can be very much automated too. Maybe there are tools out there that work, but they are probably not free.

I feel there is a problem of motivation in developing software for visually impaired people. The people who care the most about it (e.g. blind people), are the people who can't write it and the people who can write it (people who can see) don't have a need for it. This problem can be easily solved if people who can produce software, do it for the people who can't.

Goal for this Hackweek

So here is (roughly) what I had in mind:

You got a Rasberry pi with a camera attached to it and a hardware button connected to some IO pin. You put the book in front of the camera (maybe on a permanent stand). Every time the user clicks the button, a program does the following automatically:

detects the books page and aligns it and cuts the non-paper part (optional, it can be the user that has to align it properly, once)
runs some filters on the image to make it more readable
passes the image through OCR (open source or some remote API)
makes guesses and corrections on the text (optional, not sure if there are free tools for this)
passes the text to a text to speech engine (or API based: https://cloud.google.com/text-to-speech)

It can also support translation of the text in the future (through some online API).

This is meant to be a proof of concept and should be done in one week (Hackweek). One could easily see how a mobile application that does the same job may be easier to distribute and use but it's also more complicated to write. If the proof of concept seems useful and if there is interest, this project could expand its goals.

Resources

None yet.

Join this project Leave this project

Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 21

Activity

almost 2 years ago: Gabriel_Alado liked this project.

almost 2 years ago: Gabriel_Alado joined this project.

over 3 years ago: hennevogel liked this project.

over 3 years ago: andreas-kupries liked this project.

over 3 years ago: rbonafiglia joined this project.

over 3 years ago: Pi-Cla liked this project.

over 3 years ago: crameleon liked this project.

over 3 years ago: mmanno liked this project.

over 3 years ago: DKarakasilis liked this project.

over 3 years ago: gkalog joined this project.

over 3 years ago: ph03nix liked this project.

over 3 years ago: DKarakasilis started this project.

over 3 years ago: DKarakasilis originated this project.

Comments

over 3 years ago by andreas-kupries | Reply

over 3 years ago by andreas-kupries | Reply
- (Deskewing is the process of rotating a document upright, i.e. align to the vertical)

over 3 years ago by DKarakasilis | Reply

Thanks for the links @andreas-kupries ! Hopefully we'll only need to produce some glue code to make the various existing tools work nicely together. Everything should already be there, we just lack a good user experience.

over 3 years ago by DKarakasilis | Reply

Another thing I want to do before we start would be a more thorough research for an existing open source solution that does the same. I did look around and didn't find one but let's make sure we are not re-inventing the wheel before we start.

over 3 years ago by andreas-kupries | Reply

+1 research

over 3 years ago by DKarakasilis | Reply

Fallback in case we are not satisfied with offline libraries: https://cloud.google.com/vision/docs/ocr

over 3 years ago by DKarakasilis | Reply

I just tried tesseract (https://github.com/tesseract-ocr/tesseract) though this library: https://github.com/otiai10/gosseract

On opensuse tumbleweed I had to install first: - leptonica-devel - tesseract-ocr-devel - tesseract-ocr-traineddata-greek (for the greek language support) - leptonica-devel

The test program with a sample photo of a book page (rather clumsy and low quality on purpose):

https://gist.github.com/jimmykarily/e5bde4ac64592abd0fc6b52f56c9c20c

Timing:

real 0m4.616s user 0m4.579s sys 0m0.226s

This is promising.

over 3 years ago by DKarakasilis | Reply

Also tried espeak (http://espeak.sourceforge.net/) using this docker image: https://github.com/parente/espeakbox

with :

docker run --name espeakbox -d -p 8080:8080 parente/espeakbox

and then:

cvlc http://localhost:8080/speech?text=this%80is%80text&voice=en

and works fine. I couldn't get it to read anything other than English though. Supported voices should be these:

http://espeak.sourceforge.net/languages.html

but changing to Greek makes the thing simply list the letters one by one.

over 3 years ago by DKarakasilis | Reply

Must have something to do with the text being url-encoded. If I get a shell to the container with:

docker exec -it espeakbox /bin/sh and I run this command:

espeak -v el "Δεν μιλάω πολύ καλά Ελληνικά" --stdout > file.mp3

I can then copy the file to the host and play it with cvlc:

docker cp espeakbox:file.mp3 . cvlc file.mp3

and this actually sounds like Greek.

over 3 years ago by DKarakasilis | Reply

I was looking for a good name for this project and thought of "readforme". I googled that to check if it was already used by something else and found this: https://www.readforme.io/

The result of this page is outstanding! (I tried with a screenshot of a wikipedia page)

over 3 years ago by DKarakasilis | Reply

And this page lists various Amazon services that could act as fallbacks if no open source solution can be used: https://www.readforme.io/static/media/rfm_arch.3f287400.png

over 3 years ago by DKarakasilis | Reply

A list of TTS engines to try out: https://medevel.com/14-os-text-to-speech/

over 3 years ago by DKarakasilis | Reply

and a way to detect blocks of text: https://www.geeksforgeeks.org/text-detection-and-extraction-using-opencv-and-ocr/

over 3 years ago by DKarakasilis | Reply

Useful: https://openpaper.work/en/

over 3 years ago by DKarakasilis | Reply

These are some useful libraries they are using: https://gitlab.gnome.org/World/OpenPaperwork/libpillowfight

over 3 years ago by DKarakasilis | Reply

Started a project here: https://github.com/jimmykarily/open-ocr-reader

almost 3 years ago by mcepl | Reply

Just to say that in the TTS arena I have excellent experience with mimic (packaged for openSUSE). Sound is much better than espeak, not sure about its support for Greek. Oh, it has https://mycroftai.github.io/mimic3-voices/samples/elGR/rapunzelinalow/sample.wav (https://mycroft.ai/mimic-3/).

Similar Projects

This project is one of its kind!

Project Description

Goal for this Hackweek

Resources

Looking for hackers with the skills:

This project is part of:

Activity

Comments

over 3 years ago by andreas-kupries | Reply

over 3 years ago by andreas-kupries | Reply

over 3 years ago by DKarakasilis | Reply

over 3 years ago by DKarakasilis | Reply

over 3 years ago by andreas-kupries | Reply

over 3 years ago by DKarakasilis | Reply

over 3 years ago by DKarakasilis | Reply

over 3 years ago by DKarakasilis | Reply

over 3 years ago by DKarakasilis | Reply

over 3 years ago by DKarakasilis | Reply

over 3 years ago by DKarakasilis | Reply

over 3 years ago by DKarakasilis | Reply

over 3 years ago by DKarakasilis | Reply

over 3 years ago by DKarakasilis | Reply

over 3 years ago by DKarakasilis | Reply

over 3 years ago by DKarakasilis | Reply

almost 3 years ago by mcepl | Reply

Similar Projects