Project Description
Currently openQA requires a reference image to be stored to do OCR based comparisons. It is not possible to pass a character string to openQA which should be compared to the text in the screenshot. This project is about allowing to just store character strings in the corresponding JSON file of the needle and to get rid of any reference images in case of OCR needles.
Status
Research about possible tools was done. The result was that the current implementation based on Tesseract appears to be too inaccurate on short character strings. The program GOCR seems to do more classical recognition by shape which seems to work reasonably accurate on well shaped characters. The accuracy of the matched strings could be calculated using the library perl-Text-Levenshtein.
Goal for this Hackweek
- Create draft implementation of OCR in os-autoinst.
- Optional: Create easy handling of text based OCR needles in openQA web frontend (e.g. providing live preview of recognized text)
Resources
- This project is tracked here: https://progress.opensuse.org/issues/121354
- openQA frontend repo: https://github.com/os-autoinst/openQA
- openQA backend repo: https://github.com/os-autoinst/os-autoinst
- GOCR: https://wasd.urz.uni-magdeburg.de/jschulen/ocr/
- Perl-Text-Levenshtein: https://github.com/neilb/Text-Levenshtein
Looking for hackers with the skills:
This project is part of:
Hack Week 22
Activity
Comments
-
8 months ago by okurz | Reply
There is very basic support for OCR in os-autoinst with https://github.com/os-autoinst/os-autoinst/blob/master/ocr.pm which might give you some good ideas and a starting base. https://github.com/os-autoinst/os-autoinst/blob/master/t/02-test_ocr.t shows its usage
-
7 months ago by clanig | Reply
Created draft PR: https://github.com/os-autoinst/os-autoinst/pull/2276
Similar Projects