Currently openQA requires a reference image to be stored to do OCR based comparisons. It is not possible to pass a character string to openQA which should be compared to the text in the screenshot. This project is about allowing to just store character strings in the corresponding JSON file of the needle and to get rid of any reference images in case of OCR needles.
Research about possible tools was done. The result was that the current implementation based on Tesseract appears to be too inaccurate on short character strings. The program GOCR seems to do more classical recognition by shape which seems to work reasonably accurate on well shaped characters. The accuracy of the matched strings could be calculated using the library perl-Text-Levenshtein.
Goal for this Hackweek
- Create draft implementation of OCR in os-autoinst.
- Optional: Create easy handling of text based OCR needles in openQA web frontend (e.g. providing live preview of recognized text)
- This project is tracked here: https://progress.opensuse.org/issues/121354
- openQA frontend repo: https://github.com/os-autoinst/openQA
- openQA backend repo: https://github.com/os-autoinst/os-autoinst
- GOCR: https://wasd.urz.uni-magdeburg.de/jschulen/ocr/
- Perl-Text-Levenshtein: https://github.com/neilb/Text-Levenshtein
This project is part of:
Hack Week 22