Computers cannot read
In our first blog in this series about Interactive Visual Testing we presented an example of a broken UI. Demonstrating a language selection dropdown, where the localized entry for French (“Français”) was rendered incorrectly due to the special character ç. In short, we showed you how this kind of problem can be detected with Interactive Visual Testing.
However, it turns out, the same special character can also cause problems when automating the test case itself. In order to find the value in the dropdown list, we are applying optical character recognition (OCR). Moreover, depending on the specific OCR algorithm. Hence, there is a good chance that it will not recognize the special character ç and just read “Francais”, “Frangais” or even “Fran5ais”. In conclusion, this will then cause a test failure, as the correct selection cannot be found.
What is more, similar confusions in OCR happen frequently for characters such as ‘I’, ‘l’, ‘1’ or ‘O’ and ‘0’.
Why 99% accuracy is not good enough
A good OCR algorithm claims to be about 99% accurate, so isn’t this good enough? No, for two main reasons: First this accuracy is usually determined by scanning a whole page of printed text with good image quality. Since the image you get from your UI has usually much less image information (96 dpi vs 300 dpi for a good scan), the accuracy you will get is probably lower.
But the second reason is even more important. If you have 1000 characters on your UI, 99 percent accuracy means there are still 10 erroneous characters. What if even one of those characters is in the text you are looking for? It will lead to a test failure! We can play the game of percentages here: 99% accuracy, 10 characters per search text, 2 OCR searches per test case, 20 test cases: This will result in an overall probability of getting all of them right of 0.99(10*2*20) = 1.8%. This is clearly not acceptable.
So how can we reliably find text on the screen with these imperfect algorithms?
The secret of Reverse OCR
The idea behind Reverse OCR is that we can take advantage of the fact that we know already what should be written on the screen. In the example in the introduction, we are applying OCR to find a specific value on the UI.
So instead of starting with the screen and checking the output of the OCR algorithm, we reverse the process. We can start with the known output and use the OCR algorithm to find where this text appears on the screen. With this method we can achieve much more reliable and robust results when searching a text on the screen.
How does it work technically?
In practice Reverse OCR can be implemented by comparing the search string with every detected string on the screen. We then assign a difference metric to every detected string, based on how different it is from the string we were looking for.
This difference metric is directly influenced by the probability of the different characters detected by the OCR algorithm.
Let us look at a simplified example: We are looking for the string “IoC”. For the given text (100 IoC lock ABC) that is shown on the screen, let us assume the OCR algorithm will detect the following characters with the respective probabilities:
As we can see in this example, a pure OCR approach which uses the best guess for every character would return the text “loc” where we expect “IoC” and therefore we could not find what we were looking for. With Reverse OCR we are looking at the difference per word and will use the text with the smallest difference. In this case we did find our search string “IoC” at the correct location with a difference of 0.45. Of course, we can define up to which difference we are considering something a match. It could be that the text we are looking for is not on the screen and the best match would then lie above this tolerance value.
No need for dictionary
Classical OCR algorithms often apply a dictionary to correct for characters that were read incorrectly. But in software testing we often deal with words that do not appear in a dictionary, either because they are technical terms or because we are searching for generated IDs, names and similar words that are not part of a dictionary.
With Reverse OCR we do not need a dictionary. While classical OCR compares every detected word with the dictionary and returns the most likely match, in Reverse OCR we are comparing against the search text directly.
What if we do not know the search text
But there can also be situations where we do not know the search text in advance. In test automation there are three different situations where we are using OCR.
- Finding a text on the screen so we can interact with it (as described in the language dropdown example).
- Verifying a given text appears on the screen (or does not appear); for instance, to verify that a newly created customer appears in a list.
- Reading a text from the screen that was generated by the application under test, e.g., an identification number for the created customer that will be used later in the test case.
In the first two situations we can apply Reverse OCR as described above. However, in the third situation we do not know the search text in advance. But we can still improve the output of the OCR algorithm. In most cases we still know what kind of text we are expecting to appear. Maybe we know that the generated identifier is of the form ID-abc-00xyz, where a, b, and c are letters and x, y and z are digits. This knowledge allows us to prioritize characters that match the expected pattern even though the OCR algorithm detected them with a lower probability. This in turn will lead to a much better success rate.
Conclusion and outlook
With Reverse OCR you can be confident that your test case will be able to successfully select “Français” in the language dropdown and change the display language of your application. Furthermore, you would then expect the text on the screen to change to French. Also it will continue with the next verification step in your test case. Except that changing the language might take some time, maybe the whole page needs to be re-rendered. In conclusion, your test case needs to wait until this transition is completed. How to do this without hard-coded waits in your test code will be the topic of the next blog in the series about Interactive Visual Testing.
Leave a Reply