OCR using AI
The existing OCR function can create really strange interpretations on letters that otherwise look pretty clear to the human eye. Would propose that the OCR use AI language functionality support to encourage higher accuracy in translating the images to text.
-
Yoichi commented
OCR should take into account the following factors:
- The OCR accuracy for Japanese, for instance, with its complex character set, needs significant improvement. This is crucial for basic functionality and ensuring the software remains useful for users dealing with non-Latin texts.
- My current process covers manually correcting OCR errors in a separate word processor software (Microsoft Word), which is highly inconvenient. An integrated, efficient correction tool within Acrobat would be a huge step forward.
- A feature to provide feedback on OCR performance directly within Acrobat is necessary. This would be effective for Adobe's development team to refine the OCR algorithm.
- Sharing these challenges with users would be beneficial if there are technical or legal hurdles in enhancing OCR accuracy. Unless you can indicate how far you are willing to compromise and whether there is an alternative, we will continue to use the system while being dissatisfied with its functional aspects.
It’s frustrating to imagine the significant time and cost users consume dealing with this incomplete feature while, at the same time, someone is profiting from the product that includes this feature. I firmly believe this situation requires an urgent improvement.
-
Jolyon commented
Here's a great one.
I've screencapped the image that it was trying to OCR.
The result?
J. B. \VI-II'l'TOW
COME ON Adobe, you can do better than this.
-
Jolyon commented
I can give countless examples of where the OCR in Acrobat creates terrible results. It messes up common words where it should be obvious that the word says
Common and not Cornrnon (for example)
I don't even know why it asks us to select the language for the OCR when it doesn't seem to do ANY sanity checking or using that for hints to improve OCR quality.
Right now I haven't seen any improvement in the OCR engine quality for a decade, and as others have said the rivals are getting better and better.
Invest in this please Adobe