Option to eliminate words with mixed number and letters in OCR
Provide a check off that functions to prevent the OCR from mixing numbers and symbols in words.
For example: Custodian should not be read Cu&t0d1an - only letters should be recognized in words.
-
mcmca commented
Considering this is a 3-year-old post, I'd have to say the feature has not advanced while other OCR tools have, even those on handheld devices. Basic machine learning should be able to discern that it's more likely that Cu&t0d1an is a dictionary word than its low-probability translation--especially if considering "known" error rates of misreading zero versus "O" and one versus "I" or "l" or even "!"--and the manual review process would allow the user to correct it if it were an outlier case of mixed character types. As a "Pro" tool from the premier document company, I'm surprised at the lack of attention this vital tool receives.
-
vg1039 commented
The OCR in this version stinks, so why not go back to the same OCR in Adobe XI Professional? Many of our issues branch from the text boxes overwriting text or numerical values in the text boxes and convert to erroneous numbers so we have no clue of the error unless you compare the scan with the original. One year later...same problems.
-
Hugo Blasdel commented
To not replace a practice that works for Medicare numbers with one that is inappropriate, why not work to add a "dumb" AI which develops a rule set for each type of document. Where it met an anomaly, the original image OCR would be temporarily linked to a list to be resolved. With human resolution, other cases of the same kind could have the rule applied along with the first attempt and the image and any need for further conditions reviewed. While one can never be certain and flagging as an anomaly is better than a bad choice, this method allows Adobe OCR to be tuned to organizational needs and the diversity of its document flow. The AI might be found in the many forms of R Programming Language packages, prototyped there. And then the GUI built into Adobe Reader using code parallel to that in R.
-
Adminrishusha (Admin, Adobe) commented
Hi Brian,
Thanks for suggestion. Given the complexity of OCR engine in Acrobat we will dig deeper into the feasibility of the proposed feature.
Thanks
Rishabh -
Adminrishusha (Admin, Adobe) commented
Hi Jon,
OCR is getting better day by day and we are working to make it work with almost all the real life documents we come up with and obtain from various resources.
This is an ongoing process as we can't check on ALL the documents in the world. Thanks for your interest in reporting about the quality of OCR. We will try our best to make it better.
Thanks
-
Jon Salternate commented
Can we just say it? The OCR is no longer anywhere near as good as it use to be.
-
Jeffrey commented
After OCR, would like spell check feature to review OCR errors beyond correct recognized text feature. OCR has errors. Also, would like to have the option to tell OCR to search but selectively remove symbols from OCR responses. many documents have no "~" or "|" (shift backslash) also would like to eliminate the search for British pound symbol. The user would then selectively help the OCR to eliminate some responses.