Improve OCR stability
Acrobat DC, Windows 10, 16 GB RAM, relatively fast processor.
(1) I frequently must OCR documents of 2,000 pages or more. With Acrobat for Mac (which I used from 2012 to 2018), I could start the OCR process, go away for an hour or two or three, and it would be finished with the entire document when I returned. With Acrobat DC for Windows, if I try to OCR more than about 150 pages at a time, Acrobat will crash. Not immediately, but after it's run for 10 minutes or so. All progress will be lost. I will have to specify a smaller range of pages, and restart the OCR process. OCR'ing less than 150 pages at a time means I must manually restart the process every few minutes. Which means I cannot sustain focused attention on any other task while Acrobat processes the file.
(2) Some data errors (bad font width, etc.) will cause the OCR process to stop, or Acrobat to crash. I return to my computer expecting the entire file to have been OCR'd, and I find it stopped after a few pages and I must restart the process. Acrobat should be able to skip the page with the data error, OCR the rest of the document, and generate an error report identifying the page(s) that could not be OCR'd.
-
Henrik commented
Improving the process will always be beneficial, making the OCR process multi-threaded would be helpful as we are moving towards more and more processing cores! (24 currently)