Skip to content

← Share your feedback on Acrobat DC

Settings and activity

Simple code to add more languages

2 votes

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

We’ll send you updates on this idea

5 comments · Acrobat Reader for Android » File List and User Interface · Delete… · Admin →

How important is this to you?

We're glad you're here
Please sign in to leave feedback

Signed in as (Sign out)

Close

Close

An error occurred while saving the comment

mendy mann commented · Jan 13, 2025

Simple code explained PDF to Image:
The PDF page is rendered onto a hidden <canvas> element using pdfjs-dist.
The canvas is converted into an image (PNG format) for OCR processing.
OCR with Tesseract.js:
The image is passed to Tesseract.js, which performs OCR to extract text.
The correct language is loaded to ensure accurate recognition

Submitting...

An error occurred while saving the comment

mendy mann commented · Jan 13, 2025

Hope adobe can implement this for the benefit of all humanity

Submitting...

An error occurred while saving the comment

mendy mann commented · Jan 13, 2025

Because PDF.js extracts raw text data, but if the font encoding is not properly interpreted, the output will look like random characters or symbols.

Submitting...

mendy mann shared this idea · Jan 13, 2025