Simple code to add more languages
Please add support for selecting and reading Hebrew and all other right to left language texts you can simply cenvert it to image on the backend and this will make it simpler using simple texts extract
-
mendy mann commented
Simple code explained PDF to Image:
The PDF page is rendered onto a hidden <canvas> element using pdfjs-dist.
The canvas is converted into an image (PNG format) for OCR processing.
OCR with Tesseract.js:
The image is passed to Tesseract.js, which performs OCR to extract text.
The correct language is loaded to ensure accurate recognition -
mendy mann commented
Hope adobe can implement this for the benefit of all humanity
-
mendy mann commented
Because PDF.js extracts raw text data, but if the font encoding is not properly interpreted, the output will look like random characters or symbols.