Adobe Acrobat DC enoding

Dear Ladies and Gentlemen,

I am French fund of literature and work for editors in particular in view of publications of translations from Indian literature into French.

I am thus using most programs of the Adobe Creative Suite, because I also manage a Web site devoted to the promotion of Indian literature to French-speaking people (www.chatranjali.fr).

The purpose of my request is the following:
I gather PDF documents from either Indian sites, or Indian writers or their agents. They are thus written in one of the numerous languages and scripts used in the Indian subcontinent: devanagari mainly, but also bengali, tamil, etc, Unfortunately, these PDF files are very often encoded as "Identity-H" (I don't really understand what this means but I have understood that this is not the international norm Unicode), and the font information shown in the files' metadata is not really clear: e.g. "CID-Font+F1", "CID-Font+F2", etc.

The exportation of the file in MS word does not work (instead of getting the initial document, I have a set of useless signs).

My question is thus: is it possible to retrieve the text from a PDF file encoded like that?

Thanking you in advance for your answers,

Yours faithfully,
Pascal Garin

1 vote

Pascal Garin shared this idea · Feb 13, 2019 · Report… · Admin →

An error occurred while saving the comment

Feature request / Bug report

Feedback

Acrobat for Windows and Mac: Exporting PDFs

Adobe Acrobat DC enoding

Your importance score has been recorded.

Acrobat for Windows and Mac: Exporting PDFs

Categories

Adobe Acrobat DC enoding

We're glad you're here

Your importance score has been recorded.

We're glad you're here

We're glad you're here