Adobe Acrobat DC enoding
Dear Ladies and Gentlemen,
I am French fund of literature and work for editors in particular in view of publications of translations from Indian literature into French.
I am thus using most programs of the Adobe Creative Suite, because I also manage a Web site devoted to the promotion of Indian literature to French-speaking people (www.chatranjali.fr).
The purpose of my request is the following:
I gather PDF documents from either Indian sites, or Indian writers or their agents. They are thus written in one of the numerous languages and scripts used in the Indian subcontinent: devanagari mainly, but also bengali, tamil, etc, Unfortunately, these PDF files are very often encoded as "Identity-H" (I don't really understand what this means but I have understood that this is not the international norm Unicode), and the font information shown in the files' metadata is not really clear: e.g. "CID-Font+F1", "CID-Font+F2", etc.
The exportation of the file in MS word does not work (instead of getting the initial document, I have a set of useless signs).
My question is thus: is it possible to retrieve the text from a PDF file encoded like that?
Thanking you in advance for your answers,