Acrobat crashes on malformed PDF from Apple Preview macOS X 10.14.2: stray /Pg keys in tagged PDF
This bug report was submitted to Apple on 2019-01-17 (Apple bug iD 47350683). As it has masssive impact on user experience in Acrobat (unable to save / save as, regardless whether anything was edited in Acrobat; crashing when opening or having open the Tags panel) I am sharing this here:
Summary:
When saving out a tagged PDF opened in Preview, Preview/Quartz seems to rewrite the whole PDF structure and - while doing so - messes up the data structures below the StructTreeRoot entry in the Catalog dictionary. For leaf objects inside the StructTreeRoot tree structure, a "Pg" entry is required that points to the page on which that structure element is present. On the node elements inside the StructTreeRoot tree structure, the "Pg" entry is not useful (entries below a node could be on more than one page). Preview/Quartz injects a "Pg" entry under all those nodes, with the associated indirect object ID having non-existing values (in the instances I checked very large numbers, e.g. "3918973936" where 26 would be the largest indirect object ID number in the PDF).
The big issue is that such PDFs can possibly not be used anymore in other widely used tools, such as Acrobat Pro DC (Acrobat cannot "save as" nor apply changes with/to the PDF; this will possibly impact a very large number of users). I have not yet tested with other PDF reading or editing applications. Also, there is currently no known mechanism to fix such broken PDFs.
Steps to Reproduce:
- takle any tagged PDF
- open in Preview
- "Export as PDF..."
- save to disk
- possible effects:
- - open Acrobat, open Tags panel, -> Acrobat crashes immediately
- - inspect using a low level PDF inspection tool, look for entries in node elements under StructTreeRoot, find "Pg" entries - which should not be present
Expected Results:
- no "Pg" entries injected in node elements under StructTreeRoot
Actual Results:
- "Pg" entries are injected in all node elements undr StructTreeRoot
Version/Build:
macOS Version 10.14.2 (Build 18C54) Quartz PDFContext
Configuration:
not applicable
Steps needed tro make Acrobat viewer deal with such broken PDFs: make sure that it does not stumble over stray /Pg entries (and absurd indirect object numbers)
Please get in touch if you need more info...
Olaf