Problems with PDF/A-Conversion

Hallo,
we have some strange Problems when converting PDF to PDF/A. We are using Aspose-PDF 22.1 (Java) but the same problem is also in Aspose-PDF 21.9 (Java).

The code we use to convert (especially for reduced file-size):
PdfFormatConversionOptions options = new PdfFormatConversionOptions(outputLogStream, PdfFormat.PDF_A_2A, ConvertErrorAction.Delete);
options.setOptimizeFileSize(false);
options.setTransparencyAction(ConvertTransparencyAction.Default);
options.setConvertSoftMaskAction(ConvertSoftMaskAction.ConvertToStencilMask);
doc.convert(options);
doc.optimize();
options.setOptimizeFileSize(true);
doc.convert(options);
doc.save(pdfOutPut);

So, what’s the problem? Find attached: the original PDF, the converted to PDFA2A and one converted to PDFX1A. Please compare the original and the pdfa2a on pages 2,4,5 etc. You will see some strange overlays of text.
And now compare the original with the pdfx1a. The text-parts are now ok, but: Compare page 6, and you will see, that some pictures are discarded.
We have found the same behaviour in other PDF in our production-system, but I can not share this, because that are booking-informations. I can only send three partial-screenshots (please find attached).
So, my question is: Did we do something wrong, or what should we do to get reliable PDFA (looking same than original PDF)

Thank you and kind regards, Gerd

Buchungsprotokoll-original.jpg (33.9 KB)
Buchungsprotokoll-pdfa2a.jpg (31.9 KB)
Buchungsprotokoll-pdfx1a.jpg (29.2 KB)

1014008_0100020000000025_OvercomingObjections_V8.pdf (2.6 MB)
1014008_0100020000000025_OvercomingObjections_V8.pdfa2a.pdf (3.0 MB)
1014008_0100020000000025_OvercomingObjections_V8.pdfx1a_opt.pdf (3.5 MB)

@GRein

We have logged this problem in our issue tracking system as PDFJAVA-41508. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Thank you, tahir.
Please note: The text-overlays are really critical for us, because this concerns a lot of customers in our production-systems. I would be happy, if you can resolve this problem fast. The problem here is: When customer try to convert with free pdf24, then there is no problem, and they face me with this. You understand, that I have some problem to argue. Nervertheless, Aspose is a really great product, I really like it. Today, I also tried to convert the pdf’s with iceblue-Product, and I didn’t see any problem. Please don’t let me start thinking if another product fits better our expectations.
I look forward to your response.
Kind regards, Gerd

@GRein

We have logged your concerns with same issue in our issue tracking system. We will inform you once there is an update available on your issue.

Hi there,
2 more points, perhaps it helps for your analysis:

1.) I converted the pdf (see above, Overcoming Objections) via Aspose to Word-DOCX. No problem. Then I converted it back to PDF. No problem. After that I made the OCR. I get 39 images (and not 40000 as before). Then I converted to PDFA2A. Everything ok.

2.) I did the same thing with the “Buchungsprotokoll”-File. Convert PDF to DOCX: Very good. Convert back to PDF and PDFA2A: Very good. And one more benefit: The filesize of the PDF/A is small (and not over 10 times bigger as before).

Here I can give you an anonymized version of the resulting docx.

Perhaps this can help you for your analysis of the bug(s).

Regards, GerdBuchungsprotokoll-anonym.docx (5.2 MB)

@GRein

Thanks for sharing the detail. We will investigate the issue and let you know about its update.