Word to PDF-A does not generate indexed images

Hello.

Tried Words Java 18.8, I’m afraid it still doesn’t make images indexed if saving to PDF-A_1B.
Saving to PDF works.
Is this by design, for instance for a PDF-A constraint? Or are further investigation needed?

Thank you.

@renato.mauro,

Please ZIP and upload your 1) input Word document, 2) piece of source code to reproduce and 3) Aspose.Words generated output PDF file showing the undesired behavior here for testing. We will then investigate the issue on our end and provide you more information.

Here they are.

18.3: object 23 0 obj is indexed, length 31 0 obj 7968
18.8: object 27 0 obj is not indexed, length 35 0 obj 10175

Thank you.

AsposeTestIndexedColorspace.zip (199.6 KB)

@renato.mauro,

For the sake of any correction, we have logged this problem in our issue tracking system. The ID of this issue is WORDSJAVA-1883. We will further look into the details of this problem and will keep you updated on the status of this issue. We apologize for your inconvenience.

@renato.mauro,

Regarding WORDSJAVA-1883, we have completed the analysis of your issue and come to a conclusion that this issue is actually not a bug in latest version of Aspose.Words.

There is a difference between the files which we got. Your file (AsposeConverted_18_8.pdf) was created with a restricted license (evaluation mode). This file has an additional background image. It is rather blurred, but still visible. You may see it in the middle of each page. It contains “Aspose” label and icon of the left. Basically, this file weights more because of that image. If you specify the license, everything will be as you expect.

Hello Awais.

Thank you for your analysis. I’m afraid that something can still be investigated about.

We tried using a valid license, and we simply still reproduce the behavior: till version 18.3, you produced an indexed image saving to both PDF or PDF-A; from version 18.4 ahead, you produce an indexed image saving to PDF and a NOT indexed image saving to any kind of PDF-A. The difference is clear comparing the PDF file content, as shown in the following jpg image (please, refer to the yellow bands), where the colorspace changes from Indexed DeviceRGB with its palette to pure DeviceRGB.

PDFA_18_3_VS_18_4.jpg (125.8 KB)

Please, note that the image is rendered perfectly in any case, this is good and not object of this forum’s thread.
The point is why, even using a licensed version, the image is output as NOT indexed in the PDF-A file. Moving from Indexed RGB to pure RGB it’s obvious that the image stream size is greater.
If this is a proven PDF-A constraint and you can tell us what PDF-A specification’s chapter has such rule, that indexed images are not allowed, ok, we have to live with it. Otherwise, please, we really need your help to see if it is possible to have the optimal behavior experienced till version 18.3.

Thank you so much for your help.

@renato.mauro,

Thanks for your inquiry. To ensure a timely and accurate response, please ZIP and attach the following resources here for testing:

  • Your input Word document
  • Aspose.Words 18.3 generated PDF file showing the correct behavior
  • Aspose.Words 18.10 generated PDF file showing the undesired behavior
  • Please also list complete steps to verify that 18.10 version does not produce indexed-images.
  • Please also create a standalone simple Console Application (source code without compilation errors) that helps us to reproduce your problem on our end and attach it here for testing. Please do not include Aspose.Words .jar files in it to reduce the file size.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

Hi Awais.

Here what you requested
4 ImageNotIndexedOnPdfA.zip (168.5 KB)

I pray you and your team to consider that this is what I have already sent you twice in the past, for version 18.6 and version 18.8. Now you have the same input, output and code for version 18.10.
This time I also added a readme.txt file; I’m sure you and your team will be so kind to read everything, as every other thing I have already provided you.
Nothing changed but the Words version; the problem is the same, the behavior is the same, the code is the same.
If something is not clear, please, ask a focused question and I will be glad to answer to help us all to the maximum extent in order to solve this issue.
I renew my appreciation for your fantastic product; I’m sure you and your team will manage to deeply investigate this problem.

Thank you for your help and patience.

@renato.mauro,

Thanks for the details. While using the latest version of Aspose.Words i.e. 18.10, we managed to reproduce this issue on our end. We have logged another issue in our bug tracking system. The ID of this issue is WORDSJAVA-1926. Your thread has been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.

@renato.mauro,

Regarding WORDSJAVA-1926, we have completed the work on your issue and concluded to close this issue as Not a Bug. Please see the following analysis details:

An image is saved with index if the PdfCompliance.PDF_15 or default settings are set, but for PDF/A options index is lost. It was caused by the standards and our implementation of them.

  1. According to the standards “transparency and layers” are forbidden. Please, look at the link (PDF/A FAQ – PDF Association) and the attached picture - screenshot01.png. The image in your example has an alpha layer. We have replaced all transparent pixels with red color to highlight them (screenshot02.png). To save this image properly it’s necessary to remove alpha channel. Here we can say more about our implementation of this piece of code, but in general each time when the picture is edited, image’s palette can be lost and image becomes RGB or ARGB. It happened in our case as well. Basically, the rule is “only unchanged image can keep all initial features such as palette (index)”.

  2. Then according to the standards (screenshot03.png) this image is saved into the stream by using ZIP compression. We wouldn’t say that the output zipped data weights significantly more than the indexed image, but in our example the difference is visible, about 2Kb.

So, how to keep indexed image in the output PDF. You can use any of the following option.

  1. Use PdfCompliance.PDF_15 if possible
  2. Use PDF/A but be sure that images don’t have transparency. If you manually remove the transparent layer from the image then the image won’t be changed while saving and it will be saved as is with the index, see screenshot04.png.

We have attached all resultant PDF documents and the Logo.gif image without transparency, see result.zip

We just started to support PDF/A standard more strictly. PDF/A prohibited transparency → the image contains the transparency → Aspose.Words converts the image to non-indexed. So, this is a standard behavior, and there is nothing we can fix in Aspose.Words.

Attachment: screenshots and results.zip (429.2 KB)

1 Like

Thanks for your explanation, clear and complete.

The issues you have found earlier (filed as WORDSJAVA-1926) have been fixed in this Aspose.Words for .NET 18.12 update and this Aspose.Words for Java 18.12 update.

Hello.

Being WORDSJAVA-1926 marked as ‘not a bug’, what changed in Aspose.Words 18.12 about this?

Thank you.

@renato.mauro,

It looks like bug fix notification was posted by mistake.

WORDSJAVA-1883 was closed as Not a Bug. Please refer to the details in my previous post.

WORDSJAVA-1926 was also closed as Not a Bug. Please refer to the following post.
https://forum.aspose.com/t/word-to-pdf-a-does-not-generate-indexed-images/182230/10?u=awais.hafeez