Convert document to PDF compliance PDF/A with text compression using C#

We are trying to convert MS Word to PDF/a. Our output is failing validation tests and we are being told the text is not getting compressed.
Is there a way to compress text? We tried the “flate” option when saving and it has no effect.
Is there are reason the PDF/a is failing validation? Is it something we are doing?
Attached is a PDF and the code we are using.
John

Hi John,

Please accept my apologies for late response.

Thanks for your query. You are doing correct to compress text in PDF. Aspose.Words mimics the same behavior as MS Words do. Please convert document to PDF/a by using MS word, you will get the same results.

Hope this answers your query. Please let us know, If you have any more queries.

That is not the case. If I save using MS Word the file is smaller. Attached are two file the one with wordpdf1b in the name was created by saving the file in MS Word. THe other was with Aspose (see previous post for the code we used)

Hi John,

Please accept my apologies for late response.

A PDF/A file might be marginally larger than the original PDF file it was created from (provided they don’t use different image resolutions or compression methods). Fonts are embedded in a PDF/A file (which is often also the case in “normal” PDF files) and more information is stored in the metadata. Some color profiles could, in certain cases, lead to a much larger file size, but this is rare and is highly dependent on the particular case.

However, I have asked for the details from our development team and will update your about PdfA1b text compression asap.

We apologies for your inconvenience.

Thanks.
I understand pdf/a is larger than regular PDF. I referring to your earlier post where you state the Aspose PDF/a should be the same as MS Word pdf/a. In my tests the MS Word PDF/a is much smaller. Also, the MS Word PDF/a passes validation tests using Acrobat.
We need to output pdf/a 1b.
Thanks for all of your help.

Hi John,

My apologies, You are right about the size of MS word PDF/A-1.

I have received response from our development team and like to share with you that the PDF/A-1 specification does not allow LZW text compression so it is not available in Aspose.Words for .NET.

The Flate compression is allowed by PDF/A-1 specification but currently we do not use text compression for PDF/A-1 specification.

We apologize for your inconvenience.

Hi John,

I have received response from our development team about text compression with PDF/A. I like to share with you that in next release of Aspose.Words, the TextCompression option will work correctly for PDF/A.

Hi,

Is this issue with TextCompression and PDF generation fixed? If I enable TextCompression it is still an uncompressed PDF file.

Regards,

Peter

Hi Peter,

Thanks for your query. It would be great if you please share your document for investigation purposes. I have used the following code snippet and have not found any issue.

Document doc = new Document(MyDir +"in.doc"); 
PdfSaveOptions option = new PdfSaveOptions();
option.SaveFormat = SaveFormat.Pdf;
option.TextCompression = PdfTextCompression.Flate;
option.JpegQuality = 60;
option.Compliance = PdfCompliance.PdfA1b;
doc.Save(MyDir + "AsposeOut.pdf", option);

Hi Tahir,

That’s the code I use indeed. It’s not an issue for a particular document. Take any document you like.

But this doesn’t answer my question

Is this issue with TextCompression and PDF generation fixed? If I enable TextCompression it is still an uncompressed PDF file.

Peter

Hi Peter,
Please accept my apologies for late response.
Yes, the TextCompression issue has been fixed in latest version of Aspose.Words for .NET. Please find the output and input documents in attachment.
Uncompressed PDF file:

Document doc = new Document(MyDir + "in.docx");
PdfSaveOptions option = new PdfSaveOptions();
option.SaveFormat = SaveFormat.Pdf;
option.TextCompression = PdfTextCompression.None;
option.Compliance = PdfCompliance.PdfA1b;
doc.Save(MyDir + "AsposeOut-None.pdf", option);

Compressed PDF file:

Document doc = new Document(MyDir + "in.docx");
PdfSaveOptions option = new PdfSaveOptions();
option.SaveFormat = SaveFormat.Pdf;
option.TextCompression = PdfTextCompression.Flate;
option.JpegQuality = 60;
option.Compliance = PdfCompliance.PdfA1b;
doc.Save(MyDir + "AsposeOut-Flate.pdf", option);

Hi Tahir,

Yes, I tested it and it indeed works.

The problem that I have seems to be related to a font being used in the source file. This is licensed font, that’s not available. The Windows machine chooses the font that best matches and that’s MS PGothic. But this seems to be a font that supports a Japanese character set (16bt. The outputted PDF cannot be processed by another application. I can open it in Acrobat Reader though. If I convert the same document with another PDF converter (Bullzip) the other application has no problems with processing the resulting PDF.

Is there a know issue with converting Japanese character fonts?

Cheers,

Peter

Hi Peter,

Thanks for sharing the information. It would be great if you please share your document for investigation purposes.

this is a solved issue. We are all set. Thanks for the help

Hi Peter,

Thanks for your feedback. Please let us know if you have any more queries.