PDF is not Editable | DOCX to PDF Conversion using .NET

Hi,

When converting DOCX to PDF, I noticed that some pages in the PDF are not editable by Adobe Acrobat DC Pro, which also caused an error when trying to add header&footers into the PDF.

Code:

var doc = DocumentHelper.OpenReadOnly(@"test.docx");
doc.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;

var options = new PdfSaveOptions();
options.ExportDocumentStructure = true;
doc.Save(@"\out.pdf", options);

When using the ‘Edit PDF’ feature in Acrobat Pro, I found that page 1 supports editing properly but none of the page 2 content is editable. Then if I tried to add headers into the document, an error prompted that prevented the operation completely:

image.png (48.2 KB)

The error normally means there are corrupted/incompatible elements in the PDF.

Our clients have complained about this because this made the PDF cannot be properly manipulated at all even with the official Acrobat software, since they need to add headers into the document.

The issue doesn’t happen with Aspose PdfA1b compliance, but if I saved the DOCX to PDF manually, even with Pdf15 or Pdf17, the issue didn’t happen either.

I’ve attached an example document for your reference.

tests.zip (393.2 KB)

It would also be very helpful if you could let me know if there’s any workaround (other than changing the compliance because we cannot change that) that would avoid this issue.

Could you please help me check?

Thanks,

@ServerSide527

We have tested the scenario using the latest version of Aspose.Words for .NET 21.3 and have not found the shared issue. So, please use Aspose.Words for .NET 21.3. We have attached the output PDF with this post for your kind reference. 21.3.pdf (140.1 KB)

Hi @tahir.manzoor

I tried 21.3 and the issue still exists. And the same issue also happened in your attachment. Please find the recording below:

record.gif (683.7 KB)

I’m using the latest Acrobat Pro DC:

image.png (3.6 KB)

Could you please help to check.

Thanks,

@ServerSide527

We have logged this problem in our issue tracking system as WORDSNET-21993. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

The issues you have found earlier (filed as WORDSNET-21993) have been fixed in this Aspose.Words for .NET 21.6 update and this Aspose.Words for Java 21.6 update.

Hi,

Thanks for the updates.

I have tried the new version 21.6.0 and found in some of my files the issue is fixed, but in some files the issue still presists.

It looks like the issue can be triggered with there’s an Office chart on the page.

examples.zip (923.6 KB)

Could you help us check further on this?

Thanks,

@ServerSide527

Please use MetafileRenderingOptions.EmfPlusDualRenderingMode property as shown below to get the desired output.

Document doc = new Document(MyDir + "Issues.docx");
PdfSaveOptions so = new PdfSaveOptions();
so.MetafileRenderingOptions.EmfPlusDualRenderingMode = EmfPlusDualRenderingMode.Emf;
doc.Save(MyDir + "21.6.pdf", so);

Hi @tahir.manzoor

Thanks for your suggestions.

I have tested the option and found that even if the pages with charts are now editable, the quality of other EMF images in the PDF has been significantly reduced compared to the original EmfPlusDual option.

Here is an example of an EMF rendering, where you would find there is text spacing issue with ‘EMF’ option. Please also note that even with the previous EmfPlus rendering, the page was editable fine.
Emf.zip (103.6 KB)
image.png (38.5 KB)

Therefore, the flag looks like a workaround rather than an actual fix, and because of the quality loss in the existing working EMFs, we cannot use this approach.

I also noticed that in the Aspose comment to the flag, it mentioned Office would use EmfPlus. Since Office Word ‘save as’ does not create corrupted elements and cause page editing error and Aspose should be mimicking it, is it possible to find an actual fix to this issue without compromising the quality of the whole PDF?

Thanks,

@ServerSide527

We have logged the PDF editable issue as WORDSNET-22375 and WORDSNET-22376 for your documents. You will be notified via this forum thread once these issues are resolved.

We apologize for your inconvenience.

@ServerSide527

Could you please use PdfSaveOptions.OptimizeOutput property as shown below to fix this issue? Please let us know if this solution is acceptable for you.

Document document = new Document(MyDir + "Issues.docx");
PdfSaveOptions options = new PdfSaveOptions();
options.OptimizeOutput = true;
document.Save(MyDir + "output.pdf", options);

Hi @tahir.manzoor

Thanks very much for the suggestion. I have tried the flag and indeed it fixed the issue without changing the quality of EMFs.

I would however be keen to understand more about the usage of the flag. Why was it turned off by default, do we have any known side effects with turning this flag on that we could be made aware of?

Also, when it was not editable with the warning message (without the flag), Acrobat suggested it was because ‘corrupted elements were generated in the PDF’. It doesn’t look normal that when using the default options to generate PDF with Aspose, it creates ‘corrupted elements’. Is there a plan to make it work with the default option, since PDF with any options shouldn’t have anything corrupted?

Thanks,

@ServerSide527

We have logged your concerns in our issue tracking system for issue WORDSNET-22375. As soon as there is any update available on it, we will inform you via this forum thread.

@ServerSide527

OptimizeOutput option makes some vector graphics optimization including the simplification of the metafile graphics (which is the case in your issue). Without OptimizeOutput, Aspose.Words exports vector graphics as it is defined in the metafile with all the intermediate transformations. With OptimizeOutput, Aspose.Words simplifies the graphics by applying the intermediate transformations directly to the graphics where it is possible (thus metafile vector graphics in PDF appears nearly in page coordinates).

In your case metafile defines intermediate transformations so graphics coordinates are larger than Acrobat can handle. So, if you are working with metafiles with large coordinates regularly and also wants to edit the PDF output then it will be safer to use OptimizeOutput option.

Hi @tahir.manzoor

Thanks very much for the explanation. That makes sense.

However, I found it interesting that when I tried to use the flag with the older version of Aspose, and it also worked. According to the comment here there was a fix done to this, which makes some of the documents editable without the flag.

Could we have some more information on what the fix does, and is it possible to apply the same fix to the new examples I provided here since the fix worked on similar EMFs already in the initial post?

Thanks again for your help!

@ServerSide527

If you are working with metafiles with large coordinates regularly and also wants to edit the PDF output then it will be safer to use OptimizeOutput option.

If you do not want to use this property, please let us know. We will fix the shared issues.

Hi @tahir.manzoor

Thanks for the suggestions

We are now trying to communicate the solution with our clients since they also have concerns about unwanted behaviours with this additional flag.

Since the previous flx in 21.6.0 already addressed some of the examples, we are hoping the other examples can be fixed as well, so that if some clients don’t want to use this property or later come back with issues when having the flag on, we can revert the flag but still keep the editing feature for them.

Thanks

@ServerSide527

We have logged your concerns in our issue tracking system. We will inform you once there is any update available on WORDSNET-22375 and WORDSNET-22376.

The issues you have found earlier (filed as WORDSNET-22375,WORDSNET-22376) have been fixed in this Aspose.Words for .NET 21.8 update and this Aspose.Words for Java 21.8 update.