Saving to Pdf - contains nodes I removed but Saving to doc doesn't

I have a process where I make a Document along with a DocumentBuilder and generates a document. The process inserts data into some Word fields and removes some content (bookmarks/nodes) after processing. Finally, does a document.Save( stream, saveFormat ). If I save as word, all my ‘processing’ works (i.e. content I’ve removed, is still gone and all looks good), but if I instead save as Pdf (which is what I need) it’s as though the document puts the content back into the document.

I’ve attached a sample project that reproduces our problem. The attached is simplified code (and hard coded certain sections) as much as I could. In general, this is what ‘real code’ does following:

  1. Open document and has a ‘data source’ for different data elements that’ll be inserted into document

  2. Loop all sections

  3. Loop all section.HeadersFooters.Range and insert ‘data’

  4. Loop all section.Body.Range and insert ‘data’

  5. Delete all bookmarks (and their content) flagged to delete

  6. Save the document

To get around the problem, Aspose support (Awais Hafeez) suggested that before saving to Pdf I call document.UpdatePageLayout() but they said that should only be needed if I ‘saved’ the document once already which I hadn’t.

In creating the sample, I’ve determined that the ‘insert data’ part of my workflow that seems to be causing the problem. You’ll see in code, but essentially to ‘swap’ in the passed in data into Word field I essentially do this to a field:

f.Result = "Some value";
f.IsLocked = true;

I’m not sure why IsLocked needs to be set to true, but if I don’t set that, the bookmark processing works in the Pdf save (without the update layout call), but the data substitution doesn’t. If I leave IsLocked = true, data substitution works in Doc and Pdf, but Save only works in Doc while Pdf doesn’t.

In the sample, address2 is only bookmark I work on removing. So that you can find it easily in your document explorer, it is the paragraph found at doc.Sections[0].Body.ChildNodes[9]. If you open the document in Word, it is found immediately at the top as part of the ‘return address’.

So two questions:

a) Is my code that substitutes data corrent? (need field.IsLocked = true; and range.UpdateFields(); call?

b) Is it expected that setting field.IsLocked some how triggers Pdf export to require a call to UpdatePageLayout()?

Thanks in advance.

Hi Terry,

Thanks for your inquiry. We are working over your query and will get back to you soon.

Best regards,

Any luck or did we stumble on to a bug and you are working on a fix?

Hi Terry,

Thanks for being patient. I believe that calling Document.UpdatePageLayout method before saving to PDF fixes issues. If you don’t want to use this method, the other way around is to save Document instance to memory (in DOCX format) and load it in another Document instance and save new Document to PDF.

MemoryStream stream = new MemoryStream();
doc.Save(stream, SaveFormat.Docx);
stream.Position = 0;
Document newDoc = new Document(stream);
newDoc.Save(@"D:\temp\15.6.0.pdf", SaveFormat.Pdf);

Regarding Field.IsLocked property, please note that Aspose.Words updates some fields during conversion to PDF according to MS Word behaviour and if you set it to true, this ensures that Aspose.Words will not update those fields during saving to PDF.

PS: To manipulate docvariable field, you may please use newly introduced FieldDocVariable class.

Best regards,

I was just told by support that UpdatePageLayout was an expensive operation and it shouldn’t be needed unless I called ‘save’ already on the document then manipulated it. Since I am not manipulating after calling Save, I was hoping I wouldn’t have to call that or first save, reopen, save the document simply to get it to Pdf. Is there something else that is triggering Aspose.Words to think it needs an UpdatePageLayout?

Hi Terry,

Thanks for your inquiry. Yes, some Aspose.Words operations can internally trigger UpdatePageLayout method. For example, if your document has PAGE, NUMPAGES fields and you call UpdateFields, in this case UpdateFields method will internally call UpdatePageLayout method. And hence if you modify the document after UpdateFields call, you’ll have to call UpdatePageLayout method before saving to PDF. The following example illustrates this special case:

Document doc = new Document(MyDir + @"in.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
builder.Writeln("before updatefields call");
doc.UpdateFields();
builder.Writeln("after updatefields call");
// doc.UpdatePageLayout();
doc.Save(MyDir + @"15.6.0.docx");
doc.Save(MyDir + @"15.6.0.pdf");

I hope, this helps.

PS: Sample in.docx is attached. Try producing output DOCX/PDF with and without method doc.UpdatePageLayout(); call to see the difference.

Best regards,

Thanks for the info. I see why your sample needs it, as you call builder.WriteLn(), but I was told simply removing nodes wouldn’t trigger it. Was that an incorrect statement? Basically, all my ‘data injection’ into the document was done, and I called section.Body.Range.UpdateFields(), then I ‘remove nodes’ if needed (basically removing flagged ‘bookmarks’ and all content between them, then I call Save().

As for the new FieldDocVariable class, is there a sample somewhere showing how that would be of benefit to me over the standard Field manipulation?

Thanks in advance,
Terry

Hi Terry,

Thanks for your inquiry. In this case, addition or removal of nodes after UpdateFields method won’t have any effect. Regarding FieldDocVariable class, we have mapped all MS Word fields to specific classes. So, it is better to use those classes. Please refer to the Aspose.Words.Fields Namespace below:
https://reference.aspose.com/words/net/aspose.words.fields/

Best regards,