Exporting to PDF will remove styling on TOC

Hi,

We are using Aspose.Words 19.3.0 to convert Word formats to PDF through the following code:

        var doc = new Document(input);
        doc.AcceptAllRevisions();
        
        doc.UpdateFields();
        doc.UpdatePageLayout();

        var saveFormat = new PdfSaveOptions
        {
            OptimizeOutput = true,
            ImageCompression = PdfImageCompression.Jpeg,
            
            PreserveFormFields = true,

            UpdateFields = true,
            SaveFormat = SaveFormat.Pdf
        };
        
        doc.Save(output, saveFormat);

The generated PDF no longer has the styling of the original Word document.

I have attached both the input.docx and the ouput.pdf here: docs.zip (58.8 KB)

When poking with the code we saw that removing the doc.UpdateFields() line solves this issue but unfortunately we need that line to make sure that the page #s in TOC are always up to date.

Thank you!

@gwert,

In this case, the Document.UpdateFields method causes this unexpected styling of TOC. But, Aspose.Words mimics the behavior of MS Word 2019. To verify the correctness of Aspose.Words’ behavior, please open ‘input.docx’ with MS Word on your end. Move cursor to inside TOC field. Right click and choose Update Fields. Then Update Entire Table and press OK. And then Save As to PDF (see msw-2019.pdf (301.6 KB)). So, this seems to be an expected behavior. If we can help you with anything else, please feel free to ask.

Hi,

Thank you for getting back so quickly on this one.

Indeed, triggering Update Field > Update Entire Table will remove the styling. However, Update Field > Updating page numbers only does not remove the styling.

Is there any way to trigger an update on the page numbers only?

Thank you!

@gwert,

Please try using the following code:

Document doc = new Document("E:\\docs\\input.docx");

foreach (Field field in doc.Range.Fields)
{
    if (field.Type == FieldType.FieldTOC)
    {
        FieldToc toc = (FieldToc)field;
        toc.UpdatePageNumbers();
        toc.IsLocked = true;
    }
}

doc.UpdateFields();

doc.Save("E:\\docs\\19.4.pdf"); 

Hope, this helps.

Hi,

Thank you for that piece of code. It will just update the page numbers in TOC.

However, to get back to the original question, is there any way we could achieve the following?

For each TOC field:

  1. store its current styling - ?
  2. execute Update() on it (the one that loses styling)
  3. apply the styling back - ?
  4. lock it

and then UpdateFields() on the rest of fields in the document.

Can you help with some code here?

Thank you!

@gwert,

I think, the code mentioned in my previous post (UpdatePageNumbers related) should be fine to achieve what you are looking for, but if you insist on using Update(), the only draft workaround we may suggest is something like that:

if (field.Type == FieldType.FieldTOC)
{
    var separatorParent = (Paragraph)field.Separator.ParentNode;
    var endParent = (Paragraph)field.End.ParentNode;

    var stylings = new List<Styling>();
    for (var para = separatorParent; para != endParent; para = para.NextSibling)
        stylings.Add(GetStyling(para))


    field.Update();

    separatorParent = (Paragraph)field.Separator.ParentNode;
    endParent = (Paragraph)field.End.ParentNode;

    for (var para = separatorParent, var i = 0; para != endParent && i < stylings.Count; para = para.NextSibling, i++)
ApplyStyling(para, stylings[i]);

    field.IsLocked = true;
}

private static Styling GetStyling(Paragraph para)
{
    return ...;
}

private static void ApplyStyling(Paragraph para, Styling styling)
{
    ...
}

Hi,

Thank you for going along with the idea of preserving the styling.

I gave it a try using this code:

        foreach (Field field in doc.Range.Fields)
        {
            if (field.Type != FieldType.FieldTOC)
                continue;
            
            var separatorParent = (Paragraph)field.Separator.ParentNode;
            var endParent = (Paragraph)field.End.ParentNode;

            var styles = new List<Style>();
            for (var para = separatorParent; para != endParent; para = (Paragraph) para.NextSibling)
            {
                styles.Add(para.ParagraphFormat.Style);
            }

            field.Update();

            separatorParent = (Paragraph)field.Separator.ParentNode;
            endParent = (Paragraph)field.End.ParentNode;

            var i = 0;
            for (var para = separatorParent; para != endParent; para = (Paragraph) para.NextSibling)
            {
                para.ParagraphFormat.Style = styles[i++];                        
            }

            field.IsLocked = true;
        }

BUT the styling is getting lost. Is there a better approach for getting/setting the style there?

More, I poked a bit longer with the paragraphs inside the field and I discovered is that I need to preserve is para.Runs[x].Font. That one contains the font color of the actual texts inside the TOC fields.
But para.Runs[x].Font does not have a setter… Can you please guide me here with an approach?

Thank you!

@gwert,

You can use Font.Bold, Font.Color properties etc to set new values. Another approach you may use is to manually clone and insert each TOC entry in document. Please check the following code:

Document doc = new Document("E:\\Temp\\Docs\\input.docx");

// Store each Paragraph of TOC in list
ArrayList list = new ArrayList();
foreach (Field field in doc.Range.Fields)
{
    if (field.Type.Equals(Aspose.Words.Fields.FieldType.FieldPageRef))
    {
        FieldPageRef pageRef = (FieldPageRef)field;
        if (pageRef.BookmarkName != null && pageRef.BookmarkName.StartsWith("_Toc"))
        {
            Paragraph tocItemClone = (Paragraph)(field.Start.GetAncestor(NodeType.Paragraph)).Clone(true);
            list.Add(tocItemClone);
        }
    }
}

// Get the Paragraph containing the TOC field 
Paragraph refPara = null;
foreach (Field field in doc.Range.Fields)
{
    if (field.Type == FieldType.FieldTOC)
    {
        FieldToc toc = (FieldToc)field;
        refPara = (Paragraph)toc.Start.GetAncestor(NodeType.Paragraph);
        toc.Remove();
    }
}

doc.UpdateFields();

// Manually insert all TOC paragraphs
if (refPara != null)
{
    foreach(Paragraph para in list)
    {
        refPara.ParentNode.InsertAfter(para, refPara);
        refPara = para;
    }
}

doc.Save("E:\\Temp\\Docs\\19.4.docx");

Hope, this helps.

Hi,

I think this approach might get us where we want. I really love the way you go with this workaround.

As I mentioned, we need to store font settings and re-apply them because cloning the TOC entries will clone the wrong text that really needs Update in order to update.
See the modified input that has a wrong text in the TOC (because it was not updated) and the output that did not update the text although (this is a great win I think) the font styling is there.
20180423-TOC-update.zip (61.5 KB)

Can we go with this approach of cloning/saving just the styling and re-apply it?

Thank you!

@gwert,

Please see these simple input/output Word documents (SimpleDocs.zip (19.7 KB)) and try running the following code:

Document doc = new Document("E:\\Temp\\input.docx");

NodeCollection runs = doc.GetChildNodes(NodeType.Run, true);
Run first = (Run)runs[0];
Run second = (Run)runs[1];

CopyFormatting(first.Font, second.Font);

doc.Save("E:\\Temp\\19.4.docx");

public static void CopyFormatting(Object source, Object dest)
{
    if (source.GetType() != dest.GetType())
        throw new ArgumentException("All objects must be of the same type");

    // Iterate through each property in the source object.
    foreach (PropertyInfo prop in source.GetType().GetProperties())
    {
        // Skip indexed access items. Skip setting the internals of a style as these should not be changed.
        if (prop.Name == "Item" || prop.Name == "Style")
            continue;

        object value;

        // Wrap this call as it can throw an exception. Skip if thrown
        try
        {
            value = prop.GetValue(source, null);
        }
        catch (Exception)
        {
            continue;
        }

        // Skip if value can not be retrieved.
        if (value != null)
        {
            // If this property returns a class which belongs to the
            if (value.GetType().IsClass && prop.GetGetMethod().ReturnType.Assembly.ManifestModule.Name == "Aspose.Words.dll")
            {
                // Recurse into this class.
                CopyFormatting(prop.GetValue(source, null), prop.GetValue(dest, null));
            }
            else if (prop.CanWrite)
            {
                prop.SetValue(dest, prop.GetValue(source, null), null);
            }
        }
    }
}

It takes the formatting of source object (e.g Font class objects) and applies to the destination object. Existing values of destination object will be overridden. I think, you can build logic on this code to achieve what you are looking for.