Merging Sections Appends Suffix to Paragraph Anchor Names when a Word DOC is Saved as HTML File using VB.NET

The piece of code below converts a DOCX document into HTML, then merges sections of a document before converting it into another HTML file (tested with Aspose.Words 19.8.0.0):

    Dim doc As Words.Document = New Words.Document("D:\sample\sample.docx")
    doc.Save("D:\sample\sample_html\sample.html")
    ' For some reasons two first ones won't be merged 
    For i As Integer = doc.Sections.Count - 2 To 0 Step -1
        Dim s As Words.Section = doc.Sections.Item(i)
        doc.LastSection.PrependContent(doc.Sections.Item(i))
        s.Remove()
    Next
    doc.Save("D:\sample\sample_html_merged\sample_merged.html")

After a section merging, first bookmarks have been suffixed (_Toc49465255 → _Toc49465255_0):

  • Before merging:
        <h2 style="[...]">
        	<span style="[...]">1.1</span>
        	<span style="[...]">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>
        	<a name="_Toc49465255">
        		<span style="[...]">Lorem ipsum dolor sit amet, consectetur adipiscing elit</span>
        	</a>
        </h2>
  • After merging:
        <h2 style="[...]">
        	<span style="[...]">1.1</span>
        	<span style="[...]">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>
        	<a name="_Toc49465255_0">
        		<span style="[...]">Lorem ipsum dolor sit amet, consectetur adipiscing elit</span>
        	</a>
        </h2>

Why does this occur?

You’ll find the sample attached below.

Best regards.

sample.zip (44.6 KB)

@monir.aittahar,

We have logged this problem in our issue tracking system. Your ticket number is WORDSNET-21010. We will further look into the details of this problem and will keep you updated on the status of the linked issue. Sorry for the inconvenience.

@monir.aittahar We have completed analyzing the issue and concluded to close it as “Not a Bug”.

Aspose.Words renames bookmarks because at the moment when section’s content is being copied the section itself is not removed from the document. As a result, copied bookmarks duplicate and Aspose.Words has to rename them. In order to work around the issue the section should be removed from the document before its content can be copied:

Document doc = new Document("sample.docx");
for (int i = doc.Sections.Count - 2; i >= 0; i--)
{
    Section s = doc.Sections[i];
    s.Remove();
    doc.LastSection.PrependContent(s);
}
doc.Save("sample_merged.html");

Hello @alexey.noskov,

Thank you for your complete answer.

Regards,
Monir

1 Like

The issues you have found earlier (filed as WORDSNET-21010) have been fixed in this Aspose.Words for .NET 24.3 update also available on NuGet.

1 Like