How to detect table, row and cell from the fixed html structure

Hi team,

I’m currently working on converting a Word document to Fixed HTML format, and I have encountered table content within the document. However, during the conversion process to fixed HTML, the structure is represented with div elements instead of the actual table, tr, and td elements.

If it’s not possible to directly obtain the table structure in fixed HTML, I’m seeking an alternative approach to detect and convert the div structure into a table structure. I’m considering using a tool like Html Agility Pack for this purpose.

I would greatly appreciate any solutions or suggestions you may have regarding this matter. Thank you.

Document :
Sample.docx (17.0 KB)

@AlpeshChaudhariDev I am afraid this is an expected behavior. The tables in HtmlFixed format is really represented as a set of DIVs and there is no way to represent them as HTML tables in this format. HtmlFixed format is designed to preserve original document layout for viewing purposes only, unfortunately, it does not preserve the original document structure.
If your goal is to detect which of DIVs represent the table, you can wrap the table into bookmarks before saving document to HtmlFixed and then in HTML use these bookmarks as markers.

1 Like

@alexey.noskov thanks for alternative solution. how can i wrap the table, row and cell with bookmark before saving document to html fixed. So i can identify the table, row and cell with bookmark.

@AlpeshChaudhariDev You can use code like the following:

Document doc = new Document(@"C:\Temp\in.docx");

int tableIndex = 0;
foreach (Table t in doc.GetChildNodes(NodeType.Table, true))
{
    string tableStartBookmarkName = $"_table_start_{tableIndex}";
    string tableEndBookmarkName = $"_table_end_{tableIndex}";
    int rowIndex = 0;
    foreach (Row r in t.Rows)
    {
        string rowStartBookmarkName = $"_row_start_{tableIndex}_{rowIndex}";
        string rowEndBookmarkName = $"_row_end_{tableIndex}_{rowIndex}";
        int cellIndex = 0;
        foreach (Cell c in r.Cells)
        {
            string cellStartBookmarkName = $"_cell_start_{tableIndex}_{rowIndex}_{cellIndex}";
            string cellEndBookmarkName = $"_cell_end_{tableIndex}_{rowIndex}_{cellIndex}";
            cellIndex++;
            // Wrap cell into bookmarks.
            c.FirstParagraph.PrependChild(new BookmarkEnd(doc, cellStartBookmarkName));
            c.FirstParagraph.PrependChild(new BookmarkStart(doc, cellStartBookmarkName));
            c.LastParagraph.AppendChild(new BookmarkStart(doc, cellEndBookmarkName));
            c.LastParagraph.AppendChild(new BookmarkEnd(doc, cellEndBookmarkName));
        }
        rowIndex++;
        // Wrap row into bookmarks.
        r.ParentNode.InsertBefore(new BookmarkStart(doc, rowStartBookmarkName), r);
        r.ParentNode.InsertBefore(new BookmarkEnd(doc, rowStartBookmarkName), r);
        r.ParentNode.InsertAfter(new BookmarkEnd(doc, rowEndBookmarkName), r);
        r.ParentNode.InsertAfter(new BookmarkStart(doc, rowEndBookmarkName), r);
    }
    tableIndex++;
    // Wrap table into bookmarks.
    t.ParentNode.InsertBefore(new BookmarkStart(doc, tableStartBookmarkName), t);
    t.ParentNode.InsertBefore(new BookmarkEnd(doc, tableStartBookmarkName), t);
    t.ParentNode.InsertAfter(new BookmarkEnd(doc, tableEndBookmarkName), t);
    t.ParentNode.InsertAfter(new BookmarkStart(doc, tableEndBookmarkName), t);
}

doc.Save(@"C:\Temp\out.html", new HtmlFixedSaveOptions() { PrettyFormat = true });

But please not, the code is a workaround and does not guaranty 100% result.

1 Like

@alexey.noskov thanks for solution.

1 Like

Hi,
I am inserting a bookmark at the start and end of the Structured Document Tag (SDT) so I can identify it. However, when I insert a content control in the entire table in fixed HTML, the first rows cell’s div is not placed at the top but is inserted after some divs. As a result, my bookmark is not set at the beginning of the SDT. How can I solve this issue?

Snippet :

NodeCollection runs = null;

IEnumerable<Node> sdts = doc.GetChildNodes(NodeType.StructuredDocumentTag, true).Where(x => ((StructuredDocumentTag)x).Tag != null && ((StructuredDocumentTag)x).Tag != "");

foreach (StructuredDocumentTag sdt in sdts)
{
    int stdPageIndex = LayoutCollector.GetStartPageIndex(sdt);

    runs = sdt.GetChildNodes(NodeType.Run, true);
    InsertSdtStartBookMark(ref doc, stdPageIndex, sdt, runs.FirstOrDefault());
    InsertSdtEndBookMark(ref doc, stdPageIndex, sdt, runs.LastOrDefault());

}
doc.UpdatePageLayout();
private void InsertSdtStartBookMark(ref Document doc, int pageIndex, StructuredDocumentTag sdt, Node run)
{
    try
    {
        string sdtStartBookmarkName = $"sdt_s_{pageIndex}_{sdt.Tag}";

        run.ParentNode.InsertBefore(new BookmarkStart(doc, sdtStartBookmarkName), run);
        run.ParentNode.InsertBefore(new BookmarkEnd(doc, sdtStartBookmarkName), run);
    }
    catch (Exception ex)
    {
        Debug.WriteLine(ex.Message);
    }
}
private void InsertSdtEndBookMark(ref Document doc, int pageIndex, StructuredDocumentTag sdt, Node run)
{
    try
    {
        string sdtEndBookmarkName = $"sdt_e_{pageIndex}_{sdt.Tag}";

        run.ParentNode.InsertAfter(new BookmarkEnd(doc, sdtEndBookmarkName), run);
        run.ParentNode.InsertAfter(new BookmarkStart(doc, sdtEndBookmarkName), run);
    }
    catch (Exception ex)
    {
        Debug.WriteLine(ex.Message);
    }
}

Word Sample :
ColspanTable.docx (57.5 KB)

Bookmark start name : sdt_s_1_7f8320c7-5448-4ec3-87a1-a3cfadbc89c3
Bookmark end name : sdt_e_1_7f8320c7-5448-4ec3-87a1-a3cfadbc89c3

Fixed HTML output :
Output.zip (59.8 KB)

Screenshot for HTML fixed structure:

@AlpeshChaudhariDev I am afraid there is no way to control this. Upon conversion document to HtmlFixed format, Aspose.Words builds the document layout, the result of building layout is an absolutely another document model APS (Aspose Page Specification), which represent visual representation of the document, but does not preserve the original document structure. So even if the bookmark wraps SDT in the DOM, there is no guaranty the bookmark will wrap SDT visual representation in APS model and as result in HtmlFixed format.