Dear Team,
I am currently working on the conversion of a Word document to HTML. The Word document contains several content controls within a table, and my primary objective is to identify Structured Document Tag (SDT) elements in the HTML output of these content controls.
In the HTML output, the content control value is enclosed within a div element that is styled as follows: “-aw-sdt-tag:‘5459d4b1-8fb7-4b5c-b911-f8ce5ab9c139’” where ‘5459d4b1-8fb7-4b5c-b911-f8ce5ab9c139’ represents the ID of the content control.
I have been successful in correctly capturing SDT elements at the block level. However, I am encountering challenges in extracting SDT elements at the cell level in the HTML output. I kindly request your guidance and any suggested solutions to help resolve this issue.
Word Source Document :
Latest.docx (47.3 KB)
HTML Output :
HtmlLatest.zip (6.8 KB)
@AlpeshChaudhariDev Unfortunately, Aspose.Words currently does not support exporting row and cell level SDTs upon exporting document to HTML. This feature request is logged as WORDSNET-18209. We will keep you informed and let you know once this feature is supported.
Thanks @alexey.noskov for your quick response,
Is there an alternative method to export an SDT document to HTML?
@AlpeshChaudhariDev You can try marking the cell with a bookmark. For example see the following code:
Document doc = new Document(@"C:\Temp\in.docx");
// Get cell level SDTs
List<StructuredDocumentTag> cellSDTs = doc.GetChildNodes(NodeType.StructuredDocumentTag, true)
.Cast<StructuredDocumentTag>().Where(sdt => sdt.Level == MarkupLevel.Cell).ToList();
// Wrap content of cell SDTs into bookmark.
foreach (StructuredDocumentTag sdt in cellSDTs)
{
Cell cell = (Cell)sdt.FirstChild;
string bkName = $"sdt_{sdt.Id}";
cell.PrependChild(new BookmarkStart(doc, bkName));
cell.AppendChild(new BookmarkEnd(doc, bkName));
}
doc.Save(@"C:\temp\out.html", new HtmlSaveOptions() { PrettyFormat = true });
Thank you @alexey.noskov for the alternative solution, but my entire functionality depends on SDT. Is it possible to convert these rows and cell-level SDT into body (regular SDT) level using any logic?
@AlpeshChaudhariDev For cell level SDTs you can achieve this using the following code:
Document doc = new Document(@"C:\Temp\in.docx");
// Get cell level SDTs
List<StructuredDocumentTag> cellSDTs = doc.GetChildNodes(NodeType.StructuredDocumentTag, true)
.Cast<StructuredDocumentTag>().Where(sdt => sdt.Level == MarkupLevel.Cell).ToList();
// Wrap contnet of cell SDTs into bookmark.
foreach (StructuredDocumentTag sdt in cellSDTs)
{
Cell cell = (Cell)sdt.FirstChild;
StructuredDocumentTag blockSdt = new StructuredDocumentTag(doc, SdtType.RichText, MarkupLevel.Block);
while (cell.HasChildNodes)
blockSdt.AppendChild(cell.FirstChild);
cell.AppendChild(blockSdt);
}
doc.Save(@"C:\Temp\out.html", new HtmlSaveOptions() { PrettyFormat = true });
1 Like
thanks @alexey.noskov for alternate solution.
1 Like
I obtained this HTML output through the provided solution, which includes unwanted data displaying ‘Click or tap here to enter text.’ How can I address this issue?
TestTemp.zip (8.6 KB)
@AlpeshChaudhariDev You should remove children from the newly created SDT. Please modify your code like this:
StructuredDocumentTag blockSdt = new StructuredDocumentTag(doc, SdtType.RichText, MarkupLevel.Block);
blockSdt.RemoveAllChildren();