Aspose Words html conversion persist SDT metadata

Hello,

We are trying to convert a docx document to html using fixedHtmlOptions. We have a requirement where a content control (rich text in our case, with a relevant title and tag) will be converted to the respective markup in the HTML. We need some way to maintain information ( the title and tag of the content control) after the conversion is done on the respective piece of text (paragraph, inline, cell etc). Something like custom html attributes on the html nodes that surround the piece of text.

Something like this.

<span customAttributeTag=“TagContent” customAttributeTitle=“TitleConent” --other attributes --> text content here

@AMELKIS,
Could you please ZIP and attach your input and, if possible, expected output documents? We will then provide you more information according to your requirement.

testdata.zip (300.2 KB)

my save options are as follows

HtmlFixedSaveOptions options = new HtmlFixedSaveOptions
{
    //     CssClassNamesPrefix = "AmelkisPrefix1_",
    ExportFormFields = false,
    FontFormat = ExportFontFormat.Ttf,
    ExportEmbeddedCss = true,
    ExportEmbeddedFonts = true,
    ExportEmbeddedImages = true,
    ExportEmbeddedSvg = true,
    UseHighQualityRendering = true,
    //AllowEmbeddingPostScriptFonts = true,
    ExportGeneratorName = false,
    UpdateSdtContent=true,
};

Basically, the 2 docx documents are converted to html and then merged with html agility pack. IN one of the documents there are 4 Content controls (rich texts). After the conversion, the text contained in the content controls SHOULD be identifiable via some html attributes that belonged to the SDT tag (tag and title, we need both).

@AMELKIS,
Unfortunately, custom styles, needed to mark content control document elements, currently are not supported when converting a DOCX to HTML using HtmlFixedSaveOptions. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET - 23356. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Is there any approach that you can recommend, maybe using normal HtmlSaveOption s while being able to keep the aspect of the resulting html similar to a docx page?

@AMELKIS You can mark your structured document tags with bookmarks. For example see the following code:

Document doc = new Document(@"C:\Temp\in.docx");
DocumentBuilder builder = new DocumentBuilder(doc);

NodeCollection sdts = doc.GetChildNodes(NodeType.StructuredDocumentTag, true);
Console.WriteLine(sdts.Count);
foreach (StructuredDocumentTag sdt in sdts)
{
    string bookmarkStartName = string.Format("sdt_start_{0}_{1}", sdt.Id, sdt.Tag);
    string bookmarkEndName = string.Format("sdt_end_{0}_{1}", sdt.Id, sdt.Tag);

    switch (sdt.Level)
    {
        case MarkupLevel.Block:
            builder.MoveTo(sdt.FirstChild);
            builder.MoveTo(builder.CurrentParagraph.FirstChild);
            builder.StartBookmark(bookmarkStartName);
            builder.EndBookmark(bookmarkStartName);
            builder.MoveTo(sdt.LastChild);
            builder.StartBookmark(bookmarkEndName);
            builder.EndBookmark(bookmarkEndName);
            break;
        case MarkupLevel.Cell:
            Cell cell = (Cell)sdt.FirstChild;
            builder.MoveTo(cell.FirstChild);
            builder.MoveTo(builder.CurrentParagraph.FirstChild);
            builder.StartBookmark(bookmarkStartName);
            builder.EndBookmark(bookmarkStartName);
            builder.MoveTo(cell.LastChild);
            builder.StartBookmark(bookmarkEndName);
            builder.EndBookmark(bookmarkEndName);
            break;
        case MarkupLevel.Row:
            Row row = (Row)sdt.FirstChild;
            builder.MoveTo(row.FirstCell.FirstChild);
            builder.MoveTo(builder.CurrentParagraph.FirstChild);
            builder.StartBookmark(bookmarkStartName);
            builder.EndBookmark(bookmarkStartName);
            builder.MoveTo(row.LastCell.LastChild);
            builder.StartBookmark(bookmarkEndName);
            builder.EndBookmark(bookmarkEndName);
            break;
        case MarkupLevel.Inline:
            builder.MoveTo(sdt.FirstChild);
            builder.StartBookmark(bookmarkStartName);
            builder.EndBookmark(bookmarkStartName);
            builder.MoveTo(sdt.LastChild);
            builder.StartBookmark(bookmarkEndName);
            builder.EndBookmark(bookmarkEndName);
            break;
        default:
            break;
    }
}

HtmlFixedSaveOptions opt = new HtmlFixedSaveOptions();

doc.Save(@"C:\Temp\out.html", opt);

In this case SDT content will be surrounded with bookmarks, which can be located and replaced by required tags:

<a name="sdt_start_828018482_tag2" style="left:37.74pt; top:2.35pt;">
</a>
SDT Content is here
<a name="sdt_end_828018482_tag2" style="left:59.15pt; top:2.35pt;">
</a>

Dear @AMELKIS
Did you find any solution for this?

@Coder365 The issue reported in this thread has been postponed and is not currently scheduled for development. At the moment, you can use the workaround suggested above.