Read HTML Content from StructuredDocumentTag

Hi,

Is there any way to read the HTML content from StructuredDocumentTags elements in aspose.words ?

Regards,
Srini

Hi
Srini,

Thanks for your inquiry. The actions that can be performed with StructuredDocumentTags using Aspose.Words are the same as the ones you can perform using Microsoft Word. Could you please attach your input Word document, that contains the SDT controls having HTML content inside you want to extract from, here for testing? It would be great if you provide a screen shot that highlights the HTML content you want to extract from SDT. I will investigate the structure of your document and provide you code snippet.

Best Regards,

Hi,

Thanks for you replay.

Please find the attached document which contains multiple content controls.

My requirement is read the HTML content from each content control from the document and save into database.

Regards,
Srini

Hi Srini,

Thanks for the additional information. I believe, you can achieve this by using the following code:

Document doc = new Document(@"C:\test\Sample+Document.docx");
ArrayList htmlStrings = new ArrayList();
foreach(StructuredDocumentTag srcSdt in doc.GetChildNodes(NodeType.StructuredDocumentTag, true))
{
    Document tempDoc = new Document();
    Node dstSdt = tempDoc.ImportNode(srcSdt, true, ImportFormatMode.KeepSourceFormatting);
    tempDoc.FirstSection.Body.AppendChild(dstSdt);
    using(MemoryStream htmlStream = new MemoryStream())
    {
        tempDoc.Save(htmlStream, SaveFormat.Html);
        string html = Encoding.UTF8.GetString(htmlStream.GetBuffer(), 0, (int) htmlStream.Length);
        htmlStrings.Add(html);
    }
}
// Send the html representation of SDT controls to DB

I hope, this will help.

Best Regards,

Thanks hafeez.

Its solved my problem.

Hi,
That solution is working fine. I have another issue. I have list styles in the HTML content and i want to remove the list styles.
Ex need to remove 1. from below

  1. Content1

Thanks.
Seenu 115.

Hi Seenu,

Thanks for your inquiry. Please try using the following code snippet to be able to remove list numbers:

foreach(Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
    para.ListFormat.RemoveNumbers();

Best Regards,

Hi,

Thanks for the code. It’s working fine. I have another issue. I want to make the content in StructuredDocumentTag editable in the document. We are currently using the below code. Do you have any code in Aspose to do the same. With the below code the text inside the StructuredDocumentTag gets hilighted. We would be happy if it is not so but even with the hilighted text is ok. I have also attached a sample document in which you can see that document.

using MSW = Microsoft.Office.Interop.Word;

MSW._Application newApp = null;
MSW.Document sDoc = null;
object Unknown = Type.Missing;
object Password = Type.Missing;
object noReset = true;
object objFalse = false;
object password = "Test";

object FilePath = "@D:\01.docx";

try
{
    newApp = new MSW.Application();

    sDoc = new MSW.Document();

    sDoc = newApp.Documents.Open(ref FilePath, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown,
        ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown,
        ref Unknown, ref Unknown);
    sDoc.Protect(MSW.WdProtectionType.wdAllowOnlyFormFields, ref noReset, ref password, ref objFalse, ref objFalse);

    foreach (MSW.ContentControl ctrl in sDoc.ContentControls)
    {
        if (!ctrl.LockContents)
        {
            object editorEveryOne = MSW.WdEditorType.wdEditorEveryone;
            ctrl.Range.Editors.Add(ref editorEveryOne);
        }
    }
    sDoc.SaveAs(ref FilePath, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown);

    sDoc.Close(ref noReset, ref Unknown, ref Unknown);
}
catch (Exception ex)
{
    if (!ReferenceEquals(sDoc, null))
        sDoc.Close(ref noReset, ref Unknown, ref Unknown);
    throw ex;
}
finally
{
    newApp.Quit(ref noReset, ref Unknown, ref Unknown);
}

Thanks,
Seenu115

Hi
Seenu,

Thanks for your inquiry. Well, normally you can set StructuredDocumentTag.LockContents property to false to allow a user to edit the contents of this SDT. However, this property doesn’t seem to be working in your case. The SDT controls becomes non-editable no matter what value (true/false) is specified to this property.

I have logged this issue in our bug tracking system. The issue ID is WORDSNET-6905. Your request has been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.

Best Regards,

The issues you have found earlier (filed as WORDSNET-6905) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.