Hello, we’re investigating using Aspose.Words as part of a robust document reuse platform. Our goal is to maximize the reuse of minimally formatted HTML content, which can be edited online, then downloaded back into Word docs, and reused possibly with the original styling, possibly with other styling. Styling that is important for us to preserve includes bold, underline, italic, bulleted lists, and possibly a few other minor things.
I put together a proof of concept to validate the following:
Aspose.Words can iterate through all content controls in a Word doc, and it can
export the HTML of each -
Our staff can revise the content within a free form text editor
Aspose.Words can be then used to populate a Word doc with content controls, with each content control having that HTML added back into it
My proof of concept has validated that this works at a high level with StructuredDocumentTag. But there are a lot of important unknowns that I haven’t validated, and that might be difficult for me to validate without your support. Can you please help by answering the following questions:
When I programmatically add a StructuredDocumentTag into a Word doc, and then append HTML to it, it seems to work. But, the HTML within the content control is read-only. That is, I cannot edit any part of the HTML content inserted into the content control. I can only either “select all text” within the content control, or “delete all text” within the content control. How can I insert HTML into a content control, so that when the populated Word doc is downloaded and opened by our users, they’re able to edit the content of the strucutred document template?
When I export a StructuredDocumentTag to HTML, the Word styles seem to be converted to HTML - which makes sense obviously since maximum presentation fidelity is ideally the most common goal of exporting to HTML. But if I import this HTML back into Word, and say it originally included a “Header” style, and “Normal” style in the content, is it possible to somehow set the original Word style names for content? Right now the imported HTML content doesn’t have any particular style set for the header or paragraph text when viewing the Word doc. Do you have some sort of HTML export/import options that can allow the re-imported HTML to have its original Word styles reassigned? If not, would this be possible for you to add some sort of enhancement to support this? Perhaps if you were to export HTML with style attributes as an option so that when, upon using your API to import HTML, if those style attributes existed in the HTML, you could then set the appropriate Word document styles, if they happened to be in the given Word document template?
Thanks for your inquiry. I have inserted the html contents into rich text box (StructuredDocumentTag) by using following code snippet and have found the same issue. The content control is read-only in output Docx. Please see the attached document and confirm that you are facing the same issue.
However, I have logged this issue as WORDSNET-8081 in our issue tracking system. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved. We apologize for your inconvenience.
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
Paragraph para = new Paragraph(doc);
Run run = new Run(doc);
run.Text = "Hello World!";
run.Font.Color = System.Drawing.Color.Red;
para.AppendChild(run);
StructuredDocumentTag sdt1 = new StructuredDocumentTag(doc, SdtType.RichText, MarkupLevel.Block);
doc.FirstSection.Body.AppendChild(sdt1);
sdt1.Tag = "sdt1";
sdt1.RemoveAllChildren();
sdt1.AppendChild(para);
builder.MoveTo(sdt1.FirstChild);
builder.Writeln("Test");
builder.InsertHtml(
"<P align='right'>Paragraph right</P>" +
"<b>Implicit paragraph left</b>" +
"<div align='center'>Div center</div>" +
"<h1 align='left'>Heading 1 left.</h1>");
builder.Writeln("Test");
builder.Writeln("Test");
doc.Save(MyDir + "out.docx");
Could you please attach your input Word document, html and code here for testing? I will investigate the issue on my side and provide you more information.
Thank you Tahir. The StructuredDocumentTag issue that you have found is the same issue that I found. Thank you for correcting this - it is critical for us to have this resolved, in order to support our planned content reuse lifecycle. I’ve attached our relevant proof of concept source code, although I don’t think you’ll need it at this point since you have reproduced the issue. Let me know if you need anything else.
Thanks also for the additional reference links. Also let me know if you have any other guidance / suggestions on considerations / best practices that we should keep in mind to accomplish our goal of content item reuse.
Thanks for your feedback. We will update you via this forum thread once this issue is resolved. Please note that MS Word format and HTML formats are quite different so sometimes it’s hard to achieve 100% fidelity.
You have shared three MS Word document. It would be great if you please share some more detail about your documents/query what exact you want to achieve by using Aspose.Words? We will then provide you more information on this along with code.
Hi Tahir, our goal is to maximize the re-use of document content through the use of rich text, plain text, and date content controls.
A large portion of our company’s revenue comes from the thousands of reports that we write annually. These reports are sometimes hundreds of pages long. Some of the content is noteworthy and unique enough to merit storing the content, in order to reuse in future scenarios. The content that we would want to store for reuse would often be one or several paragraphs.
Some future reuse scenarios where we believe Aspose.Words will be a good fit include the following:
Create a server side process that extracts Word doc content from content controls and stores in a data model (HTML and plain text storage, possibly Office XML format as well)
Search for and view extracted document content online (HTML and plain text format)
Selecting several content controls online, possibly editing the content online with an HTML editor first, then downloading the selected edited content into Word
All of the scenarios above should fit well with Aspose.Words.
In addition, we are researching the use of Word 2013 task panes to facilitate the use of Word as the environment to find content from our data store, and import into the given Word document. If we store the Office XML data from the server side process using the Aspose.Words API, then I assume this scenario will likely provide maximum fidelity when importing the content control with the Word JavaScript document object model.
Thanks for sharing the details. You can use rich text, plain text, and date content controls to achieve your requirement. Unfortunately, there are two issue 1) use of Html in SdtType.RichText (the issue reported by you) 2) set date value for SdtType.Date. The issue ID is WORDSNET-7055 for date issue. We will update you via this forum thread once these issue are resolve.
Moreover, please note that Aspose.Words is quite different from the Microsoft Word’s Object Model in that it represents the document as a tree of objects more like an XML DOM tree. If you worked with any XML DOM library you will find it is easy to understand and work with Aspose.Words. When you load a Word document into Aspose.Words, it builds its DOM and all document elements and formatting are simply loaded into memory. Please read the following articles for more information on DOM: https://docs.aspose.com/words/net/aspose-words-document-object-model/ https://docs.aspose.com/words/net/logical-levels-of-nodes-in-a-document/
Hi Tahir, what’s the issue with set date value for SdtType.Date? It seemed to work for me with the following source. Can you provide feedback on the following source at the same time. This seemed to successfully render a date in the date type StructuredDocumentTag. I believe setting StructuredDocumentTag.FullDate also worked in Word, such that when I clicked the content control in Word, the date selector was properly set to the date that I had programmatically set.
The Options type is just a type I created for the proof of concept to validate that setting different date formats would display properly within Word.
StructuredDocumentTag tag = new StructuredDocumentTag(this.DestDoc, SdtType.Date, MarkupLevel.Block);
tag.Title = destSDTMetadata.Title;
tag.Tag = destSDTMetadata.Tag;
Paragraph paragraph = new Paragraph(this.DestDoc);
paragraph.AppendChild(new Run(this.DestDoc));
tag.RemoveAllChildren();
tag.AppendChild(paragraph);
this.DestDoc.LastSection.Body.AppendChild(tag);
if (!this.Options.UseWordDocumentDefaultDateFormat)
{
tag.DateDisplayFormat = this.Options.WordDocumentSpecificDateFormat;
}
tag.DateStorageFormat = SdtDateStorageFormat.Date;
DateTime ? date = destSDTMetadata.DateValue;
string dateDisplayFormat = this.Options.UseWordDocumentDefaultDateFormat ? tag.DateDisplayFormat: this.Options.WordDocumentSpecificDateFormat;
((Run) paragraph.FirstChild).Text = date == null ? "" : date.Value.ToString(dateDisplayFormat);
if (date != null)
tag.FullDate = date.Value;
Thanks for your inquiry. Please execute the following code snippet with attached document and check the output document. The date content control will not be set with new date value. This issue ID is WORDSNET-7055.
Document doc = new Document(MyDir + "date.docx");
NodeCollection sdtNodes = doc.GetChildNodes(NodeType.StructuredDocumentTag, true);
foreach(StructuredDocumentTag sdt in sdtNodes)
{
if (sdt.SdtType == SdtType.Date)
{
// FullDate do not work. can not set value
sdt.FullDate = DateTime.Now.Date;
sdt.DateDisplayFormat = "MM/dd/yyyy";
sdt.DateStorageFormat = SdtDateStorageFormat.Date;
}
}
doc.Save(MyDir + "AsposeOut.docx");
Thanks for your patience. It is to update you that our development team has finished working on your issue WORDSNET-8081 and has come to a conclusion that this not a bug.
Please use the StructuredDocumentTag.IsShowingPlaceholderText property to solve your issue. This property specifies whether the content of this SDT shall be interpreted to contain placeholder text (as opposed to regular text contents within the SDT). if set to true, this state shall be resumed (showing placeholder text) upon opening this document.
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
Paragraph para = new Paragraph(doc);
Run run = new Run(doc);
run.Text = "Hello World!";
run.Font.Color = System.Drawing.Color.Red;
para.AppendChild(run);
StructuredDocumentTag sdt1 = new StructuredDocumentTag(doc, SdtType.RichText, MarkupLevel.Block);
doc.FirstSection.Body.AppendChild(sdt1);
sdt1.IsShowingPlaceholderText = false;
sdt1.Tag = "sdt1";
sdt1.RemoveAllChildren();
sdt1.AppendChild(para);
builder.MoveTo(sdt1.FirstChild);
builder.Writeln("Test");
builder.InsertHtml(
"<P align='right'>Paragraph right</P>" +
"<b>Implicit paragraph left</b>" +
"<div align='center'>Div center</div>" +
"<h1 align='left'>Heading 1 left.</h1>");
builder.Writeln("Test");
builder.Writeln("Test");
doc.Save(MyDir + "out.docx");
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.