Hi,
Im parsing word document based on it
s TOC and retrieveing all data from certain chapters. As a result I get the list of List - all nodes inside certain chapter. After Im looping through this list and calling ToString(SaveFormat.Html). for each node. This works ok for simple paragraphs (which are almost 95% of document) but in case I have a table node than I get null reference exception. Calling ToString(SaveFormat.Text) works fine but I would like to retrive html markup with all styles, not a plain text. When I
m calling ToString(SaveFormat.Html) on whole doument it works fine and return html markup including markup for tables.
Will appreciate any help. Thanks.
Hi Pavel,
Aspose.Words.Document(@“C:\Temp\test.docx”);
Table tab = doc.FirstSection.Body.Tables[0];
string html = tab.ToString(SaveFormat.Html);
Hi Awais,
Thanks for you reply. I`m using the latest version of aspose.words.
I believe that the reason of this exception is in the code that is extracting data based on TOC of document. Actually it is slightly modified code taken from here.
In attachment you can find a small app, using it you can reproduce this issue.
Thanks,
Pavel Pavlov
Hi Pavel,
var doc = new Document(MyDir + "test.docx");
var docBuilder = new DocumentBuilder(doc);
docBuilder.MoveToDocumentEnd();
var dummyEndDocNode = docBuilder.InsertParagraph();
docBuilder.Write("ENDOFDOC");
ArrayList extractedNodes = ExtractContent(doc.Range.Bookmarks["_Toc380074442"].BookmarkStart, doc.FirstSection.Body.LastParagraph, false);
Table table = (Table)extractedNodes[1];
string html = table.ToString(SaveFormat.Html);
Thanks. Will wait for this bug fixes.
Also I`ve 2 more questions.
1. Using ExractContent method we are getting all content between to specified nodes. There is a paragraph.Clone() method used. The problem is that after we clone a paragraph the ListLabel values get reseted in the new cloned object.
For example in case paragraph is in ordered list in 4th position it has LabelString = ‘4.’ and LabelValue = 4. After we clone the paragraph the newly created clone has “” and 0 correspondingly. And there is no way to set those values equal to original because those properies do not have setters. Is there any way to have this info in cloned object because later when I call
ToString (SaveFormat.Html) for each list memeber i get a list with zeros but i want 1,2,3,4, etc…
2. In case paragraph contains an image inside how can I get from it an html string with base64 image? I tried something like
var saveOptions = new HtmlSaveOptions(SaveFormat.Html)
{
ExportImagesAsBase64 = true
};
paragraph.ToString(saveOptions);
but got only empty tag.
Well, it seems that i managed to solve the second issue. Before calling Extract content method i just get all nodes of type DrawingML and call ToString(SaveFormat.Html) for all nodes. The type of nodes changes to Shape and they successfully pass the Clone method keeping all original image data.
But the first issue is still open. Is there any workaround to keep original ListLabe values in cloned paragraph?
Thanks
pavel.pavlov:
1. Using ExractContent method we are getting all content between to specified nodes. There is a paragraph.Clone() method used. The problem is that after we clone a paragraph the ListLabel values get reseted in the new cloned object.For example in case paragraph is in ordered list in 4th position it has LabelString = '4.' and LabelValue = 4. After we clone the paragraph the newly created clone has "" and 0 correspondingly. And there is no way to set those values equal to original because those properies do not have setters. Is there any way to have this info in cloned object because later when I callToString (SaveFormat.Html) for each list memeber i get a list with zeros but i want 1,2,3,4, etc...
- Please attach your input Word document.
- Please create a standalone/runnable simple application (for example a Console Application Project) that demonstrates the code (Aspose.Words code) you used to generate your output document
- Please attach the output Word file that shows the undesired behavior.
- Please attach your target Word document showing the desired behavior. You can use Microsoft Word to create your target Word document. I will investigate as to how you are expecting your final document be generated like.
pavel.pavlov:
2. In case paragraph contains an image inside how can I get from it an html string with base64 image? I tried something like
Document doc = new Document(MyDir + "in.docx");
// Extract the last paragraph in the document to convert to HTML.
Node node = doc.LastSection.Body.LastParagraph;
var saveOptions = new HtmlSaveOptions(SaveFormat.Html)
{
ExportImagesAsBase64 = true
};
string nodeAsHtml = node.ToString(saveOptions);
I`ve attached the sample console app and word file.
In undesired behaviour you may see that all nodes has ‘0’ list number.
And expected behaviour is to get the same 1,2,3 list as in doc.
var doc = new Document(@"C:\temp\Test.docx");
doc.UpdateListLabels();
var paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);
Console.WriteLine("Desired output:");
Console.WriteLine("");
foreach (var paragraph in paragraphs)
{
if (paragraph is Paragraph && ((Paragraph)paragraph).IsListItem)
{
Console.WriteLine(((Paragraph)paragraph).ToString(SaveFormat.Html));
string html = ((Paragraph)paragraph).ToString(SaveFormat.Html);
}
}
Console.WriteLine("");
Console.WriteLine("Undesired behaviour:");
Console.WriteLine("");
ArrayList extractedNodes = ExtractContent(doc.FirstSection.Body.FirstParagraph, doc.FirstSection.Body.LastParagraph, true);
Document doc2 = GenerateDocument(doc, extractedNodes);
var paragraphs2 = doc2.GetChildNodes(NodeType.Paragraph, true);
foreach (var paragraph in paragraphs2)
{
if (paragraph is Paragraph && ((Paragraph)paragraph).IsListItem)
{
Console.WriteLine(((Paragraph)paragraph).ToString(SaveFormat.Html));
string html = ((Paragraph)paragraph).ToString(SaveFormat.Html);
}
}
I suppose you are missing the implemenation of GenerateDocument(doc, extractedNodes);
Hi Pavel,
public static Document GenerateDocument(Document srcDoc, ArrayList nodes)
{
// Create a blank document.
Document dstDoc = new Document();
// Remove the first paragraph from the empty document.
dstDoc.FirstSection.Body.RemoveAllChildren();
// Import each node from the list into the new document. Keep the original formatting of the node.
NodeImporter importer = new NodeImporter(srcDoc, dstDoc, ImportFormatMode.KeepSourceFormatting);
foreach (Node node in nodes)
{
Node importNode = importer.ImportNode(node, true);
dstDoc.FirstSection.Body.AppendChild(importNode);
}
// Return the generated document.
return dstDoc;
}
Thanks, exactly what I need.
Yet, I have one more question:
Is it possible to save Paragraph in HTML without any font-settings?
For example when I call ToString(SaveFormat.Html) for single paragraph from doc i get
<span style=“font-family:Calibri; font-size:11pt”>Test text</span>
and desired is
<span>Test text</span>
But I need only excluding font-related styles (Font-name, font-size, font-color).
Or the only way is post process result html with regexp and exclude this styles?
The issues you have found earlier (filed as WORDSNET-9691) have been fixed in this .NET update and this Java update.
This message was posted using Notification2Forum from Downloads module by aspose.notifier.