Hi,
I have a word document that gets generated on the fly. I need to open this file in my program and after doing some modification in the text save it back again. I noticed that I lose all the font information and everything gets saved with a default font. Just opening and saving the document in Aspose.Word is sufficient to remove all the font info. Please let me know how I can accomplish my task without losing font info.
Thanks.
Hi,
Please see my original post above. I am pasting the code that currently is in production and works with MS Word. I am trying to adapt the code to using Aspose Word.
' Open Word document and reformat text in table
Set objDoc = objWord.Documents.Open(filename:=fPathCompare)
' Set in Page Layout mode
If objWord.ActiveDocument.ActiveWindow.View.SplitSpecial = wdPaneNone Then
objWord.ActiveDocument.ActiveWindow.ActivePane.View.Type = wdPageView
Else
objWord.ActiveDocument.ActiveWindow.View.Type = wdPageView
End If
' Set page layout to landscape
With objWord.ActiveDocument.PageSetup
.Orientation = wdOrientLandscape
End With
' Select comparison table and reformat
objWord.ActiveDocument.Tables(1).Select
With objWord.Selection.Font
.Name = "Arial"
.Size = 8
End With
' Save compare file as Word document
objDoc.SaveAs filename:=fPathCompare, FileFormat:=wdFormatDocument
objDoc.Close wdDoNotSaveChanges
Set objDoc = Nothing
Please attach the document you are trying to open in Aspose.Words so we can reproduce the issue.
Hi
Thanks for your inquiry. Here is analog of your code.
[C#]
//Open document
Document doc = new Document(@"C:\Temp\in.doc");
//Set page layout to landscape
doc.FirstSection.PageSetup.Orientation = Aspose.Words.Orientation.Landscape;
//Get table from the document
Table tab = doc.FirstSection.Body.Tables[0];
//Get collection of runs in this table and change font
NodeCollection runs = tab.GetChildNodes(NodeType.Run, true);
foreach (Run run in runs)
{
run.Font.Name = "Arial";
run.Font.Size = 8;
}
//Also we need change font of paragraphs breaks
NodeCollection pars = tab.GetChildNodes(NodeType.Paragraph, true);
foreach (Paragraph par in pars)
{
par.ParagraphBreakFont.Name = "Arial";
par.ParagraphBreakFont.Size = 8;
}
//Set document view mode
doc.ViewOptions.ViewType = ViewType.PageLayout;
//Save document
doc.Save(@"C:\Temp\out.doc");
[VB]
'Open document
Dim doc As Document = New Document("C:\Temp\in.doc")
'Set page layout to landscape
doc.FirstSection.PageSetup.Orientation = Aspose.Words.Orientation.Landscape
'Get table from the document
Dim tab As Table = doc.FirstSection.Body.Tables(0)
'Get collection of runs in this table and change font
Dim runs As NodeCollection = tab.GetChildNodes(NodeType.Run, True)
For Each run As Run In runs
run.Font.Name = "Arial"
run.Font.Size = 8
Next
'Also we need change font of paragraphs breaks
Dim pars As NodeCollection = tab.GetChildNodes(NodeType.Paragraph, True)
For Each par As Paragraph In pars
par.ParagraphBreakFont.Name = "Arial"
par.ParagraphBreakFont.Size = 8
Next
'Set document view mode
doc.ViewOptions.ViewType = ViewType.PageLayout
'Save document
doc.Save("C:\Temp\out.doc")
Hope this could be useful for you.
Best regards.
Ho Alexey,
Thank you very much for your code. I am trying to compile the code and I am getting error on the following line:
'Get table from the document
Dim tab As **Tables** = doc.FirstSection.Body.Tables(0)
Type expected.
Could you please let me know what modification I need to make to make it compile?
Thanks.
Hi Alexey,
Actually I have used exactly your code as in:
Dim tab As Table = doc.FirstSection.Body.Tables(0)
I had tried it with Tables as well and it did not compile in either case.
Thanks.
Hi Alexey,
Thank you very much for your code. You have, as always, been very helpful.
I changed the Table to Tables.Table and now it is recompiling. Ran a test and it is almost the way I need it to be. Except that it is loosing its original color. I am thinking perhaps we need to import the Original Formatting (as in: ImportFormatMode.KeepSourceFormatting) first to preserve the original color. I have attached the input document for your testing.
Hi
Thank you for additional information. I tested your document and font settings are preserved after processing using Aspose.Words.
However borders of table were changed. So I created new issue #6488. I will notify you as soon as it is fixed.
Also please try using the latest version of Aspose.Words.
https://releases.aspose.com/words/net
Best regards.
Hi Alexey,
I installed the latest version and still I get the same results. However, I have found some interesting points as follows:
- I repeated the test with the test file that I had sent you and I could duplicate your results, i.e., the colors were preserved and the the borders were double-lined. For this I had modified and saved manually using MS Word the file which had been saved by my prgram using Aspose.Word.
- If I do not save the file using MS Word, the output looses its color. I have attached the input and the output files which have ONLY been saved in my program using Aspose Word. This output we need to fix so that it carries over the colors as well.
By the way the size of the output file increases dramatically. Any reason for this?
Thanks.
Hi
Thank you for additional information. Your original document is HTML document. I managed to reproduce the problem and created new issue #6495 in our defect database. I will notify you as soon as it is fixed. As a workaround use DOC format instead HTML
Best regards.
Hi Alexey. Thank you for your response.
This doc was generated out of HTML but was saved in Aspose.Word as a document format. How could you tell that this is an HTML document ? For the work around I am not sure what I need to do, I already save the document in doc format. Please elaborate.
Thanks.
Hi
Thanks for your inquiry. “CNkk01_compare_in.doc” was not generated by Aspose.Words and this document is actually a HTML document. You can just open this document using Notepad and you will see what I mean.
Best regards.
Hi, please also see my last post.
In a way of explaining our process further, we have these documents that are in HTML and are used by the users through our intranet. We generate Word documents from these HTML pages. We are currently using MS Word but want to replace it with Aspose Word. The code that goes through the HTML to convert it to a Word document is displayed below. Some of the strings are links in the original HTML document and they carry over to the generated Aspose Word document. You can see it at the top of the sample document I sent you. Could you please let me know how to remove these links from the word document. You can see the link when you hover the cursur on top of the string. Thanks.
' Select all run nodes in the document.
Dim runs As NodeCollection = doc.GetChildNodes(NodeType.Run, True)
' Loop through every run node.
For Each run As Run In runs
run.Font.Size = 7.5
run.Font.Name = "Verdana"
**If run.Font.Color = Color.Blue Then**
run.Font.Color = Color.Black
run.Font.Underline = Underline.None
End If
Next run
Hi,
Thank you for your response. You are right it is an HTML document and is created by a 3rd party software. I did not know that it was in HTML though.
In my program I opened up the HTML document in Aspose Word and saved it again in Aspose.Word (Attached). Then used the file as input to the function however, I get an error on the following line. Can you please let me know the cause of this error?
'Get collection of runs in this table and change font
Dim runs As NodeCollection = tab.GetChildNodes(NodeType.Run, True)
Error: Object reference not set to an instance of an object.
The attachment for the previous post.
Hi
Thanks for your inquiry. This occurs because your document is truncated and does not contain the table. The table was removed from the document because you are using Aspose.Words. in evaluation mode. If you want to test Aspose.Words without the evaluation version limitations, you can also request a 30-day Temporary License. Please refer to
https://purchase.aspose.com/temporary-license
Best regards.
Hi Alexey,
Thanks for your response. Yes I found out later that I had missed putting the License code in my portion of the test code. That has been resolved.
However, my problem is to convert the HTML file which is the output from DiffDoc application (3rd party application) to Aspose.Word and preserving the fonts, color formatting. As you may recall we found out that the problem was that it was not a Word document originally and was a HTML file. Then you suggested that convert it to Word first. But how do I do that. If I open the HTML file in Aspose Word and save it again I loose all the formatting. Could you please let me know how to do this? Our customers are growing restless to get some results soon.
Thanks.
Hi
Thanks for your inquiry. I meant that you can convert your HTML to DOC using MS Word. In this case all formatting will be preserved. Also you can try to change your HTML if it is possible. You can try to use ‘style’ instead ‘font’ tags.
Best regards.
Hi,
Thanks for your response. I tried it with ‘style’ replacing ‘font’ but it did not make any difference. However, I did something else that has made it preserve the colors and that is I manually replaced the hex representation of the color with its litteral English word, i.g., “red” and that is being shown in the saved document. Now I need to replace the color codes in hex to their English words for all colors given their hex value as input. I was thinking to read the file into a streamer and then replace the color values in memory. Do you think this is a good way and what would be a fast code as some of the files are very large. Thanks.
Hi,
I got it to work and now it looks exactly as want it to look. However, there is one last issue and that is that the size of the output file is almost 3 times the size of the input file. I am using the following code to reformat the input file and have attached the input file and the resulting output file. Is there any way to reduce the size of the file and do you know the reason for such file size increase? Thanks.
' Open file in memory to replace Hex color codes with their equivalent English definitions.
' Create an instance of StreamReader to read from a file.
Dim srRead As IO.StreamReader = New IO.StreamReader(fPathCompare)
srRead.BaseStream.Position = 0
Dim FileContent As String = srRead.ReadToEnd()
srRead.Close()
FileContent = Replace(FileContent, "FF0000", "red")
FileContent = Replace(FileContent, "009000", "green")
FileContent = Replace(FileContent, "0000FF", "blue")
FileContent = Replace(FileContent, "C0C0C0", "gray")
FileContent = Replace(FileContent, "000000", "black")
FileContent = Replace(FileContent, "FFFFFF", "white")
Dim FileByte As Byte() = System.Text.Encoding.UTF8.GetBytes(FileContent)
Dim srStore As IO.MemoryStream = New IO.MemoryStream(FileByte)
'Open document from the streamer
Dim srcDoc As Document = New Aspose.Words.Document(srStore)
Dim dstDoc As Aspose.Words.Document = CType(srcDoc.Clone(False), Aspose.Words.Document)
'This is needed to import formating of source document.
dstDoc.Sections.Add(dstDoc.ImportNode(srcDoc.FirstSection, True, ImportFormatMode.KeepSourceFormatting))
'Set page layout to landscape
dstDoc.FirstSection.PageSetup.Orientation = Aspose.Words.Orientation.Landscape
'Get table from the document
Dim tab As Tables.Table = dstDoc.FirstSection.Body.Tables(0)
'Get collection of runs in this table and change font
Dim runs As NodeCollection = tab.GetChildNodes(NodeType.Run, True)
For Each run As Run In runs
run.Font.Name = "Arial"
run.Font.Size = 8
Next
'Also we need change font of paragraphs breaks
Dim pars As NodeCollection = tab.GetChildNodes(NodeType.Paragraph, True)
For Each par As Paragraph In pars
par.ParagraphBreakFont.Name = "Arial"
par.ParagraphBreakFont.Size = 8
Next
'Set document view mode
dstDoc.ViewOptions.ViewType = ViewType.PageLayout
Dim fPathCompareOut As String = Replace(fPathCompareWord, "_inWordTemp", "_outWord10")
'Save document
dstDoc.Save(fPathCompareOut, SaveFormat.Doc)