Free Support Forum - aspose.com

Formatting adds _0s after combining documents

Hi after I do a merge in my word documents I want to combine them into one and then convert that to a PDF and have the headings convert to bookmarks.

This seems to work after combining only 2 or 3 docs but when I combine several, like 20 documents, it looses most of the headings somehow when going to PDF.

For the formatting on my Headings, my combined / merged word document has something like "Heading 1 _0_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0 + Font..." where my template says "Heading 1 + Font...". Its like every time it calls "ImportFormatMode.KeepSourceFormatting" it saves a new format name appending the _0s on the end. The last appended document has no _0s on it. I tried modifying part of the fully appended word doc to remove some of the _0s from the formatting before it converts to PDF but still did not work Like there are just too many formats in the list or the names are too long.

I don't understand whats going on but the result is there are only a couple bookmarks left at the bottom of the PDF. Any help is much appreciated. I do need to run through all the templates and standardize the formatting but wiill this make any difference?

Thanks,

Eric

FYI we are using the latest words.dll and pdf.dll in evaluation mode because our current version does not support the bookmarking features. Once this is working the boss will renew our upgrade. Thx, Eric

Hello Eric!<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for your inquiry.

To help you properly we have to reproduce the issue on our side. Please attach (in an archive) several documents you are combining and post here the code you are using for concatenation and converting. Any details would be very useful. Maybe we’ll also need cooperation with Aspose.Pdf team.

Regards,

public Aspose.Words.Document Append(params string[] aDocPaths)
{
Aspose.Words.Document allInOneDocument = null;
foreach (string documentPath in aDocPaths)
{
Aspose.Words.Document document = new Aspose.Words.Document(documentPath);
if (allInOneDocument == null)
{
allInOneDocument = document.Clone();
allInOneDocument.Sections.Clear();
}
foreach (Aspose.Words.Section section in document)
{
Aspose.Words.Node node = allInOneDocument.ImportNode(section, true, Aspose.Words.ImportFormatMode.KeepSourceFormatting);
allInOneDocument.AppendChild(node);
}
}
return allInOneDocument;
}
//--------------------------------------------------------------

//--------------------------------------------------------------

public void ConvertDocToPdf(string aDocPath)
{
string pdfPath = System.IO.Path.ChangeExtension(aDocPath, ".pdf");
Aspose.Pdf.Pdf pdf = ConvertWithStream(aDocPath);
Aspose.Pdf.Pdf pdfWithBookmarks = IncorporateBookmarks(pdf);
pdfWithBookmarks.Save(pdfPath);
FileByter fileByter = new FileByter();
_Document = fileByter.GetFileNow(pdfPath);
_Directory = System.IO.Path.GetDirectoryName(pdfPath);
_FileName = System.IO.Path.GetFileName(pdfPath);
}

private Aspose.Pdf.Pdf ConvertWithStream(string aDocPath)
{
Aspose.Words.Document document = new Aspose.Words.Document(aDocPath);
System.IO.MemoryStream docStream = new System.IO.MemoryStream();
document.Save(docStream, Aspose.Words.SaveFormat.AsposePdf);
docStream.Seek(0, System.IO.SeekOrigin.Begin);
System.Xml.XmlDocument xmlDoc = new System.Xml.XmlDocument();
xmlDoc.Load(docStream);
docStream.Close();
Aspose.Pdf.Pdf pdf = new Aspose.Pdf.Pdf();
pdf.IsImagesInXmlDeleteNeeded = true;
pdf.BindXML(xmlDoc, null);
pdf.IsTruetypeFontMapCached = true;
ConfigFile configFile = new ConfigFile();
pdf.TruetypeFontMapPath = configFile.OutputFolder;
return pdf;
}

Aspose.Pdf.Pdf IncorporateBookmarks(Aspose.Pdf.Pdf pdf)
{
pdf.BookMarkLevel = 3;
pdf.IsBookmarked = true;
foreach (Aspose.Pdf.Section section in pdf.Sections)
{
foreach (Aspose.Pdf.Paragraph paragraph in section.Paragraphs)
{
if (paragraph is Aspose.Pdf.Heading)
{
Aspose.Pdf.Heading heading = paragraph as Aspose.Pdf.Heading;
heading.IsInList = true;
}
}
}
return pdf;
}

Hello!<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

I have looked at your files and code. Thank you for additional details. All you are doing is correct. These style names are synthesized when documents are concatenated. Most probably Aspose.Pdf has a restriction on now many styles can be in one document. I’ll ask them and let you know what they suggest.

Regards,

Hi,

I found these codes in your C# file:

pdf.BookMarkLevel = 3;
pdf.IsBookmarked = true;

The Pdf.BookMarkLevel means: Gets or sets a int value that indicates how many levels of Heading of the pdf document is to be bookmarked. The default value is 0, which means every heading of the PDF is to be taged as a bookmark. Only when the relative property IsBookmarked is set as true, this property is valid. If IsBookmarked is true and this property is set to a non-negative value levelNumber, Aspose.Pdf will create bookmarks for the corresponding Heading of level 1 to level levelNumber.
As pdf.BookMarkLevel is set to be 3, Headings with heading level 1-3 will be shown. The others will be ignored. That’s why most of headings’ bookmarks are lost(their heading levels are 10).
You can set pdf.BookMarkLevel to a non-negative levelNumber ‘n’ to show those headings with heading level 1~n. When pdf.BookMarkLevel is 0(default value), it will show all headings in the doc.
At present, there is a restriction: pdf.BookMarkLevel supports headings with heading level less than 10. I will enlarge this value to support your case(as most of the headings in the doc with heading level 10.).

Best regards.

Hello Viktor and Hans,

Thank you for taking a look at this. I don't really understand what you said about loosing lvl 10 headings because the headings that are geting lost are set to 1, 2 or 3. Maybe the synthezided styles that Viktor mentioned are interpreted as > 10 which would make sense. Anyway as long as you understand it enough to fix it, that's good enough for me. Am I waiting to hear back from you or should I download the latest pdf.dll for the fix.

Or is there some way for words.dll to not synthesize a new style if it already exists in the combined doc? Maybe that would be easier and you would still be able to set lvl 10 as your max heading.

Thanks again,

Eric

Hi Eric,

I have attached the combined doc,intermediate xml and result pdf. The pdf is generated after I enlarged the max heading level to 20. Most of the headings are shown as bookmarks now.
You can see that most of the headings have a heading level 10 in the xml file, that’s why they are lost.We can fix this just by enlarging the max heading level.
But some headings with level less than 10 are also lost.I will cousult with our developer about this and found a solution.
We will give you an update here as soon as the problem is resolved. And the fix will be also included in our next hotfix.

Best regards.

Hello!<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Consolidating styles from several documents is implemented this way because someone might want the styles with identical names and even identical properties to be distinguishable. For instance we saw that some customers concatenate completely different documents, without any logical relations. What could we do if in those documents one or more styles will match accidentally? Leaving them logically different is a rational decision. Theoretically a new option might be provided to switch this. But we don’t see much value of it.

Happy New Year! My best wishes.

Regards,

Hello Team,

Just checking status to see if you have a new hotfix coming to fix this?

Thanks,

Eric

Hello!<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

I have reminded Aspose.Pdf team about this issue. They will reply here soon.

Regards,

Hi Eric.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

If you would like to know the issue status you can ask Aspose.Pdf team in their forum. They promised to fix this but most probably there are some other important tasks.

Regards,

Well in describing my problem again I tried a simple test and realized that the problem may not be with the renaming of the styles when combining docs but the fact that most of my templates contain the heading within a table at the tops of the word document. When I take the Heading text out of the table it seems to work. Not sure if this is a bug or if there is a reason for it to work like this. Anyway it's posted on the PDF forum for now. Thanks for all the help.

Eric

http://www.aspose.com/Community/forums/thread/109991.aspx

Hello Eric!<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Of course it is questionable whether Aspose.Pdf should or shouldn’t collect headings from tables to the list of bookmarks. Probably another customer wouldn’t like to have them propagated. I think that you can come to a reasonable solution with Aspose.Pdf team.

Thank you for using Aspose products and helping make them better!

Regards,