Rendering and html export problems

Hi,

First of all, I want to thank you for adding rendering functions to your product. It’s a feature we have really been waiting for.

I started testing the rendering engine with a few documents you can find attached and encountered the following problems :

  • When a document contains a table of figures like in “doc with table of figures.docx”, Aspose.Words throws a null pointer exception.
  • There are paragraph and font issues when exporting to pdf (doc.pdf and
    docx.pdf). Please note that the problems are not the same if the
    document was saved to doc or docx (doc.doc and docx.docx) with MS Word.
  • There are indentation problems with paragraph without numbering whose styles do contain numbering. Please have a look at “Table des illustrations” on page 7 and “Remerciements” on page 6 of doc.pdf.
  • The position of shapes with “in front of text” wrapping are not correct like on page 13 of doc.pdf.
  • There is too much space after shapes with “in line with text wrapping” like on page 14 and 16 of doc2.pdf compared with the original document doc2.doc. This problem also occurs when exporting these pages to images ("doc2 p14.png and doc2 p16.png)
  • When exporting to pdf, some pictures are distorted like in doc.pdf and docx.pdf. This doesn’t happen when converting to images.

Regarding the export to HTML problems, there are indentation and numbering issues when exporting documents like doc2.doc to doc.html.

Also, since your rendering engine is done, do you plan to rasterize shapes when exporting to html and add support for floating shapes?

About your API, I must say I find confusing that some PDF export options are in SaveOptions while others are in SaveToPdfOptions. Couldn’t you merge all thes options?

Hope I could be clear enough.

Regards,

Hi
Thanks for your request.
1. I managed to reproduce the problem and create new issue #6686 in our defect database.
2. IT seems that Renderer does not recognize NonBreakingSpace characters. I created new issue #6687 in our defect database. As a workaround you can replace NonBreakingSpace characters in your document with Space characters. Please see the following code:

Document doc = new Document(@"Test011\doc.docx");
// Get collection of runs
NodeCollection runs = doc.GetChildNodes(NodeType.Run, true);
// Loop through all runs and replace nonBreakingSpace with Space character
foreach (Run run in runs)
{
    run.Text = run.Text.Replace(ControlChar.NonBreakingSpaceChar, ' ');
}
// Save output in PDF format
doc.SaveToPdf(@"Test011\out.pdf");

3. I see the problem and I created issue # 6689.
4. I see that position of shape is not correct. I create issue # 6690.
5. Created one more issue in our defect database (#6691)
6. Created new issue # 6692.
Regarding List numbering in HTML it is known issue #3701.
Issue #3701 - List with multiple levels is not correct converted to HTML.
It is impossible to set 1.2.3 as list labels netively in HTML. At least we cannot find how this could be done. MS Word exports such lists as non-lists using siple paragraphs and spans. This is not good because this approach looses document structure.
Yes we plan to support floating shapes and rasterize shapes. But currenly I can’t provide you any estimate when these features will be supported.
Which options do you mean? If you mean option like SaveOptions.PdfExportBookmarkLevel, SaveOptions.PdfExportCheckBoxEmptySign etc, these options are used when Aspose.Words and Aspose.Pdf are used for DOC to PDF conversion.
Best regards.

Thank you for your answers.

Regarding issue 2, I think this is more than a nbsp character recognition problem. If I convert a docx document to pdf, like docx.docx to docx.pdf, there are no problem with nbsp characters but most font and paragraph formatting are not correct.
If I convert this document to doc with Word 2007, the generated pdf, doc.pdf, has proper paragraph formatting but problems with the nbsp.
In both case, the generated pdfs have the “Blackoak Std” font missing for “Destinataires” on page 1.

About the HTML conversion, I must say I don’t really understand what you mean about MS Word losing document structure since, as far as I understood, a list element in a Word document is just a paragraph with some list formatting. So, to me, when converting list elements to HTML list items, you are somehow adding more piece of information to the html structure. The problem is that by doing so, you actually lose some document content with the nested lists and have problem with some paragraphs’ left margin.
To us, what matters is having an HTML which looks as close as possible from the original Word document. So, having the list items exported as spans and paragraphs is fine with us and would be great since less document content will be lost.

Do you plan on fixing the paragraph left margin problems? This is what I incorrectly called “indentation” in my previous post. For example, in the original document, doc2.doc, bulleted lists are in line with their upper and under paragraphs while this is not the case in the converted html, doc2.html. Also, the word “Localisation” of “1.1 Localisation” is not in line with “La société…” like it is on page 10.

About the pdf save options, I now get it but I think you should specify in your api reference that the PdfExportSomething properties of SaveOptions are only used when saving to your Aspose.Pdf format and not to pdf.

Kind regards.

Hi

Thanks for your request.
Sorry I missed problem with paragraph spacing. It seems to be a known issue #6102 in our defect database.
Yes, it seems font of “Destinataires” is changed. Maybe there is some difference between the same font in MS Word and in PDF. I will consult with our developers and provide you more information.
We will investigate this issue further and will try to fix it in one future releases. But currently I can’t provide you any estimate.
Yes of course we will improve out HTML export. But I can’t promise you that this issue will be fixed in the nearest future.
You are right. We will update your documentation.
Best regards.

The issues you have found earlier (filed as 6691;6690;6686;6692) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by alexey.noskov.

The issues you have found earlier (filed as 3701) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

The issues you have found earlier (filed as 6687) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(64)

The issues you have found earlier (filed as WORDSNET-1882) have been fixed in this .NET update and in this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.