A Few Conversion Issues During Testing

Hi all,

Our testing team has submitted a few issues found during the conversion of a Word document to PDF and HTML. I have attached the source Word document, a supporting PDF output, and supporting images for the HTML output. We are using Aspose.Words and minimal conversion code:

word = New Aspose.Words.Document(stream, LoadFormat.Doc)
word.Save(htmlDestination, SaveFormat.Html)


word = New Aspose.Words.Document(stream, LoadFormat.Doc)

The three issues are:

  1. The text ‘— End of Procedure —’ displays on page 16 of the attached original document, but is cut off on page 15 of the PDF output.Ex: textCutOffAtEndOfDoc.pdf
  2. There is a difference in bullet indentation between the original Word document and the HTML output.An example from the source doc is page 5, second bullet under # 8. This is indented to line up with the other bullets in the HTML output, but is not indented in the source doc. Ex: bulletIndentation.bmp
  3. Extra spacing is present between screen capture headings and the screen captures.Ex: spaceBetweenScreenTitleAndCapture.bmp



Thanks for your request.

  1. I managed to reproduce the problem with PDF and created new issue #8705 in our defect database.
  2. I logged the problem with bullets indentation in our defect database as issue #8706.
  3. I also reproduce the problem with extra line breaks added before image and created new issue #8707 in our defect database.

You will be notified as soon as these issues are resolved.
Best regards.

Thank you for your patience. I have addressed issues #8706 and #8707. Here is my expertise.
#8706: In HTML lists can be represented by native constructions ol/ul/li. Microsoft Word outputs list items using these constructions wherever possible. First of all, not all list labels are representable in HTML. This is the matter of #3701, the issue which has been fixed in the development mainstream already. But Microsoft Word also outputs simple paragraphs with labels if indentation differs. “Differs” from what is a good question. I think we can deducе this logic from how browser and Microsoft Word work together. First guess is that each list level has indentation equal to one tab position (36.0pt in the sample). If a list item in a document has different indentation parameters then we should output it as custom (non-native) list item. Other formatting options should be also considered in the future. To see what I explain you can export your document to HTML with Microsoft Word and view as plain text.
There is a known workaround applicable to this case. You can replace lists with simple paragraphs with custom-calculated label text of the corresponding items. You can refer this thread to learn this approach:
#8707: The issue happens because the image (floating) belongs to a canvas (inline). AW doesn’t export floating contents (known as #4488) so floating images are exported as inline. This particular case is not easy to fix but there is an easy workaround. Just remove the canvas completely and place the image inline right after the text paragraph. See attachment (Remake.doc).

The issues you have found earlier (filed as 8707) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

The issues you have found earlier (filed as WORDSNET-2302) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.