Convert Word to HTML- and some text are at wrong place

Hi there

I used aspose word to convet Word files to HTML format
I found that there are some texts are at wrong place in the result.
(At the table of 2nd and 3rd pages)

Please check my code and the attachment.

try
{
    Document doc = new Document("20141008 Pusheen表格格.docx");
    Document pageDoc;
    LayoutCollector layoutCollector;
    DocumentPageSplitter splitter;
    ByteArrayOutputStream output = new ByteArrayOutputStream();
    HtmlSaveOptions saveOp = new HtmlSaveOptions();
    saveOp.setExportImagesAsBase64(true);
    saveOp.setExportTextInputFormFieldAsText(false);
    saveOp.setExportTocPageNumbers(true);
    saveOp.setExportPageSetup(true);
    saveOp.setExportDocumentProperties(true);
    saveOp.setExportRelativeFontSize(false);
    saveOp.setUpdateFields(true);
    layoutCollector = new LayoutCollector(doc);
    doc.updatePageLayout();
    splitter = new DocumentPageSplitter(layoutCollector);

    byte[] outputContent;
    String outputPath = "";
    String blockId = UUID.randomUUID().toString();

    File outputDir = new File(outputPath + "/" + blockId + "/");
    if (!outputDir.exists())
        outputDir.mkdir();
    ByteArrayOutputStream testOut = new ByteArrayOutputStream();

    for (int page = 1; page <= doc.getPageCount(); page++)
    {
        System.out.println("page:" + page);
        pageDoc = splitter.getDocumentOfPage(page);
        Document onepageDoc = splitter.getDocumentOfPage(1);

        testOut.reset();
        output.reset();

        pageDoc.save(output, saveOp);
        outputContent = output.toByteArray();
        IOUtils.write(outputContent, new FileOutputStream(outputPath + "/" + blockId + "/" + page + ".html"));

    }

}
catch (Exception e)
{
    e.printStackTrace();
}

Hi Cheng,

Thanks for your inquiry. We have made a slight change in your code, removed empty paragraphs from your document before converting it to HTML. Now the output consists of two pages, please find attached the resultant HTML documents for your reference. We have noticed that in fourth column of first page text characters are split on separate lines and number list in second last column is restarted from 1 on second page. Please check and confirm, so we will log the issues in our issue tracking system.

Document doc = new Document("20141008+Pusheen表格格.docx");
// "D:/Downloads/MAYJIANG_HAWAY_Improvement_2+2011.03.01.doc");
for (Paragraph paragraph : (Iterable)doc.getChildNodes(
NodeType.PARAGRAPH, true))
{
    if (paragraph.toString(com.aspose.words.SaveFormat.TEXT).trim().equals(""))
    {
        if (paragraph.hasChildNodes())
            continue;
        paragraph.remove();
    }
}

Best Regards,

Hi

With the change of the code, the issue you mentioned exists,
along with the other issue that the splitting of the table across 2 pages is different from the original Word file.

I would like to take the issue of number-list restarting to another post, for tracking separately.
The numbering after page splitting is different from origin Word file

Hi Cheng,

Thanks for your feedback. We have logged a ticket WORDSJAVA-1564 against word splitting issue on multiple lines in our issue tracking system for further investigation and rectification. We will notify you as soon as it is resolved.

We are sorry for the inconvenience.

Best Regards,