Problem with bullet list Converting from DOCX -&gt; HTML Word.Java

callidus · January 23, 2015, 8:18am

I have a problem when I try to convert attached word file. I get attached HTML file. Bullet list is not left aligned , but there is a lot of spaces in span after bullet on list beginning. Is this a known bug ? I am missing something in my code:

package com.company.asposetest;

import com.aspose.words.Document;
import com.aspose.words.ExportHeadersFootersMode;
import com.aspose.words.HtmlSaveOptions;
import com.aspose.words.SaveFormat;
import org.jsoup.Jsoup;

import java.io.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Main
{

    public static String getBody(String html)
    {
        // get just body
        String htmlBody;
        org.jsoup.nodes.Document jdoc = Jsoup.parse(html);
        if (jdoc != null && jdoc.select("body") != null)
        {
            htmlBody = jdoc.select("body").toString();
        }
        else
        {
            htmlBody = html;
        }
        return htmlBody;
    }

    public static String convertByteArrayDocxToHTML(byte[] template)
    {
        String convertedHtml = null;
        convertedHtml = generateHtmlFromTemplate(template);

        if (convertedHtml != null)
        {
            convertedHtml = getBody(convertedHtml);
        }
        return convertedHtml;
    }

    /**
     * @param template
     * @param
     * @return
     */
     private static String generateHtmlFromTemplate(byte[] template)
     {
         InputStream is = new ByteArrayInputStream(template);
         String convertedHtml = null;
         ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
         try
         {
             Document doc = new Document(is);
             HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.HTML);
             options.setExportHeadersFootersMode(ExportHeadersFootersMode.NONE);

             options.setExportImagesAsBase64(true);
             doc.joinRunsWithSameFormatting();
             doc.save(outputStream, options);
             convertedHtml = outputStream.toString("UTF-8");

         }
         catch (Exception e)
         {

         }
         finally
         {
             if (is != null)
             {
                 try
                 {
                     is.close();
                 }
                 catch (IOException e)
                 {
                     e.printStackTrace();
                 }
             }
             if (outputStream != null)
             {
                 try
                 {
                     outputStream.close();
                 }
                 catch (IOException e)
                 {
                     e.printStackTrace();
                 }
             }
         }
         return convertedHtml;
     }

     public static void main(String[] args) throws IOException
     {

         Path path = Paths.get("c:\aspose\1.docx");
         byte[] data;
         data = Files.readAllBytes(path);

         PrintWriter writer = new PrintWriter("c:\aspose\1.html", "UTF-8");
         writer.print(convertByteArrayDocxToHTML(data));
         writer.close();

     }
}

tahir.manzoor · January 26, 2015, 3:38am

Hi Milan,

Thanks for your inquiry. I have tested the scenario using latest version of Aspose.Words for Java 14.12.0 and have not found the shared issue. Please use Aspose.Words for Java 14.12.0 and let us know if you have any more queries. I have attached the output HTML with this post for your kind reference.

Problem with bullet list Converting from DOCX -&amp;gt; HTML Word.Java

Problem with bullet list Converting from DOCX -> HTML Word.Java