Convert pdf to html error

use aspose.pdf convert a pdf to html in windows is success, but in centos6.5 is fail and not throw Exception。



Hi Jiaokun,


Thanks for contacting support.

We are looking into details of the scenario in an environment which you have specified and will get back to you shortly. Please be patient.

Best Regards,

Do you have any test results?

Hi Jiaokun,


Thanks for your patience. We are working on configuring the specified environment and once we setup that, we will be able to test the scenario. We will definitely update you as soon as we have some results. Please be patient and spare us little time.

Best Regards,

Hi Jiaokun,

Thanks for your patience.

I have tried to convert your shared PDF into HTML in specified environment (CentOS 6.5), with JRE 1.8 and Aspose.Pdf for Java 17.4.0, and I was unable to notice any issue. The output was generated fine by the following code snippet, which I have also attached for your reference.

com.aspose.pdf.Document doc = new com.aspose.pdf.Document(dataDir + “test.pdf”);
HtmlSaveOptions options = new HtmlSaveOptions();
doc.save(dataDir + "PDFToHtml2_out.html", options);

We will really appreciate if you please share more information regarding scenario i.e the JDK Version and API Version along with code snippet which you are using, so that we can test the scenario again in our environment and address it accordingly.

Best Regards,

@Override
public void doAnalysis() throws Exception
{
    // 转换成pdf文档对象即可
    Document doc = new Document(getParam(ParamConstrant.SOURCEPATH, String.class));
addParam(ParamConstrant.DOCUMENT, doc);
}
@Override
public void doConvert() throws Exception
{
    Document doc = getParam(ParamConstrant.DOCUMENT, Document.class);
if (null != doc)
{
    doc.save(getParam(ParamConstrant.TARGETPATH, String.class), getHtmlSaveOptions(getParam(ParamConstrant.TARGETPATH, String.class)));
}
}
/**
* 获取HtmlSaveOptions
* @return
*/
protected HtmlSaveOptions getHtmlSaveOptions(final String targetPath)
{
    HtmlSaveOptions newOptions = new HtmlSaveOptions();
    newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
    newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
    // 这个地方是控制, 图片是否压入的地方
    newOptions.PartsEmbeddingMode = AnalysisConfig.getBoolean(ParamConstrant.COMPRESSIMAGE, false) ? HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml : HtmlSaveOptions.PartsEmbeddingModes.EmbedCssOnly;
    newOptions.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
    newOptions.setSplitIntoPages(false);
    newOptions.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy()
{
@Override
public void invoke(HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo)
    {
        byte[] resultHtmlAsBytes = new byte[(int)htmlSavingInfo.ContentStream.getLength()];
        htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);
        FileOutputStream fos = null;
        try
        {
            LOG.info("开始写入Html[" + targetPath + "]文件...");
            // 考虑编码
            fos = new FileOutputStream(targetPath);
            fos.write(resultHtmlAsBytes);
            fos.flush();
            LOG.info("Html[" + targetPath + "]文件写入完成...");
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
        finally
        {
            try
            {
                if (null != fos)
                {
                    fos.close();
                }
            }
            catch (Exception e)
            {
                LOG.error(e.getMessage());
            }
        }
    }
};
return newOptions;
}

Hi Jiaokun,

Thanks for sharing code snippet.

I was unable to run your code snippet as there were some undefined objects/classes in the code (i.e AnalysisConfig.getBoolean(ParamConstrant.COMPRESSIMAGE, false)). Furthermore I have also observed that your code snippet reflects usage of an older version of the API. Please check following code snippet where I have managed to convert your PDF into HTML with latest version of the API, by modifying highlighted part of code lines.

public void PDFToHtml()

{

    com.aspose.pdf.Document doc = new com.aspose.pdf.Document(dataDir + "test.pdf");

    doc.save(dataDir + "PDFToHtml_out.html", getHtmlSaveOptions(dataDir + "xys.html"));

}


protected HtmlSaveOptions getHtmlSaveOptions(final String targetPath)

{

    HtmlSaveOptions newOptions = new HtmlSaveOptions();

    newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;

    newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;

    newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;

    newOptions.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;

    newOptions.setSplitIntoPages(false);

    newOptions.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy()

{

@Override

public void invoke(HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo)

    {

        byte[] resultHtmlAsBytes;

        try
        {

            resultHtmlAsBytes = org.apache.commons.io.IOUtils.toByteArray(htmlSavingInfo.ContentStream);

            htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);

            htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);

            FileOutputStream fos = null;

            try

            {

                fos = new FileOutputStream(targetPath);

                fos.write(resultHtmlAsBytes);

                fos.flush();

            }
            catch (Exception e)
            {

                e.printStackTrace();

            }
            finally
            {

                try

                {

                    if (null != fos)
                    {

                        fos.close();

                    }

                }
                catch (Exception e)
                {

                }

            }

        }
        catch (IOException e1)
        {

            // TODO Auto-generated catch block

            e1.printStackTrace();

        }

    }

};

return newOptions;

}

The output, generated by the above code has also been attached for your reference. Please try using latest version of the Aspose.Pdf for Java which is Aspose.Pdf for Java 17.4.0 and in case if you still face any issue, please feel free to let us know by sharing a sample application with entire routine of execution. This way we can try to observe the issue in our environment.

Best Regards,