Aspose.pdf for Java pdf转html，转换结果基本显示空白

xiangma · March 23, 2020, 1:36pm

pdf转换出来显示空白.zip (479.6 KB)
pdf转换html，转换结果基本显示空白。附件中为原文件及转换结果文件。

代码：
private void pdf2Html(String sourceFileName, String targetFileName) throws Exception {
com.aspose.pdf.Document document = new com.aspose.pdf.Document(sourceFileName);

	File path = new File(targetFileName.substring(0, targetFileName.lastIndexOf('.')));
	path.mkdirs();
	
	Resolution resolution = new Resolution(200); 
    JpegDevice device = new JpegDevice(1080,1920,resolution);
    PageCollection pages = document.getPages();
	for (int i = 1; i <= pages.size(); i++){
    	File file = new File(path+"/"+(i-1)+".jpg");// 输出路径 
    	file.createNewFile();
    	FileOutputStream fileOS = new FileOutputStream(file); 
    	device.process(pages.get_Item(i), fileOS); 
    	fileOS.close(); 
    }
	buildSlideMobileHtml(targetFileName, path, pages.size());
	document.close();
}

依赖版本：

com.aspose
aspose-pdf
20.1

asad.ali · March 23, 2020, 6:58pm

@xiangma

感谢您与支持人员联系。

您共享了一个代码段，其中将PDF转换为JPG。鉴于您提到在PDF到HTML转换期间遇到的问题。请您分享用于生成HTML的代码段。我们将在我们的环境中测试该场景并相应地解决它。

xiangma · March 24, 2020, 10:53am

您执行如下方法就可以了。
参数：
sourceFileName：转换的pdf原文件例：d:\test.pdf
targetFileName：输出的目标地址例：d:\test\test.html
private void pdf2Html(String sourceFileName, String targetFileName) throws Exception {
com.aspose.pdf.Document document = new com.aspose.pdf.Document(sourceFileName);

File path = new File(targetFileName.substring(0, targetFileName.lastIndexOf('.')));
path.mkdirs();

Resolution resolution = new Resolution(200); 
JpegDevice device = new JpegDevice(1080,1920,resolution);
PageCollection pages = document.getPages();
for (int i = 1; i <= pages.size(); i++){
	File file = new File(path+"/"+(i-1)+".jpg");// 输出路径 
	file.createNewFile();
	FileOutputStream fileOS = new FileOutputStream(file); 
	device.process(pages.get_Item(i), fileOS); 
	fileOS.close(); 
}
document.close();

}

asad.ali · March 24, 2020, 5:11pm

@xiangma

我们想与您分享您共享的方法将PDF转换为JPG，我们尝试使用20.3版本的API。我们注意到输出图像是空白的。因此，已在我们的问题跟踪系统中将问题记录为PDFJAVA-39270。我们将进一步调查其详细信息，并向您发布其纠正状态。请给我们一点时间。

我们对造成的不便很抱歉。

asad.ali · April 27, 2020, 5:31pm

@xiangma

字体有一些问题。可以通过字体替换和将字体转换为TTF / setConvertFontsToUnicodeTTF（true）/进行修复。请在下面查看代码段。该代码段已通过以下字体成功测试：SimSum，Arial Unicode MS，Droid Sans Fallback。为了获得更好的效果，您可以尝试使用其他一些看起来更接近原始字体的字体。

FontRepository.addLocalFontPath("...Path to substitution font..."); // When it is absent in default font repository
FontRepository.getSubstitutions().add(new FontSubstitute());

Document pdfDocument = new Document(dataDir + "test_ing.pdf");
for (Page page : pdfDocument.getPages()) {
  long totalStart = System.currentTimeMillis();
  java.io.OutputStream imageStream = new java.io.FileOutputStream(dataDir + "Converted_Image_"+page.getNumber()+".jpg");
  Resolution resolution = new Resolution(200);
  JpegDevice jpegDevice = new JpegDevice(1080, 1920, resolution);
  jpegDevice.getRenderingOptions().setConvertFontsToUnicodeTTF(true); 
  jpegDevice.process(page, imageStream);
  imageStream.close();
  long totalEnd = System.currentTimeMillis();
  System.out.println("Page Number=" + page.getNumber() + " Total time taken was " + (totalEnd - totalStart) / 1000 + " seconds for conversion");

....

public class FontSubstitute extends CustomFontSubstitutionBase {
       public boolean trySubstitute(OriginalFontSpecification originalFontSpecification, com.aspose.pdf.Font[] substitutionFont) {
           System.out.println(originalFontSpecification.getOriginalFontName());
           substitutionFont[0] = FontRepository.findFont("SimSun");
           //substitutionFont[0] = FontRepository.findFont("Droid Sans Fallback");
           //substitutionFont[0] = FontRepository.findFont("Arial Unicode MS");
       return true;
}