Free Support Forum - aspose.com

Convert pdf to html spaces loss


#1

Hi,
When I convert pdf to html using aspose.pdf java version, spaces loss.

In my case: Narrow linewidth lasers are necessary as local to Narrowlinewidthlasersarenecessaryaslocaloscil

Original pdf screenshot:
Image20190315113904.png (46.5 KB)
Converted html screenshot:
Image20190315113943.png (2.8 KB)

Here’s my test pdf:
100024.pdf (226.7 KB)

Here’s my testing code:

Document pdf = new Document(pdfFile.getAbsolutePath());

HtmlSaveOptions options = new HtmlSaveOptions();
options.setFixedLayout(false);
options.setSplitIntoPages(false);
options.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsTTF;
options.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsExternalPngFilesReferencedViaSvg;
options.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedCssOnly;

pdf.save(htmlFile.getAbsolutePath(), options);


#2

@titanseason

Thank you for contacting support.

Please install attached fonts in default font directory or set a path to the fonts using FontRepository.addLocalFontPath() or below function with Aspose.PDF for Java 19.2.

String path = "path/to/my/folder";
List<String> fontPaths = FontRepository.getLocalFontPaths();
fontPaths.add(path);
FontRepository.setLocalFontPaths(fontPaths);

100024Fonts.zip

We hope this will be helpful. Please feel free to contact us if you need any further assistance.


#3

I didn’t install the fonts, but set a path to the fonts using FontRepository.addLocalFontPath(). still spaces loss.

Here’s my full test code:

public static int pdfToHtml(File pdfFile, File htmlFile) {
try {
addDefaultFonts(); // add fonts

        Document pdf = new Document(pdfFile.getAbsolutePath());

        HtmlSaveOptions options = new HtmlSaveOptions();
        options.setFixedLayout(false);
        options.setSplitIntoPages(false);
        options.setExtractOcrSublayerOnly(true);
        options.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsTTF;
        options.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsExternalPngFilesReferencedViaSvg;
        options.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedCssOnly;

        pdf.save(htmlFile.getAbsolutePath(), options);

    } catch (Exception e) {
        e.printStackTrace();
        return -1;
    }

    return 0;
}

private static void addDefaultFonts() {
    URL url = AsposePDF.class.getClassLoader().getResource("fonts/100024Fonts");
    if (url == null) {
        return;
    }
    String path = url.getFile();
    for (int i = 1; i <= 10; i++) {
        File file = new File(path, "100024_font" + i + ".ttf");
        if (file.exists()) { // make sure file exists
            System.out.println(file.getAbsolutePath()); // in console log: file path is correct
            FontRepository.addLocalFontPath(file.getAbsolutePath());
        }
    }
}

#4

I think, the key point is options.setFixedLayout(false);. If I set FixedLayout to false, spaces will loss and images missing. But when I set FixedLayout to true, spaces and images are all exist.

In my case, I need to set FixedLayout to false, to keep paragraph information. So please see if there are some bugs make spaces and images lost


#5

@titanseason

We have logged a ticket with ID PDFJAVA-38436 for investigations of the problem. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.