Parts of texts gets a little up-shifted in the result of saving a PDF file HTML format

craig.w.su · April 26, 2017, 1:40am

Hi There

We are working on saving PDF file into HTML format with Aspose PDF 17.2.

Here is our methods called for test:

public static void convertPDF(String filePath, String password) throws Exception {

License l = new License();

l.setLicense(“aspose.lic”);

FontRepository.getSubstitutions()

.add(new ReadPdfFontSubRule(“DFKaiShu”, FontRepository.findFont(“cwTeX Q Kai Medium”)));

FontRepository.getSubstitutions()

.add(new ReadPdfFontSubRule(“標楷體”, FontRepository.findFont(“cwTeX Q Kai Medium”)));

FontRepository.getSubstitutions()

.add(new ReadPdfFontSubRule(“PMingLiU”, FontRepository.findFont(“cwTeX Q Ming Medium”)));

Document pdf = null;

if (StringUtil.isNotEmpty(password)) {

pdf = new Document(filePath, password);

} else {

pdf = new Document(filePath);

}

HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();

htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;

htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;

htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;

htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;

htmlSaveOps.setSplitIntoPages(false);

String dirName = UUID.randomUUID().toString();

File file = new File("" + dirName);

file.mkdirs();

for (int p = 1; p <= pdf.getPages().size(); p++) {

Document pageDoc = new Document();

pageDoc.getPages().add(pdf.getPages().get_Item§);

final ByteArrayOutputStream stream = new ByteArrayOutputStream();

htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {

@Override

public void invoke(com.aspose.pdf.HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo) {

try {

byte[] resultHtmlAsBytes = new byte[(int) htmlSavingInfo.ContentStream.available()];

htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);

stream.write(resultHtmlAsBytes);

stream.close();

} catch (FileNotFoundException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

};

String outHtmlFile = “SomeUnexistingFile.html”;

pageDoc.save(outHtmlFile, htmlSaveOps);

IOUtils.write(stream.toByteArray(), new FileOutputStream("" + dirName + “/” + p + “.html”));

}

static class ReadPdfFontSubRule extends CustomFontSubstitutionBase {

private String originFontName;

private Font replaceFont;

public ReadPdfFontSubRule(String originFontName, Font replaceFont) {

this.originFontName = originFontName;

this.replaceFont = replaceFont;

}

@Override

public boolean trySubstitute(CustomFontSubstitutionBase.OriginalFontSpecification originalFontSpecification,

Font[] substitutionFonts) {

String fontName = originalFontSpecification.getOriginalFontName();

String decodedName = new String(

originalFontSpecification.getOriginalFontName().getBytes(Charset.forName(“ISO-8859-1”)),

Charset.forName(“BIG5”));

if (fontName.startsWith(this.originFontName) || decodedName.startsWith(originFontName)) {

substitutionFonts[0] = replaceFont;

System.out.printf("Replace font: " + originFontName + " -> " + replaceFont.getFontName());

return true;

} else {

return false;

}

The PDF file and its result, and the fonts used for substitution in this issue have been uploaded in the attachment.

This problem is pointed by our QA team that in the HTML result, some parts of texts seems to be a little up-shifted.

Please check this issue.

If there is some other way to improve this, please also tell us, thank you.

Craig

imran.rafique · April 26, 2017, 1:21pm

Hi Craig,

Thank you for contacting support. We managed to replicate the problem of little up-shifted text in the table as shown in your screenshot. It has been logged under ticket ID PDFJAVA-36707 in the bug tracking system for the investigation purposes. We shall let you know once a significant progress has been made in this regard. We are sorry for the inconvenience caused.