Hi There
We are working on saving PDF file into HTML format with Aspose PDF 17.2.
Here is our methods called for test:
public static void convertPDF(String filePath, String password) throws Exception {
License l = new License();
l.setLicense(“aspose.lic”);
FontRepository.getSubstitutions()
.add(new ReadPdfFontSubRule(“DFKaiShu”, FontRepository.findFont(“cwTeX Q Kai Medium”)));
FontRepository.getSubstitutions()
.add(new ReadPdfFontSubRule(“標楷體”, FontRepository.findFont(“cwTeX Q Kai Medium”)));
FontRepository.getSubstitutions()
.add(new ReadPdfFontSubRule(“PMingLiU”, FontRepository.findFont(“cwTeX Q Ming Medium”)));
Document pdf = null;
if (StringUtil.isNotEmpty(password)) {
pdf = new Document(filePath, password);
} else {
pdf = new Document(filePath);
}
HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;
htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlSaveOps.setSplitIntoPages(false);
String dirName = UUID.randomUUID().toString();
File file = new File("" + dirName);
file.mkdirs();
for (int p = 1; p <= pdf.getPages().size(); p++) {
Document pageDoc = new Document();
pageDoc.getPages().add(pdf.getPages().get_Item§);
final ByteArrayOutputStream stream = new ByteArrayOutputStream();
htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {
@Override
public void invoke(com.aspose.pdf.HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo) {
try {
byte[] resultHtmlAsBytes = new byte[(int) htmlSavingInfo.ContentStream.available()];
htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);
stream.write(resultHtmlAsBytes);
stream.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
};
String outHtmlFile = “SomeUnexistingFile.html”;
pageDoc.save(outHtmlFile, htmlSaveOps);
IOUtils.write(stream.toByteArray(), new FileOutputStream("" + dirName + “/” + p + “.html”));
}
}
static class ReadPdfFontSubRule extends CustomFontSubstitutionBase {
private String originFontName;
private Font replaceFont;
public ReadPdfFontSubRule(String originFontName, Font replaceFont) {
this.originFontName = originFontName;
this.replaceFont = replaceFont;
}
@Override
public boolean trySubstitute(CustomFontSubstitutionBase.OriginalFontSpecification originalFontSpecification,
Font[] substitutionFonts) {
String fontName = originalFontSpecification.getOriginalFontName();
String decodedName = new String(
originalFontSpecification.getOriginalFontName().getBytes(Charset.forName(“ISO-8859-1”)),
Charset.forName(“BIG5”));
if (fontName.startsWith(this.originFontName) || decodedName.startsWith(originFontName)) {
substitutionFonts[0] = replaceFont;
System.out.printf("Replace font: " + originFontName + " -> " + replaceFont.getFontName());
return true;
} else {
return false;
}
}
}
The PDF file and its result, and the fonts used for substitution in this issue have been uploaded in the attachment.
This problem is pointed by our QA team that in the HTML result, some parts of texts seems to be a little up-shifted.
Please check this issue.
If there is some other way to improve this, please also tell us, thank you.
Craig