Free Support Forum - aspose.com

Characters missing after font replacing and conversion to HTML

Hi Aspose team


Due to the problem of broken fonts in the HTML conversion result, we decided to replace the fonts which cause this as possible as we can.
However we got a PDF file with missing characters in its HTML conversion result with Aspose PDF 11.7.0.

Here is our method to replace font:
protected Document checkAndReplaceFont(Document doc, String fontName,
Font replaceFont) {

TextFragmentAbsorber absorber = new TextFragmentAbsorber(
new TextEditOptions(
TextEditOptions.FontReplace.RemoveUnusedFonts));
doc.getPages().accept(absorber);
TextFragmentCollection textFragmentCollection = absorber
.getTextFragments();

for (@SuppressWarnings(“unchecked”)
Iterator iterator = textFragmentCollection.iterator(); iterator
.hasNext():wink: {
TextFragment textFragment = iterator.next();
String encoodedFontName = textFragment.getTextState().getFont()
.getFontName();
String decodedFontName = new String(
encoodedFontName.getBytes(Charset.forName(“ISO-8859-1”)),
Charset.forName(“BIG5”));

// System.out.println(decodedFontName);

if (decodedFontName.startsWith(fontName)) {
// System.out.println(“replace “+decodedFontName);
textFragment.getTextState().setFont(replaceFont);
}
}
return doc;

}

And here is our code for testing conversion
Font defaultFont = FontRepository.findFont(“HanWangMingLight”, true);
String fontToReplace = “PMingLiU”;
Document pdf = new Document(“custom/input/pdf/2013032201.pdf”);
this.checkAndReplaceFont(pdf, fontToReplace, defaultFont);

for(int p = 1; p<=pdf.getPages().size();p++){
Document pageDoc = new Document();
pageDoc.getPages().add(pdf.getPages().get_Item§);
HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;
htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlSaveOps.setSplitIntoPages(false);
final ByteArrayOutputStream stream = new ByteArrayOutputStream();
htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {
@Override
public void invoke(
com.aspose.pdf.HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo) {
byte[] resultHtmlAsBytes = new byte[(int) htmlSavingInfo.ContentStream
.getLength()];
htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0,
resultHtmlAsBytes.length);
try {
stream.write(resultHtmlAsBytes);
stream.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
};

String outHtmlFile = “SomeUnexistingFile.html”;
pageDoc.save(outHtmlFile, htmlSaveOps);
IOUtils.write(stream.toByteArray(), new FileOutputStream(“custom/output/pdf/2013032201.”+p+”.html”));
}

Is there any option we can use to fix this problem?
I’ ve uploaded compare image, the result, the font we used to replace, and the origin PDF file.
Please check this problem.Thank you for the help. :slight_smile:

Best,
Craig


Hi Craig,


Thanks for your inquriy. We will appreciate it if you please check whether your substituted font(HanWangMingLight) contains all the characters of original font(PMingLiU), as I have tested the scenario with “Arial Unicode MS” and unable to notice reported issue.

Document pdf = new Document(“D:/Downloads/2013032201.pdf”);<o:p></o:p>

// Substitute the fonts

CustomSubst subst = new CustomSubst();

FontRepository.getSubstitutions().add(subst);

pdf.FontSubstitution.add(new Document.FontSubstitutionHandler() {

public void invoke(Font font, Font newFont) {

System.out.println("Warning: Font " + font.getFontName() + " was substituted with another font -> " +

newFont.getFontName());

}

});

HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();

htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;

htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;

htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;

htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;

htmlSaveOps.setSplitIntoPages(false);

//final ByteArrayOutputStream stream = new ByteArrayOutputStream();

htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {

@Override

public void invoke(

com.aspose.pdf.HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo) {

byte[] resultHtmlAsBytes = new byte[(int) htmlSavingInfo.ContentStream

.getLength()];

htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0,

resultHtmlAsBytes.length);

final FileOutputStream stream;

try {

stream=new FileOutputStream("E:/data/2013032201.html");

stream.write(resultHtmlAsBytes);

stream.close();

} catch (FileNotFoundException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

}

};

String outHtmlFile = "SomeUnexistingFile.html";

pdf.save(outHtmlFile, htmlSaveOps);

--------

class CustomSubst extends CustomFontSubstitutionBase {

public boolean trySubstitute(CustomFontSubstitutionBase.OriginalFontSpecification originalFontSpecification, /*out*/ com.aspose.pdf.Font[] substitutionFont) {

if (originalFontSpecification.getOriginalFontName().contains("PMingLiU")) {

substitutionFont[0] = FontRepository.findFont("Arial Unicode MS");

return true;

} else {

return false;

}

}


We are sorry for the inconvenience.

Best Regards,

@tilal.ahmad So when can I do a simple substitution.
I am converting PDF to HTML and I want to do quick substitution.

Here are the steps:

  • I do pdf.getFontUtilities().getAllFonts(); to get all fonts from a document.
  • Check the directory if those fonts exists or not
  • If they do not exist i add the command FontRepository.getSubstitutions().add(new SimpleFontSubstitution(fontName, "Times New Roman"));
  • Then I do pdf.save(file, options).
  • Yet it throws font not found exception.
  • do I have to do customFontSubstitutions and setFont all the time ?

Also I cannot see com.aspose.pdf.Document.FontSubstitutionHandler in the code base . Is it depricated ?

@shamikjv

Thanks for contacting support.

Would you please share your sample HTML file along with complete sample code snippet. We will test the scenario in our environment and address it accordingly.

Would you please try to use latest version i.e. Aspose.PDF for Java 18.8 and in case you still face missing reference issue, please let us know.