Embedded font inside the HTML (PDF to HTML)

Hi,
I am using aspose.pdf (python via .Net==latest) for pdf to html conversion. To achieve the same view of HTML, I need to embed fonts during the conversion.

Questions:
– How can I embed the text font of the pdf during pdf to HTML conversion?
– Is there any default option to embed font?
– How can I set the default font?

I have embedded the CSS by using saveOptions.parts_embedding_mode = 1. In the same way, I tried, but it didn’t work. Here I have attached the code for how I am doing.

import aspose.pdf as ap

# Load the license
license = ap.License()
license.set_license("Aspose.PDF.Product.Family.lic")
saveOptions = ap.HtmlSaveOptions()
doc = ap.Document("test.pdf")
saveOptions = ap.HtmlSaveOptions()
# saveOptions.save_full_font=True #(not work)
# saveOptions.default_font_name = "Arial" #(not work)
saveOptions.parts_embedding_mode = 1 # embed css only (work)
doc.save("output.html",saveOptions)

Thanks

@Md_Shaedul_Islam

Have you tried saveOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF; property? Please try to use it and if it does not work, please share your sample PDF with us so that we can test the scenario in our environment and address it accordingly.

Hi Asad,
In the Python version, your given way is similar to font_saving_mode=0.

As described in API.py,
if self._fontSavingMode == self.FontSavingModes.AlwaysSaveAsWOFF:
self.__jClass.setFontSavingMode(0)

And this option is to save the font with a specific format in a directory, not embed it in an HTML file.

I want something to embed the font inside the HTML (with encoding).
This option is for any pdf, not for a specific pdf file.

And I tried what you suggested: saving the font in a directory.

@Md_Shaedul_Islam

Below is the sample code to generate single HTML file with all resources embedded into it i.e. it does not create any additional folder along with output HTML:

[C#]

Document doc = new Document(dataDir + "test.pdf");
foreach (Page page in doc.Pages)
{
 Document newDoc = new Document();
 newDoc.Pages.Add(page);
 HtmlSaveOptions newOptions = new HtmlSaveOptions();
 // this is usage of tested feature
 newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
 // this is just optimozation for IE and can be omitted
 newOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
 newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
 newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
 newOptions.RemoveEmptyAreasOnTopAndBottom = true;
 string outHtmlFile = dataDir + DateTime.Now.Millisecond + @".html";
 newDoc.Save(outHtmlFile, newOptions);
}

Would you kindly share your sample PDF with us in case none of the suggested method is working for you? This would help us in investigating this case accordingly.

Hi Asad,
Thanks for your response. Yes, it worked.

As I mentioned, I am doing it in Python. I am sharing this code that could help to others.

    pdf_bytes = BytesIO(pdf_recover)
    converted_pdf_load = ap.Document(pdf_bytes)
    save_options = ap.HtmlSaveOptions()

    save_options.raster_images_saving_mode = 2
    save_options.parts_embedding_mode = 0 # embed css only
    # Delete all images on all pages
    for i in range(len(converted_pdf_load.pages)):
        while len(converted_pdf_load.pages[i + 1].resources.images) != 0:
            converted_pdf_load.pages[i + 1].resources.images.delete(1)

    # converted_pdf_load.save(html_file)
    converted_pdf_load.save(html_file, save_options)

Is there any suitable way to not produce images (automatically removing all images)?

@Md_Shaedul_Islam

It is nice to know that your issue has been resolved. The shared code will definitely help others having similar requirements. We are afraid that there is no automatic way to remove images. OR may be if you can share a sample source and expected output for our reference, we will use it for better understanding and investigating the requirement further.

Hello Asad,

Thanks for your response. Sorry, for the security purpose, I cannot share the input file. But I can hint: I am working on converting the vaccination certificate from pdf to HTML and removing the QR image.

@Md_Shaedul_Islam

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-54697

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.