How to convert PDF to HTML using apose pdf java and set CSS style to inline style

How to convert PDF to HTML using apose pdf java and set CSS style to inline style

@wpk1901

To convert a PDF to HTML using Aspose.PDF for Java and set CSS styles as inline styles, you can utilize the HtmlSaveOptions class. This class allows you to customize the conversion process, including the option to embed CSS styles directly into the HTML output.

Here’s a step-by-step guide along with a code example:

  1. Set Up Your Environment: Ensure you have Aspose.PDF for Java added to your project.

  2. Load the PDF Document: Use the Document class to load your PDF file.

  3. Configure HtmlSaveOptions: Create an instance of HtmlSaveOptions and set the property to embed CSS styles inline.

  4. Convert the PDF to HTML: Call the save method on the Document instance, passing the output file path and the configured HtmlSaveOptions.

Here’s a sample code snippet demonstrating this process:

import com.aspose.pdf.Document;
import com.aspose.pdf.HtmlSaveOptions;

public class PdfToHtmlConverter {
    public static void main(String[] args) {
        // Load the PDF document
        Document pdfDocument = new Document("input.pdf");

        // Create HtmlSaveOptions and set the option to use inline styles
        HtmlSaveOptions saveOptions = new HtmlSaveOptions();
        saveOptions.setEmbedFonts(true); // This ensures fonts are embedded
        saveOptions.setCssStyleSheetType(HtmlSaveOptions.CssStyleSheetType.Inline); // Set CSS to inline

        // Convert PDF to HTML
        pdfDocument.save("output.html", saveOptions);
    }
}

Explanation of Key Options:

  • setEmbedFonts(true): This option ensures that fonts used in the PDF are embedded in the HTML output, which helps maintain the visual integrity of the document.
  • setCssStyleSheetType(HtmlSaveOptions.CssStyleSheetType.Inline): This setting specifies that CSS styles should be included as inline styles within the HTML elements, rather than in a separate stylesheet.

By following these steps, you can effectively convert a PDF to HTML while ensuring that the styles are applied inline, making it suitable for web display without relying on external CSS files.

For more detailed information, you can refer to the official documentation on converting PDF to HTML with Aspose.PDF for Java here.

Sources:
[1]: Convert PDF file to HTML format|Aspose.PDF for Java - Aspose Documentation

May I ask which version supports this method at the lowest level: saveOptions.setCssStyleSheetType(HtmlSaveOptions.CssStyleSheetType.Inline);

@wpk1901

There is no such class in the API and above response was auto-generated by the AI. We apologize for the inconvenience. Can you please provide a sample PDF with expected output HTML so that we can analyze it and proceed further accordingly?

I have tried converting regular PDFs to this format

【
     < !DOCTYPE html><!--[if IE]>  <html class="stl_ie"> <![endif]-->
       < html>
	    < head>
		< meta charset="utf-8" />
		< title>
		< /title>
		< link rel="stylesheet" type="text/css" href="output_files/style.css" />
	< STYLE>
.stl_02{
	margin: 0 auto !important; 
} 
】

but I hope the converted HTML will be in this format:

and this is converted through aspose-word:

【
< html>< head>< meta http-equiv="Content-Type" content="text/html; charset=utf-8" />< meta http-equiv="Content-Style-Type" content="text/css" />< meta name="generator" />< title></ title></ head>< body style="text-align:justify; widows:0; orphans:0; font-family:Calibri; font-size:10.5pt">< div>< p style="margin-top:0pt; margin-bottom:0pt; text-align:center; line-height:28pt">
】

@asad.ali
test.pdf (72.9 KB)

This is a regular PDF converted to the format I mentioned above using ASPose-PDF

@wpk1901

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFJAVA-44451

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.