Aspose PDF to HTML issue

Hi,

I was creating the PDF document into HTML using Aspose.Pdf component using the below code. I have also attached the input file sample_pdf_1.pdf and generated output html file pdfToHtml.html here.

The issue here in this conversion is, the generated HTML file is not a real HTML where users can’t input their data. Example, when you open the generated pdfToHtml.html html file, you will see there is NO way for users to select the date, month, and year dropdown fields where as PDF form fields have the data. Also, the checkboxes and radio buttions can’t be selectable.

After conversion of the PDF to HTML, I would like to see the form fields into HTML tags like input (type=text/checkbox/radio), select, img tag. How do I convert the PDF form with proper HTML tags? Is it possible with Aspose.PDF? If not, are there any workarround solution for this scenario?

sample_pdf.pdf (531.7 KB)

pdfToHtml_files.zip (144.4 KB)

string pdfPath = @"C:\temp\Files\sample_pdf_1.pdf";

using (var document = new Aspose.Pdf.Document(pdfPath))
{
    // Save the output HTML
    document.Save(@"C:\temp\pdfToHtml.html", Aspose.Pdf.SaveFormat.Html);
}

@srinudhulipalla

To convert a PDF document to HTML while preserving form fields as interactive HTML elements (like <input>, <select>, etc.), you can utilize the HtmlSaveOptions class in Aspose.PDF. This allows you to specify how the PDF should be converted, including the handling of form fields.

Here’s a basic example of how to achieve this:

using Aspose.Pdf;

string pdfPath = @"C:\temp\Files\sample_pdf_1.pdf";
string outputHtmlPath = @"C:\temp\pdfToHtml.html";

// Load the PDF document
using (Document pdfDocument = new Document(pdfPath))
{
    // Create HTML save options
    HtmlSaveOptions htmlOptions = new HtmlSaveOptions
    {
        // Enable embedding of all resources
        PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml,
        // Ensure form fields are converted to HTML input elements
        ConvertMarkedContentToLayers = true
    };

    // Save the document as HTML
    pdfDocument.Save(outputHtmlPath, htmlOptions);
}

Key Options Explained:

  • PartsEmbeddingMode: This option allows you to embed all resources (like images and fonts) directly into the HTML, which can help maintain the appearance of the original PDF.
  • ConvertMarkedContentToLayers: This option ensures that any marked content in the PDF is converted into layers in the HTML output, which can help with the interactive elements.

Additional Considerations:

If the generated HTML still does not meet your requirements for interactivity, you may need to manually adjust the HTML after conversion or explore additional settings in HtmlSaveOptions to fine-tune the output.

For more complex scenarios, such as ensuring that dropdowns and checkboxes are properly rendered, you might need to look into specific configurations or even post-processing the HTML to add the necessary interactivity.

If you encounter any specific issues or need further customization, feel free to ask for more detailed guidance!

Hi,

Tried the given example, but still no luck. The HTML generated as old and not much changes found except CSS is part of the single HTML file. My main requirement is, any PDF which has the form fields should be able to convert into HTML which should use the HTML tags <input> , <select> , etc

I am attaching again what file is generated with the given code. Can I get the correct parameters which really use the HTML tags? I tried few parameters to make it, but no luck.

pdfToHtml.zip (83.4 KB)

@srinudhulipalla

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-59722

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.