Bad PDF Forms conversion quality

Greetings!

I have issue in conversion of PDF files containing forms to HTML. I tried 2 ways of doing it, but didn’t got a convinient result. Could you, please, help me? Am I doing somethinf wrong or there’s some kind of bug in Aspose conversion?

Source file for both examples:
1003 - Unmarried Addendum.pdf (69.1 KB)

First try - direct conversion with Aspose.PDF:
Result file:
1003 - Unmarried Addendum_pdf.7z (434.9 KB)

                using (var document = new Aspose.Pdf.Document("1003 -  Unmarried Addendum.pdf"))
                {

                    var htmlOptions = new Aspose.Pdf.HtmlSaveOptions
                    {
                        SaveFullFont = true,
                        DocumentType = Aspose.Pdf.HtmlDocumentType.Html5,
                        UseZOrder = true,
                        AntialiasingProcessing = Aspose.Pdf.HtmlSaveOptions.AntialiasingProcessingType.TryCorrectResultHtml,
                         CompressSvgGraphicsIfAny = true,
                          
                        FontSavingMode = Aspose.Pdf.HtmlSaveOptions.FontSavingModes.SaveInAllFormats,
                        TrySaveTextUnderliningAndStrikeoutingInCss = true,
                        TryMergeAdjacentSameBackgroundImages = true,
                        LettersPositioningMethod = Aspose.Pdf.HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss,
                        SaveShadowedTextsAsTransparentTexts = false,
                        SaveTransparentTexts = true,
                        PartsEmbeddingMode = Aspose.Pdf.HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml,
                        ConvertMarkedContentToLayers = true,
                        FixedLayout = false,
                        RasterImagesSavingMode = Aspose.Pdf.HtmlSaveOptions.RasterImagesSavingModes.AsPngImagesEmbeddedIntoSvg,
                        HtmlMarkupGenerationMode = Aspose.Pdf.HtmlSaveOptions.HtmlMarkupGenerationModes.WriteAllHtml,
                    };

                    // Save the output HTML
                    document.Save("output.html", htmlOptions);
                }

This code results in total loss of styles and some strage artifact that propaply sould be parts of selected inputs in PDF - but no inputs itsef.

Second try - converting PDF file to DOCX, and then converting it to HTML
Result file:
1003 - Unmarried Addendum.7z (30.7 KB)

using (var document = new Aspose.Pdf.Document("1003 -  Unmarried Addendum.pdf"))
{
    // Instantiate DocSaveOptions object
    Aspose.Pdf.DocSaveOptions savePdfOptions = new Aspose.Pdf.DocSaveOptions
    {
        Format = Aspose.Pdf.DocSaveOptions.DocFormat.DocX,
        Mode = Aspose.Pdf.DocSaveOptions.RecognitionMode.Flow,
        TryMergeAdjacentSameBackgroundImages = true,
        //RelativeHorizontalProximity = 1f,
        RecognizeBullets = true,
        ReSaveFonts = true,
        ConvertType3Fonts = true,
    };
   document.Save(temporaryFileOut, savePdfOptions);
}

Aspose.Words.Document doc = new Aspose.Words.Document(temporaryFileOut);

// Set different properties of HtmlSaveOptions class
Aspose.Words.Saving.HtmlSaveOptions saveOptions = new Aspose.Words.Saving.HtmlSaveOptions();

saveOptions.AllowNegativeIndent = true;
saveOptions.DmlEffectsRenderingMode = DmlEffectsRenderingMode.Fine;
saveOptions.DmlRenderingMode = DmlRenderingMode.Fallback;
saveOptions.ImlRenderingMode = ImlRenderingMode.Fallback;
saveOptions.ExportPageSetup = true;
saveOptions.CssStyleSheetType = CssStyleSheetType.Inline;
saveOptions.ExportPageMargins = true;
saveOptions.ImageResolution = 90;
saveOptions.ExportImagesAsBase64 = true;
saveOptions.ExportFontsAsBase64 = true;
saveOptions.ExportDocumentProperties = true;
saveOptions.ExportHeadersFootersMode = ExportHeadersFootersMode.PerSection;
saveOptions.HtmlVersion = Aspose.Words.Saving.HtmlVersion.Html5;
saveOptions.AllowEmbeddingPostScriptFonts = true;
saveOptions.Encoding = Encoding.UTF8;
saveOptions.ExportFontResources = true;
saveOptions.ExportListLabels = ExportListLabels.ByHtmlTags;
saveOptions.ExportShapesAsSvg = true;
saveOptions.PrettyFormat = true;


// Save output HTML
doc.Save("result.html", saveOptions);

Ok. Everything is not so bad here - formatting and styling is ok, but all form elements and moved from positions where they shoud be.

Could you please advise, if there’s anything I’m missing in my code?
Thank you!

@EvgeniyMikhailov

Hello!

It seems you’re experiencing issues with the conversion quality of PDF forms to HTML using Aspose.PDF. Here are a few suggestions to improve the results:

  1. Direct PDF to HTML Conversion: The loss of styles and form elements might be due to the complexity of the PDF forms. Ensure that the PDF is not overly complex or contains elements that are difficult to render in HTML. You might want to try simplifying the PDF or adjusting the HtmlSaveOptions parameters, such as EmbedFonts or SplitIntoPages, to see if that helps.

  2. PDF to DOCX to HTML Conversion: While converting to DOCX can be a viable strategy, the positioning of form elements may not perfectly match the original PDF. You can try adjusting the DocSaveOptions parameters, such as RelativeHorizontalProximity, to improve the positioning of the form elements when converting to HTML.

  3. Check for Updates: Ensure you are using the latest version of Aspose.PDF, as updates often include bug fixes and improvements in conversion quality.

  4. Alternative Rendering: If the form elements are critical, consider rendering the PDF to images instead of HTML. This approach maintains the visual fidelity of the forms, although it won’t be editable.

If you continue to face issues, please share specific details about the PDF structure or any error messages you encounter.

Feel free to reach out if you need further assistance!