Convert PDF to HTML using Aspose.PDF for .NET - RemoveEmptyAreasOnTopAndBottom doesn't work

Converting PDF document to HTML with the RemoveEmptyAreasOnTopAndBottom = true or RemoveEmptyAreasOnTopAndBottom = false in C# code shows no difference in the converted HTML.
Tried with Aspose.PDF versions 19.1.0, 20.2.0 and 20.4.0.
If executing code without license file (Evaluation mode), RemoveEmptyAreasOnTopAndBottom = true does remove some of the empty spaces.
Please see attached example.
Aspose_RemoveEmptyAreasOnTopAndBottom_issue.png (71.6 KB)

@natalyav

Would you kindly share your sample source and output files for our reference along with code snippet that you have used. We will test the scenario in our environment and address it accordingly.

Here is the code I am using for conversion from pdf to html (I tried it with and without assigning the license):
try
{

            Aspose.Pdf.License license = new Aspose.Pdf.License();
            // Set license
            license.SetLicense("AsposePdf.lic");
            Console.WriteLine("License set successfully.");

            // Load source PDF file
            Document doc = new Document("D:\\Aspose_input.pdf");

                // Instantiate HTML Save options object
                HtmlSaveOptions newOptions = new HtmlSaveOptions();
                newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
                newOptions.RemoveEmptyAreasOnTopAndBottom = true;
                newOptions.SplitIntoPages = false;
                newOptions.HtmlMarkupGenerationMode = HtmlSaveOptions.HtmlMarkupGenerationModes.WriteAllHtml;
                newOptions.FixedLayout = true;

                // This is just optimization for IE and can be omitted 
                newOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
                newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
                newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;

                // Output file path 
                string outHtmlFile = "D:\\Aspose_licensed_output_with_RemoveEmptyAreasOnTopAndBottom_5.6.20.html";
                
                doc.Save(outHtmlFile, newOptions);

                Console.WriteLine("PDF converted to HTML successfully.");
                Console.Read();
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
                Console.Read();

            }

The output varies in terms of the amount of white space when running with and without the license.
I am going to attached the input and output files, as well as a comparison screenshot of the outputs.

Aspose_input.pdf (252.7 KB)
pdf_to_html_with_and_without_license.png (88.0 KB)

Unfortunately, the output html files are not allowed to be uploaded here… The screenshot “pdf_to_html_with_and_without_license.png” shows the difference in output in produced htmls.

@natalyav

We were able to observe similar behavior of the API in our environment while using Aspose.PDF for .NET 20.5. Therefore, we have generated an issue as PDFNET-48156 in our issue tracking system to further investigate reasons behind this. We will keep you posted with the status of ticket resolution within this forum thread. Please be patient and spare us some time.

We are sorry for the inconvenience.

Thank you for looking into the issue!
I also would like to clarify: would the white space before and after the “Evaluation…” text in red be gone after the fix?
See screenshot, please.
Thank you.After_the_fix_question.png (318.0 KB)

@natalyav

The appearance of Evaluation text is not a Bug but a limitation of trial version. You need to use valid license of the API in order to get rid of this text. In case you do not have one, you can consider applying for 30-days free temporary license.

Sorry, but that wasn’t the question. The Evaluation text comes with it’s own white space on the top and the bottom of the text. Are those parts - the top and the bottom white spaces will be gone when using licensed version? Do I expect to see no extra gaps anywhere in the converted html after you get it fixed?

@natalyav

Yes, we will investigate the logged ticket from this perspective that whether white space between the pages could be reduced or not and let you know about our findings as soon as the ticket is resolved. Please spare us some time.

Great, thank you!

The issues you have found earlier (filed as PDFNET-48156) have been fixed in Aspose.PDF for .NET 20.7.

Thank you very much for the fix!

@natalyav

Please keep using our API and in case you face any issue, please feel free to let us know.