PDF to HTML generic error occurred in GDI+

bhgiq · August 2, 2018, 6:54pm

I am running into an issue with certain PDF files which we are converting to HTML.
The problem can be replicated with the example project on github.

I tested the SingleHTML and OutPutToStream functions.
If we call the example PDFToHTML, no error occurs.

The requirements I need to adhere to require the output is a single HTML file with images, css and fonts embedded.

I have uploaded an example PDF I use to replicate the issue.exampleinput.pdf (254.1 KB)

Farhan.Raza · August 2, 2018, 9:12pm

@bhgiq

Thank you for contacting support.

We have been able to reproduce the exception mentioned by you and a ticket with ID PDFNET-45197 has been logged in our issue management system for further investigation and resolution. We are afraid that a single HTML file may not be generated before this ticket is investigated. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.

steven.hill · May 29, 2019, 9:41am

Hi @Farhan.Raza, was this issue ever resolved?

Farhan.Raza · May 29, 2019, 5:59pm

@steven.hill

We are afraid PDFNET-45197 has not been resolved yet. We have recorded your concerns and will let you know as soon as any significant update will be available in this regard.

asad.ali · June 23, 2020, 6:48am

@steven.hill, @bhgiq

Would you please specify correct path for input/output like in following code snippet as we tested using Aspose.PDF for .NET 20.6 and issue was not reproduced:

public static string dataDir45197 = "./45197/";

        public static void SingleHTML()
        {
            try
            {
                // Load source PDF file
                Document doc = new Document(dataDir45197 + "input.pdf");
                // Instantiate HTML Save options object
                HtmlSaveOptions newOptions = new HtmlSaveOptions();

                // Enable option to embed all resources inside the HTML
                newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;

                // This is just optimization for IE and can be omitted 
                newOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
                newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
                newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
                // Output file path 
                string outHtmlFile = dataDir45197 + "SingleHTML_out.html";
                doc.Save(outHtmlFile, newOptions);
                // ExEnd:SingleHTML
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
        }

        public static void OutPutToStream()
        {
            try
            {
                Document doc = new Document(dataDir45197 + "input.pdf");

                // Tune conversion params
                HtmlSaveOptions newOptions = new HtmlSaveOptions();
                newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
                newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
                newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
                newOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
                newOptions.SplitIntoPages = false;// Force write HTMLs of all pages into one output document

                newOptions.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy(SavingToStream);
                // We can use some non-existing puth as result file name - all real saving will be done
                // In our custom method SavingToStream() (it's follows this one)                
                doc.Save("fakePath", newOptions);

            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
        }

        private static void SavingToStream(HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo)
        {
            byte[] resultHtmlAsBytes = new byte[htmlSavingInfo.ContentStream.Length];
            htmlSavingInfo.ContentStream.Read(resultHtmlAsBytes, 0, resultHtmlAsBytes.Length);
            // Here You can use any writable stream, file stream is taken just as example
            string fileName = dataDir45197 + "stream_out.html";
            Stream outStream = File.OpenWrite(fileName);
            outStream.Write(resultHtmlAsBytes, 0, resultHtmlAsBytes.Length);
        }