Convert PDF to HTML with images embedded where they are located in the PDF

We are attempting to convert PDFs to HTML using .Net and then modify the HTML for additional formatting. Part of that formatting included deleting certain images, along with headers and footers. We are running into an issue with images, though, since you render the images from a page as one image. In some cases, we have one image on the first page that needs to be deleted and another that needs to be retained. When we use the option to embed images in an SVG, we can find the image we want to retain, but we can’t position it properly because part of our formatting of the text removes white space and unneeded divs.

With this in mind, is there a way to render images inline so that we can remove the images we want from the rendered HTML while retaining the ones we want to keep? Using absolute positioning and overlaying the text over the SVG for an entire page is proving problematic for us.

Thank you,
Brian

@marysaf

Can you please share what code snippet are you trying to convert PDF into HTML? Also, please share a sample PDF document with us so that we can test the scenario in our environment and address it accordingly.

Hi Asad, here is my snippet and sample document. We’re trying to delete the “FBI” banner and keep the other images. We cannot use absolute positioning because we also have to remove headers and footers and modify the formatting of the text, so the images need to be inline.

            var document = new Aspose.Pdf.Document(fileName);
            document.Save(fileName.Replace(".pdf", ".html"), new Aspose.Pdf.HtmlSaveOptions()
            {
                PartsEmbeddingMode = Aspose.Pdf.HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml,
                SplitIntoPages = false,
                LettersPositioningMethod = Aspose.Pdf.HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss,
                RasterImagesSavingMode = Aspose.Pdf.HtmlSaveOptions.RasterImagesSavingModes.AsExternalPngFilesReferencedViaSvg,
                CssClassNamesPrefix = "bk"
            });

IA-10152025.pdf (334.1 KB)

@marysaf

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-58441

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.