HTML Generation from pdf file with hyperlinks

Hi Team

I have an pdf with links on TextFragment object. As its added successfully and appear on pdf document. As when i try to generate HTML from that PDF its take ages for conversion. (2 MB of pdf file) ( 10 min for html conversion)

HtmlSaveOptions options = new HtmlSaveOptions
{
FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsTTF,
FontEncodingStrategy = HtmlSaveOptions.FontEncodingRules.DecreaseToUnicodePriorityLevel,
PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedCssOnly,
TryMergeAdjacentSameBackgroundImages = false,
RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsPngImagesEmbeddedIntoSvg,

            };

I try to generate html from same pdf without hyperlinks. Its able to convert html on time.

Please suggest us on HtmlSaveOptions option with conversion work on optimize way.

Note: I raised 4 topic and still struggling to get proper solution from support forum.(Issue No : PDFNET-46972)
Please help us on this.

I just add one sample pdf file :
ParaWrap3.pdf (27.6 KB)
Code for adding link on TextFragment :
Document pdf = new Document(filePath);
foreach (Page page in pdf.Pages)
{
ParagraphAbsorber paragraphAbsorber = new ParagraphAbsorber();
paragraphAbsorber.Visit(page);
int secId = 0;

            foreach (PageMarkup pageMarkup in paragraphAbsorber.PageMarkups)
            {
                foreach (MarkupSection markupSection in pageMarkup.Sections)
                {
                    int textId = 0;
                    Rectangle sectionRect = markupSection.Rectangle;
                    foreach (TextFragment textFragment in markupSection.Fragments)
                    {
                        string marker = "www.test.com"
                        textFragment.Hyperlink = new WebHyperlink(marker);
                        ++textId;
                    }

                    ++secId;
                }
            }
        }
                }

Thank you for your support.

@uk_itprocurement_tcs_com

We apologize for the inconvenience you have been facing while using our API. We have attached your other ticket to this forum thread as well, so that you will be notified once it is resolved.

Now concerning to your current inquiry, we have tested the scenario while adding links to the PDF file you shared using same code snippet and converting it into HTML later. The API took 17 seconds in order to generate HTML file. Furthermore, we have also noticed that the PDF file with links did not have the size of 2MB but 36KB Hyperlinks.pdf (36.0 KB)

Would you kindly share the file that has the size of 2MB after adding links in it and its respective source document. We will again test the scenario in our environment and address it accordingly.

PS: We have tested the scenario with Aspose.PDF for .NET 19.9.

Yes API took time according to file size as i given one sample pdf of some Kb size with link. So its convert on 36 Kb .

If you try one sample pdf with 80 pages and add links . On html generation of that pdf with take lot of time.

Kindly help us on this.

Thank You for your support.

@uk_itprocurement_tcs_com

We performed testing with a file with 7MB of size but could not notice the issue. It is requested if you can share problematic PDF with which you are facing performance issues. We will test the scenario with it and share our feedback with you accordingly. Please also share your environment details i.e. OS Name and Version, Installed Memory, Application Type, etc. with us.

Hi Team

Please use this file for HTML Generation using Aspose PDF Api.

@uk_itprocurement_tcs_com

The API took 35 seconds for the conversion in our environment. Would you please share the time you noticed at your side for this file?

Hi Team

Thank you for your quick feed back . Let us check from our side again.

Hi Team

Can you please try on this pdf .

This pdf have hyperlink on each TextFragment. Its took 10 mins for HTML generation from above code.

Ia4tWeQH51_edit.zip (3.5 MB)

@uk_itprocurement_tcs_com

We were able to notice that API took long time (aprox. 10 mins) while converting shared PDF into HTML. We have logged an issue as PDFNET-47026 in our issue tracking system for the sake of correction. We will further look into details of it and keep you posted with the status of its correction. Please be patient and spare us little time.

We are sorry for the inconvenience.