PDF to HTML conversion is very slow


#1

I am trying to convert a simple 8 page PDF to HTML with the sample code below. It takes over 10 secs to convert. Am I doing anything wrong here?

    	       Document doc = new Document("C:\\temp\\79134279.pdf");

		HtmlSaveOptions newOptions = new HtmlSaveOptions();
		newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
		newOptions.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
		newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
		newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;

		// Output file path
		String outHtmlFile = "C:\\temp\\79134279.html";
		
		// Save the output file
		doc.save(outHtmlFile, newOptions);

#2

@kalidoss2,

Thanks for contacting support.

The time taken by API to perform specific operation depends upon the structure and complexity of input document, so can you please share the input PDF document, so that we can test the conversion in our environment. We are sorry for this inconvenience.


#3

Hi,

Attaching the PDF I tried. It take a quite a bit of time even after the first time (to rule out any caching overhead).

Thanks,
Kal79134279.pdf (206.7 KB)


#4

@kalidoss2,

Thanks for sharing the sample file.

I have tested the scenario and have managed to reproduce same performance related issue. For the sake of correction, I have logged it as PDFNET-43208 in our issue tracking system. We will further look into the details of this problem and will keep you updated on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.


#5

@kalidoss2

Thanks for your patience.

We have further tested the scenario with latest version Aspose.Pdf for .NET 17.11 while using following code snippet and thoroughly inspected our source code. We have not found any block that could be simply and essentially optimized. Please note that conversion duration naturally depends on number of elements on the page, especially characters. As you can see, this document is composed quite densely, so performance with this sort of document is expected and acceptable.

In case of any further assistance, please feel free to let us know.