20.4 PDF -> HTML: Startxref not found

liam.goss · May 5, 2020, 2:56pm

Hi,

I’m currently testing the .NET API with a large(ish) PDF document with roughly 6 images per page over 4 pages. The PDF should be converted to HTML with embedded fonts and images.

Unfortunately, I’m receiving the following exception “Startxref not found.” Is this due to a collection limitation?

Large doc.pdf (4.2 MB)

Thanks

asad.ali · May 5, 2020, 11:03pm

@liam.goss

It is not an issue of trial version limitation. We have tested the scenario in our environment using following code snippet with Aspose.PDF for .NET 20.5 and were unable to notice any issue.

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(dataDir + "Large doc.pdf");
HtmlSaveOptions options = new HtmlSaveOptions();
options.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
options.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
options.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
options.BatchSize = 640;
pdfDocument.Save(dataDir + "output20.5.html", options);

Would you please make sure to use latest version and in case you still face any issue, please share the code snippet that you are using. We will test the scenario in our environment and address it accordingly.

liam.goss · May 6, 2020, 1:03pm

Thank you, it turns out the stream was corrupting during writing to disk (Pre aspose API calls).

I’ve now fixed the issue

asad.ali · May 6, 2020, 8:30pm

@liam.goss

It is good to know that your issue has been resolved. Please keep using our API and in case of further assistance, feel free to create a new topic.