PDF to HTML wrong conversion for some text - each character of text mapping in separate span instead of single span

Its really weird thing we observed while converting Pdf document to Html, as simple plain text getting divided in different different span by each character.

Please look into this and reply ASAP,

Thanks

image.png (6.5 KB)

The plain text is: Printed Name

But it is divided in multiple spans instead of single span.

@saleemshaikh

Can you please share your sample PDF document along with sample code snippet so that we can test the scenario in our environment and address it accordingly.

2022-02-25T09-47-36_AM_Hoaic_App_HO3-6.pdf (95.3 KB)

Another example here: “P o lic y h o ld e r’s S ig n a tu re”
split into multiple spans

image.png (10.5 KB)

@saleemshaikh

Can you please specify if you are using Aspose.PDF for .NET OR Java? Please share a sample code snippet with us as well.

We are using Aspose.pdf for .net

And already shared code snippet

@saleemshaikh

We were able to reproduce the issue of multiple spans around every character while testing with 22.2 version of the API. However, the text in output HTML looked fine while rendering in the browser. We used below code snippet:

var original_book = new Aspose.Pdf.Document(dataDir + @"2022-02-25T09-47-36_AM_Hoaic_App_HO3-6.pdf");
HtmlSaveOptions htmlOptions = new HtmlSaveOptions();

htmlOptions.FixedLayout = true;
htmlOptions.PartsEmbeddingMode = Aspose.Pdf.HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlOptions.RasterImagesSavingMode = Aspose.Pdf.HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlOptions.RemoveEmptyAreasOnTopAndBottom = true;
htmlOptions.SplitIntoPages = false;
htmlOptions.SplitCssIntoPages = false;
string cssprefix = "aspose_pdf";
htmlOptions.CssClassNamesPrefix = cssprefix;

original_book.Save(dataDir + "outputHTML.html", htmlOptions);

We have logged multiple spans issue as PDFNET-51425 in our issue tracking system. We will further look into its details and keep you posted with the status of its rectification. Please be patient and spare us some time.

We are sorry for the inconvenience.

If we take paid support, could you please confirm when you will be able to fix this issue. Thanks

@saleemshaikh

We are investigating the ticket and will return with an answer to your question soon. Please give us little time.