Found the duplicated same texts in the background image after converting to HTML

Hi there


I am using Aspose PDF 11.7.0 for converting PDF files to HTML files.
There is a problem that a segment of text appear twice.
One is in the result html, and the other is in the background image.

Here is my code to test:
Document pdf = new Document(“custom/input/pdf/研究者のみなさまへ.pdf”);

HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;
htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlSaveOps.setSplitIntoPages(false);

for(int p = 1; p<=pdf.getPages().size();p++){
Document pageDoc = new Document();
pageDoc.getPages().add(pdf.getPages().get_Item§);
pageDoc.save(“custom/output/pdf/研究者のみなさまへ.”+p+".html", htmlSaveOps);
}


Please check this and file that cause this problem, thank you :slight_smile:
P.S. This happens in page 14.



Hi Craig,

Thanks for your inquiry. I have tested your scenario with shared document using Aspose.Pdf for Java 11.7.0 and managed to observe the reported issue. For further investigation, I have logged an issue in our issue tracking system as PDFJAVA-36040 and also linked your request to it. We will keep you updated via this thread regarding the issue status.

We are sorry for the inconvenience caused.

Hi Tilal.Ahmad


Is there any progress?

Hi Craig,


Thanks for your inquiry. I am afraid the reported issue is still not resolved, as currently our product team is busy in resolving other issues in the queue. We will notify you as soon as we made some significant progress towards issue resolution.

We are sorry for the inconvenience.

Best Regards,