Hi folks,
This is linked to our previous issue detailed on Line breaks and spaces being ignored in PDF->HTML
We’re using Aspose.Words 24.7.0 to convert DOC and PDF files to HTML, and seeing inconsistent treatment of linebreaks, and our client who has paid for the license wants clarification how they might resolve the issue by modifying their own document templates, if there’s no possible solution from the Aspose side.
The first test PDF I have to demonstrate the issue looks like this:
But the HTML output has lost all the linebreaks (and has added spaces between the letters and numbers which is less of an issue at this point):
The second test PDF I have just has just an extra word added, and looks like this:
Which doesn’t suffer the same linebreak issue:
I’ve attached the PDFs here:
Test1.pdf (20.9 KB)
Test2.pdf (21.3 KB)
And as before, the code we’re using to convert is quite simple:
Aspose.Words.Saving.HtmlSaveOptions saveOptions = new Aspose.Words.Saving.HtmlSaveOptions(Aspose.Words.SaveFormat.Html);
saveOptions.CssStyleSheetType = Aspose.Words.Saving.CssStyleSheetType.Embedded;
saveOptions.ExportImagesAsBase64 = true;
saveOptions.ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.None;
var doc = new Aspose.Words.Document(url); // url is the URL of the doc in the CMS
var html = doc.ToString(saveOptions);
We understand that conversion from PDF to HTML may not be exact, but this particular linebreak issue feels like something that shouldn’t be happening - perhaps there’s a simple explanation?
Any guidance you can give would be greatly appreciated.
Steve.