Aspose.Words line break issues converting PDF->HTML

Hi folks,

This is linked to our previous issue detailed on Line breaks and spaces being ignored in PDF->HTML

We’re using Aspose.Words 24.7.0 to convert DOC and PDF files to HTML, and seeing inconsistent treatment of linebreaks, and our client who has paid for the license wants clarification how they might resolve the issue by modifying their own document templates, if there’s no possible solution from the Aspose side.

The first test PDF I have to demonstrate the issue looks like this:

But the HTML output has lost all the linebreaks (and has added spaces between the letters and numbers which is less of an issue at this point):

The second test PDF I have just has just an extra word added, and looks like this:

Which doesn’t suffer the same linebreak issue:

I’ve attached the PDFs here:
Test1.pdf (20.9 KB)
Test2.pdf (21.3 KB)

And as before, the code we’re using to convert is quite simple:

Aspose.Words.Saving.HtmlSaveOptions saveOptions = new Aspose.Words.Saving.HtmlSaveOptions(Aspose.Words.SaveFormat.Html);
saveOptions.CssStyleSheetType = Aspose.Words.Saving.CssStyleSheetType.Embedded;
saveOptions.ExportImagesAsBase64 = true;
saveOptions.ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.None;
var doc = new Aspose.Words.Document(url); // url is the URL of the doc in the CMS
var html = doc.ToString(saveOptions);

We understand that conversion from PDF to HTML may not be exact, but this particular linebreak issue feels like something that shouldn’t be happening - perhaps there’s a simple explanation?

Any guidance you can give would be greatly appreciated.

Steve.

@SteveZesty
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-27267

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

PS: You can preserve original PDF document layout by converting it directly to HtmlFixed:

Aspose.Words.LowCode.Converter.Convert(@"C:\Temp\in.pdf", @"C:\Temp\out.html", SaveFormat.HtmlFixed);
1 Like

Hi Alexey,

Is there any update on that ticket?

Thanks,
Steve.

@SteveZesty Unfortunately, there are no updates regarding the issue yet. The issue is currently in the queue for analysis. Please accept our apologies for your inconvenience.