Conversion from RTF to HTML: unordered lists (bullet points) broken

Hi,

I ran into an issue when I tried to convert RTF documents to HTML documents: Aspose.Words for .NET cannot convert bullet points properly. This problem is reproducable in version 18.10.0.

For demonstration I built a small WPF project to test the conversion from RTF to HTML and vice versa. This is how it looks initially and that’s what I was expecting:

Expected.png (28.4 KB)

Aspose.Words converts HTML to RTF flawlessly as seen in this screenshot. The bullet points are correct in RTF:

ToRTF.png (30.7 KB)

The other direction fails, however. While the numbered list was converted fine, the bullet points became broken. They lost the leading bullet point, uses an other character and there are random spaces inserted for padding.

ToHTML.png (36.1 KB)

I have found multiple topics about this issue (e.g. Bullet point disappear after html conversion) but the most recent Aspose.Words for .NET is not working still.

Can somebody confirm this problem? I can upload my minimal demo project if needed.

Thanks in advance.

@SemaphoreProxy,

Please ZIP and attach your Word and HTML documents here for testing. We will then investigate the issue on our end and provide you more information.

Hi @awais.hafeez,

I’ve created a ZIP file with the following files: the source RTF, the expected HTML and the actual broken HTML result.

RtfToHtmlBulletPointBug.zip (1.4 KB)

@SemaphoreProxy,

For the sake of any correction, we have logged this problem in our issue tracking system. The ID of this issue is WORDSNET-17576. We will further look into the details of this problem and will keep you updated on the status of this issue. We apologize for your inconvenience.

@SemaphoreProxy,

Regarding WORDSNET-17576, it is to update you that we are actually unable to reproduce the same exact issue on our end. The HTML document that Aspose.Words 18.11 generates on our end from your “RtfToBeConverted.rtf” document differs significantly from the “HtmlResult.html” you provided (see 18.11.zip (666 Bytes)). “HtmlResult.html” contains parts that were supposedly generated by Aspose.Words but the resulting document was obviously post-processed by you. We must make sure we are looking into issues caused by Aspose.Words itself, not by your post-processing code. We used the following code on our end for testing:

Document doc = new Document("D:\\RtfToHtmlBulletPointBug\\RtfToBeConverted.rtf");
doc.Save("D:\\RtfToHtmlBulletPointBug\\18.11.html");

So, please also provide the actual HTML file generated by Aspose.Words showing the undesired behavior, not post-processed one. Thanks for your cooperation.

@awais.hafeez,

there was NO post-processing done from my side. It is Aspose.Words which fails to convert RTF to HTML properly.

Here is the .NET project I used to reproduced your bug: AsposeRtfToHtmlConvertBug.zip (10.2 KB)

I have no licence for the latest 18.11 version so here is the broken result plus the generated watermark: RtfToHtmlResult.zip (31.9 KB)

If you check the HTML you will notice that

<p style="FONT-SIZE: 9pt; MARGIN-BOTTOM: 0pt; MARGIN-TOP: 0pt; MARGIN-LEFT: 72pt; ORPHANS: 0; WIDOWS: 0; TEXT-INDENT: -18pt"><span style="FONT-FAMILY: 'Segoe UI'">·</span><span style="FONT: 7pt 'Times New Roman'; -aw-import: spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="FONT-FAMILY: 'Segoe UI'">Bullet 1.1</span></p>

is NOT how you do unordered lists in HTML preferably. Aspose.Words should generate <ul> and <li> tags for that.

I depend on you to fix this issue because we use Aspose.Words to convert between RTF and HTML all the time. If Aspose.Words creates <span> tags with 11 &nbsp; space chars, then cross converting bullet lists becomes impossible.

Greetings
SemaphoreProxy

Edit: I just opened your 18.11.html file and noticed, that it is even more broken then my result. While the bullet points are there, the size of the points differ randomly. The numbered list is an unordered list now, too. And for some reason a blank box character is used as a symbol. AsposeOfficialResponse.png (9.2 KB)

Did nobody double check this Aspose.Words output before posting it in the forum?

Edit 2: I made graphic which is an overview of the entire HTML conversion issue: AsposeBugOverview.png (64.7 KB)

@SemaphoreProxy,

Please also check if specifying the HtmlSaveOptions.ExportListLabels option is acceptable for you?

Document doc = new Document("D:\\RtfToHtmlBulletPointBug\\RtfToBeConverted.rtf");

HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.Html);
opts.PrettyFormat = true;
opts.ExportListLabels = ExportListLabels.ByHtmlTags;

doc.Save("D:\\RtfToHtmlBulletPointBug\\18.11.html", opts);

Thanks for your cooperation.

1 Like

@awais.hafeez,

setting the property ExportListLabels = ExportListLabels.ByHtmlTags fixed the issue, the conversion works perfectly fine now!

While I do not understand why ExportListLabels.Auto defaults to inline text, I am still thankful that I can disable this behavior now.

Thanks for your assistance!

@SemaphoreProxy,

It is great that specifying the ExportListLabels.ByHtmlTags option resolves the issue on your end. Also, please do not rely on ExportListLabels.Auto behavior as in this case, it exports list labels in mixed mode i.e. inline text and native HTML elements.

ExportListLabels.Auto: It outputs list labels in auto mode. Uses HTML native elements when possible.

A post was split to a new topic: Lines getting overlapped during rtf to html conversion