Redundant <span> tags

I’ve seen other posts on this subject and have tried calling the Document.joinRunsWithSameFormatting() method prior to saving the document, but I still see HTML output like the following:

<li class="Achievement" style="margin-left:0pt; margin-right:0pt">
    <span>Coordinated a Pilot Program to introduce BuildNet Enabling to builders in the </span>
    <span>North Carolina</span>
    <span> and </span>
    <span>Michigan</span>
    <span> markets.</span>
</li>

We’re evaluating Aspose.Words for use in an application where searching the text of the resulting HTML document is required, but with the additional tags some searches won’t work.
Is there a fix or workaround for this behavior.
Thanks.

Hello!
Thank you for your interest in Aspose.Words.
I’d like to explain how joinRunsWithSameFormatting works and why it doesn’t help in some cases. This document optimization method finds and joins adjacent Run objects with equal attribute set except for editing session identifiers. If you intensively edit some fragment of text, even equally formatted from beginning to end, Microsoft Word creates many runs. You can see this in DocumentExplorer demo application or if you export such a document without calling joinRunsWithSameFormatting. These runs differ in only editing session identifiers. But there are some other cases:
- Runs can differ in other attributes that doesn’t affect HTML output.
- Runs can be separated by some other nodes that are not directly output to HTML.
In your case “North Carolina” and “Michigan” seem to be results of field evaluation. In the model they are separated by field start/separator/end nodes.
So joinRunsWithSameFormatting eliminates all redundancy in the document model while it might retain something in exported HTML. Really the method has nothing to do with export modules, it operates on the model. If you show me the source document I will tell you the reason and probably suggest a workaround. This behavior is a known issue and I’ll subscribe your thread to it. You’ll be notified when it is fixed.
Regards,

Thanks for the info. I’ve attached the source document, but I’m not sure if that will help. Our application has no control over the content/formatting/styles of the documents we’re converting to HTML. The documents are uploaded by our end-users.
Can you give me an idea of the timeframe when you think this issue would be fixed (ex. days, weeks, months) or is this issue scheduled to be fixed in a specific release?
Thanks for the help and the quick responses.

Hi

Thanks for your inquiry. Unfortunately, it is difficult to provide you any reliable estimate regarding this issue at the moment. The issue should be first fixed in .NET version of Aspose.Words and then fix will be ported to java. The porting process usually takes around a month. We will notify you as soon as the issue is resolved.
Best regards.

The issues you have found earlier (filed as 4773) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.