Unwanted Bullet Spacing Characters

We are getting undesired output when extracting bulleted lists from Word documents and saving to HTML. I understand that part of the ASPose implementation for bullets is the use of the non-breaking space character which we are seeing. However, there are far too many of these characters and the result is that the spacing between the bullets and the text is way too big and the output is unacceptable.
Our solution begins with a Word document that resides in Microsoft SharePoint. The document contains content controls (StructuredDocumentTag). We open the document and loop through all the content controls and identify them by tag. The content control in question is a rich text control. I import the content from the StructuredDocumentTag into a temp document using the ImportNode method. The document is then saved as HTML.
Here are the HtmlSaveOptions being used:

HtmlSaveOptions BodyOptions = new HtmlSaveOptions(SaveFormat.Html);
BodyOptions.ImageSavingCallback = new HandleImageSaving(this);
BodyOptions.ImagesFolder = _ImageSettings.BodyImagesFolder;
BodyOptions.ImagesFolderAlias = _ImageSettings.ImagesFolderAlias;
BodyOptions.CssStyleSheetType = CssStyleSheetType.Inline;

I am then saving the document into a memory stream:

using(MemoryStream msDoc = new MemoryStream())
    doc.Save(msDoc, BodyOptions);
    msDoc.Seek(0, SeekOrigin.Begin);
    StreamReader srBody = new StreamReader(msDoc);
    string BodyContent = srBody.ReadToEnd(); //return the stream contents to string

By the time we get the above string BodyContent, the extra spacing is there. The reason for extracting as text here is that I need to just isolate the Body tag of the HTML, so we then go on to use the HTMLAgilityPack for that, but the extra spacing is already there before we do that.
My only remedy for this at the moment is to remove the characters via String.Replace. I’ve included a sample of the output below.
The curious thing is that this is happening to only a single document. I’ve attached this document for review. The workflow is that the user will enter information in the Body content control and save the file back to SharePoint, then we pull it down for processing.
Bill Siegler
HTML output containing too many space characters:
Fdsa gfdg

Hi Bill,

Thanks for your inquiry. It would be great if you please share following detail for investigation purposes.

  • Please share the RTF contents which you are using inside contents control
  • Please create a standalone/runnable simple application (for example a Console Application Project) that demonstrates the code (Aspose.Words code) you used to generate your output document

As soon as you get these pieces of information to us we’ll start our investigation into your issue.

Hello,I have created a sample project (attached) per your request. The Word document containing the RTF content is located in the bin\debug folder. The file Executive Memo Template v2.docx contains an rtf content control with the tag name “Body.” There are a few bulledted items in it. If you run the sample project you should find that the string output into the variable named “BodyText” contains numerous non-breaking space characters that force a very large distance between the bullets and their associated text.

Hi Bill,

Thanks for sharing the detail.

I have tested the scenario and have managed to reproduce the same issue at my side. For the sake of correction, I have logged this problem in our issue tracking system as WORDSNET-10262. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Thank you. Any time estimate as to when a fix might be available?

Hi Bill,

Thanks for your inquiry. I would like to share with you that issues are addressed and resolved based on first come first serve basis. Currently, your issue is pending for analysis and is in the queue. We will update you via this forum thread once there is any update available on your issue.

Thank you for your patience and understanding.

The issues you have found earlier (filed as WORDSNET-10262) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.