Unwanted Bullet Spacing Characters

XinnBill · June 5, 2014, 2:34pm

Hello,
We are getting undesired output when extracting bulleted lists from Word documents and saving to HTML. I understand that part of the ASPose implementation for bullets is the use of the non-breaking space character which we are seeing. However, there are far too many of these characters and the result is that the spacing between the bullets and the text is way too big and the output is unacceptable.
Our solution begins with a Word document that resides in Microsoft SharePoint. The document contains content controls (StructuredDocumentTag). We open the document and loop through all the content controls and identify them by tag. The content control in question is a rich text control. I import the content from the StructuredDocumentTag into a temp document using the ImportNode method. The document is then saved as HTML.
Here are the HtmlSaveOptions being used:

HtmlSaveOptions BodyOptions = new HtmlSaveOptions(SaveFormat.Html);
BodyOptions.ImageSavingCallback = new HandleImageSaving(this);
BodyOptions.ImagesFolder = _ImageSettings.BodyImagesFolder;
BodyOptions.ImagesFolderAlias = _ImageSettings.ImagesFolderAlias;
BodyOptions.CssStyleSheetType = CssStyleSheetType.Inline;

I am then saving the document into a memory stream:

using(MemoryStream msDoc = new MemoryStream())
{
    doc.Save(msDoc, BodyOptions);
    msDoc.Seek(0, SeekOrigin.Begin);
    StreamReader srBody = new StreamReader(msDoc);
    string BodyContent = srBody.ReadToEnd(); //return the stream contents to string
}

By the time we get the above string BodyContent, the extra spacing is there. The reason for extracting as text here is that I need to just isolate the Body tag of the HTML, so we then go on to use the HTMLAgilityPack for that, but the extra spacing is already there before we do that.
My only remedy for this at the moment is to remove the characters via String.Replace. I’ve included a sample of the output below.
The curious thing is that this is happening to only a single document. I’ve attached this document for review. The workflow is that the user will enter information in the Body content control and save the file back to SharePoint, then we pull it down for processing.
Thanks,
Bill Siegler
Xinnovation
HTML output containing too many space characters:
Body
Fdsa gfdg
v

tahir.manzoor · June 6, 2014, 9:52am

Hi Bill,

Thanks for your inquiry. It would be great if you please share following detail for investigation purposes.

Please share the RTF contents which you are using inside contents control
Please create a standalone/runnable simple application (for example a Console Application Project) that demonstrates the code (Aspose.Words code) you used to generate your output document

As soon as you get these pieces of information to us we’ll start our investigation into your issue.

XinnBill · June 9, 2014, 4:18pm

Hello,I have created a sample project (attached) per your request. The Word document containing the RTF content is located in the bin\debug folder. The file Executive Memo Template v2.docx contains an rtf content control with the tag name “Body.” There are a few bulledted items in it. If you run the sample project you should find that the string output into the variable named “BodyText” contains numerous non-breaking space characters that force a very large distance between the bullets and their associated text.
Thanks,
Bill

tahir.manzoor · June 10, 2014, 6:53am

Hi Bill,

Thanks for sharing the detail.

I have tested the scenario and have managed to reproduce the same issue at my side. For the sake of correction, I have logged this problem in our issue tracking system as WORDSNET-10262. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

XinnBill · June 10, 2014, 3:47pm

Thank you. Any time estimate as to when a fix might be available?

tahir.manzoor · June 11, 2014, 3:37am

Hi Bill,

Thanks for your inquiry. I would like to share with you that issues are addressed and resolved based on first come first serve basis. Currently, your issue is pending for analysis and is in the queue. We will update you via this forum thread once there is any update available on your issue.

Thank you for your patience and understanding.

aspose.notifier · October 17, 2014, 9:45am

The issues you have found earlier (filed as WORDSNET-10262) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.