Docx to mthml conversion: extra   appears

Dear Aspose Team,
I have a document, which contains some text and images. On converting this document into Mhtml, there are some nonbreakingSpaces added with blue borders.
This “blue boxes” also are included in the background of the second image.
(Can be seen, if you view the source code of the mhtml in Internet Explorer and save it again as html source)
I think, this is caused by some kind of paragraph formatting. But Unfortunately my scan method,
which prints some attributes for each type of node found in a document, does not reveal any special color formatting in paragraphs or runs. Maybe, i am checking the wrong attributes.

The code to reproduce the problem:

Document badDoc = new Document(badDocumentPath);
badDoc.Save(savePath, SaveFormat.Mhtml);

Please find attached the Document which causes the problems.
I appreciate any help or hints you provide and really hope, that there exists a simple workaround or even solution for this problem. I also know, that you will find them, if they exist.
If you need further information, just let me know about it.
Thanks for your help and kind regards,
Wolfgang

Hi
Thank you for reporting this problem to us. I managed to reproduce in on my side. Your request has been linked to the appropriate issue. We will let you know once it is resolved.
The problem occurs because blue border is defined in rPr of the empty paragraphs in your document. When exporting to HTML, Aspose.Words adds a non-breaking space to empty paragraphs and these borders become visible.
Best regards,

Hi Alexey,
thanks you for your fast analysis and quick reply.
To know, how this issue is caused, is very valuable for me.
So thanks again and have a nice day,
Wolfgang

Hi again,
I just wanted to share our workaround with you and other users, which may experience the same problem.
Since we don’t need any formatting of empty paragraphs and also there seems to be no chance by now, to get the specific formatting by API, we simply replace the empty paragraphs with new paragraphs, so no formatting is kept.

private static Document processEmptyParagraphs(Document document)
{
    List parasToRemove = new List();
    foreach(Paragraph paragraph in
        document.GetChildNodes(NodeType.Paragraph, true))
    {
        if (String.IsNullOrEmpty(paragraph.GetText().Trim()) &&
            (paragraph.ChildNodes.Count == 0))
        {
            try
            {
                Paragraph newParagraph = new Paragraph(document);

                parasToRemove.Add(paragraph);
                // insert before, to avoid infinite loop issues
                paragraph.ParentNode.InsertBefore(newParagraph, paragraph);
            }
            catch (Exception e)
            {
                Logger.Log(LogType.Error, e);
            }
        }
    }
    foreach(Paragraph para in parasToRemove)
    {
        para.ParentNode.RemoveChild(para);
    }
    return document;
}

This workaround may produce a small difference to the source document (some more line breaks appear than before). But this is sufficient for our needs and those of our customers (we only use html conversion for creating emails).

Hi
Thank you for sharing this code. It is perfect that you managed to work the problem around.
Best regards,

The issues you have found earlier (filed as WORDSNET-5079) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

The issues you have found earlier (filed as WORDSNET-5079) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(1)