Aspose.Words does not Import Multi-section Content Control using .NET

Hi,

Attached a document with some StructuredDocumentTags. In the StructuredDocumentTags there is text and tables data.

If I try to convert the nodes to html in a generic method I can get all the data from the document. But if I try to convert only the StructuredDocumentTags I don’t get the tables data.

All the tables and the texts are inside the StructuredDocumentTags so logically it should be the same.

I think this is a bug in your parsing - you think that the table is outside the StructuredDocumentTag but word shows that it is inside.

Here is an example code that takes this word document and create 2 htm files, one is good (using the generic approach) and one is bad (recursively going over the StructuredDocumentTags and getting the data out of them).

var html = string.Empty;

var htmlSaveOptions = new Aspose.Words.Saving.HtmlSaveOptions
{
    ExportImagesAsBase64 = true,
    ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.None
};

//Generic approach - go over all the nodes
using (var inputStream = File.OpenRead(@"E:\WordTest\11.docx"))
{
    var doc = new Aspose.Words.Document(inputStream);

    CompositeNode parent = doc;

    foreach (Aspose.Words.Node node in doc.ChildNodes)
    {
        html += node.ToString(htmlSaveOptions);
    }

    File.WriteAllText(@"E:\WordTest\11_good.htm", html, Encoding.UTF8);
}

// Exclusive approach - go over the StructuredDocumentTag only
html = string.Empty;
using (var inputStream = File.OpenRead(@"E:\WordTest\11.docx"))
{
    var doc = new Aspose.Words.Document(inputStream);

    html = ReadStructuredDocumentTagOnly(html, htmlSaveOptions, doc);

    File.WriteAllText(@"E:\WordTest\11_bad.htm", html, Encoding.UTF8);
}

Helper method:

private static string ReadStructuredDocumentTagOnly(string html, Aspose.Words.Saving.HtmlSaveOptions htmlSaveOptions, CompositeNode parent)
{
    foreach (Aspose.Words.Node node in parent.ChildNodes)
    {
        if (node.NodeType == NodeType.StructuredDocumentTag)
        {
            StructuredDocumentTag structuredDocumentTag = (StructuredDocumentTag)node;


            foreach (Aspose.Words.Node textNode in structuredDocumentTag.ChildNodes)
            {
                html += textNode.ToString(htmlSaveOptions);
            }
        }
        else
        {
            if (node is CompositeNode)
            {
                if (((CompositeNode)node).ChildNodes != null

                        && ((CompositeNode)node).ChildNodes.Count > 0)
                {
                    html = ReadStructuredDocumentTagOnly(html, htmlSaveOptions, (CompositeNode)node);
                }
            }
        }
    }
    return html;
}

Please fix this bug or give us advice how to work around it.

Thanks

Hi Omri,

Thanks for your inquiry. We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-13601. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Any news about this bug?

This is a real blocker for us, we just can’t work with Aspose Words that we purchased because of this bug!

Hi Omri,

Thanks for your patience. Our product team has completed the work on your issue and has come to a conclusion that they won’t be able to implement the fix to your issue. Your issue (WORDSNET-13601) is now closed with ‘Won’t Fix’ resolution. This is because the requested feature is not compatible with Aspose.Words document model. We apologize for your inconvenience.

What do you mean “Won’t Fix”?

You do understand that this is a major bug in your product and it is preventing from working with StructuredDocumentTag right?

We purchased your product in order to read and write structured MS Word Documents and now we find out that it is broken and you won’t going to fix it?!

It sound very unreasonable response!

Can you reconsider this decision?

Hi Omri,

We are really very sorry for your inconvenience.

Unfortunately, the requested feature is not compatible with our document model. We are in communication with our product team about this feature. Once we have any information from our product team, we will be more than happy to share that with you.

Let me emphasize – this is NOT a feature, it is a BUG in your product. Your StructuredDocumentTag is broken in some cases. There is no way to spin it around – you are just not supporting your product. We purchased your product on a false promise that it supports StructuredDocumentTags and we find out that it is badly implemented and broken!

Besides transferring this message to your product team and ask them to reconsider again. Can you please elaborate on exactly what is the bug? What cause it?

Maybe we can ask our users to avoid using some features of MS Word that breaks your StructuredDocumentTags implementation as a temporary work around.

Hi Omri,

Thanks for your inquiry. Your document contains section break inside structured document tag. As per current Aspose.Words document model, only sections can be inserted into Document node. If section break is cut from structured document tag and paste after it then the shared code will start work properly.

Please check the detail of StructuredDocumentTag. StructuredDocumentTag can occur in a document in the following places:

  • Block-level - Among paragraphs and tables, as a child of a Body, HeaderFooter, Comment, Footnote or a Shape node.
  • Row-level - Among rows in a table, as a child of a Table node.
  • Cell-level - Among cells in a table row, as a child of a Row node.
  • Inline-level - Among inline content inside, as a child of a Paragraph.
  • Nested inside another StructuredDocumentTag.

Please check the attached word document (11_valid.docx) and DOM image. The StructuredDocumentTag is under Body node of Section. Your document should not contain section break inside StructuredDocumentTag. Please insert section break in your document as shown in 11_valid.docx. Hope this clears the detail of issue.

Hi Omri,

Could you please share the steps which you are using to create the input document? Maybe we will help you to find the other way to fix this problem.

First of all, thank you for your support.

This is something that I can work with. What you are practically saying is that section breaks “breaks” your document model and therefore you return an empty HTML when converting the StructuredDocumentTag, right?

I was able to reproduce it and it seems to be working like that.

So now I can instruct my users not to use section breaks (I don’t think this is such an important feature that they can’t live without it).

My questions are:

  1. Can’t you just add a preemptive check when converting a StructuredDocumentTag to HTML and remove the section break? It will be easier than me (and any other customer of yours) asking my users not to use it.

  2. If not, can you explain to me how I can do it? (add the check and remove section breaks befoe converting to HTML)

  3. What other features may “breaks” the StructuredDocumentTag conversion to HTML?

Thanks

Thanks for your inquiry.

omri-1:

This is something that I can work with. What you are practically saying is that section breaks “breaks” your document model and therefore you return an empty HTML when converting the StructuredDocumentTag, right?

If the section break exists inside StructuredDocumentTag, Aspose.Words does not import StructuredDocumentTag. However, its contents are imported into Aspose.Words DOM. Please see the attached DOM image.

omri-1:

  1. Can’t you just add a preemptive check when converting a StructuredDocumentTag to HTML and remove the section break? It will be easier than me (and any other customer of yours) asking my users not to use it.

  2. If not, can you explain to me how I can do it? (add the check and remove section breaks befoe converting to HTML)

Aspose.Words does not import the StructuredDocumentTag when section break exists inside it. See the attached image (section break inside content control.png). StructuredDocumentTag can occur in a document at specific places. See my reply here.

omri-1:

  1. What other features may “breaks” the StructuredDocumentTag conversion to HTML?

Aspose.Words does support conversion of StructuredDocumentTag to Html except the issue shared in this forum thread.

Please let us know if you have any more queries.

tahir.manzoor:

Aspose.Words does not import the StructuredDocumentTag when section break exists inside it.

I do understand that. what i’m asking if there is a way to identify section breaks inside StructuredDocumentTag, remove it and then import the StructuredDocumentTag without the section break?

Hi Omri,

Thanks for your inquiry. Unfortunately, there is no way to get StructuredDocumentTag when there is section break in it. This is because Aspose.Words does not import the StructuredDocumentTag for your case into Aspose.Word DOM. Please read about Aspose.Words document model from following link.

Aspose.Words Document Object Model

Just to be 100% sure… i’m not looking to import StructuredDocumentTag with section breaks. i’m looking for the following method:

  1. import the entire document to a DOM model
  2. identify and remove section breaks
  3. save the document
  4. import it again and this time read the StructuredDocumentTag

Is it possible?

Hi Omri,

Thanks for your inquiry. You can import the document into Aspose.Words DOM and remove the section breaks from document. However, in your specific case, the section break is inside StructuredDocumentTag. Unfortunately, Aspose.Words does not import such StructuredDocumentTag into document object model. So, you cannot get this kind of StructuredDocumentTag using Aspose.Words.

We apologize for your inconvenience.

I see… Do you have an estimate when you will support section breaks in StructuredDocumentTag?

Hi Omri,

Thanks for your inquiry. We regret to share with you that this feature has been postponed till a later date. We will inform you via this forum thread as soon as there are any further developments.

We apologize for your inconvenience.

Hi,

It’s been over 6 months since my original question, any news regarding this issue?

Hi Omri,

Thanks for your inquiry.

Unfortunately, there is no update available on this feature. We will inform you via this forum thread as soon as there are any further developments. We apologize for your inconvenience.

Hi,

It’s been over 9 months since my original question, any news regarding this issue?