Vertical & Horizontal merge in table cells

Hi, I have a problem with extracting merge info about cell merge from documents created in word. The merge property is never set for any cell, even if I created a table in word and merged some cells.

Is there some way to read this information some other way if this is not possible?

Thanks for your request. By Microsoft Word design rows in a table in a Microsoft Word document are completely independent. It means each row can have any number of cells of any width. So if you imagine first row with one wide cell and second row with two narrow cells, then looking at this document the cell in the first row will appear horizontally merged. But it is not a merged cell; it is just a single wide cell. Another perfectly valid scenario is when the first row has two cells. First cell has CellMerge.First and second cell has CellMerge.Previous, in this case it is a merged cell. In both cases the visual appearance in MS Word is exactly the same. Both cases are valid.
So in the second case you can easy determine whether Cells are merged horizontally (using HorizontalMerge property)
But in the first case it is not so easy.
Regarding vertically merged cells you can use VerticalMerge property. Please see the following link for more information
OK, the cells can be a bit hard to get the span information from. But since the information is reveled as col and rowspans when saving to html, then it can’t be impossible to get. What I do now is to save to html and parse it as xml to get the row and colspan information for the cells. There must be a better way!


Thanks for your inquiry. I didn’t tell this is impossible. But it is not so easy to achieve as it sounds. We are using special algorithm that calculates width of cells and make decision whether cell should have colspan in the HTML output or not.
So parsing of HTML is the easiest way to determine whether cells have colspan.
You can also try to create your own algorithm that will calculate colspan for each cell in the table.
Hi ,
I saw the discussion that you had with Hjalmar about Colspan problem with Aspose.
Actually what I’m doing is parsing a word document into custom xml file using DocumentVesitor, and the big problem here is the colspan which not detectible by aspose.
I saw that you said: “Parsing of HTML is the easiest way to determine whether cells have colspan”.
Is it possible for me to use this approach with documentvesitor? if yes How to do it?
Could you please share with me your experience so I can resolve my problem?
Hi Robert,

Thanks for your inquiry. As I wrote earlier in this thread, it is very difficult to determine colspan, because rows in the Word table are absolutely independent, and can contain any number of cells of any width.
During converting to HTML Aspose.Words calculates this value using complex algorithm. You can try to convert your document to HTML and then parse it. Unfortunately, there is no way to convert only particular node to HTML, so you should to convert whole document to HTML.
You can try to create your own algorithm to calculate colspan, but it is very complex task. First of all you should calculate width of the whole table. Then you should calculate min width of each cell and determine max number of cells per row in the table and then calculate colspan of each cell.
There are 2 alternatives, the one that works best is to create a config file where you specify every table and where the colspans are, this takes time but it’s the one I had to use. The alternative which almost always works it to use the built in funktion in aspose which saves the wordfile to html, parse the file to get the colspans set. (this will work as long as the tables are normal, if the cells are weird in ways only MS Word can make them, then your in for solution 1.

Here are some sample code (If you arn’t familiar with linq, then you should be)

// Save the document to html, and parse it to xml
using (MemoryStream docToHtml = new MemoryStream())
    document.Save(docToHtml, SaveFormat.Html);
    docToHtml.Position = 0;
    System.Xml.XmlReader xr = System.Xml.XmlReader.Create(docToHtml);
    XElement html = XElement.Load(xr);

    // extract the information aboute table col and rowspan
    tableDfn = new XElement("tables",
    (from tab in html.Descendants("table")
        select new XElement("table", (from tr in tab.Descendants("tr")
        select new XElement("tr", (from td in tr.Descendants("td")
        select new XElement("td", (from attr in td.Attributes()
        where attr.Name.LocalName.Contains("span")
        select attr))))))));

With this xml, just keep track of which table, row and cell you are in when you navigate the word file, so you can query the xml to get colspans.

Is it in the good way should I do to save word document in HTML format and reload it and parse it using documentvesitor?

Document doc = new Document("Document.doc");

doc = new Document("Out.html");
MyDocToTxtWriter myConverter = new MyDocToTxtWriter();



You do not need to save and then reload document as HTML, you can just determine indexes of table and of current row, then find corresponding table and row in the HTML and then determine rowspan and colspan of each cell in the row.
Thanks Alexey,

Here is the XML file which is the result of your code:

      <td />
      <td colspan="9" />
      <td rowspan="2" />
      <td rowspan="2" />
      <td rowspan="2" />
      <td colspan="2" />
      <td rowspan="2" />
      <td rowspan="2" />
      <td rowspan="2" />
      <td rowspan="2" />
      <td rowspan="2" />

How to keep track with the word document and the table name? As you can see there is no name for the table.
If I understood you well, you propose to move to the next table at every time I meet a table in word document and to move to next row and cell every time I meet a new row and new cell?
If it’s what you mean, then if you have any hits how to do it, so it will be appreciated.


Thanks for your inquiry. I created simple example, how you can parse the HTML and determine colspan and rowspan of each cell. I used XmlDocument DOM, but you can change the code and use LINQ to get necessary information. Here is my code:

// Open document
Document doc = new Document(@"Test013\in.doc");
// Create visitor
SpanVisitor visitor = new SpanVisitor(doc);
// Accept visitor

Here code of the visitor

public class SpanVisitor : DocumentVisitor
    /// Creates new SpanVisitor instance
    /// Is document which we should parse
    public SpanVisitor(Document doc)
        // get collection of tables from the document
        mWordTables = doc.GetChildNodes(NodeType.Table, true);
        // Convert document to HTML
        // We will parse HTML to determine rowspan and colspan of each cell
        MemoryStream htmlStream = new MemoryStream();
        doc.SaveOptions.HtmlExportImagesFolder = Path.GetTempPath();
        doc.Save(htmlStream, SaveFormat.Html);
        // Load HTML into the XML document
        XmlDocument xmlDoc = new XmlDocument();
        htmlStream.Position = 0;
        // Get collection of tables in the HTML document
        XmlNodeList tables = xmlDoc.DocumentElement.SelectNodes("//table");
        foreach (XmlNode table in tables)
            TableInfo tableInf = new TableInfo();
            // Get collection of rows in the table
            XmlNodeList rows = table.SelectNodes("tr");
            foreach (XmlNode row in rows)
                RowInfo rowInf = new RowInfo();
                // Get collection of cells
                XmlNodeList cells = row.SelectNodes("td");
                foreach (XmlNode cell in cells)
                    // Determine row span and colspan of the current cell
                    XmlAttribute colSpanAttr = cell.Attributes["colspan"];
                    XmlAttribute rowSpanAttr = cell.Attributes["rowspan"];
                    int colSpan = colSpanAttr == null ? 0 : Int32.Parse(colSpanAttr.Value);
                    int rowSpan = rowSpanAttr == null ? 0 : Int32.Parse(rowSpanAttr.Value);
                    CellInfo cellInf = new CellInfo(colSpan, rowSpan);
    public override VisitorAction VisitCellStart(Aspose.Words.Tables.Cell cell)
        // Determone index of current table
        int tabIdx = mWordTables.IndexOf(cell.ParentRow.ParentTable);
        // Determine index of current row
        int rowIdx = cell.ParentRow.ParentTable.IndexOf(cell.ParentRow);
        // And determine index of current cell
        int cellIdx = cell.ParentRow.IndexOf(cell);
        // Determine colspan and rowspan of current cell
        int colSpan = 0;
        int rowSpan = 0;
        if (tabIdx < mTables.Count &&
        rowIdx < mTables[tabIdx].Rows.Count &&
        cellIdx < mTables[tabIdx].Rows[rowIdx].Cells.Count)
            colSpan = mTables[tabIdx].Rows[rowIdx].Cells[cellIdx].ColSpan;
            rowSpan = mTables[tabIdx].Rows[rowIdx].Cells[cellIdx].RowSpan;
        Console.WriteLine("{0}.{1}.{2} colspan={3}\t rowspan={4}", tabIdx, rowIdx, cellIdx, colSpan, rowSpan);
        return VisitorAction.Continue;
    private List<TableInfo> mTables = new List<TableInfo>();
    private NodeCollection mWordTables = null;

And here is code of helper classes.

/// Helper class that contains collection of rowinfo for each row
public class TableInfo
    public List<RowInfo> Rows
        get { return mRows; }
    private List<RowInfo> mRows = new List<RowInfo>();
/// Helper class that contains collection of cellinfo for each cell
public class RowInfo
    public List<CellInfo> Cells
        get { return mCells; }
    private List<CellInfo> mCells = new List<CellInfo>();
/// Helper class that contains info about cell. currently here is only colspan and rowspan
public class CellInfo
    public CellInfo(int colSpan, int rowSpan)
        mColSpan = colSpan;
        mRowSpan = rowSpan;
    public int ColSpan
        get { return mColSpan; }
    public int RowSpan
        get { return mRowSpan; }
    private int mColSpan = 0;
    private int mRowSpan = 0;

I hope this could help you.
I got your solution and it was working very well, till I updated my aspose version to the latest one.
Now when the code hits the folowing:
doc.Save(htmlStream, SaveFormat.Html);
I get the error message.
Image file cannot be written to disk. When saving the document to a stream either HtmlExportImagesFolder should be specified or custom streams should be provided via HtmlExportImageSaving event handler. Please see documentation for details.
Could you please help


Thanks for your inquiry. You should just specify where images will be stored during converting document to HTML. For example see the following code:

// Specify folder where images will be saved durign export to HTML
doc.SaveOptions.HtmlExportImagesFolder = @"C:\Temp\images";
doc.Save(htmlStream, SaveFormat.Html);

unfortunatly it doesn’t work.
If you see your code above:

MemoryStream htmlStream = new MemoryStream();
doc.SaveOptions.HtmlExportImagesFolder = Path.GetTempPath();
doc.Save(htmlStream, SaveFormat.Html);

I even replace Path.GetTempPath(); by @“C:”
it doesn’t work neither the same error message.
Try by yoursef


Thanks for your request. This code works fine on my side.

Document doc = new Document(@"C:\Temp\in.doc");
// Specify folder where images will be saved durign export to HTML
doc.SaveOptions.HtmlExportImagesFolder = Path.GetTempPath();
MemoryStream htmlStream = new MemoryStream();
doc.Save(htmlStream, SaveFormat.Html);

I tried again and it doesn’t work with the version, however I kept the previous version
So What I did, I desinstalled the and reinstalled the version and it works as before.
Then I gone back to so it’s not working at all with this version.
Please Help.
PS: I Attached my .doc test file to this message

Hi Robert,

I still cannot reproduce the problem on my side. Since you do not need images during saving document to HTML, maybe you should try using code like the following:

public void Test089()
    Document doc = new Document(@"C:\Temp\I-C1-1.1_test_tout.doc");
    MemoryStream htmlStream = new MemoryStream();
    doc.SaveOptions.HtmlExportImageSaving += new ExportImageSavingEventHandler(SaveOptions_HtmlExportImageSaving);
    doc.Save(htmlStream, SaveFormat.Html);
void SaveOptions_HtmlExportImageSaving(object sender, ExportImageSavingEventArgs e)
    e.ImageStream = new MemoryStream();

Best regards.

Thanks Alexey, it works now.

I am new to Aspose.
I have some tables in HTML files and would like to export to CSV format. Tried the code snippet, but XMLDocument and MemoryStream classes are not found.
Could you please help?

Thanks for your request. This question is not related to Aspose.Words. In your case you should just add System.IO and System.Xml name spaces to your project.
