Convert Tables from DOCX document to Table XML in String or Java object

Dear,

We would like to use Aspose.Word product to extract Tables from DOCX document and convert to Table XML in String or Java object. Is there any way to do that?

For now I only iterate through nodes of the aspose.words table like this…

public Object renderOfficeTableToST36ElementAsObject(final com.aspose.words.Table officeTable){
try {
String tableMl = “”;

        // Get the index of the table node as contained in the parent node of the table
        int tableIndex = officeTable.getParentNode().getChildNodes().indexOf(officeTable);

        // Iterate through all rows in the table
        for (Row row : officeTable.getRows()) {
            int rowIndex = officeTable.getRows().indexOf(row);

            // Iterate through all cells in the row
            for (Cell cell : row.getCells()) {
                int cellIndex = row.getCells().indexOf(cell);
                // Get the plain text content of this cell.
                //String cellText1 = cell.toString();
                String cellText = cell.toString(SaveFormat.TEXT).trim();

                //Add cell to docxconversion.st36.content.Table??
				
            }
        }

        StringReader reader = new StringReader(tableMl);
        return unmarshaller.unmarshal(reader);

    } catch (Exception e) {
        LOGGER.error(TABLE_ML_CONVERSION_EXCEPTION, e);
        return null;
    }
}

A lot of thanks!

@josgobo

Thanks for your inquiry. Please ZIP and attach your input Word document and expected output here for our reference. We will then provide you more information about your query.

A lot of thanks!

Here are the input document and the output xml.

Table from DOCX to XML Sample.zip (11.9 KB)

@josgobo

Thanks for sharing the detail. Unfortunately, Aspose.Words does not convert the document or a node to specified XML. Please check the save formats provided by Aspose.Words.

In your case, we suggest you please import the Table node into another document and save the document to WordprocessingML file format.

You may also save the Table node to HTML using following code example. Hope this helps you.

Document doc = new Document(MyDir + "AppBody-Sample-Table-Input.docx");
Table table = doc.getFirstSection().getBody().getTables().get(0);
System.out.println(table.toString(SaveFormat.HTML));

Thank you very much, I am going to try it this way, but for the moment I do not know how to go from the table in html to xml once I have it as String.

String htmlTable = table.toString(SaveFormat.HTML);

Thanks and best regards.

@josgobo

Thanks for your inquiry. In your case, we suggest you please iterate over table’s rows and cells, get the text of paragraphs using Node.toString method, and write your own custom XML.

Following code example shows how to iterate over table’s row and cells. Hope this helps you.

Document doc = new Document(MyDir + "in.docx");
for (Table table : (Iterable<Table>) doc.getChildNodes(NodeType.TABLE, true))
{
    for(Row row : table.getRows())
    {
        for(Cell cell : row.getCells())
        {
            NodeCollection nodes = cell.getChildNodes(NodeType.PARAGRAPH, true);
            for (Paragraph para : (Iterable<Paragraph>) nodes) {
                System.out.println(para.toString(SaveFormat.TEXT));
            }            
        }
    }
}

Thanks for your help!

I iterate de rows, but, i don’t know how to extract some things from aspose Table. For exaple I need to set, in my custom XML Table, Colsep, Rowsep, list of Colspec…

@josgobo

Thanks for your inquiry. You can get the properties of Table, Row and Cells. In your case, we suggest you please read the members of Table, Row and Cell classes. Hope this helps you.

Please also read the following article.
Applying Formatting to Table, Row and Cell