I’m facing an issue while parsing a word document containing tables into HTML.
The following is my code to parse the document :
Document parsedDocument = new Document(file1);
Document document = new Document(); document.appendDocument(parsedDocument,ImportFormatMode.KEEP_SOURCE_FORMATTING);
ByteArrayOutputStream outStream1 = new ByteArrayOutputStream();
HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.HTML);
The steps to reproduce the problems are as follows:
1. Open a new word document (.doc format).
2. Keep Track Changes on for the document.
3. Create a small table with 3 rows and 3 columns.
4. Accept all changes.
5. Now Delete an entire row, say row 2. (By delete I mean delete all the cells and NOT just the content inside them)
6.Now Add a new column in between, say column 2 and 3.
7. Now pass this document to Aspose to generate the HTML.
(Note that I have not manually accepted the track changes in the document after deleting the row and adding the column, I intend to do that programmatically.)
Expected Result :
The expected result according to my changes should have been a table with 2 rows and 4 columns.
Actual Result :
In the resulting HTML, the cell which lies on the intersection point of the deleted row and added column ( Row2-Col3 according to my steps) is also part of the result.
I’ve noticed that if after step6 if I manually accept all changes, the document is parsed correctly.The intersecting cell is both deleted and added at the same time, which is probably causing the issue.
Kindly let me know if I’m doing something wrong or if there are functions to accept each change individually in java.
I have attached the sample document as well as snapshots of the expected and actual results.
Since this is a very critical issue for my application, I would really appreciate any feedback as soon as possible.
Document doc = new Document(MyDir + "Sample.doc");
doc.Save(MyDir + "out.doc");
doc.Save(MyDir + "out.html");