How to extract text from table by preserving layout of the cells?

how to extract text from html file while preserving table layout and preventing the text in the cells from being wrapped?
in the way the text inside the table cells should not be wrapped.

@Sathiya22

Could you please ZIP and attach your input and expected output documents here for our reference? We will then provide you code example according to your requirement.

input file : samplehtml.html.zip (2.4 KB)

the result got : samplehtml.txt.zip (1.6 KB)

expected output : expected_text.zip (1.6 KB)

sample text that did not get alignment : sample_text.png (31.5 KB)

@Sathiya22

We have tested the scenario using following code example and noticed the same issue at our side.

Document doc = new Document(MyDir + @"samplehtml.html");
                
TxtSaveOptions options = new TxtSaveOptions();
options.PreserveTableLayout = true;
                
doc.Save(MyDir + @"output.txt", options);

For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-22293. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Similarly, can I know how to get text from table with alignment in doc/docx files please?

@Sathiya22

Please note that DOCX and TXT file formats are quite different. The TXT file format does not has text alignment as MS Word document. You can get the text of table’s cell using Node.ToString method. However, the string does not contain any font formatting like DOCX.

Is there a way to extract text from doc/docx with indentation without the text being wrapped?

@Sathiya22

Could you please share complete detail of your use case along with input Word document and expected text (extracted text) and what will be its file format? We will then provide you more information on your query.

sample doc file : sampledoc.doc.zip (3.8 KB)

the result got : sampledoc.txt.zip (734 Bytes)

expected output : expected_text_doc.txt.zip (721 Bytes)

sample text that did not get alignment : text_out_of_alignment.png (25.0 KB)

@Sathiya22

Aspose.Words mimics the behavior of MS Word. If you convert your document to TXT file format using MS Word, you will get the same output. Moreover, TXT document does not keep the font formatting as MS Word document does.

The issues you have found earlier (filed as WORDSNET-22293) have been fixed in this Aspose.Words for .NET 21.7 update and this Aspose.Words for Java 21.7 update.