Free Support Forum - aspose.com

How to extract text from DOCX file with indentation?

Could anyone explain how to get text from .docx files with same indentations in .docx input file?
I tried using TxtSaveOption to preserve the table layout. But i get the following error.

Document document = new Document(fis);
TxtSaveOptions options = new TxtSaveOptions();
options.setSaveFormat(SaveFormat.TEXT);
options.setPreserveTableLayout(true);

                document.save(parentFolder.getPath() + "/" + fileName + ".txt",options );

java.lang.StringIndexOutOfBoundsException: String index out of range: -7
at java.lang.String.substring(String.java:1967)
at com.aspose.words.zzZE3.zzJu(Unknown Source)
at com.aspose.words.zzZE3.appendText(Unknown Source)
at com.aspose.words.zzZE1.appendText(Unknown Source)
at com.aspose.words.zzZE1.visitRun(Unknown Source)
at com.aspose.words.Run.accept(Unknown Source)
at com.aspose.words.CompositeNode.acceptChildren(Unknown Source)
at com.aspose.words.CompositeNode.acceptCore(Unknown Source)
at com.aspose.words.Paragraph.accept(Unknown Source)
at com.aspose.words.CompositeNode.acceptChildren(Unknown Source)
at com.aspose.words.CompositeNode.acceptCore(Unknown Source)
at com.aspose.words.Cell.accept(Unknown Source)
at com.aspose.words.CompositeNode.acceptChildren(Unknown Source)
at com.aspose.words.CompositeNode.acceptCore(Unknown Source)
at com.aspose.words.Row.accept(Unknown Source)
at com.aspose.words.CompositeNode.acceptChildren(Unknown Source)
at com.aspose.words.CompositeNode.acceptCore(Unknown Source)
at com.aspose.words.Table.accept(Unknown Source)
at com.aspose.words.CompositeNode.acceptChildren(Unknown Source)
at com.aspose.words.CompositeNode.acceptCore(Unknown Source)
at com.aspose.words.Body.accept(Unknown Source)
at com.aspose.words.zzZE1.zzZ(Unknown Source)
at com.aspose.words.zzZE1.zzY(Unknown Source)
at com.aspose.words.zzZE1.zzZ(Unknown Source)
at com.aspose.words.Document.zzZ(Unknown Source)
at com.aspose.words.Document.save(Unknown Source)
at asposePackage.AsposeExtractor.listoffolders(AsposeExtractor.java:75)
at asposePackage.AsposeExtractor.main(AsposeExtractor.java:33)

@Sathiya22

Could you please ZIP and attach your input Word document here for testing? We will investigate the issue and provide you more information on it.

checkFile.docx.zip (10.4 KB)

@Sathiya22

We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-21385. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

This error also occurs when extracting text from some html files.

@Sathiya22

Please ZIP and attach the document for which you are facing issue along with code example that you are using to extract the text. We will investigate this issue and provide you more information on it.

I assume that table heading with multiple line causes the error.
Can you let me know if the issue has been solved please.

@Sathiya22

Yes, this issue has been resolved and its fix will be available in the next version of Aspose.Words i.e. 20.12. We will inform you via this forum thread once it is available. Hopefully, it will be available at the start of next month (December 2020).

Thank you for your information