DOCX to TXT conversion throws java.lang.StringIndexOutOfBoundsException

Sathiya22 · November 9, 2020, 1:16pm

Could anyone explain how to get text from .docx files with same indentations in .docx input file?
I tried using TxtSaveOption to preserve the table layout. But i get the following error.

Document document = new Document(fis);
TxtSaveOptions options = new TxtSaveOptions();
options.setSaveFormat(SaveFormat.TEXT);
options.setPreserveTableLayout(true);

                document.save(parentFolder.getPath() + "/" + fileName + ".txt",options );

java.lang.StringIndexOutOfBoundsException: String index out of range: -7
at java.lang.String.substring(String.java:1967)
at com.aspose.words.zzZE3.zzJu(Unknown Source)
at com.aspose.words.zzZE3.appendText(Unknown Source)
at com.aspose.words.zzZE1.appendText(Unknown Source)
at com.aspose.words.zzZE1.visitRun(Unknown Source)
at com.aspose.words.Run.accept(Unknown Source)
at com.aspose.words.CompositeNode.acceptChildren(Unknown Source)
at com.aspose.words.CompositeNode.acceptCore(Unknown Source)
at com.aspose.words.Paragraph.accept(Unknown Source)
at com.aspose.words.CompositeNode.acceptChildren(Unknown Source)
at com.aspose.words.CompositeNode.acceptCore(Unknown Source)
at com.aspose.words.Cell.accept(Unknown Source)
at com.aspose.words.CompositeNode.acceptChildren(Unknown Source)
at com.aspose.words.CompositeNode.acceptCore(Unknown Source)
at com.aspose.words.Row.accept(Unknown Source)
at com.aspose.words.CompositeNode.acceptChildren(Unknown Source)
at com.aspose.words.CompositeNode.acceptCore(Unknown Source)
at com.aspose.words.Table.accept(Unknown Source)
at com.aspose.words.CompositeNode.acceptChildren(Unknown Source)
at com.aspose.words.CompositeNode.acceptCore(Unknown Source)
at com.aspose.words.Body.accept(Unknown Source)
at com.aspose.words.zzZE1.zzZ(Unknown Source)
at com.aspose.words.zzZE1.zzY(Unknown Source)
at com.aspose.words.zzZE1.zzZ(Unknown Source)
at com.aspose.words.Document.zzZ(Unknown Source)
at com.aspose.words.Document.save(Unknown Source)
at asposePackage.AsposeExtractor.listoffolders(AsposeExtractor.java:75)
at asposePackage.AsposeExtractor.main(AsposeExtractor.java:33)

tahir.manzoor · November 9, 2020, 6:51pm

@Sathiya22

Could you please ZIP and attach your input Word document here for testing? We will investigate the issue and provide you more information on it.

Sathiya22 · November 11, 2020, 6:55am

checkFile.docx.zip (10.4 KB)

tahir.manzoor · November 11, 2020, 2:25pm

@Sathiya22

We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-21385. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Sathiya22 · November 16, 2020, 5:44am

This error also occurs when extracting text from some html files.

tahir.manzoor · November 16, 2020, 7:57am

@Sathiya22

Please ZIP and attach the document for which you are facing issue along with code example that you are using to extract the text. We will investigate this issue and provide you more information on it.

Sathiya22 · November 19, 2020, 3:34pm

I assume that table heading with multiple line causes the error.
Can you let me know if the issue has been solved please.

tahir.manzoor · November 19, 2020, 3:42pm

@Sathiya22

Yes, this issue has been resolved and its fix will be available in the next version of Aspose.Words i.e. 20.12. We will inform you via this forum thread once it is available. Hopefully, it will be available at the start of next month (December 2020).

Sathiya22 · November 19, 2020, 3:44pm

Thank you for your information

aspose.notifier · December 10, 2020, 8:38am

The issues you have found earlier have been fixed in this Aspose.Words for .NET 20.12 update and this Aspose.Words for Java 20.12 update.