Run.toTxt() pops exception

This is a problem I get on any document I process, but you can use as an example one of your files from the Java package. For this post I used SalesInvoiceDemo.doc (docx documents do not work as well).

Make a simple tree-walk algorithm that takes each node of the document and calls toTxt if node type is Run, like this

public void traverseAllNodes(CompositeNode parentNode) throws Exception
{
    for (Node childNode = parentNode.getFirstChild(); childNode != null; childNode = childNode.getNextSibling())
    {
        Run run = (Run) childNode;
        String runStr = run.toTxt();
        if (childNode.isComposite())
            traverseAllNodes((CompositeNode)childNode);
    }
}

Calling to run.toTxt will eventually (sometimes at the first iteration) cause exception:

java.util.EmptyStackException
at java.util.Stack.peek(Stack.java:85)
at com.aspose.words.awm.visitRun(TxtWriter.java: 109)
at com.aspose.words.Run.accept(Run.java: 89)
at com.aspose.words.awm.v(TxtWriter.java: 68)
at com.aspose.words.Document.a(Document.java: 1378)
at com.aspose.words.Node.toTxt(Node.java: 588)

For the document I' ve mentioned exception pops at the run with text "PAGE * MERGEFORMAT"

Strange thing is Document.toTxt() works fine.

Aspose.Words version: 11.0.0

Hi Stanislav,

Thanks for your query. It would be great, If you share what you want to do by using Aspose.Words? The code shared by you will not work because you are casting each childNode to Run, which is incorrect. Please see the different types of nodes in document explorer.

Hi
Thanks for your request. Yes, we are aware of this issue. We will let you know once this problem is resolved.
You can use Run.getText() instead of Run.toTxt() while you are waiting for a fix.
Best regards,

Tahir,

Yes, I know this exact code is not going to work, but this is not my actual code -
just an example to give you a clue how to reproduce the problem.

Thanks,
Stanislav

Thanks Alexey!

I’d like to use toTxt to get the run text without control chars, but for now I have workaround - I use getText and replace them via regex.

Hi there,

Thanks for this additional information.

Normally Run nodes don’t have any control characters. There are some such as new line break, but from memory these also appear when using ToTxt.

Thanks,

The issues you have found earlier (filed as WORDSNET-4812) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(3)