We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Avoid Special character while parsing

Hi Aspose Team,

I copy paste some text from one doc file to second doc file. While parsing second doc file, I am getting some special character when I use getText() method. But While I display same text in html, that text is not visible. It is also not visible in Word doc.

Can you please tell me how can I remove special charater from my text?

My special character is:

Also can you provide list of special characters which are not visible but can be part of word doc(Word 2003).

Thanks!

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your inquiry. I think in your case you should just use ToTxt method instead of GetText. Please follow the link to learn the difference between these methods:

http://www.aspose.com/documentation/java-components/aspose.words-for-java/howto-extract-text-only.html

Best regards.

It is a VT (vertical tab) ascii code is 11.

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

You can find list of special characters here:

http://www.aspose.com/documentation/java-components/aspose.words-for-java/com/aspose/words/controlchar.html

Best regards.

Thanks for repling.

Is there any way by which I can stop this character(VT ascii 11) not come in my text.

I tried the code but it still displays this character. I am using word 2003. attached a doc I used.

public static void main(String[] args)

{

try

{

Document doc = new Document("C:\\doc\\hello.doc");

NodeCollection runsDoc = doc.getChildNodes(NodeType.RUN, true);

for (Run run : runsDoc)

{

System.out.println(run.toTxt());

}

}

catch(Exception e)

{

e.printStackTrace();

}

}

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. Sure, just remove this character from the final text.

Best regards,