Free Support Forum - aspose.com

Replace junk data while placing table in word document using aspose.words

Actually iam placing a table in word document usign aspose.words and while retrieving word document content using doc.range.text iam getting some unwanted data as shown below..how to replace tht unwanted data .....for example i gave table screen shot attchment which i gave in word document..while rettrieving iam getting in this below way..how to replace

by using doc.range.text iam getting in this way

Testaaaaaaaaaatestsampleaa

while iam trying in innerxml iam getting in this way

Testtestsample

Hello

Thanks for your request. I this case you can try using PreserveTableLayout option of TxtSaveOptions Class. This option specifies whether the program should attempt to preserve layout of tables when saving in the plain text format.

http://www.aspose.com/documentation/.net-components/aspose.words-for-.net/aspose.words.saving.txtsaveoptions.preservetablelayout.html

Also I think the information provided here will be useful for you:

http://www.aspose.com/documentation/.net-components/aspose.words-for-.net/howto-extract-text-only.html

Best regards,

i tried in this method is it the right way to use?getting only content without any junk ,special characters,codes.......

Document dstDoc = new Document(@"C:\Textclean1.docx");//Paper size of this document is A4

string value = dstDoc.ToTxt();

value = value.Replace("\r", "").Replace("\n", "");

Hello

Thanks for your request.

The ways to retrieve text from the document are:

· Use Document.Save with SaveFormat.Text to save as plain text into a file or stream.

· Use Node.ToTxt. Internally, this invokes save as text into a memory stream and returns the resulting string.

· Use Node.GetText to retrieve text with all Microsoft Word control characters including field codes.

· Implement a custom DocumentVisitor to perform customized extraction.

Please see the following link to learn more:

http://www.aspose.com/documentation/.net-components/aspose.words-for-.net/howto-extract-text-only.html

You can try using the folliwng code:

Document doc = new Document("C:\\Temp\\in.docx");

Console.WriteLine(doc.GetText());

Or Save to TXT using PreserveTableLayout option:

TxtSaveOptions options = new TxtSaveOptions();

options.PreserveTableLayout = true;

Document doc = new Document("C:\\Temp\\in.docx");

doc.Save("C:\\Temp\\out.txt", options);

Best regards,

when iam trying to place some characters in one column in table like this

Gfhg.,*&$#1254@!)({}[]":;,./?^%_+~`

by using doc.Totxt() it is reading all the characters as in same way......it is not replacing... actually by using totxt it read only content no special charatcters ..y like tht

when iam trying to place some characters in one column in table like this

Gfhg.,*&$#1254@!)({}[]":;,./?^%_+~`

by using doc.TOtxt () it is reading all the characters as in same way......it is not replacing

Hello

Thanks for your request. Unfortunately, it is not quite clear for me what your problem is. Could you please describe your issue more specifically and provide sample input and expected output documents here for testing? I will check the issue on my side and provide you more information.

Best regards,