Replace junk data while placing table in word document using aspose.words

deepi · September 19, 2011, 1:07am

Actually iam placing a table in word document usign aspose.words and while retrieving word document content using doc.range.text iam getting some unwanted data as shown below…how to replace tht unwanted data …for example i gave table screen shot attchment which i gave in word document…while rettrieving iam getting in this below way…how to replace

by using doc.range.text iam getting in this way

Testaaaaaaaaaatestsampleaa

while iam trying in innerxml iam getting in this way

Testtestsample

AndreyN · September 19, 2011, 1:53am

Hello
Thanks for your request. I this case you can try using PreserveTableLayout option of TxtSaveOptions Class. This option specifies whether the program should attempt to preserve layout of tables when saving in the plain text format.
https://reference.aspose.com/words/net/aspose.words.saving/txtsaveoptions/preservetablelayout/
Also I think the information provided here will be useful for you:
https://docs.aspose.com/words/java/extract-selected-content-between-nodes/
Best regards,

deepi · September 19, 2011, 2:05am

i tried in this method is it the right way to use?getting only content without any junk ,special characters,codes…

Document dstDoc = new Document(@"C:\Textclean1.docx"); //Paper size of this document is A4
string value = dstDoc.ToTxt();
value = value.Replace("\r", "").Replace("\n", "");

AndreyN · September 19, 2011, 2:13am

Hello
Thanks for your request.
The ways to retrieve text from the document are:
· Use Document.Save with SaveFormat.Text to save as plain text into a file or stream.
· Use Node.ToTxt. Internally, this invokes save as text into a memory stream and returns the resulting string.
· Use Node.GetText to retrieve text with all Microsoft Word control characters including field codes.
· Implement a custom DocumentVisitor to perform customized extraction.
Please see the following link to learn more:
https://docs.aspose.com/words/java/extract-selected-content-between-nodes/
You can try using the folliwng code:

Document doc = new Document("C:\\Temp\\in.docx");
Console.WriteLine(doc.GetText());

Or Save to TXT using PreserveTableLayout option:

TxtSaveOptions options = new TxtSaveOptions();
options.PreserveTableLayout = true;
Document doc = new Document("C:\\Temp\\in.docx");
doc.Save("C:\\Temp\\out.txt", options);

Best regards,

deepi · September 19, 2011, 2:29am

when iam trying to place some characters in one column in table like this
Gfhg.,*&$#1254@!)({}[]":;,./?^%_+~`
by using doc.Totxt() it is reading all the characters as in same way…it is not replacing… actually by using totxt it read only content no special charatcters …y like tht

deepi · September 19, 2011, 2:29am

when iam trying to place some characters in one column in table like this
Gfhg.,*&$#1254@!)({}[]":;,./?^%_+~`

by using doc.TOtxt() it is reading all the characters as in same way…it is not replacing

AndreyN · September 19, 2011, 5:27am

Hello
Thanks for your request. Unfortunately, it is not quite clear for me what your problem is. Could you please describe your issue more specifically and provide sample input and expected output documents here for testing? I will check the issue on my side and provide you more information.
Best regards,