We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Weird characters after converting from word to text

I am evaluating the words .net component for purchase consideration. I’m simply trying to take a word doc and convert it to text. I have 2 questions.

  1. The original word doc contains auto-numbers in the body. If I just read in the text into memory using aw.Document.GetText I lose those auto-numbers. If I save it to a text file it preserves them. Is there a flag or something I need to set to keep them in memory?

  2. The word doc has some data that looks like this:

<span style=“font-family:“Arial”,“sans-serif””>
RE: Rosa Gutierrez Amezquita

DOB: March 30, 1940

MR#: 55555

REASON FOR CONSULTATION: I was asked to see this patient by Dr. Maribel Flores for evaluation and management of endstage renal disease.

When i view it in memory or in the saved text file version I don’t get newlines in the first 3 lines above. Instead I see weird characters as in the attached file.
How can I remove or convert these to carriage returns so they look like above?

Thanks,
Mike

Ok. So I figured out the characters where vertical tabs. Now If I can just get the auto-generated numbers/bullets to show when I do a getText.

Hi there,

Thanks for your inquiry.

Regarding your first query, you can use ToTxt() instead which should preserve list labels. It’s not recommended to use GetText() as if you have a field in your document you will end up getting the entire structure of field in text (i.e the field code and field result). With ToTxt you only get the visible content.

If we can help you with anything else please feel free to ask.

Thanks,

Ok. I tried the toTxt. It did not preserve the list numbers. Assuming I’m doing this correctly with this code:

Dim aw = New Aspose.Words.Document(f)
Dim txtIn as string = aw.toTxt()

If I save it in text format and then open that file the list numbers are there. Perhaps this is the only way to do this?

Rut

Hi Rut,

Thanks for your request. I think, in your case, you can save the document as TXT to stream and then read a string from this stream.

Best regards,