Using TabStops to parse Run.Text

Is there any way to use TabStop position to parse a Run.Text. Similar to a Substring method to pull out the text at a certain tabstop position?

I've verified that there are various tabstop collection in the word document and have been able to see the value of the positions but not sure how to use the position (in points) in relationship to the position of the text in the Run.Text.


This message was posted using Page2Forum from TabStops Property - Aspose.Words for .NET

Hi Cheryl,

Thanks for your inquiry.

Could you please attach your template here for testing? Most likely content separated by tab stops will have tab characters inside the runs which actually line the text up with each of the tab stops. This means you should be able to find the content between each stop by searching for these and parsing the text between them.

Thanks,

Adam,

I'm attaching a testtab.doc and below is the code I'm using. Basically I am able to split the run text by the ControlChar.TabChar. This would work fine if all the lines had the same tabbing. But take a look at the doc I attached. I highlighted in yellow, the line where I want to determine which tab stop the text is at so I know what the text is related to.

The final goal is to load this text into a database table in a meaningful way.

Any creative suggestions would be appreciated. :-)

Dim doc As Aspose.Words.Document = New Document(FilePath)
ParseRuns(doc)

Sub ParseRuns(ByVal doc As Document)
Dim runs As NodeCollection = doc.GetChildNodes(NodeType.Run, True, False)
Dim text() As String

For Each run As Run In runs
Log("****************RUN********************")
text = run.Text.Split(ControlChar.TabChar)
For i As Integer = 0 To text.Length - 1
Log("tab index: " + i.ToString() + " text: " + text(i))

Next

Next


End Sub

Hi Cheryl,

Thanks for this additional information.

I have taken a look at your template and you are correct, the tabs do differ which makes extracting the certain columns difficult. At the moment I am unaware of any way to extract text from a certain tabstop.

Is there any reason that you need to use the tabs to extract the data from the document? Have you considered using a table (with no borders) instead? This would be alot easier to extract the information from.

I will be glad to help with any reworking of your template if you can use this method.

Thanks,

Adam,

I'm new to Aspose.Words (used Aspose.Cells more). I wasn't aware of the Table with no borders option. I'm more than happy to look at other options.

Cheryl

Hi Cheryl,

Sure, I have done a quick rework of your template so it uses a table instead and attached the new version to this post. I have tried to keep the information from the tabs consistent when moving them over to the table but you may find you will have to change some values if they appear in the wrong place.

You can then extract the content of each cell for use in your database by using the code below:

Dim tables As NodeCollection = doc.GetChildNodes(NodeType.Table, True)

Dim table As Table = CType(tables(0), Table)

For Each row As Row In table.Rows

For Each cell As Cell In row.Cells

Dim text As String = cell.GetText()

Console.WriteLine("Text in cell (R:{0}, C:{1})= {2}", table.Rows.IndexOf(row), row.Cells.IndexOf(cell), text)

Next cell

Next row

Thanks,

Great. Thanks Adam!

I'm getting this error:

Unable to cast object of type 'Aspose.Words.NodeCollection' to type 'Aspose.Words.Tables.TableCollection'.

I'm using this in VS 2005

I've use the TraverseAllNodes method I found in Help and there are no nodes of NodeType.Table.

So I'm thinking this option won't work???

Ok, nevermind. I just realized that in order to get this Table option to work you had modified the test document. That is not an option for me.

Hi Cheryl,

Thanks for this additonal information.

My apologises, there was a mistake in the first line of the code. I have fixed that and reattached the correct code.

I'm afraid you will have to change the template in order to get that functionality working. I'm not sure of any other ways to use tab stops to separate the text. Why is the document layout not able to be changed? With further information there may be other suggestions as to how to achieve what you're looking for.

Thanks,

Adam,

I think I've found a manipulate the word document to have tables so I can use the tables code option. In your last post, you mentioned you had updated your code and reattached and yet I don't see the new code attached. Can you do that?

Thanks for your help!!

Cheryl

Hi Cheryl,

Thanks for your inquiry. I posted the updated code in the original post. I will repost it here just in case.

Dim tables As NodeCollection = doc.GetChildNodes(NodeType.Table, True)

Dim table As Table = CType(tables(0), Table)

For Each row As Row In table.Rows

For Each cell As Cell In row.Cells

Dim text As String = cell.GetText()

Console.WriteLine("Text in cell (R:{0}, C:{1})= {2}", table.Rows.IndexOf(row), row.Cells.IndexOf(cell), text)

Next cell

Next row

Thanks,