Best Way To Parse Out Tables and Paragraph

robertal · June 18, 2013, 10:28pm

Hi,

Can you please make any recommendations on parsing out the info from the attached file? I had originally though going node by node would work. However the requirements got tougher. Basically I need to save the main object names (style: heading 2), item names (style: heading 3), then each of the sub items (style: normal). The trickiest part is parsing out each of the rows under each category. I have to have a list of each row by category, per item. I decided on using a dictionary<string, row class>, where string is the category name, and row class is a class I made to hold each of the headings. I don’t need to store the headings.

I have looked at examples Aspose has on-line, but the don’t seem to fit my situation. Can this be parsed linearly, node by node, or would you recommend another approach. The hardest thing I am facing is keeping track of my cursor while I do this linearly over a NodeCollection.

Additional notes: If there are more than one Main Object next to each other, I skip to the last one. Headings in the category tables, can not be used for finding the text. The headings are dynamic. My requirements require that I look for the 4 columns merged together.

Any suggestions you can provide will be greatly appreciated.

Thank you.

tahir.manzoor · June 20, 2013, 4:18am

Hi Rob,

Thanks for your inquiry. I have tried to understand your query but unfortunately I have not completely understood your query. It would be great if you please share some more detail about your query.

If you want to get the style of a Paragraph, you can get it by using ParagraphFormat.Style Property.

You can get the document’s nodes by using CompositeNode.GetChildNodes method. This method returns a live collection of child nodes that match the specified type.

NodeCollection rows = doc.GetChildNodes(NodeType.Row, true);

Please share some more detail about your query. We will then provide you more information about your query along with code.