Hi,
I want to extract the text of paragraph based on the Indent number, like my document is look like this:
- Paragraph1
This is para1 text (This is also one of the paragraph).
- Paragraph3
This para3 text (paragraph4)
This is para3 text(paragraph5)
So how can i extract the paragraph text of the perticular Indent number paragraph… I am tracking this by converting document into html, because paragraph text may have fields (hyperlink,…), but it is look like more complex for finding which one is Indent paragraph and which one is normal paragraph…
So is there any way to tract the paragraph text which is under another paragraph…
Hi
Thanks for your inquiry. Could you please attach sample document for testing. I will investigate the issue and provide you more information.
Best regards.
Hi Alexey,
Thanks for your response, please find the attached document it contains the data how our document look like…
In attached document some of the text is:
1 Intend num1
Text of Intend num1
1.1 Heading2
Test of Heading 2 at 1.1
I need to maintain Hashtable where key is the HeadingName(Intend num1) and value is sub-text of Heading(Text of Intend num1), same like for other headings also…
so Is there any way to get the Heading text and text under heading(sub text)…?
could you please help me out is this issue with proper logic to extract heading text and sub-text of heading…
Regards,
Srinu Dhulipalla
Hi
Thank you for additional information. Yes, of course you can do that using Aspose.Words. I think you can try using code like the following.
//Create hastble where key is headign text and value is heading content
Dictionary<string, string> headings = new Dictionary<string, string>();
//Open document
Document doc = new Document(@"Test046\in.doc");
//Create two variables
string key = string.Empty;
string value = string.Empty;
//Loop through all section in the docuemnt
foreach (Section sect in doc.Sections)
{
//loop though all child nodes of sections body
foreach (Node child in sect.Body.ChildNodes)
{
Paragraph par = child as Paragraph;
//If current node is heading paragraph we will start to collect heading data
if (par != null &&
(par.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading1 ||
par.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading2 ||
par.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading3))
{
if (!string.IsNullOrEmpty(key))
headings.Add(key, value);
key = par.ToTxt();
value = string.Empty;
}
else
{
value += child.ToTxt();
}
}
}
//Write collected data to console
foreach (KeyValuePair<string, string> entry in headings)
{
Console.WriteLine("Heading : {0}", entry.Key);
Console.WriteLine("Content : {0}", entry.Value);
}
Hope this helps.
Best regards.
Hi Alexey,
Thank you for providing extra needful information, In my attached document one of the heading content having hyperlink like,
1.1.1 Heading 2
Test of Heading 2 at 2.1.1
Click here to view attached file.
1.1.2 Heading 2
Test of Heading 2 at 2.1.2
So when i following the above code given by you returns text of the hyperlink only, how should i get the hyperlink of the perticular heading…
This was very much need to us for providing the better document information to our clients…
Regards,
Srinu Dhulipalla
Hi
Thanks for your inquiry. Could you please explain how you would like to store data under heading? For example, you can try store data as HTML or RTF string. See the following code for instance:
public void Test046()
{
//Create hastble where key is headign text and value is heading content
Dictionary<string, string> headings = new Dictionary<string, string>();
//Open document
Document doc = new Document(@"Test046\in.doc");
//Create two variables
string key = string.Empty;
Document subDoc = new Document();
subDoc.FirstSection.Body.RemoveAllChildren();
//Loop through all section in the docuemnt
foreach (Section sect in doc.Sections)
{
//loop though all child nodes of sections body
foreach (Node child in sect.Body.ChildNodes)
{
Paragraph par = child as Paragraph;
//If current node is heading paragraph we will start to collect heading data
if (par != null &&
(par.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading1 ||
par.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading2 ||
par.ParagraphFormat.StyleIdentifier == StyleIdentifier.Heading3))
{
if (!string.IsNullOrEmpty(key))
headings.Add(key, ConvertDocumentToHtml(subDoc));
key = par.ToTxt();
subDoc = new Document();
subDoc.FirstSection.Body.RemoveAllChildren();
}
else
{
//import node
Node dstNode = subDoc.ImportNode(child, true, ImportFormatMode.KeepSourceFormatting);
subDoc.FirstSection.Body.AppendChild(dstNode);
}
}
}
//Write collected data to console
foreach (KeyValuePair<string, string> entry in headings)
{
Console.WriteLine("Heading : {0}", entry.Key);
Console.WriteLine("Content : {0}", entry.Value);
}
}
private string ConvertDocumentToHtml(Document doc)
{
string html = string.Empty;
//Save docuemnt to MemoryStream in Hml format
using (MemoryStream htmlStream = new MemoryStream())
{
doc.Save(htmlStream, SaveFormat.Html);
//Get Html string
html = Encoding.UTF8.GetString(htmlStream.GetBuffer(), 0, (int)htmlStream.Length);
}
return html;
}
Hope this helps.
Best regards.
Hi Alexey,
Thank you for providing better information which is needful to us, but what i want to ask you is, Is there anyway to findout Parent-Child relationship for the paragraphs based on Headings… for example, in my previous attached document contains:
1 Intend num1
Text of Intend num1
1.1 Heading2
Test of Heading 2 at 1.1
So 1.1 Heading2 is the child of the 1 Intend num1, How can i findout the Heading2(1.1) is the child of the Heading1(1)… in thte sameway Heading5 is the child of the Heading4…
Is there any method to findout the parent-child relationship between Heading style paragraphs…
This relation was needed to us to maintain the information which one is parent and which one is child… based on this only we are providing information to our clients…
Regards,
Srinu Dhulipalla
Hi
Thanks for your inquiry. There is no parent-child relationships between paragraphs in word document. Please see the following link to learn how about document object model.
https://docs.aspose.com/words/net/aspose-words-document-object-model/
However, you can guess that if Heading2 paragraph is following after Heading1 paragraph, Heading2 is child of Heading1.
Best regards.
Hi Alexey,
Thanks for your response. I used this code some time back to import data under Heading styles, but when we are using this statement,
Node dstNode = subDoc.ImportNode(child, true, ImportFormatMode.KeepSourceFormatting);
here child parameter is one of the Node in the section, unfortunately if my document having ListNumber type, after importing Node i am getting ‘1’ before all the listNumbered types.
Is there any way to catch ListTypes are one Node, like Table is one node in document, paragraph is one node in document
i am eagerly looking for solution to resolve this one since many days… Could you please help me out…
Thanks,
Hi
Thanks for your inquiry. There is very simple way to resolve this problem. You should just use NodeImporter instead of using Document.ImportNode method.
https://reference.aspose.com/words/net/aspose.words/nodeimporter/
Please let me know if you need more assistance, I will be glad to help you.
Best regards.
Hi Alexey,
Thanks for you Response,
Attached document is one of part of my original document, i need to read para by para from source document and need to place in different target documents without changing Formating like LlistNumbers. I am using ImportNode() method, to write the source paragraph into target like:
foreach (Section sect in docTest.Sections)
{
foreach (Node child in sect.Body.ChildNodes)
{
Paragraph para = child as Paragraph; string k = para.ToTxt().ToString();
Node dstNode = subDoc.ImportNode(child, true, ImportFormatMode.KeepSourceFormatting);
subDoc.FirstSection.Body.AppendChild(dstNode);
}
}
But in this process i lost the Formating for ListNumbers. Could you please suggest to overcome this one using NodeImporter()…
Thanks,
Hi
Thanks for your inquiry. Here is the same code using NodeImporter:
// Open docuemnts
Document dstDoc = new Document(@"Test001\dst.doc");
Document srcDoc = new Document(@"Test001\src.doc");
// Create NodeImporter.
NodeImporter importer = new NodeImporter(srcDoc, dstDoc, ImportFormatMode.KeepSourceFormatting);
// Copy content from one docuemnt into another.
foreach (Section sect in srcDoc.Sections)
{
foreach (Node child in sect.Body.ChildNodes)
{
Node dstNode = importer.ImportNode(child, true);
dstDoc.FirstSection.Body.AppendChild(dstNode);
}
}
// Save output document
dstDoc.Save(@"Test001\out.doc");
Best regards.