Fetched Paragraph style from a doc file Issue

Hi Aspose Team,
I am doing the documnt styling using ASPOSE. Here i am pasting the piece of code doing the job. But i am experiencing one issue i have described below. PF the attached sample documnt(sample1.docx) for input and the expected output i have mentioned below. Here i am attching the style mapper file for your reference to get the xml output in a specific format as i have mentioned the output.

private XmlDocument docToXml(Document doc, string metaData)
{
    XmlDocument xmlDoc = new XmlDocument();
    XmlNode documentNode;
    XmlNode LevelOneNode = null;
    XmlNode LevelTwoNode = null;
    XmlNode LevelThreeNode = null;
    XmlNode LevelFourNode = null;
    xmlDoc.LoadXml("<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?><document></document>");
    documentNode = xmlDoc.SelectSingleNode("document");
    documentNode.InnerXml = metaData;
    StyleMapper styleMap = new StyleMapper(AppPath + "stylemapping.xml");
    TextReplaceMapper textMap = new TextReplaceMapper(AppPath + "textmapping.xml");
    xmlDoc.PreserveWhitespace = true;
    XmlElement pageNode;
    pageNode = addPage(xmlDoc);
    documentNode.AppendChild(pageNode);
    int currentPageNumber = int.Parse(pageNode.Attributes["id"].Value);
            
    //AppLog.trace("processing page 1");
            NodeCollection paras = doc.GetChildNodes(NodeType.Paragraph, true);
                foreach (Paragraph para in paras)                    
    {
        string DocText = para.Range.Text;
        if (DocText[0] == 0xC)
        {
        DocText = DocText.Substring(1, DocText.Length - 1);
        pageNode = addPage(xmlDoc);
        currentPageNumber = int.Parse(pageNode.Attributes["id"].Value);
      
        }
     
        DocText = ConvertCodes(DocText);
        DocText = DocText.Trim();
                   
        if (DocText != string.Empty)
        {
        string StyleName;
        XmlNode DocElement = null;
        DocText = textMap.DoReplacment(DocText);
      
            Style ws = para.ParagraphFormat.Style;
            StyleName = (ws == null ? "normal" : ws.Name);
                       
        }
        }
    }
           
    return xmlDoc;
}

Output:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?> 
 <document>
 <meta>
  <timestamp>20040119130000</timestamp> 
  <published>20040119130000</published> 
  <expires /> 
  <plan>45000</plan> 
  <source>AAROW</source> 
  <userid>a259593</userid> 
  </meta>
 <page id="1">
  <p>PARTICIPATING IN YOUR plan 
  <PlanIntro>You can measure the amount of time that a driver spends in deferred procedure calls (DPCs) and interrupt service. routines (ISRs) by tracing these events in the Windows kernel. This information will help you to minimize the time. For additional information about your Plan visit www.example.com</PlanIntro> 
 <PlanIntro>
  When am I eligible for the Plan? 
  <p>You are eligible to participate in the Plan if: 
  <bullet>you are eligible to xyx</bullet> 
  <bullet>you are a Indian citizen</bullet> 
  <p>You can measure the amount of time that a driver spends in deferred procedure calls. 
  <p>How do I enroll in the Plan? 
  <p>You can measure the amount of time that a driver spends in deferred procedure calls (DPCs) and interrupt service. routines (ISRs) by tracing these events in the Windows kernel. enroll. 
  <p>You can measure the amount of time that a driver spends in deferred procedure calls (DPCs) and interrupt service. routines (ISRs) by tracing these events in the Windows kernel. enroll. You can measure the amount of time that a driver spends in deferred procedure calls (DPCs) and interrupt service. routines (ISRs) by tracing these events in the Windows kernel. enroll. You can measure the amount of time that a driver spends in deferred procedure calls (DPCs) and interrupt service. routines (ISRs) by tracing these events in the Windows kernel. enroll. You can measure the amount of time that a driver spends in deferred procedure calls (DPCs) and interrupt service. routines (ISRs) by tracing these events in the Windows kernel. enroll. 
  </PlanIntro>
 <Question title="When is my enrollment effective?">
  <p>You can measure the amount of time that a driver spends in deferred procedure calls (DPCs) and interrupt service. routines (ISRs) by tracing these events in the Windows kernel. enroll. 
  </Question>
  </page>
  <page id="2" /> 
  </document>

The issue Issue :
Why two same question-statement appearing in a two different way. e.g
When am I eligible for the Plan? is appearing as <PlanIntro>
When am I eligible for the Plan? (The style name is FP_Tag Indicator)
and the

When is my enrollment effective? is appearing as

<Question title="When is my enrollment effective?"> (the style name is FP_Body Text Bold 10_Avenir)
that means aspose is taking both the paragraph style in a different way. The style name is different, so whenever i am applying the stylemapper.xml file i am getting diffent output. Please assist.

Please assist.

Hi Komal,

Thanks for your inquiry and sorry for the delayed response. Please use Paragraph.ToString Method instead of Paragraph.Range.Text.

I am unable to execute your code as it is not the complete code. Please create a simple application (for example a Console Application Project) that helps us reproduce the same problem on our end and attach it here for testing. I will investigate the issue on my side and provide you more information.

Hi Tahir,
Paragraph.ToString method will return the string "Aspose.Words.Paragraph " always , so the return xml would be the below one which is not the expected one, for that i am fetching the text format of the paragraph. Correct me if i am wrong. Thanks!

<page id="1">
  <p>Aspose.Words.Paragraph</p> 
  <PlanIntro>Aspose.Words.Paragraph</PlanIntro> 
 <PlanIntro>
  Aspose.Words.Paragraph 
  <p>Aspose.Words.Paragraph</p> 
  <bullet>Aspose.Words.Paragraph</bullet> 
  <bullet>Aspose.Words.Paragraph</bullet> 
  <bullet>Aspose.Words.Paragraph</bullet> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  </PlanIntro>
  <Question title="Aspose.Words.Paragraph" /> 
 <Question title="Aspose.Words.Paragraph">
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  </Question>
 <p>
  Aspose.Words.Paragraph 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
 <p>
  Aspose.Words.Paragraph 
  <p>Aspose.Words.Paragraph</p> 
 <p>
  Aspose.Words.Paragraph 
  <bullet>Aspose.Words.Paragraph</bullet> 
  <bullet>Aspose.Words.Paragraph</bullet> 
  <bullet>Aspose.Words.Paragraph</bullet> 
  <bullet>Aspose.Words.Paragraph</bullet> 
 <Question title="Aspose.Words.Paragraph">
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <p>Aspose.Words.Paragraph</p> 
  <bullet>Aspose.Words.Paragraph</bullet> 
  <bullet>Aspose.Words.Paragraph</bullet>

Hi tahir,

let me simplify my query. In that sample doc i have attached is having two question mark statement

one is :

PARTICIPATING IN YOUR plan

{PlanDetailsIntro} You can measure the amount of time that a driver spends in deferred procedure calls (DPCs) and interrupt service. routines (ISRs) by tracing these events in the Windows kernel. This information will help you to minimize the time**.** For additional information about your Plan visit www.example.com**{/PlanDetailsIntro}{PlanDetails}**

When am I eligible for the Plan?

and other is :

When is my enrollment effective?

You can measure the amount of time that a driver spends in deferred procedure calls (DPCs) and interrupt service. routines (ISRs) by tracing these events in the Windows kernel. enroll.

Here "When am I eligible for the Plan?" style is recognized as FP_Tag Indicator and the

When is my enrollment effective? style is recognized as FP_Body Text Bold 10_Avenir by aspose.

Though the word syle is FP_Body_Text_Bold_10_Avenir_Char as actual. Please advise on this.

Hi,

Here is the code i am pasting for the above issue: I have attached the Sample input file New.doc.

Document doc = new Document(@"C:\Users\a532830\Desktop\situ\New.doc");
NodeCollection paras = doc.GetChildNodes(NodeType.Paragraph, true);
foreach (Paragraph para in paras)
{
    string DocText = para.ToString(SaveFormat.Text);
    DocText = DocText.Trim();
    if (DocText != string.Empty)
    {
        Style ws = para.ParagraphFormat.Style;
        string Stylename = ws.Name;
    }
}

for When am I eligible for the Plan? the style is FP_Tag Indicator

for How do I enroll in the Plan? the style is FP_Body Text 10_Berkeley

and for When is my enrollment effective? the style is FP_Body Text Bold 10_Avenir

But actually the style is

“FP_Body Text Bold 10_Avenir Char” for all the above lines. Please assist on this issue.

Hi Komal,

Thanks for sharing the detail. The formatting applied to text in a Microsoft Word document can come from many different sources. A useful thing to note about the Aspose.Words’ API is that querying direct formatting (Run.Font,Paragraph.ParagraphFormat) will normally return the “calculated” formatting value based on all direct formatting, styles and document defaults etc. Therefore, using the direct formatting properties are the best way to find the visible formatting of the content.

In your case, I suggest you please use the Run.Font.Style to get the character style applied to text formatting. Hope this helps you. Please let us know if you have any more queries.

Document doc = new Document(MyDir + "sample1.docx");
foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    if (para.ToString(SaveFormat.Text).Trim() == "When is my enrollment effective?")
    {
        Console.WriteLine("When is my enrollment effective? " + para.Runs[0].Font.Style.Name);
    }
    if (para.ToString(SaveFormat.Text).Trim() == "When am I eligible for the Plan?")
    {
        Console.WriteLine("When am I eligible for the Plan? " + para.Runs[0].Font.Style.Name);
    }
}

Hi,

I have created a console application for you to reproduce the issue. I am not able to attach the app in this thread. I’ll try and send an email to you with the console app. Below description should help you understand our issue and the resolution we are looking.

Issue:

Value retruned by para.ParagraphFormat.Style and para.Runs[0].Font.Style.Name is not same.

For the First Question: When am I eligible for the Plan?

The value retruned by para.ParagraphFormat.Style is FP_Tag Indicator

But the value returned by para.Runs[0].Font.Style.Name is FP_Body Text Bold 10_Avenir Char.

Whant we wnated is :

FP_Body Text Bold 10_Avenir Char

Becasue today we are using Interop.Word API to get the style name. when we execute the below Interop code for the same question the value returned is :

FP_Body Text Bold 10_Avenir Char.

Interrop Code: Word.Style ws = (Word.Style) para.Range.get_Style();

We wanted both para.ParagraphFormat.Style and para.Runs[0].Font.Style.Name to return the same value for the above question.

Also we wanted to follow one standard way of getting the style name its either para.ParagraphFormat.Style OR para.Runs[0].Font.Style.Name.

With reference to the attached document (test.doc) para.ParagraphFormat.Style is working fine for all the paragraps but not for question mark paragraps.

So we have a temp code fix today:

Style ws = para.ParagraphFormat.Style;

if (DocText.Contains("?"))
{
    StyleName1 = para.Runs[0].Font.Style.Name;
}
else
{
    StyleName1 = ws.Name;
}

We wanted to avoid the temp code and have one standard way of getting the style name.

The expected output for all the question mark paragraps is : FP_Body Text Bold 10_Avenir Char.

Let me know if you need more information

Thanks,

Hari

Can you please assist?

Hi Hari,

Thanks for your inquiry. I have not received your Console application via email. However, I am able to get the following results.

The value returned by para.ParagraphFormat.Style is FP_Tag Indicator

The value returned by para.Runs[0].Font.Style.Name is FP_Body Text Bold 10_Avenir Char.

Aspose.Words returns the correct value in both cases. I have converted your document to open document and have found that the style of Paragraph and Run node is different. Please see the attached image for detail.

The content of the paragraph is contained in one or more runs (<w:r>). The formatting is specified within a <w:rPr> and can be direct formatting. Paragraph formatting is within a <w:pPr>.

Please note that formatting applied to text in a Microsoft Word document can come from many different sources. A useful thing to note about the Aspose.Words API is that querying direct formatting (Run.Font,Paragraph.ParagraphFormat) will normally return the “calculated” formatting value based on all direct formatting, styles and document defaults etc. Therefore using the direct formatting properties are the best way to find the visible formatting of the content.

Hope this answers your query. Please let us know if you have any more queries.

Hi,

What does direct formatting properties mean. Can you please provide couple of examples.

How can we standardise this code:

Style ws = para.ParagraphFormat.Style;

if (DocText.Contains("?"))
{
    StyleName1 = para.Runs[0].Font.Style.Name;
}
else
{
    StyleName1 = ws.Name;
}

Hi Hari,

Thanks for your inquiry. By direct formatting I mean formatting included in addition to the original style which is applied directly onto the paragraph. Please see the attached document and image for detail. This document has a style named ‘Style1’ with font setting (Calibri, 26). I have applied the direct formatting to first paragraph with font setting (Verdana, 48). Please execute the following code snippet with attached document to check the font properties.

Hope this answers your query. Please let us know if you have any more queries.

Document doc = new Document(MyDir + "style.docx");
foreach (Paragraph para in doc.FirstSection.Body.Paragraphs)
{
    Console.WriteLine(para.ParagraphFormat.Style.Font.Size.ToString()); // 26
    Console.WriteLine(para.ParagraphFormat.Style.Font.Name); // Calibri
    foreach (Run run in para.Runs)
    {
        Console.WriteLine(run.Font.Size.ToString()); // 48
        Console.WriteLine(run.Font.Name); // Verdana
    }
}