Recognizing linked styls

Good day.

We are importing sections of one document into another. As we do this, we set style and font properties to match the destination document styles.

We noticed a problem in which some text was coming over incorrectly from the source document. I think I’ve narrowed it down to a specific issue.

Please see the attached document. This is a portion the “source” document. You can see that all the text has a Tahoma 8.5 font applied if viewed in MS Word 2010.

When we import, we iterate through the paragraphs, and within each paragraph, each Run.

The following is the toTxt() of the paragraph node. Below it is the ParagraphFormat and font information for this paragraph as extracted via Aspose:

<div style="margin-left:30px;color:#909090">
    <font size="2">
        With the majority of the mission critical business systems
        currently being hosted in Timbuktu, maximum business network latency has
        been defined as being from each remote site to the Timbuktu co-loc
        computer center. RFP Spreadsheet D provides the max latency by site. For
        providers proposing a global speed solution, committed business system
        latencies should be from each site to the Timbuktu co-lo compute center.
        For providers proposing a regional solution, the cross connects will be
        done at the regional co-loc computer centers so latency commitments
        should be from the site to the co-lo compute center where the cross
        connect will be performed in the other region. Engineering latency
        commitments should be defined as being from the engineering site to the
        in-region compute ranch.

    </font>
</div><font size="2">

    Style: Body Text,
    Line spacing: 12,
    Line spacing rule: 2,
    <font color="#FF0000">
        Font name: Times New Roman,
        Font size: 10
    </font>
</font>

</blockquote>We are using the following ColdFusion code to extract those values:
<blockquote>
    <font face="Courier New" size="1">
        Style: #currNode.getParagraphFormat().getStyleName()#,
        Line spacing: #currNode.getParagraphFormat().getLineSpacing()#,
    </font><font face="Courier New" size="1">
    </font><font face="Courier New" size="1">
        Line spacing rule: #currNode.getParagraphFormat().getLineSpacingRule()#,
    </font><font face="Courier New" size="1">
    </font><font face="Courier New" size="1">
        Font name: #currNode.getParagraphFormat().getStyle().getFont().getName()#,
    </font><font face="Courier New" size="1">
    </font><font face="Courier New" size="1">
        ont size: #currNode.getParagraphFormat().getStyle().getFont().getSize()#
    </font>
</blockquote>So, it reads the correct style name ("Body Text"), but it shows an incorrect font face and size (Times New Roman).  So far, not an issue, as some of the runs could override the paragraph style.

But f we iterate through the runs within this node, and show the font face and size of each run, it still seems to be incorrect:

<ul>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                With the majority of the mission critical business systems currently being hosted in
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                Timbuktu
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                , maximum business network latency has been defined as being from each remote site to the
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                Timbuktu
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">

            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                co-
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                loc
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">

            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                computer
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                center
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                . RFP
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                Spreadsheet
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                D provides the max latency by site. For providers proposing a global
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                speed
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                solution, committed business system latencies should be from each site to the
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                Timbuktu
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">

            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                co-lo compute
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                center
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                . For providers proposing a regional solution, the cross connects will be done at the regional co-
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                loc
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                computer
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                centers
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                so latency commitments should be from the site to the co-lo compute
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                center
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                where the cross connect will be performed in the other region. Engineering laten
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
    <li>
        <div style="margin-left:30px;color:#909090">
            <font size="2">
                cy commitments should be defined as being from the engineering site to the in-region compute ranch.
            </font>
        </div><font size="2">
            Font name: Times New Roman,
            Font size: 8.5
        </font>
    </li>
</ul>

That was generated with the following code:

<cfset runs=currNode.getRuns()>
    <cfset ri=runs.iterator()>
    Made up of: <br />
    <ul>
       <cfloop condition="#ri.hasNext()#">
         <cfset run=ri.next()>
            <li>
            <div style="margin-left:30px;color:##909090">
             #run.getText()#&nbsp;
            </div>
            Font name: #run.getFont().getName()#,
            Font size: #run.getFont().getSize()#                                         
        </li>
      </cfloop>
    </ul>

So it seems no matter how I extract the font properties, Aspose properties indicate Times New Roman, while the real font is something else.

I think the reason that the style properties shown in Aspose do not match the properties shown in Word, is that the Body Text style is a linked style. At least, in my much larger document, I am not able to properly convert any text that uses a linked style in the source document.

Am I misunderstanding linked styles and how Aspose is reading the font face and size? Is there a way to extract the actual font/formatting information for the text that has a linked style? In the end, I am trying to set the fonts as well, but I think if I solve the former issue the latter may fall into place.

Thank you.

I forgot to mention - I am using Aspose.Words for Java, the latest edition that I downloaded yesterday (believe it is 11.0 but not at my PC at the moment)

Correction: the version used to test is as follows:

Specification-Title: Aspose.Words for Java
Specification-Version: 11.11.0.0
Specification-Vendor: Aspose Pty Ltd
Implementation-Title: Aspose.Words for Java
Implementation-Version: 11.11.0.0
Implementation-Vendor: Aspose Pty Ltd
Release-Date: 2012.12.30

Thank you.

Hi Chris,

Thanks for your inquiry. I have tested the scenario (iterate through all Runs of each Paragraph node) with latest version of Aspose.Words for Java (v13.1.0) and have not found the shared issue. I would suggest you please upgrade to the latest version (v13.1.0) and let us know how it goes on your side.

If you are facing this issue in new document in which you are importing the sections, I suggest you please use ImportFormatMode.KEEP_SOURCE_FORMATTING. Moreover, please read following documentation links for your kind reference.
https://docs.aspose.com/words/java/insert-and-append-documents/
https://reference.aspose.com/words/java/com.aspose.words/importformatmode

Hope this answers your query. Please let us know if you have any more queries.

Thank you for your reply. I am now running:

Manifest-Version: 1.0
Specification-Title: Aspose.Words for Java
Specification-Version: 13.1.0.0
Specification-Vendor: Aspose Pty Ltd
Implementation-Title: Aspose.Words for Java
Implementation-Version: 13.1.0.0
Implementation-Vendor: Aspose Pty Ltd
Release-Date: 2013.01.31

and see the same issue. I am able to iterate through the paragraphs and runs as before; however, Aspose still reports that all runs are Times New Roman font. When the document is displayed in Word, the font is Tahoma.

Hi Chris,

Thanks for your inquiry. Please note that there can be different type of styles in a document like Paragraph, Character, List and Table type. A style allows you to define a set of formatting that can be reused on many elements in a document. A style loaded into a document is represented in the Aspose.Words DOM by the Style class.

The style type of “Body Text” is Paragraph. The following code snippet returns : Tahoma 12.0, which is correct output. Please see the attached image.

Document doc = new Document(MyDir + "mergesmall.docx");

System.out.println(doc.getStyles().get("Body Text").getFont().getName());
System.out.println(doc.getStyles().get("Body Text").getFont().getSize());

If you iterate through all RUN nodes and get font information through ParagraphFormat, it will return you Tahoma 12.0, which is correct output. Please see the attached image.

Document doc = new Document(MyDir + "mergesmall.docx");
NodeCollection paras = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph para : (Iterable<Paragraph>)paras)
{
    for (Run run : (Iterable<Run>)para.getChildNodes(NodeType.RUN, true))
    {
        System.out.print(run.getParentParagraph().getParagraphFormat().getStyleName());
        System.out.print(run.getParentParagraph().getParagraphFormat().getStyle().getFont().getName());
        System.out.print(run.getParentParagraph().getParagraphFormat().getStyle().getFont().getSize());
    }
}

If you iterate through all RUN nodes and get font information through RUN, it will return you Tahoma 8.5, which is correct output.

Document doc = new Document(MyDir + "mergesmall.docx");
NodeCollection paras = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph para : (Iterable<Paragraph>)paras)
{
    for (Run run : (Iterable<Run>)para.getChildNodes(NodeType.RUN, true))
    {
        System.out.print(run.getFont().getName());
        System.out.println(run.getFont().getSize());
    }
}

It would be great if you please execute the above code at your side with same document and share your finding with us.

*backprop:

Style: #currNode.getParagraphFormat().getStyleName()#,
Line spacing: #currNode.getParagraphFormat().getLineSpacing()#,
Line spacing rule: #currNode.getParagraphFormat().getLineSpacingRule()#,
Font name: #currNode.getParagraphFormat().getStyle().getFont().getName()#,
ont size: #currNode.getParagraphFormat().getStyle().getFont().getSize()#*

Please share what is the NodeType of currNode from above code?

Thank you for your reply. When I extract the nodes as you did, I get the same results you do!

Perhaps you can help me understand why it is working differently? The differences are marked in the code in red below. I was looping through the sections, and within each section, looping through the nodes and, and within each node, the runs. When I did this, the font on the runs was always Times New Roman.

When I get the Node collection by using getChildNodes() as you suggest, but otherwise leave the code the same, each run says Tahoma.

So what is different between using

doc.getChildNodes().iterator()
vs.
doc.getSections() ->section.getBody().iterator()

??

I will try to make your method work (I will need to find and move all nodes, not just paragraphs), but I’m curious why extracting runs one way shows Times, and getting to runs the other way indicates Tahoma.

Thank you for your explanation.

  • Found a section in the merge document

<cfif currNode.getNodeType() neq “HEADER_FOOTER”>

Found a node: #currNode.getNodeType()#

#htmleditformat(currNode.toTxt())#

<cfif currNode.getNodeType() eq NodeTypes.PARAGRAPH>

WAS:
Style: #currStyle#,
Line spacing: #currNode.getParagraphFormat().getLineSpacing()#,
Line spacing rule: #currNode.getParagraphFormat().getLineSpacingRule()#,
Font name: #currNode.getParagraphFormat().getStyle().getFont().getName()#,
Font size: #currNode.getParagraphFormat().getStyle().getFont().getSize()#

Made up of:

  • #run.getText()#

Font name: #run.getFont().getName()#,
Font size: #run.getFont().getSize()#,
Font bold: #run.getFont().getBold()#,
Font color: #run.getFont().getColor().getRGB()#,
Font style identifier: #run.getFont().getStyleIdentifier()#

<cfset currNode.getParagraphFormat().getStyle().getFont().clearFormatting()>
<!— <cfset currNode.getParagraphFormat().clearFormatting()> —>
<cfelseif currNode.getNodeType() eq NodeTypes.TABLE>

Hi Chris,

Thanks for your inquiry. The Document.getSections() return SectionCollection. A Section can have one Body and maximum one HeaderFooter of each HeaderFooterType. A minimal valid section needs to have Body with one Paragraph.

The Document.getChildNodes() return NodeCollection of all immediate child nodes of this node and Section.getBody() returns the Body child node of the section. Please read following documentation links for your kind reference.
https://docs.aspose.com/words/java/logical-levels-of-nodes-in-a-document/
http://www.aspose.com/docs/display/wordsjava/Composition+Diagrams

backprop:
Perhaps you can help me understand why it is working differently? The differences are marked in the code in red below. I was looping through the sections, and within each section, looping through the nodes and, and within each node, the runs. When I did this, the font on the runs was always Times New Roman.

I have tested the same scenario and have not found the font issue, the run node has font Tahoma 8.5. Please check the following Java code snippet.

Document doc = new Document(MyDir + "mergesmall.docx");

Iterator si = doc.getSections().iterator();
while (si.hasNext())
{
    Section s = (Section)si.next();
    Iterator iterator = s.getBody().iterator();
    while (iterator.hasNext())
    {
        Node currNode = (Node)iterator.next();
        if (currNode.getNodeType() == NodeType.PARAGRAPH)
        {
            Paragraph para = (Paragraph)currNode;
            System.out.print(para.getParagraphFormat().getStyle().getFont().getName());
            System.out.print(para.getParagraphFormat().getStyle().getFont().getSize());
            System.out.println("Runs...");
            Iterator ri = para.getRuns().iterator();
            while (ri.hasNext())
            {
                Run run = (Run)ri.next();
                System.out.print(run.getFont().getName());
                System.out.println(run.getFont().getSize());
                System.out.println(run.getText());
                System.out.println("******");
            }
        }
    }
}