Aspose Word Java Style problem!

Hi, i have a problem with keeping style when inserting a temporary doc into a bookmark

I made a test case for you:
CopyStyleBookmark.java
I take the content of a bookmark(source), create a temporary doc. At this point, the style is still correct in the temporary document.
When reinserting into the destination bookmark, i lose the style!

source.docx : Source file that contains my two bookmarks
result.docx : Result that i get after the process
wanted.docx: This is my desired result!
temp.docx: Content of the temporary generated doc file

Thank you!

Hi Gilles,

Thanks for your inquiry. In your case, I suggest you please use the insertDocument method shared at following documentation link. Please use the ImportFormatMode as USE_DESTINATION_STYLES in insertDocument method.
https://docs.aspose.com/words/java/insert-and-append-documents/

I have used the insertDocument method in your shared code and output document is same as your expected output. Please see the following code snippet. I have attached the output document with this post for your kind reference.

extractedNodesInclusive = extractContent(bookmarkStart, bookmarkEnd, false);
dstDoc = generateDocument(myDocument, extractedNodesInclusive);
dstDoc.save(dataDir + "temp.docx");
myDocumentBuilder.moveToBookmark(key, false, true);
// if i dont add this line, the program crashes, when add it, no more return is present?
destBmark.setText("");
// insertDocumentAtBookmark(destBmark.getName(),myDocument, dstDoc);
insertDocument(destBmark.getBookmarkStart().getParentNode(), dstDoc);
sourceBmark.setText("");
sourceBmark.remove();

Hi and thank you for your answer!

But in my case, i have to use the insertDocumentAtBookmark() function.
This is just a simple case, but my real process is way more complicated and the insertDocument function does not suit my needs! It causes problem with return lines, etc…

You can see in my previous posts that the Aspose team has gave me the insertDocumentAtBookmark() function to correct some unwanted behaviors and bugs!
So it has work with that function!.

Thank you again and hope to hear from you soon!

Hi Gilles,

Thanks for your inquiry. I am working over your query and will update you asap.

Hi Gilles,

Thanks for your patience. You are inserting contents of one document to another. In this case, the method insertDocumentAtBookmark does not work correctly. Please note that NodeImporter class allows to efficiently perform repeated import of nodes from one document to another.

Aspose.Words provides functionality for easy copying and moving fragments between Microsoft Word documents. This is known as “importing nodes”. Before you can insert a fragment from one document into another, you need to “import” it. Importing creates a deep clone of the original node, ready to be inserted into the destination document.

The simplest way to import a node is to use the ImportNode method provided by the DocumentBase object.

However, when you need to import nodes from one document to another multiple times, it is better to use the NodeImporter class. The NodeImporter class allows to minimize the number of styles and lists created in the destination document.

infogt2000:

But in my case, i have to use the insertDocumentAtBookmark() function.

This is just a simple case, but my real process is way more complicated and the insertDocument function does not suit my needs! It causes problem with return lines, etc…

It would be great if you please share following details for investigation purposes. We will then provide you more information along with code.

  • Please supply us with the input document
  • Please supply us with the output document showing the undesired behavior
  • Please supply us with the expected document showing the desired behavior (You can create this document using Microsoft Word).

Hi and thank you for your support.
Here is a more complicated case!

input.docx : Input document
result.docx :Output document
excepted output.docx : Excepted result!

For this one, the process is pretty straight forward!
Everything is explained in the DetailProcess.java class!

Thank you and hope to hear from you soon!

Hi Gilles,

Thanks for your patience. I have worked with your documents and code and have found that you are extracting contents between bookmark Nodes and insert these contents into different destination bookmark e.g name1.

Secondly, you want to use the same font formatting of destination bookmark to newly inserted contents as shown in expected document.

In this case, first you need to get the font formatting of destination bookmark and then apply the same formatting to newly inserted contents. Please use the following method to get the font of Run node inside destination bookmark.

public static Font getBookmarkFont(BookmarkStart bStart, Document doc) throws Exception
{
    Node curNode = bStart;
    while (curNode != null)
    {
        Node nextNode = curNode.nextPreOrder(doc);
        if (nextNode.getNodeType() == NodeType.RUN)
        {
            return ((Run)nextNode).getFont();
        }
        curNode = nextNode;
    }
    return null;
}

Please use this method as shown in following code snippet. I have attached the modified insertDocumentAtBookmark method with this post for your kind reference.

// We generate a temporary aspose document
dstDoc = generateDocument(myDocument, extractedNodesInclusive);
dstDoc.save(dataDir + "\\temp_doc\\" + key + ".docx");
Font font = getBookmarkFont(destBmark.getBookmarkStart(), myDocument);
// We erase the line of text in the destination bookmark ("destination")
myDocument.getRange().replace(val, "", false, false);
// Insertion of the temporary doc inside the bookmark destination
insertDocumentAtBookmark(destBmark.getName(), myDocument, dstDoc, font);

Hi and thank you for your support.
I have tested your code
In fact, it does work for this case!
But in my real doc, a bookmark can be inside another one.
And how i loop over the bookmark will make me lose all the formats.
For example: I loop on the childs bookmark first, then loop over the parents one, so each time i’ll loop, the style will be set to all the text of the childs bookmark.
Plus my bookmarks can contains more than a style, and your code will select the first style that is found in the bookmark.
This way forcing the style will not work for me.

Maybe i could give you a more complicated test case!
Let me get back to you on this.!
Thank you.

Hi Gilles,

Thanks for your your inquiry. Please note that Aspose.Words mimic the same behavior as MS Word do. E.g If you move the cursor to bookmark ‘name1’ of your input document and write some contents, MS Word and Aspose.Words use the same formatting for newly inserted text.

You are using insertDocument method in your code to insert the extracted contents. If you want to insert the extracted contents of a bookmark with new formatting, e.g you want to use the formatting of bookmark ‘name1’ (name 1 bookmark of input.docx), you can insert the new contents by using DocumentBuilder.write method with new font formatting.

As per my understanding, you want to get the text from a bookmark e.g. bookmrk1 in same document and insert the extracted text to another bookmark e.g. bookmark2. The new text at bookmark2 should have same font formatting as bookmark2 have. If this is the case, please use DocumentBuilder.moveToBookmark method to move the cursor to a bookmark and insert the extracted text.

nfogt2000:

But in my real doc, a bookmark can be inside another one.

And how i loop over the bookmark will make me lose all the formats.

It would be great if you please share your input document and output expected document along with some more detail of your complete scenario. Please manually create your expected Word document using Microsoft Word and attach it here for our reference. We will investigate, how you want your final Word output be generated like. We will then provide you more information on this along with code.

Hi!
So like i said, i have built another test case for you.
This time, i work with bigger bookmarks, thats contains different style, list, etc…

The .rar file contains:
TextDetailedProcess.java : Java class that works with the bookmark
input.docx : The input file for the test.
result.docx : The output of the test.
excepted output.docx: The desired output of the test.

Process:

  1. I am getting the “Paragraph1” bookmark content and save it in a temporary document.
    Then i loop 3 times, and add 3 paragraph with the “Paragraph1” content.
    I finally remove the content of “Paragraph1”.

  2. I insert the “Paragraph1” temporary document content, into the “Paragraph2” bookmark.

You will see that some undesired styling occurs.
Ill be waiting for your response.
Thank you again

Hi Gilles,

Thanks for your your inquiry. As I shared with you in my last post, Aspose.Words mimic the same behavior as MS Word do. If you do the same scenario by using MS Word, you will get the same output. Please check the attached images for your kind reference.

Please copy the first paragraph of input.docx ( Morbi pulvinar ligula ) and paste it at the end of any list item and check the MS Word’s behavior.

You are inserting the contents of bookmark paragrpah1 at the end of BookmarkEnd node. In this case, the text will be inserted in the same Paragraph like ( EST IN BLANDIT VARIUSMorbi pulvinar ligula ). So, the newly inserted text will be in one Paragraph and multiple Run nodes. Please see the attached DOM.png for detail. In this case, you have to format Run node according to your requirements as you have shown in your expected output document. Hope this answers your query. Please let us know if you have any more queries.

Hi!
Can i format run node with REGEX after inserting the content into the bookmark?
Example: Format this sentence: “Morbi pulvinar ligula” with the style xxxx.
Is there any chance that the word behavior will overrides my styling?
Thank you!

Hi Gilles,

Thanks for your inquiry. Yes, you can achieve your requirement by implementing IReplacingCallback interface. Please use the same approach shared at following documentation link to achieve your requirements.
https://docs.aspose.com/words/java/find-and-replace/

Please modify the code of class ReplaceEvaluatorFindAndHighlight* as highlighted in following code snippet. Hope this helps you. Please let us know if you have any more queries.

// Now highlight all runs in the sequence.
for (Run run : (Iterable<Run>)runs)
    run.getFont().setHighlightColor(Color.YELLOW);
for (Run run : (Iterable<Run>)runs)
{
    run.getFont().setHighlightColor(Color.YELLOW);
    run.getFont().setName("Arial");
    run.getFont().setSize(16);
    run.getFont().setColor(Color.blue);
}

Thank you for the help.
Here is what i got:
I’ve added this code right after the insertAtBookmark call

NodeCollection children = dstDoc.getChildNodes(NodeType.RUN, true);
for (Node child : (Iterable)children)
{

    // Paragraph may contain children of various types such as runs, shapes and so on.
    if (child.getNodeType() == NodeType.RUN)
    {

        // Say we found the node that we want, do something useful.
        Run run = (Run)child;
        Pattern regex = Pattern.compile(run.getText(), Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
        try
        {

            myDocument.getRange().replace(regex, new FindAndStyle(run), false);
        }
        catch (Exception ex)
        {

            System.out.println(ex.getMessage());
        }

    }

and this is the class i have used:

class FindAndStyle implements IReplacingCallback
{
    Run thisRun;
    public FindAndStyle(Run run){
        this.thisRun=run;
    }
    /* This method is called by the Aspose.Words find and replace engine for each match.
     * This method highlights the match string, even if it spans multiple runs.
     */
    public int replacing(ReplacingArgs e) throws Exception
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.getMatchNode();

        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.getMatchOffset() > 0)
            currentNode = splitRun((Run)currentNode, e.getMatchOffset());

        // This array is used to store all nodes of the match for further highlighting.
        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.getMatch().group().length();
        while (
                (remainingLength > 0) &&
                        (currentNode != null) &&
                        (currentNode.getText().length() <= remainingLength))
        {
            runs.add(currentNode);
            remainingLength = remainingLength - currentNode.getText().length();

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.getNextSibling();
            }
            while ((currentNode != null) && (currentNode.getNodeType() != NodeType.RUN));
        }

            // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0))
        {
            splitRun((Run)currentNode, remainingLength);
            runs.add(currentNode);
        }

        // Now highlight all runs in the sequence.
        for (Run run : (Iterable) runs){
            // if(run.getParentNode().equals(thisRun.getParentNode())){
            run.getFont().setStyleName(thisRun.getFont().getStyleName());
            run.getFont().setSize(thisRun.getFont().getSize());
            // }
        }

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.SKIP;
    }

/* Splits text of the specified run into two runs.
* Inserts the new run just after the specified run.
*/
    private static Run splitRun(Run run, int position) throws Exception
    {
        Run afterRun = (Run)run.deepClone(true);
        afterRun.setText(run.getText().substring(position));
        run.setText(run.getText().substring((0), (0) + (position)));
        run.getParentNode().insertAfter(afterRun, run);
        return afterRun;
    }

Unfortunatly, because i’m looping on Run, any run will be formated…
For example if i found a run with the text “Mo” and the style “Heading1” in the temp doc** every other run in the final document with the text “Mo” will have is style changed…
Is there a way to know if the run in the temp doc is the same as the one in the destination doc?
Or maybe a function to get an Arraylist of styled sentences??

How can i solve this problem?..
Thank you!
(I have attached the output of the code :result.docx)

Hey, me again.
So i have been able to build a kindof styled sentences function:

public static ArrayList styledSentences(Document doc) throws Exception
{

    // Create an array to collect runs of the specified style.
    ArrayList sentencesWithStyle = new ArrayList();
    // Get all runs from the document.
    NodeCollection runs = doc.getChildNodes(NodeType.RUN, true);
    // Look through all runs to find those with the specified style.
    Object[] data = new Object[5];
    data[0] = "";
    data[1] = "";
    data[2] = 0.0;
    data[3] = false;
    sentencesWithStyle.add(data);

    for (Run run : (Iterable)runs)
    {

        if (run.getFont().getStyleName().equals(data[0]) && run.getFont().getColor().equals(data[1]) && run.getFont().getSize() == (Double)data[2] && run.getFont().getBold() == (Boolean)data[3])
        {

            String txt = (String)data[4];
            txt += run.getText();
            data[4] = txt;
        }
        else
        {

            sentencesWithStyle.add(data);
            data = new Object[5];
            data[0] = run.getFont().getStyleName();
            data[1] = run.getFont().getColor();
            data[2] = run.getFont().getSize();
            data[3] = run.getFont().getBold();
            data[4] = run.getText();
        }

    }

    return sentencesWithStyle;
}

I’m calling it this way

ArrayList styledSentences = styledSentences(dstDoc);
for (Object data: styledSentences)
{

    Object[] dataArr = (Object[])data;
    if (dataArr[4] == null) continue;
    Pattern regex = Pattern.compile((String)dataArr[4], Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
    try
    {

        myDocument.getRange().replace(regex, new FindAndStyle(dataArr), false);
    }
    catch (Exception ex)
    {

        System.out.println(ex.getMessage());
    }

}

The code that format in FindAndStyle class:

run.getFont().setStyleName((String)data[0]);
run.getFont().setColor((Color)data[1]);
run.getFont().setSize((Double)data[2]);
run.getFont().setBold((Boolean)data[3]);

It does work for some sentences, but not with the bulleted list… but it impact way to much on the performance of my program! Plus, some text will be written the same way in different formatting in the text, i would need a range object for my bookmarks, wich i think doesnt exist…!

Hi Gilles,

Thanks for your inquiry. Please note that Aspose.Words mimics the same behavior as MS Word do. Please check my reply from here:
https://forum.aspose.com/t/51555

I have worked with your documents and code again and like to share another solution. You can achieve your requirement as shown in ‘excepted output.docx’ by inserting new Run nodes e.g Run nodes with text ‘Morbi pulvinar ligula’ after BookmarkEnd node.

I have used the insertDocument method in following code example to insert the document. You can find the insertDocument method from here:
https://docs.aspose.com/words/java/insert-and-append-documents/

I have attached the input and output document with this post for your kind reference. Following code example insert the text ‘Morbi pulvinar ligula’ after text ‘EST IN BLANDIT VARIUS’ and use the original font formatting of 'Morbi pulvinar ligula’. Hope this helps you.

Document myDocument = new Document(MyDir + "input.docx");
DocumentBuilder builder = new DocumentBuilder(myDocument);
String[] names = new String[] { "Paragraph1" };
for (String name :names)
{
    Bookmark sourceBmark = myDocument.getRange().getBookmarks().get(name);
    BookmarkStart bookmarkStart = sourceBmark.getBookmarkStart();
    BookmarkEnd bookmarkEnd = sourceBmark.getBookmarkEnd();
    ArrayList extractedNodesInclusive = extractContent(bookmarkStart, bookmarkEnd, false);
    Document dstDoc = generateDocument(myDocument, extractedNodesInclusive);
    builder.moveTo(bookmarkEnd);
    Paragraph currentParagraph = (Paragraph)bookmarkEnd.getAncestor(NodeType.PARAGRAPH).getPreviousSibling();
    insertDocument(currentParagraph, dstDoc);
    if (currentParagraph.getNextSibling() != null)
    {
        Paragraph nextPara = (Paragraph)currentParagraph.getNextSibling();
        for (Run run : nextPara.getRuns().toArray())
        {
            Run newRun = new Run(myDocument, run.toString(SaveFormat.TEXT));
            newRun.getFont().setSize(run.getFont().getSize());
            newRun.getFont().setName(run.getFont().getName());
            newRun.getFont().setColor(run.getFont().getColor());
            newRun.getFont().setBold(run.getFont().getBold());
            newRun.getFont().setItalic(run.getFont().getItalic());
            currentParagraph.appendChild(newRun);
        }
        nextPara.remove();
    }
}
myDocument.save(MyDir + "Out.docx");

A post was merged into an existing topic: Extract all formatted content from a word document which has track changes