How to avoid a linebreak when I append a paragraph node?

Jagovi · September 1, 2022, 10:34am

Hi!
I am using aspose in my Java project. I have a document with Bookmarks, I have to get the content beteween the bookmarks but I get nodes of type paragraph then, when I append those nodes I get 1 paragraph for each node (with their line breaks).

Example, I have:

1-
RESUME:
TEST
1

But… i would like to get:

1-RESUME: TEST 1

My code is the following:

try
{
    document = AsposeHelper.getTextBetweenBookmarks(inicio, fin, contenido.getFiledata());
    NodeImporter importador = new NodeImporter(document, builder.getDocument(),
            ImportFormatMode.KEEP_DIFFERENT_STYLES, AsposeHelper.getDefaultImportFormatOptions());
    Node importdo = importador.importNode(document.getFirstChild(), true);
    NodeCollection nodos = ((com.aspose.words.Body)((com.aspose.words.Section)importdo)
            .getChildNodes().get(0)).getChildNodes();

    Node nodo = nodos.get(0);

    Node nodoLast = null;
    int posicion = 0;
    while (nodo != null)
    {
        String cadena = nodo.getText();
        if (("\r".equals(nodo.getText()) || "\f".equals(nodo.getText())) && posicion == 0)
        {
            posicion++;
        }
        else
        {
            builder.getCurrentParagraph().getParentNode().appendChild(nodo);

        }

        nodo = nodos.get(posicion);
        if (nodo != null)
        {
            nodoLast = nodo;
        }
    }
    if (nodoLast != null)
    {
        builder.moveTo(nodoLast);
    }
}
catch (Exception e)
{
    LOG.error("Error recuperando contenido", e);
}

Thanks for your attention!

alexey.noskov · September 1, 2022, 6:17pm

@Jagovi In this case you should append only paragraph content not the paragraph itself. For example see the following simple code example:

Document doc = new Document("C:\\Temp\\in.docx");

// Get paragraphs in the document.
Iterable<Paragraph> paragraphs = doc.getFirstSection().getBody().getParagraphs();

// Create paragraph we will insert the content to.
Paragraph target = new Paragraph(doc);

// Put content into the target paragraph.
for (Paragraph p : paragraphs)
{
    while (p.hasChildNodes())
        target.appendChild(p.getFirstChild());
}

doc.getFirstSection().getBody().appendChild(target);

doc.save("C:\\Temp\\out.docx");

in.docx (12.1 KB)
out.docx (9.7 KB)

Jagovi · September 2, 2022, 6:45am

Thanks @alexey.noskov for your quick response. I think I didn’t explain myself very well. I will try to be more descriptive.
I have a marker with several sub-markers and I want to go through all of them without generating a new paragraph for each one (with its line breaks).
I have:

@@Marker1([zone1][zone2][zone3])

I need to treat each bookmark as a paragraph, but each sub-bookmark as an inline element with no paragraph or line breaks.

My document (.DOC) is like the following snippet:

[START_ZONE1]
RESUME:
[END_ZONE1]

[START_ZONE2]
Test
[END_ZONE2]

[START_ZONE3]
1
[END_ZONE3]

I get the following output:

RESUME:
Test
1

I expect to get as output:

RESUME: Test 1

(Without breaks of paragraph after and before).

Thank you very much and sorry if I haven’t explained myself correctly.

alexey.noskov · September 2, 2022, 6:24pm

@Jagovi Yes, I understood your requirements and my suggestion is to insert only content of the extracted paragraph instead of whole paragraph. Just as I demonstrated in my simple code example. In this case there will not be redundant line breaks.

Jagovi · September 5, 2022, 1:04pm

Hi!

I am not able to make it work. Following the example doc:

[START_ZONE1]
RESUME:
[END_ZONE1]

[START_ZONE2]
Test
Test2
[END_ZONE2]

[START_ZONE3]
1
2
3
4
[END_ZONE3]

If I use @@ZONES([ZONE1][ZONE2][ZONE3])

I get the following if I try to get the 3 bookmarks:

RESUME:
TestTest2
1234

But not this:

 RESUME:TestTest21234

Between submarks I have a linebreak, How can I remove it? I am sorry for insisting but I cant get my desired result.

alexey.noskov · September 5, 2022, 4:04pm

@Jagovi As I can see you are using DocumentBuilder. So you can use method like this:

public static void insertParaContent(DocumentBuilder builder, Paragraph para)
{
    while (para.hasChildNodes())
        builder.insertNode(para.getFirstChild());
}

Document src = new Document("C:\\Temp\\in.docx");

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

Iterable<Paragraph> paragraphs  = src.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph srcPara : paragraphs) {
    Paragraph targetPara = (Paragraph)doc.importNode(srcPara, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    insertParaContent(builder, targetPara);
}

doc.save("C:\\Temp\\out.docx");

For demonstration purposes all paragraphs from the source document are inserted into the target document.

Jagovi · September 6, 2022, 6:45am

Wow!! Now it is working! Thanks for your helpfull attention.

Jagovi · October 26, 2022, 3:26pm

Hi!!
I am here again
I have detected a function missing with this code. But I can’t solve it. If I have several paragraphs in the document to recovery the content with this code I get all the content in the same paragraph.

If I have the next document content:

[START_ZONE1]
RESUME:
[END_ZONE1]

[START_ZONE2]
Test
Test2
[END_ZONE2]

[START_ZONE3]
1
2
345
6
[END_ZONE3]

And if I use @@ZONES([ZONE1][ZONE2][ZONE3]) I get:

RESUMEN:TestTest2123456

But I need:

RESUME:Test
Test21
2
345
6

If ZONE1,ZONE2 or ZONE3 have line breaks or something I would like to bring it to my document. But I wouldn’t like to get line breaks between ZONE1,ZONE2,ZONE3.

Could it be possible?

Thanks for your attention and sorry for my bad explication when I created the thread.

alexey.noskov · October 26, 2022, 7:07pm

@Jagovi In this case content between tags is represented by several paragraphs, so you should modify your code like this:

public static void insertParaContent(DocumentBuilder builder, Paragraph[] paragraphs)
{
    for(int i=0; i<paragraphs.length; i++) {
        Paragraph para = paragraphs[i];

        while (para.hasChildNodes())
            builder.insertNode(para.getFirstChild());

        if(i<(paragraphs.length-1))
            builder.writeln();
    }
}

Or alternatively instead of paragraph breaks in your content you can use soft line breaks Shift+Enter in MS Word. In this case all content between tags will be represented by a single paragraph with soft line breaks and will be properly handled by the code suggested in my previous answer.

Jagovi · October 26, 2022, 8:45pm

Thanks for your quick response.

The call to insertParaContent was made from this code:

Document src = new Document("C:\\Temp\\in.docx");

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

Iterable<Paragraph> paragraphs  = src.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph srcPara : paragraphs) {
    Paragraph targetPara = (Paragraph)doc.importNode(srcPara, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    insertParaContent(builder, targetPara);
}

doc.save("C:\\Temp\\out.docx");

What should I do with this part of code?

alexey.noskov · October 27, 2022, 4:44am

@Jagovi The above code was provided for demonstration purposes. In your real scenario you extract paragraphs between [START_ZONEN] and [END_ZONEN] tags. With the modified method I have provided in my previous answer, you should pass array of extracted paragraphs as a last parameter of the insertParaContent method.

Jagovi · October 27, 2022, 9:01am

I have an iterable of paragraphs with the contents between [START_ZONEN] and [END_ZONEN] tags but I cant cast it to an array to call the function:

Iterable<Paragraph> parrafos =documentoTextoEntreMarcadores.getChildNodes(NodeType.PARAGRAPH, true);

EDIT:

I just have fix it. Thank you!

alexey.noskov · October 27, 2022, 12:17pm

@Jagovi You can modify the method like this to us inerrable instead of array:

public static void insertParaContent(DocumentBuilder builder, Iterable<Paragraph> paragraphs)
{
    Iterator iter = paragraphs.iterator();
    while (iter.hasNext())
    {
        Paragraph para = (Paragraph) iter.next();

        while (para.hasChildNodes())
            builder.insertNode(para.getFirstChild());

        if(!iter.hasNext())
            builder.writeln();
    }
}

Jagovi · October 27, 2022, 2:16pm

We are very close to the final solution.
If I start the text between marks with a table, for example or an image I lose this element in the final document. How can I correct it?

Thanks @alexey.noskov

alexey.noskov · October 27, 2022, 6:55pm

@Jagovi Images should be handled properly using the provided code, since images are inline nodes (are children of paragraphs). Tables are block level nodes and are on the same level as paragraphs. So to handle them, you should modify the code like this:

public static void insertParaContent(DocumentBuilder builder, Iterable<Node> paragraphs)
{
    Iterator iter = paragraphs.iterator();
    while (iter.hasNext())
    {
        // If the current node is a paragraph we process it as earlier.
        Node current = (Node)iter.next();
        if(current.getNodeType() == NodeType.PARAGRAPH) {
            Paragraph para = (Paragraph)current;

            while (para.hasChildNodes())
                builder.insertNode(para.getFirstChild());

            if (!iter.hasNext())
                builder.writeln();
        }
        else
        {
            // If the node is not paragraph, insert a paragraph break and insert
            // this node before the newly created paragraph. 
            builder.writeln();
            builder.getCurrentParagraph().getParentNode().insertBefore(current, builder.getCurrentParagraph());
        }
    }
}

Jagovi · October 28, 2022, 8:14am

I am getting this error when I call the function insertParaContent:

java.lang.IllegalArgumentException: Cannot insert a node of this type at this location.

I get the nodes from the document with:

Iterable<Node> paragraphs   = textBetweenMarks.getChildNodes(NodeType.BODY, true);

What am I doing wrong?

alexey.noskov · October 28, 2022, 8:29am

@Jagovi textBetweenMarks.getChildNodes(NodeType.BODY, true) gets all Body nodes. In your case you need to get children of Body, i.e. paragraphs and tables.

Jagovi · October 28, 2022, 10:01am

How can I get the childrens? Do I have to get BODY and after that get all the children of BODY?

Jagovi · October 28, 2022, 11:11am

Hi again @alexey.noskov I have tried the following code:

Document textBetweenMarks= AsposeHelper.getTextBetweenBookmarks(
	start, 
	end, content.getFiledata());

Document destDoc = builder.getDocument();

Iterable<Node> paragraphs  = textBetweenMarks.getChildNodes(NodeType.ANY, true);	
int iterableSize=0;
for (Node paragraphbetween : paragraphs) {
	iterableSize++;
}

Node[] parrafoArrays = new Node[iterableSize];
int i=0;
for (Node paragraphbetween : paragraphs) {
    Node targetParagraph = (Node)destDoc.importNode(paragraphbetween, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    if(targetParagraph!=null) {
    	paragraphArray[i]=(Paragraph) targetParagraph;
    	
    }
    i++;  
}
for(int j=0;j<paragraphArray.length;j++) {
	if(paragraphArray[j].getNodeType()==NodeType.PARAGRAPH) {
		Paragraph para = (Paragraph) paragraphArray[j];
        while (para.hasChildNodes())
            builder.insertNode(para.getFirstChild());

        if(j<(paragraphArray.length-1)){
            builder.writeln();	
        }					
	}else if(paragraphArray[j].getNodeType()==NodeType.TABLE){
		builder.writeln();
		builder.getCurrentParagraph().getParentNode().insertBefore(paragraphArray[j], builder.getCurrentParagraph());
	}else {
		builder.writeln();
	}
}

But if my text between the marks start with an image or a table nothing is writen int the output document.

alexey.noskov · October 28, 2022, 1:58pm

@Jagovi You can use code like this to get the nodes that should be inserted:

Iterable<Node> nodes = textBetweenMarks.getFirstSection().getBody().getChildNodes();

Also, I think the following article might be useful for you, that explains how to extract content between two nodes:
https://docs.aspose.com/words/java/extract-selected-content-between-nodes/