Run with carriage return (CR) caracter

Hi,

I have problem with following. If I read runs from document made by MS Word I get run that contains carriage return character (see ControlCharacter.CR in Aspose.Words api), but if I read runs from document made by DocumentBuilder I get run that not contains carriage return character. I used following code:

public class CRTest
{
    
    public static void main(String[] args) throws Exception
    {
        Document docW = new Document("d:/soft/test.doc");
        
        List runsW = getDocumentRuns(docW);
        
        Run runW = runsW.get(1);
        
        System.out.println(runW.getText());
        
        Document doc = new Document();
        
        DocumentBuilder builder = new DocumentBuilder(doc);
        
        builder.write("Some text");
        builder.write("\r"); // carriage return or paragraph break character
        builder.write("Some text, again");
        
        doc.save("d:/soft/new.doc");
        
        List runs = getDocumentRuns(doc);
        
        Run run = runs.get(1);
        
        System.out.println(run.getText());
        
    }
    
    private static List getDocumentRuns(Document doc) throws Exception
    {
        final List runs = new ArrayList();
        
        doc.accept(new DocumentVisitor()
        {
            
            @Override
            public int visitRun(Run run) throws Exception
            {
                runs.add(run);
                
                return super.visitRun(run);
            }
        });
        
        return runs;
    }
}

test.doc is in attachment.
Can you help me, please?

Thanks,
Zeljko

Hi

Thanks for your request. “\r” in MS Word documents is paragraph break. So if you use the following code, you will get two paragraphs in the output document.

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.write("Some text");
builder.write("\r");
builder.write("Some text, again");
doc.save("C:\\Temp\\out.doc");

Sometimes in MS Word documents \r can be just a character within a Run. However, MS Word still considers such characters as paragraph breaks and shows them appropriately. You can use DocumentExplorer (Aspose.Words demo) to check structure of your document.
Best regards.

Hi,

Thanks for replay, but maybe I was not precise in previous post.Please try to execute following code:

public class CRTest2
{
    
    public static void main(String[] args) throws Exception
    {
        String source = "d:/soft/test.doc";
        String target = "d:/soft/new test.doc";
        
        Document doc = new Document(source);
        Document result = new Document();
        
        DocumentBuilder builder = new DocumentBuilder(result);
        
        List runs = getDocumentRuns(doc);
        for (Run run: runs)
        {
            builder.write(run.getText());
        }
        
        result.save(target);
        
    }
    
    private static List getDocumentRuns(Document doc) throws Exception
    {
        final List runs = new ArrayList();
        
        doc.accept(new DocumentVisitor()
        {
            
            @Override
            public int visitRun(Run run) throws Exception
            {
                runs.add(run);
                
                return super.visitRun(run);
            }
        });
        
        return runs;
    }
}

, then compare test.doc and new test.doc in DocumentExplorer and you while see that these documents dose not same.

test.doc is in attachment.

Thanks,
Zeljko

Hi

Thank you for additional information. Of course, the output document produced by your code will not look like the source document. Documents also contain paragraphs, which have formatting, and other nodes except Runs with text. If you need to copy one document to another, you can use the approach suggested in the following article:
https://docs.aspose.com/words/net/insert-and-append-documents/
What is your goal, by the way? Maybe I will be able to help you to find a proper solution of your problem.
Best regards.

Hi,

Thank you for suggestion, but I am wondering why run from source document contains CR character and why I can not do same. It seams to be that document builder parse text which I gave him through method parameter and him self creates paragraphs and runs. Is this a valid behavior?

By the way, I am trying to read document, then put obtained text in some xml document and finally make same copy of the source document.

Thanks,
Zeljko

Hi

Thank you for additional information. If you need to recreate run with the same text as the original run, you can try using the following code:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
// Create run, which contains \r character.
Run run = new Run(doc, "Some text\rSome text, again");
// insert it into the document.
builder.insertNode(run);
doc.save("C:\\Temp\\out.doc");

Hope this helps.

Hi,

Thank you for suggestion. This could help me, but is there a way to achieve same thing with using DocumentBuilder and his method write(String text)? I must to use that method and I have to write text that contains CR character piece by piece (eg. text before CR character, a CR character and text after CR character).

Thanks,
Zeljko

Hi

Thanks for your request. When you use DocumentBuidler.Write method, the string is parsed and at the place of \r you will have paragraph break. But anyway documents should look the same. For instance, see the following code example and attached output documents.

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
// Create run, which contains \r character.
Run run = new Run(doc, "Some text\rSome text, again");
// insert it into the document.
builder.insertNode(run);
doc.save("C:\\Temp\\out1.doc");

And the second example:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.write("Some text\rSome text, again");
doc.save("C:\\Temp\\out2.doc");

As you can see both documents looks the same if you open them in Ms Word, but the first document contains only one paragraph and one run, and the second document contains two paragraphs and two runs.
Best regards.

Hi,

Thank you for help. The last post was very helpful.

Zeljko