Extract Content Controls Data from RTF & Convert to Word (DOCX DOC) Document using Java | Custom Bookmarks

awais.hafeez · May 26, 2020, 10:04am

These customized bookmarks that you are referring to are actually Content Controls (represented by StructuredDocumentTag class). You can get the string representation of such Content Controls in Word document by using the following Java code:

Document doc = new Document("E:\\Temp\\Documents\\InputDoc.docx");

for (StructuredDocumentTag sdt :
        (Iterable<StructuredDocumentTag>) doc.getChildNodes(NodeType.STRUCTURED_DOCUMENT_TAG,
                true)) {
    if (sdt.getTitle().equals("V3:B")) {
        System.out.println(sdt.toString(SaveFormat.TEXT));
    }
}

You can also extract the Nodes from inside such Content Controls and place them into a new Document.

Hope, this helps in achieving what you are looking for.

Sudha_Mylapalli · May 29, 2020, 9:20am

@awais.hafeez
Thanks a lot, but it did not search for whole document file. I mean in Table, bookmark is available. Could you suggest how to get bookmark in table too.

please find the zip fileBookmarkSampleFile.zip (141.0 KB)

And I want to read RTF Bookmark and convert it to Word doc from java.
please find the rtf file ,symbol for Bookmark in file is BK .

Sudha_Mylapalli · May 29, 2020, 11:35am

@awais.hafeez

Hi Hafeez,

We need a program in such a way that, the input customized Bookmark should be available in the same alignment(as it is in the same place) after retrieval from .rtf to the .doc file.
Sample documents ( BookmarkSampleFile.zip (141.0 KB))are available in my last comment.

awais.hafeez · May 30, 2020, 6:24am

@Sudha_Mylapalli

We are checking this scenario and will get back to you soon.

Sudha_Mylapalli · May 30, 2020, 10:19am

@awais.hafeez
Hi Hafeez,

I want Java program to convert open office document(ODT) to word document for Aspose .

awais.hafeez · May 31, 2020, 5:34am

@Sudha_Mylapalli,

You can use the following simple Java code to convert open office document (ODT) into DOCX, DOC, RTF, WordML or other file formats (see Supported Document Formats).

Document doc = new Document("input.odt");
// Save the ODT document in DOCX format.
doc.save("output.docx");

Hope, this helps.

Sudha_Mylapalli · May 31, 2020, 10:42am

@awais.hafeez
Document doc = new Document(“input.odt”);
// Save the ODT document in DOCX format.
doc.save(“output.docx”);

with this code im able to convert the file but its not coming expected output file .please find the attached zip file odt_ToDoc.zip (300.9 KB)

awais.hafeez · June 1, 2020, 6:18am

@Sudha_Mylapalli,

The problem occurs because “outputFile.docx” was generated by using a very old (15.12) version of Aspose.Words for Java on your end (in evaluation mode i.e. without applying license).

After an initial test with the licensed latest (20.5) version of Aspose.Words for Java, we were unable to reproduce this issue during converting ODT to DOCX on our end. Please see the output DOCX document generated on our end by using the following simple Java code:

ODT to DOCX using Aspose.Words for Java 20.5.zip (93.8 KB)

Java Code:

Document doc = new Document("E:\\Temp\\odt_ToDoc\\inputFile.odt");
doc.save("E:\\Temp\\odt_ToDoc\\awjava-20.5.docx");

So, we suggest you please upgrade to the latest version. Hope, this helps.

Sudha_Mylapalli · June 1, 2020, 11:29am

@awais.hafeez

Any Updates on this.

Sudha_Mylapalli · June 1, 2020, 11:31am

Thanks @awais.hafeez,

now odt to docx is converting.After taking licence for Aspose,watermark and licensed String Aspose will remove Right.

Sudha_Mylapalli · June 1, 2020, 3:11pm

@awais.hafeez
suggest me on how to fetch bar code, when converting from open office to Doc .

awais.hafeez · June 2, 2020, 4:43am

@Sudha_Mylapalli,

We will share our findings today. Stay tuned.

Yes, there should not be any problems when you will use the latest licensed version of Aspose.Words for Java i.e. 20.5 to convert ODT to DOCX on your end. You will also not see evaluation watermark string in generated document.

Please ZIP and upload your simplified input Word document containing the bar code here for testing. We will then investigate the scenario on our end and provide you more information.

maheshpalagani · June 2, 2020, 6:37pm

@awais.hafeez

We are stopped on product issues because of this document conversion , can we call and share screen of all our issues/requirement once , so that you can easily understand and suggest for fixes.

We are ready to take license immediately if this document conversion works with customized bookmarks .

Please help us on this asap.

Thanks
Mahesh Palagani

awais.hafeez · June 3, 2020, 8:12am

@maheshpalagani,

Regarding the “input.docx” document, please try using the following Java code that prints the text of Content Controls with Titles V3:B or V3:G:

Document doc = new Document("E:\\Temp\\BookmarkSampleFile\\input.docx");

for (StructuredDocumentTag sdt :
        (Iterable<StructuredDocumentTag>) doc.getChildNodes(NodeType.STRUCTURED_DOCUMENT_TAG,
                true)) {
    if (sdt.getTitle().equals("V3:B") || sdt.getTitle().equals("V3:G")) {
        System.out.println(sdt.toString(SaveFormat.TEXT).trim());
    }
}

awais.hafeez · June 3, 2020, 8:21am

@maheshpalagani,

Regarding the RTFSampleInput.rtf document, please try running the following code.

Document doc = new Document("E:\\Temp\\BookmarkSampleFile\\RTFSampleInput.rtf");

ReplaceHandler handler = new ReplaceHandler();
FindReplaceOptions opts = new FindReplaceOptions();
opts.setDirection(FindReplaceDirection.BACKWARD);
opts.setReplacingCallback(handler);

Pattern searchPattern = Pattern.compile("\\[BK:([^\\]]*)\\]", Pattern.CASE_INSENSITIVE);
for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
    para.getRange().replace(searchPattern, "", opts);

int i = 1;
for (String str : (Iterable<String>) handler.list)
    System.out.println(i++ + ". " + str);

static class ReplaceHandler implements IReplacingCallback {
    public ArrayList list = new ArrayList();

    public int replacing(ReplacingArgs e) throws Exception {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.getMatchNode();

        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.getMatchOffset() > 0)
            currentNode = splitRun((Run) currentNode, e.getMatchOffset());

        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.getMatch().group().length();
        while ((remainingLength > 0) && (currentNode != null) && (currentNode.getText().length() <= remainingLength)) {
            runs.add(currentNode);
            remainingLength = remainingLength - currentNode.getText().length();

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do {
                currentNode = currentNode.getNextSibling();
            } while ((currentNode != null) && (currentNode.getNodeType() != NodeType.RUN));
        }

        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0)) {
            splitRun((Run) currentNode, remainingLength);
            runs.add(currentNode);
        }

        String value = e.getMatch().group(0).trim();
        // if (!list.contains(value))
        list.add(value);

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.SKIP;
    }

    /**
     * Splits text of the specified run into two runs. Inserts the new run just
     * after the specified run.
     */
    private Run splitRun(Run run, int position) throws Exception {
        Run afterRun = (Run) run.deepClone(true);
        afterRun.setText(run.getText().substring(position));
        run.setText(run.getText().substring((0), (0) + (position)));
        run.getParentNode().insertAfter(afterRun, run);
        return afterRun;
    }
}

Hope, this helps in achieving what you are looking for.

Sudha_Mylapalli · June 3, 2020, 10:17am

@awais.hafeez
getting issue with aspose while converting from open office to word doc.
Please share a solution.
This document was truncated here because it was created using Aspose.Words in Evaluation Mode.

awais.hafeez · June 3, 2020, 1:07pm

@Sudha_Mylapalli,

This happens because you are not applying Aspose.Words for Java license before creating Document instance. If you want to test Aspose.Words without the evaluation version limitations, then you can also request a 30-day Temporary License. Please refer to How to get a Temporary License?

awais.hafeez · June 4, 2020, 5:45am

A post was split to a new topic: Unable to get bookmarks in Word document

Sudha_Mylapalli · June 3, 2020, 5:11pm

@awais.hafeez
Thanks for your response,still we are facing issues to convert bookmarks from open office to word document.
Adding some more detail documents for your understanding , Please go through this and let us know if you need any clarification from my end.

2020-06-03_12-14-31=>in this pic you can see the bookmarks when we do mouse hover at bookmark place in open office doc.

E018 - DEPP Delinquency Notice.odt=>This is source open office file which has bookmarks

Manuallyconverted_word_doc_E018 - DEPP Delinquency Notice.docx=>This is the document which we converted from open office to word manually (you can see our custom bookmarks with V3:B, V3:C).

Aspose_Converted_Test_E018 - DEPP Delinquency Notice.docx=>This is word document which ASPOSE jar converted from open office, Here Bookmarks came as normal text.
We want to automate this kind of conversion using aspose from open office to word.

sample_docx.zip (750.0 KB)

awais.hafeez · June 4, 2020, 5:43am

@Sudha_Mylapalli,

What desktop application did you use to generate “Manuallyconverted_word_doc_E018 - DEPP Delinquency Notice.docx” on your end - MS Word or OpenOffice Writer?

Please list the complete steps that you performed in MS Word (or OpenOffice Writer) to create this expected document (Manuallyconverted_word_doc_E018 - DEPP Delinquency Notice.docx). We will then provide you code that will perform the same steps programmatically by using Aspose.Words to get the desired output.