Extract Content Based on Styles for openoffice

Hello aspose,

I don’t know if it is OK if I ask my question here or not , if its not forgive me please.

I want to ask if there Is any code written in java like this one in this page and apply the same things as to Extract Content Based on Styles but for the styles in openoffice not in Microsoft word ?

I’m programming an extension and I need a function that extract words from openoffice writer texts depend on the format of the word.

Hi there,

Thanks for your inquiry. Yes, you can extract the contents based on Styles for both ODT and MS Word documents. First of all I suggest you please read style features supported on document (ODT/DOCX) import/export from here:

https://docs.aspose.com/words/net/load-in-the-open-office-xml-2007-and-2010-formats/
https://docs.aspose.com/words/net/save-in-the-open-office-xml-docx-dotx-docm-dotm-format/

https://docs.aspose.com/words/net/load-in-the-opendocument-text-odt-ott-format/
https://docs.aspose.com/words/net/save-in-the-opendocument-text-odt-ott-format/

You can use the shared code at this page with ODT file. You need to check the styles in your ODT file and then modify code according to your requirement. If you face any issue, please share your ODT file along with style information. We will then provide you more information on this along with code.

Thank you for your replay , I tried to run the code to check, but one line doesn’t work

System.out.print(paragraph.toString(SaveFormat.TEXT));

saying that the method to string in class object can not be applied for the given type.

what if I want to search in text for attributes such as (character font, name, size and color) not styles, are there any libraries can help?

Hi there,

Thanks for your inquiry. In case you are using an older version of Aspose.Words, I would suggest you please upgrade to the latest version (v13.3.0.1) from here and let us know how it goes on your side. If the problem still remains, please attach your input ODT document here for testing. I will investigate the issue on my side and provide you more information.

You can use Run.Font property to access to the font formatting of Run node as shown in following code snippet. Hope this helps you.

public static ArrayList RunsByFontName(Document doc, string FontName)
{
    // Create an array to collect runs of the specified Font Name.
    ArrayList runsNodes = new ArrayList();
    // Get all runs from the document.
    NodeCollection runs = doc.GetChildNodes(NodeType.Run, true);
    foreach (Run run in runs)
    {
        if (run.Font.Name == FontName)
            runsNodes.Add(run);
    }
    return runsNodes;
}

First of all thank you so so much for your replay it did help us a lot.

But still there is some problems :|.

We want to search for properties in text but we want to call these properties as parameter from another class and show the array (result words) in textfield for JFrame, The problem is that we could not call parameter in the main so we change the main as shown below to normal function that take parameters, but this whole thing did not work for us. there were no result shown in the textfield and we do not know whats wrong. The is attached bellow:

/**
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */

package com.example;

/**
* @author Uzer
*/
//////////////////////////////////////////////////////////////////////////
// Copyright 2001-2013 Aspose Pty Ltd. All Rights Reserved.
//
// This file is part of Aspose.Words. The source code in this file
// is only intended as a supplement to the documentation, and is provided
// "as is", without warranty of any kind, either expressed or implied.
//////////////////////////////////////////////////////////////////////////

import com.aspose.words.;
import java.awt.Color;
import java.io.File;
import java.net.URI;
import java.util.ArrayList;
import java.util.Arrays;

/**
 * Shows how to find paragraphs and runs formatted with a specific style.
*/
@SuppressWarnings("unchecked")
class Program
{
    public static String RUN_STYLE;
    public static float RUN_STYLE1;
    public static boolean Bold ;
    public static boolean Underline ;
    public static boolean Italic ;
    public Program (String run_style , float runstyle1,boolean bold, boolean underline,boolean italic ){
        RUN_STYLE = run_style ;
        RUN_STYLE1 = runstyle1 ;
        Bold = bold;
        Underline = underline;
        Italic = italic;
    }

    public static ArrayList search() throws Exception
    {
        // Sample infrastructure.
        URI exeDir = Program.class.getResource("").toURI();
        String dataDir = new File(exeDir.resolve("…/…/Data")) + File.separator;
        //ExStart
        //ExId:ExtractContentBasedOnStyles_Main
        //ExSummary:Run queries and display results.
        // Open the document.
        Document doc = new Document(dataDir + "TestFile1.odt");
        // Define style names as they are specified in the Word document.
        ArrayList runs = runsByStyleName(doc, RUN_STYLE1,RUN_STYLE,Bold,Underline,Italic);
        // System.out.println(java.text.MessageFormat.format("\nRuns with "{0}" styles ({1}):", RUN_STYLE1, runs.size()));
        for (Run run : (Iterable) runs) {
            System.out.println(run.getRange().getText());
        }
        return runs; //ExEnd
    }

    //ExStart
    //ExId:ExtractContentBasedOnStyles_Runs
    //ExSummary:Find all runs formatted with the specified style.
    public static ArrayList runsByStyleName(Document doc, float RUN_STYLE1,String RUN_STYLE, boolean Bold ,boolean Underline,boolean Italic ) throws Exception
    {
        // Create an array to collect runs of the specified style.
        ArrayList runsWithStyle = new ArrayList();
        // Get all runs from the document.
        NodeCollection runs = doc.getChildNodes(NodeType.RUN, true);
        // Look through all runs to find those with the specified style.
        //CharStyle Char = new CharStyle (xCursorProps);
        for (Run run : (Iterable) runs)
        {
            if (run.getFont().getSize()==(RUN_STYLE1))
            { if(run.getFont().getName().equals(RUN_STYLE)){
                if(Bold){
                    run.getFont().getBold();
                    if(Underline){
                        run.getFont().getUnderline();
                        if(Italic){
                            run.getFont().getItalic();
                            runsWithStyle.add(run);}}}
            }}
        }
        return runsWithStyle;
        //ExEnd
    }
}

Hi there,

Thanks for your inquiry. In your case, I suggest you the following solution. The p.runsByStyleName(doc) method return the run nodes. You can use the ArrayList (runs) in your code. Hope this helps you.

If you still face problem, please share your ODT file along with style details. I will investigate the issue on my side and provide you more information.

Document doc = new Document(MyDir + "in.odt");
Program p = new Program();
p.Bold = true;
p.Italic = true;
p.RUN_STYLE = "style";
p.Underline = false;
ArrayList runs = p.runsByStyleName(doc); 
public class Program
{
    public String RUN_STYLE;
    public float RUN_STYLE1;
    public boolean Bold;
    public boolean Underline;
    public boolean Italic;
    public ArrayList runsWithStyle = new ArrayList();

    public Program()
    {
    }

    public ArrayList runsByStyleName(Document doc) throws Exception
    {
        //Document doc = new Document("TestFile1.odt");
        // Get all runs from the document.
        NodeCollection runs = doc.getChildNodes(NodeType.RUN, true);
        // Look through all runs to find those with the specified style.
        // CharStyle Char = new CharStyle (xCursorProps);
        for (Run run : (Iterable) runs)
        {
            if (run.getFont().getSize()==(RUN_STYLE1))
            { if(run.getFont().getName().equals(RUN_STYLE)){
                if(Bold){
                    run.getFont().getBold();
                    if(Underline){
                        run.getFont().getUnderline();
                        if(Italic){
                            run.getFont().getItalic();
                            runsWithStyle.add(run);}
                    }
                }
            }
            }

        }
        return runsWithStyle;
        //ExEnd
    }
}

We use this code without main function but as a part of another code like this:

if ( aURL.Path.compareTo("procces") == 0 ) {
    try {
        URI exeDir = CorpusEX.class.getResource("").toURI();
        String MyDir = new File(exeDir.resolve("…/…/Data")) + File.separator;
        Document doc = new Document(MyDir + "TestFile1.odt");
        String RUN_STYLE = "Tahoma";
        float RUN_STYLE1 = 12;
        boolean Bold = true;
        //boolean Underline;
        boolean Italic = true;
        ArrayList runsWithStyle = new ArrayList();
        NodeCollection runs = doc.getChildNodes(NodeType.RUN, true);
        for (Run run : (Iterable) runs) {
            if (run.getFont().getSize() == (RUN_STYLE1)) {
                if (run.getFont().getName().equals(RUN_STYLE)) {
                    if (run.getFont().getBold() == Bold) {
                        if (run.getFont().getItalic() == Italic) {
                            runsWithStyle.add(run);
                        }
                    }
                }
            }
        }

        NewJFrame1 featuresFrame = new NewJFrame1();
        featuresFrame.setFieldText(runsWithStyle);
        featuresFrame.setLocationRelativeTo(null);
        featuresFrame.setTitle("Process");
        featuresFrame.setVisible(true);
    } catch (Exception ex) {
        Logger.getLogger(CorpusEX.class.getName()).log(Level.SEVERE, null, ex);
    }
}

But this did not work, even if we do it in another way; like what you do.

It’s work only if we add the value of parameter (first part of your code) in main function.

what should we do? we use a simple file with few words to test this code not a special file. So the problem doesn’t related with the file.

Hi,

Thanks for your inquiry. Perhaps, you are using an older version of Aspose.Words; as with Aspose.Words v13.3.0.1, I am unable to reproduce this problem on my side. I would suggest you please upgrade to the latest version of Aspose.Words i.e. v13.3.0.1 and let us know how it goes on your side.

If the problem still remains, please create another Java application to test this issue. Please share this Java application here along with input document for testing. I will investigate the issue on my side and provide you more information.