Is there any way to determine if a paragraph contains any "real" content?

I’m analysing a paragraph node trying to determine if it contains an image, is an item in a list, is a standard paragraph of text, or is simply a “nothing” paragraph that seems to have no purpose whatsoever, and I’m finding a lot of paragraphs coming through that contain nothing more than a “form feed” character or the like.

Is there any way to determine if a paragraph is an ACTUAL paragraph of text?

Hi Alex,

Thanks for your inquiry. Yes, you can achieve your requirements by using Aspose.Words. Please use the following code snippet to achieve your requirements.

Images are represented by Shapes or DrawingML nodes in Aspose.Words’ DOM. A simple image is represented by a Shape of ShapeType.IMAGE. This shape has no child nodes but the image data contained within this shape can be accessed by using the Shape.getImageData() method.

I suggest you please read following documentation links for your kind reference.
https://reference.aspose.com/words/java/com.aspose.words/Paragraph
https://reference.aspose.com/words/java/com.aspose.words/shape
https://reference.aspose.com/words/java/com.aspose.words/Shape

Document doc = new Document(MyDir + "Paragraph.docx");

for (Paragraph para : (Iterable<Paragraph>)doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    // 1. Get image child nodes for a Paragraph
    // *paragraph node trying to determine if it contains an image*
    NodeCollection shapes = para.getChildNodes(NodeType.SHAPE, true);
    for (Shape shape : (Iterable<Shape>)shapes)
    {
        if (shape.hasImage())
        {
            // Your code
        }
    }
    // 2. true when the paragraph is an item in a bulleted or numbered list. 
    // *is an item in a list*
    System.out.println(para.isListItem());
    // 3. toString(SaveFormat.TEXT) exports the content of the Paragraph into a string.
    // *Is there any way to determine if a paragraph is an ACTUAL paragraph of text?*
    String text = para.toString(SaveFormat.TEXT);
    if (text.equals(""))
    {
        // Your code
    }
}

Hi, thanks for your reply.

I am trying to determine if a Paragraph contains any real content (other than control codes / CRLFs etc). Your suggestion String text = para.toString(SaveFormat.TEXT); does not work, because if “para” is a paragraph containing only CRLF, text.equals("") returns false and the paragraph is processed instead of being thrown away.

Please, how do I determine whether a Paragraph contains actual content?

Also, just FYI: I don’t see any reference to Paragraph.toString(SaveFormat.TEXT) within your Documentation.

Hi Alex,

Thanks for your inquiry. Please use the String.trim() method as shown in following code snippet to achieve your requirements. Hope this helps you.

Yes, Paragraph.toString(SaveFormat.TEXT) has not included in online documentations. We will add this method in documentation soon.

Document doc = new Document(MyDir + "in.docx");

for (Paragraph para : (Iterable<Paragraph>)doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    String text = para.toString(SaveFormat.TEXT);
    if (text.trim().equals(""))
    {
        System.out.println(".....");
    }
}

Hi,

Unfortunately, text.trim() does not trim control codes such as carriage returns or form feeds.

I need to determine if a Paragraph node contains any content other than control codes (like carriage returns or form feed characters). How can I do this, please?

Hi Alex,

Thanks for your inquiry. Could you please attach your input Word document here for testing? I will investigate the issue on my side and provide you more information.