Remove all blank lines from word document

Problem.zip (462.4 KB)
Hi team,

Requiring a work around solution to remove the blank lines between images and image caption in the word document.
To remove blank lines from entire document. I have tried using empty paragraph remover but it didnt clear lines. Due to time consistency requiring solution as soon as possible.

Regards
Priya Dharshini J P

Hi Priya,

Thanks for your inquiry. In your expected output document, you are removing empty paragraphs and joining two paragraphs. The first paragraph contains the Shape (images) nodes and second contains the text starts with “Figure”. Please use following code example to get the desired output. Hope this heps you.

Document doc = new Document(MyDir + "Problem.docx");

ArrayList nodes = new ArrayList();
for (Paragraph  paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if(paragraph.toString(SaveFormat.TEXT).trim().length() == 0 && paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0)
    {
        paragraph.remove();
    }
    else if(paragraph.toString(SaveFormat.TEXT).trim().length() == 0 && paragraph.getChildNodes(NodeType.SHAPE, true).getCount() > 1 )
    {
        nodes.add((paragraph));
    }
}

for (Paragraph  paragraph : (Iterable<Paragraph>) nodes)
{
    Paragraph nextPara = (Paragraph)paragraph.getNextSibling();
    if(nextPara.toString(SaveFormat.TEXT).trim().startsWith("Figure"))
    {
        // Move all content from the nextPara paragraph into the first.
        while (nextPara.hasChildNodes())
            paragraph.appendChild(nextPara.getFirstChild());

        nextPara.remove();
    }
}
doc.save(MyDir + "output.docx");

But in case of consecutive images, the space between them is not removed. can you help out to remove space in case of group images in that document…

we need an additional requirement to removing the blank lines space in between the images.Due to time consistency requiring reply as soon as possible.

Hi Priya,

Thanks for your inquiry.

Could you please share the screenshots of problematic sections of output document? We will investigate this issue and provide you more information on this [quote=“priyadharshini, post:4, topic:414, full:true”]
we need an additional requirement to removing the blank lines space in between the images.Due to time consistency requiring reply as soon as possible.
[/quote]
Please share the screenshots of this requirements along with expected output document. We will then provide you more information on this along with code.

Best Regards,
Tahir Manzoor

problem.zip (50.0 KB)
Attached an example of consecutive images, pls help out to extract all images till image caption text starting with “Fig” occurs.

@priyadharshini

Thanks for sharing your requirement in detail. Please spare us some time for the analysis of your desired output. We will get back to you soon with code example according to your requirement.

Best Regards,
Tahir Manzoor

@priyadharshini

Thanks for your patience. Please use following code example to achieve your requirement. Hope this helps you.

Document doc = new Document(MyDir + "Problem.docx");

ArrayList nodes = new ArrayList();
for (Paragraph  paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if(paragraph.toString(SaveFormat.TEXT).trim().length() == 0
            && paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0
            && paragraph.getText().contains(ControlChar.PAGE_BREAK) == false)
    {
        paragraph.remove();
    }
    else if(paragraph.toString(SaveFormat.TEXT).trim().length() == 0 && paragraph.getChildNodes(NodeType.SHAPE, true).getCount() > 1 )
    {
        nodes.add((paragraph));
    }
}

for (Paragraph  paragraph : (Iterable<Paragraph>) nodes)
{
    Paragraph nextPara = (Paragraph)paragraph.getNextSibling();
    if(nextPara.toString(SaveFormat.TEXT).trim().startsWith("Figure"))
    {
        // Move all content from the nextPara paragraph into the first.
        while (nextPara.hasChildNodes())
            paragraph.appendChild(nextPara.getFirstChild());

        nextPara.remove();

        Paragraph previousPara = (Paragraph)paragraph.getPreviousSibling();
        while (previousPara != null
                && previousPara.toString(SaveFormat.TEXT).trim().length() == 0 && previousPara.getChildNodes(NodeType.SHAPE, true).getCount() > 0)
        {
            if(previousPara != null)
                previousPara.getParagraphBreakFont().setSize(.5);
            previousPara = (Paragraph)previousPara.getPreviousSibling();
        }
    }
}
doc.save(MyDir + "output.docx");

InputDocument.zip (1.2 MB)
ExpectedOutput.zip (1014.1 KB)
Hi team,
Thanks for your reply.I am using the above mentioned code .Again Iam not able to get the expected output.Here i will attach the input document and excepted output document with it .I will waitng for your reply…Please kindly consider the ExpectedOutput.zip (1.3 MB)
latest expectedoutput document for reference.

@priyanga

Thanks for your inquiry. In this case we suggest you following solution. Hope this helps you.

Document doc = new Document(MyDir + "test (4).DOC");
RemoveSectionBreaks(doc);

int i = 1;
DocumentBuilder builder = new DocumentBuilder(doc);

//Remove empty paragraphs
for (Paragraph  paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
    if (paragraph.toString(SaveFormat.TEXT).trim().length() == 0
            && paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0) {
        paragraph.remove();
    }
}
doc.updatePageLayout();

Boolean hasImage = false;
//Get the paragraphs that start with "Fig".
for (Paragraph  paragraph : (Iterable<Paragraph>)doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if(paragraph.toString(SaveFormat.TEXT).trim().contains("Fig"))
    {
        Node previousPara = paragraph.getPreviousSibling();
        while (previousPara != null
                && previousPara.getNodeType() == NodeType.PARAGRAPH
                && previousPara.toString(SaveFormat.TEXT).trim().length() == 0
                && ((Paragraph)previousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
        {
            previousPara = previousPara.getPreviousSibling();
            hasImage = true;
        }

        if(hasImage && previousPara != null)
        {
            builder.moveTo(((CompositeNode)previousPara).getFirstChild());
            builder.startBookmark("Bookmark"+i);
            builder.endBookmark("Bookmark"+i);

            builder.moveTo(paragraph.getRuns().get(0));
            builder.startBookmark("FigBookmark"+i);
            builder.endBookmark("FigBookmark"+i);
            i++;
        }
        hasImage = false;
    }
}
for(int b = 1 ; b < i ; b++)
{
    Node start = doc.getRange().getBookmarks().get("Bookmark" + b).getBookmarkStart();
    Node end = doc.getRange().getBookmarks().get("FigBookmark" + b).getBookmarkEnd();
    ArrayList images =  ExtractContents.extractContent(start, end, false);
    Document dstDoc = ExtractContents.generateDocument(doc, images);


    if(dstDoc.getFirstSection().getBody().getFirstParagraph().toString(SaveFormat.TEXT).trim().length() > 0)
        for (Run  run : (Iterable<Run>)dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.RUN, true))
        {
            run.setText("");
        }

     dstDoc.getRange().replace(ControlChar.PAGE_BREAK, "", new FindReplaceOptions());

    dstDoc.save(MyDir + "Fig_"+ b + ".docx");
} 

private static void RemoveSectionBreaks(Document doc)
{
    // Loop through all sections starting from the section that precedes the last one
    // and moving to the first section.
    for (int i = doc.getSections().getCount() - 2; i >= 0; i--)
    {
        // Copy the content of the current section to the beginning of the last section.
        doc.getLastSection().prependContent(doc.getSections().get((i)));
        // Remove the copied section.
        doc.getSections().get(i).remove();
    }
}

Thanking You @tahir.manzoor .

Regards
Priya Dharshini J P