Get Paragraph numbering using java

Hi,

In my project I am extracting content between some characters from uploaded word document using Aspose and save it in the data base. if customer click document download option I am fetching a customer template and replace a special word with all the extracted content. So i want the downloaded document should be and uploaded document should be save.
If uploaded or template document’s Normal style contains paragraph numbering. If uploaded document input.docx (21.3 KB) contains paragraph number with negative indentation. Also removed numbering from certain paragraph. But in downloaded document output.docx (14.2 KB) not contains negative indentation and paragraph number removed paragraph also contains paragraph.
my expected document same as inputinput.docx (21.3 KB)
attaching sample code paragraph numbering.zip (152.1 KB)

Thank you

@Gptrnt In your code you store part of document content as HTML. It is not always possible to retain all MS Word document formatting in HTML. I have modified your code to use FlatOpc (MS Word 2007 XML) format instead of HTML and the result produced looks like what you need. Here are code modifications I have made:

  1. I have replaced htmlSaveOption method with the following:
private static OoxmlSaveOptions ooxmlSaveOption() throws Exception{
    OoxmlSaveOptions options = new OoxmlSaveOptions();
    options.setSaveFormat(SaveFormat.FLAT_OPC);
    return options;
}
  1. In the TokenService class I have modified addAgendaItemContent method like the following:
private void addAgendaItemContent(DocumentBuilder builder, String contentString, String field, int slNo) throws Exception {
    builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.NORMAL);
    builder.getParagraphFormat().getStyle().setName("Normal");
    contentString = contentString == null ? "" :contentString.trim();
    addAgendaItemContentCore(builder,contentString, field, slNo);
}

The second overload of this method is renamed to addAgendaItemContentCore and modified as the following:

private void addAgendaItemContentCore(DocumentBuilder builder, String contentString, String field, int slNo) throws Exception {
    try
    {
        if (!contentString.equals(""))
        {
            ByteArrayInputStream bais = new ByteArrayInputStream(contentString.getBytes());
            LoadOptions opts = new LoadOptions();
            opts.setLoadFormat(LoadFormat.FLAT_OPC);
            Document tempDoc = new Document(bais, opts);
            DocumentBuilder tempBuilder = new DocumentBuilder(tempDoc);
            if (tempDoc.getLastSection().getBody().getLastParagraph().toString(SaveFormat.TEXT).trim().length() == 0)
            {
                tempDoc.getLastSection().getBody().getLastParagraph().remove();
            }
            insertHiddenWord(tempBuilder, field + slNo, false);
            tempBuilder.moveToDocumentEnd();
            insertHiddenWord(tempBuilder, field + slNo, true);

            int importFormatting = ImportFormatMode.KEEP_SOURCE_FORMATTING;
            ImportFormatOptions importFormatOptions = new ImportFormatOptions();
 
            builder.insertDocument(tempDoc, importFormatting, importFormatOptions);
        }
    }
    catch (Exception e)
    {
        System.out.println("Error while creating document for inserting " + field);
        e.printStackTrace();
    }
}

Here is the output document produced on my side: output.docx (13.8 KB)
output.pdf (64.1 KB)

Hi,

I have go through your code and your output. In that negative indentation is issue is fixing, but in all paragraph having paragraph numbering. As you can see in my input document after 8th and 10th paragraph I am removed the numbering. But in output file that paragraph also contains the numbering. Please give a solution considering with this scenario also.

Thank you

@Gptrnt There are two reasons of the problem with numbering.

  1. In your code you extract only the first item.
    here is the modified code:
    private static List<HashMap<String,String>> SaveInItem(BookmarkCollection bookmarkCollection, Document document, SaveOptions saveOptions) {
        List<HashMap<String,String>> item = new ArrayList<>();
        for (int i =1;i<=5; i ++){
            HashMap<String,String> details = new HashMap<>();
            details.put("Title",getTitle(bookmarkCollection,i));
            details.put("Note",getHtmlContentFromBookMark("m", bookmarkCollection, document, saveOptions, i));
            details.put("Description",getHtmlContentFromBookMark("f", bookmarkCollection, document, saveOptions, i));
            item.add(details);
        }
        return item;
    }
  1. You are using old 21.1 version of Aspose.Words for Java. After updating the version to the latest 22.1, the problem does not occur: output.docx (16.5 KB)
    output.pdf (70.9 KB)

Hi,
I didn’t find any change in the code. Its same code i have attached. Please rectify and send me the correct code.

Thank you

@Gptrnt In your code there were hardcoded indexes, like this:

details.put("Note",getHtmlContentFromBookMark("m", bookmarkCollection, document, saveOptions, 1));
details.put("Description",getHtmlContentFromBookMark("f", bookmarkCollection, document, saveOptions, 1));

I replaced with i variable:

details.put("Note",getHtmlContentFromBookMark("m", bookmarkCollection, document, saveOptions, i));
details.put("Description",getHtmlContentFromBookMark("f", bookmarkCollection, document, saveOptions, i));

Hi,

i have tried your code. But the above quoted code is not working as it throwing following error in run time,

java.lang.IllegalStateException: Exporting fragments of a document in this format is not supported.

while adding OoxmlSaveOptions to dstHTML.toString(ooxmlSaveOption());

full code:

private static String getHtmlContentFromBookMark(String field, BookmarkCollection bookmarkCollection, Document document, SaveOptions saveOptions, int slNo) {
        try {
            Bookmark bookmark = bookmarkCollection.get(field + slNo);
            if (bookmark == null) return "";
            Node startNode = bookmark.getBookmarkStart();
            Node endNode = bookmark.getBookmarkEnd();
            ArrayList<Node> extractedNodes = extractContent(startNode, endNode, false);
            Document dstHTML = generateDocument(document, extractedNodes);
            return dstHTML.toString(ooxmlSaveOption());
        }catch (Exception e){
            System.out.println("error while fetching bookmark of preamble "+ field + slNo );
        }
        return "";
    }

Please help me to fix the issue.

Thank you

@Gptrnt Excuse me, I missed to share the modified version of this method. Please see the following code:

private static String getHtmlContentFromBookMark(String field, BookmarkCollection bookmarkCollection, Document document, SaveOptions saveOptions, int slNo)
{
    try
    {
        Bookmark bookmark = bookmarkCollection.get(field + slNo);
        if (bookmark == null) return "";
        Node startNode = bookmark.getBookmarkStart();
        Node endNode = bookmark.getBookmarkEnd();
        ArrayList<Node> extractedNodes = extractContent(startNode, endNode, false);
        Document dst = generateDocument(document, extractedNodes);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        dst.save(baos, saveOptions);
        return baos.toString("UTF-8");
    }
    catch (Exception e)
    {
        System.out.println("error while fetching bookmark of preamble " + field + slNo);
    }
    return "";
}

FlatOpc format is not supported for exporting fragments, but in your case you extract content into a separate document, so you can simply save the fragment document to stream and then convert the stream to string, like demonstrated in the code above.

Hi,

I have tried the solution. While testing with the solution I have encounter some other issue with some html Can you help me with any other solution for the issue .

Thank you

@Gptrnt Could you please elaborate what problems you encountered with the suggested approach? If possible please attach sample input, output and expected documents, here for testing. We will check the issues and provide you more information.