Numbering problem in word to html

Hi,
We used Aspose word Java API to convert a Word document to HTML, it seems to have done a good job, except that some of the numbering are missed. Please see the attached image for details.

Most of the numbering is good, only a few of them are missing. It seems to be a small bug.

Thanks,
Raghavendra

@raghud1005 Could you please attach your input and output documents here for testing? We will check the issue and provide you more information. Unfortunately, it is difficult to say what the problem is without real documents.

Also, you can play with HtmlSaveOptions.ExportListLabels property on your side.

Hi Alexey,

As I wanted to send the files, I noticed that the Aspose output was perfect. Somewhere in our LuceneIndexing, it is getting lost.

Thanks for your quick response.

Regards,
Raghavendra

@raghud1005 It is perfect that you managed to find the reason of the problem. Please feel free to ask in case of any other issues, we will be glad to help you.

Hi Alexey,
I need your help once again.
I require replacing a text starting and ending with certain characters; ex %TFF% only text input or %TFF12345% here text input with a 12345 as a default text. It does work, except the position is lost. It means it inserts the TI field either at the beginning or at the end of the paragraph instead of the exact location.

Please take a look and let me know what am I doing wrong, or do let me know if you need more information.
Thanks,

Raghavendra

Visit Topic to respond.

To unsubscribe from these emails, click here.

TFF1_noformat.docx (12 KB)

outnew.docx (9.76 KB)

(Attachment MyReplaceEvaluator.java is missing)

(Attachment TestCls.java is missing)

Hi Alexey,
I need your help once again.
I require replacing a text starting and ending with certain characters; ex %TFF% only text input or %TFF12345% here text input with a 12345 as a default text. It does work, except the position is lost. It means it inserts the TI field either at the beginning or at the end of the paragraph instead of the exact location.

public class TestCls {

    static {
        AsposeConfig.setAsposeLicense();
    }

    public static Document getDocumentFromPath(Path pathVar) throws Exception {
        return new Document(new FileInputStream(pathVar.toFile()));
    }

    public static void main(String[] args) throws Exception {

        Document doc = getDocumentFromPath(Path.of("C:\\git\\TribunaMergeFieldMigrator\\src\\test\\resources\\test_data\\src_directory\\BE\\2024\\TFF1_noformat.docx"));
        // DocumentBuilder builder = new DocumentBuilder(doc);

        FindReplaceOptions options = new FindReplaceOptions();

        options.setReplacingCallback(new MyReplaceEvaluator());

        doc.getRange().replace(Pattern.compile("%TFF(.*?)%"), "", options);
        // Now replace all tabs with sequence of whitespaces.
        //doc.getRange().replace(new Rege("\t"), new MyReplaceEvaluator("\t"),false);
        // Save the result.
        doc.save("C:\\Temp\\outnew.docx");
    }
}
public class MyReplaceEvaluator implements IReplacingCallback {

    @Override
    public int replacing(ReplacingArgs e) throws Exception {
        int maxLength = 50;
        DocumentBuilder builder = new DocumentBuilder((Document) e.getMatchNode().getDocument());
        Node matchNode = e.getMatchNode();
        builder.moveTo(matchNode);
        String text1 = builder.getCurrentParagraph().getText();
        String defaultText = getDefaultText(text1.trim());
        System.out.println(" text1 = " + text1);

        FormField textInput = builder.insertTextInput("a", TextFormFieldType.REGULAR, "", "", maxLength);
        textInput.setTextInputDefault(defaultText);

        textInput.setTextInputValue("TI");

        e.getReplacement();
        return ReplaceAction.REPLACE;
    }

    private String getDefaultText(String input) {
        String output = "";
        String patternString = "%TFF(.*?)%";
        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(input);

        if (matcher.find()) {
            output = matcher.group(1);
            //System.out.println("Extracted text: " + output);
        } else {
            //System.out.println("No match found.");
        }
        return output;
    }
}

@raghud1005 The problem occurs because match node not always contains only the matched text. To get the expected output you should split the matched node. Please see the following code and output produced by it:

Document doc = new Document("C:\\Temp\\in.docx");
FindReplaceOptions options = new FindReplaceOptions();
options.setReplacingCallback(new ReplaceWithFormFieldCallback());
doc.getRange().replace(Pattern.compile("%TFF(.*?)%"), "", options);
doc.save("C:\\Temp\\out.docx");
public class ReplaceWithFormFieldCallback implements IReplacingCallback {
    
    @Override
    public int replacing(ReplacingArgs e) throws Exception {
        
        Document doc = (Document)e.getMatchNode().getDocument();
        
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.getMatchNode();
        
        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.getMatchOffset() > 0)
            currentNode = splitRun((Run)currentNode, e.getMatchOffset());
        
        // This array is used to store all nodes of the match for further deleting.
        ArrayList<Run> runs = new ArrayList<Run>();
        
        // Find all runs that contain parts of the match string.
        int remainingLength = e.getMatch().group().length();
        while (
                remainingLength > 0 &&
                        currentNode != null &&
                        currentNode.getText().length() <= remainingLength)
        {
            runs.add((Run)currentNode);
            remainingLength -= currentNode.getText().length();
            
            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.getNextSibling();
            } while (currentNode != null && currentNode.getNodeType() != NodeType.RUN);
        }
        
        // Split the last run that contains the match if there is any text left.
        if (currentNode != null && remainingLength > 0)
        {
            splitRun((Run)currentNode, remainingLength);
            runs.add((Run)currentNode);
        }
        
        // Create DocumentBuilder to insert HTML.
        DocumentBuilder builder = new DocumentBuilder(doc);
        // Move builder to the first run.
        builder.moveTo(runs.get(0));
        FormField textInput = builder.insertTextInput("a"+mIndex, TextFormFieldType.REGULAR, "", "", 0);
        textInput.setTextInputDefault("defaultText");
        textInput.setTextInputValue("TI");
        mIndex++;
        
        // Delete matched runs
        for (Run run : runs)
            run.remove();
        
        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.SKIP;
    }
    
    private static Run splitRun(Run run, int position)
    {
        Run afterRun = (Run)run.deepClone(true);
        run.getParentNode().insertAfter(afterRun, run);
        afterRun.setText(run.getText().substring(position));
        run.setText(run.getText().substring(0, position));
        return afterRun;
    }
    
    private int mIndex = 0;
}

out.docx (9.8 KB)

Hi Alex,

Thank you so much. The position problem is solved now, however, the default text for the Text input needs to be captured dynamically and is not working anymore. Ex: %TFFmein Standarttext fĂĽr den Input%. here the default text is mein Standarttext fĂĽr den Input

but in this case %TFF% it should be empty as there is no text between %TFF and %.

Thanks & Regards,
Ragavendra

@raghud1005 In this case you should simply replace the following line of code:

textInput.setTextInputDefault("defaultText");

with this:

textInput.setTextInputDefault(e.getMatch().group(1));

Hi Alex,

Thanks for the earlier reply. It was fine until one of our clients tried it with a document containing %TFF% next to each other or in French text.

German text : NPE; no parentNode found:
In der oben genannten Angelegenheit übermitteln wir Ihnen %TFFdie Verfügung%%TFFden Entscheid% vom «I_D_1».

%TFFDiese%%TFFDieser% gilt per «I_D_2» als zugestellt, nachdem die eingeschriebene Postsendung nicht abgeholt worden ist, obwohl mit einer Zustellung gerechnet werden muss-te (Art. 85 Abs. 4 lit. a StPO).

French Text: Index out of bounds exception

Les parties demanderesses/requérantes fourniront une avance de frais de CHF %TFF% jusqu’au %TFF%, au moyen du bulletin de versement annexé, auprès du Tribunal régional du Jura bernois-Seeland, Section civile.

Please help me.

Thanks,
Raghavendra

@raghud1005 Could you please attach the problematic document here for testing? We will check the issue and provide you more information.

Hi Alexey,

Here are the problematic documents.
Postfach 535.docx (46.5 KB)

Briefkopf.docx (69.3 KB)

The goal is to replace all %TFF% in the text CHF %TFF% jusqu’au %TFF% with textinput and %TFF12345% with 12345 as default text in the textinput.

I am looking forward to your solution.

Thanks,
Raghavendra

@raghud1005 Thank you for additional information. Please try replacing in backward direction:

Document doc = new Document("C:\\Temp\\in.docx");
FindReplaceOptions options = new FindReplaceOptions();
options.setReplacingCallback(new ReplaceWithFormFieldCallback());
options.setDirection(FindReplaceDirection.BACKWARD);
doc.getRange().replace(Pattern.compile("%TFF(.*?)%"), "", options);
doc.save("C:\\Temp\\out.docx");

Hi Alexey,

Thank you so much. With this single line change, it works perfectly.

Regards,
Raghavendra

1 Like