Hyperlink when extracting nodes from document

hariomgupta73 · June 1, 2022, 2:28pm

Hi Team,

I am facing issue while fetching nodes when the document contains href.(hyper link). I have a html , which i insert in the document using document.insertHtml(). And then i read nodes from the document and then copy them to new/main document. Issue is occuring when href is there in document. For href , it is giving 2 nodes and when nodes are copied into main document , href is not working and duplicate text is shown.

"HYPERLINK "http://qa.dev.com:8080/ux/" \l "/update/tblclause/5153?update=5" \t "_blank" http://qa.dev.com:8080/ux/#/update/tblclause/5153?update=5"

Code that i use to fetch nodes :

private static void insertParagraphChildNodes(DocumentBuilder builder, Paragraph paragraphNode, boolean parentNodeCell) throws Exception {
    NodeCollection nodes = paragraphNode.getChildNodes();
    Iterator iterator = nodes.iterator();
    builder.getFont();
    while (iterator.hasNext()) {
        Node childNode = (Node) iterator.next();
        if (childNode.getNodeType() == NodeType.RUN) {
            Run runNode = (Run) childNode;
            builder.getFont().setBold(runNode.getFont().getBold());
            builder.getFont().setItalic(runNode.getFont().getItalic());
            builder.getFont().setUnderline(runNode.getFont().getUnderline());
            builder.getFont().setColor(runNode.getFont().getColor());
            builder.getFont().setHighlightColor(runNode.getFont().getHighlightColor());
            if (runNode.getFont().getShading() != null) {
                builder.getFont().getShading().setForegroundPatternColor(runNode.getFont().getShading().getForegroundPatternColor());
                builder.getFont().getShading().setBackgroundPatternColor(runNode.getFont().getShading().getBackgroundPatternColor());
            }
            builder.write(runNode.getText());
        } else if (childNode.getNodeType() == NodeType.SHAPE) {
            Shape shapeNode = (Shape) childNode;
            if (shapeNode.hasImage()) {
                try {
                    builder.insertImage(shapeNode.getImageData().getImageBytes(), shapeNode.getWidth(), shapeNode.getHeight());
                } catch (Exception e) {
                }
            }
        }
    }
    if (!parentNodeCell)
        builder.writeln();
}

Attached file testDoc.docx (12.1 KB)

alexey.noskov · June 1, 2022, 2:58pm

@hariomgupta73 Hyperlink in MS Word document and in Aspose.Words DOM is represented by HYPERLINK filed. Fields are represented by special nodes: FieldStart, FieldSeparator and FieldEnd. Text between start and separator is field code and text between separator and end is field result (displayed text). Please see our documentation to learn more about fields.
In your code you copy only Run and Shape nodes and skip other nodes like FieldStart, FieldSeparator and FieldEnd and both field code and field separator is displayed as simple text. Your code can be modified like this:

private static void insertParagraphChildNodes(DocumentBuilder builder, Paragraph paragraphNode, boolean parentNodeCell) throws Exception {
    NodeCollection nodes = paragraphNode.getChildNodes();
    Iterator iterator = nodes.iterator();
    builder.getFont();
    while (iterator.hasNext()) {
        Node childNode = (Node) iterator.next();
        Node dstNode = builder.getDocument().importNode(childNode, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);
        builder.insertNode(dstNode);
    }
    if (!parentNodeCell)
        builder.writeln();
}

hariomgupta73 · June 1, 2022, 3:34pm

Thanks alexey.noskov.