Drop Down List

Hi,

I need to extract all text from my word document using DocumentVisitor, but I don’t know how to extract text from drop down list. Here is document with drop down list. Can you help me?

Thanks

Hi Djordje,

Thanks for your inquiry. You can interact with content controls by using StructuredDocumentTag class. Here is the code to extract display text and and values from the Drop Down List:

NodeCollection sdtNodes = doc.getChildNodes(NodeType.STRUCTURED_DOCUMENT_TAG, true);
for (StructuredDocumentTag sdt : (Iterable<StructuredDocumentTag>)sdtNodes)
    if (sdt.getSdtType() == SdtType.DROP_DOWN_LIST)
        for (SdtListItem item : (Iterable<SdtListItem>)sdt.getListItems())
            System.out.println(item.getDisplayText() + " | " + item.getValue());

I hope, this helps.

Best regards,

Hi,

this doesn’t help me enough, I can extract text from drop down list on this way, but also I need to get back text to drop down list with documentBuilder.write() -metod, because I extract text to XML do something with it and get it back to document, but I can not move documentBuilder to item of drop down list. Do you have, know way how I can do this?

To do this I will need something like visitStructuredDocumentTagItem()…

Thanks

Hi Djordje,

Thanks for your inquiry. I am afraid, currently you can’t modify choices (list item’s display text and value) in drop down list SDT controls. I have logged your requirement in our issue tracking system as WORDSNET-10365. We will further look into the details of this problem and keep you updated on the status of this issue. We apologize for your inconvenience.

Best regards,

Hi,

while I was working on my improvement for lists (drop down and combo) I found few problems in your framework. Do you have way to help me with this?

In attached doc you have two drop down lists, in first problem is that first item “Mother – Marcela (Spanish Speaking)” was extracted like a run but not like item

In second list item “Family lives closest to our Main Hospital Location at Thomas.” was extracted two times, first like a item (SdtListItem) and than like a run. This makes me pretty problem.

Here is my test:

public class DocToText extends DocumentVisitor {

    public int visitRun(Run run) throws Exception {
        System.out.println("Run text: " + run.getText());
        return VisitorAction.CONTINUE;
    }

    public int visitStructuredDocumentTagStart(StructuredDocumentTag sdt)
            throws Exception {
        for (SdtListItem item : (Iterable) sdt.getListItems()) {
            System.out.println("Item text: " + item.getValue() + "|" + item.getDisplayText());
        }
        return super.visitStructuredDocumentTagStart(sdt);
    }
}
public void test() throws Exception {
    Document doc = new Document("List1.docx");
    DocToText myConverter = new DocToText();
    doc.accept(myConverter);
}

Hi Djordje,

Thanks for your inquiry.

The problems occur because the text “Mother – Marcela (Spanish Speaking)” is direct child (Run) of first SDT and “Family lives closest to our Main Hospital Location at Thomas.” is direct child of second SDT combo box. Moreover, “Family lives closest to our Main Hospital Location at Thomas.” is also a list item inside second SDT. You can remove these Run nodes from SDT to fix this issue. For example, please see attached document i.e. generated using the following code:

Document doc = new Document(getMyDir() + "List1.docx");
for (StructuredDocumentTag sdt : (Iterable<StructuredDocumentTag>)doc.getChildNodes(NodeType.STRUCTURED_DOCUMENT_TAG, true)) {
    if (sdt.getSdtType() == SdtType.COMBO_BOX) {
        sdt.removeAllChildren();
        // sdt.getListItems().setSelectedValue(sdt.getListItems().get(1));
    }
}
doc.save(getMyDir() + "out.docx");

I hope, this helps.

Best regards,

Hi,

thanks, but this doesn’t help me, if I do this text “Mother – Marcela (Spanish Speaking)” and "Vanessa is an6year 0month old child who was referred to developmental pediatrics by the pediatric provider stated above due to concerns regarding developmental progression with language. " will not be extracted and I need to extract all text, but without repetition, using DocumentVisitor.

public void test() throws Exception {
    Document doc = new Document("/home/emisia/Desktop/List1.docx");
    DocumentBuilder builder= new DocumentBuilder(doc);
    DocToText myConverter = new DocToText(builder);
    doc.accept(myConverter);
}

public class DocToText extends DocumentVisitor {

    DocumentBuilder builder;

    public DocToText(DocumentBuilder builder) {
        this.builder = builder;
    }

    public int visitRun(Run run) throws Exception {
        System.out.println("Run text: " + run.getText());
        return VisitorAction.CONTINUE;
    }

    public int visitStructuredDocumentTagStart(StructuredDocumentTag sdt)
            throws Exception {
        sdt.removeAllChildren();
        for (SdtListItem item : (Iterable) sdt.getListItems()) {
            System.out.println("Item text: " + item.getValue() + "|"
                    + item.getDisplayText());
            SdtListItem newItem = new SdtListItem("T_" + item.getValue());
        }
        return super.visitStructuredDocumentTagStart(sdt);
    }
}

Hi Djordje,

Thanks for your inquiry. I think, you should maintain a unique collection of items (ArrayList) inside DocToText class, for example, if a run is encountered, you can add it to collection only if the same text is not already present in collection. Likewise, if a SdtListItem is encountered, before adding it to collection you can check if it already contains a similar text or not. I hope, this helps.

Best regards,

Hi Djordje,

Thanks for being patient. It is to update you that the implementation of the fix to this issue is postponed till a later date and we cannot push it into production right now because there are many other important issues we have to work on. As a workaround, please just clear the SdtListItemCollection and recreate every item as needed. Rest assured, we will inform you via this thread as soon as this issue is resolved. We apologize for any inconvenience.

Best regards,

Hi,

I succeed to implement somehow support for drop down list, but now I found new problem. For now I have inserted runs from drop down list on this way structuredDocumentTag.appendChild(Run); but for drop down list in attached document this doesn’t work.

Can You help me and tell me why this doesn’t work and how I could fix it?

Here is my test code:

public class AsposeSDT {
    StructuredDocumentTag s;

    @Test
    public void test() throws Exception {
        Document document = new Document(
                "/the_simplest_possible_combobox_test.docx");
        Iterator i = new Iterator();
        document.accept(i);
        Run r = new Run(document, "newText");
        s.appendChild®;
    }

    private class Iterator extends DocumentVisitor {
        public int visitStructuredDocumentTagStart(StructuredDocumentTag sdt)
                throws Exception {
            s = sdt;
            return super.visitStructuredDocumentTagStart(sdt);
        }
    }
}

If You try same test with attached file with name works.docx You will see that test works. I don’t know where is problem.

Thanks

Best regards

Hi Djordje,

We are working on your query and will update you soon.

Best Regards,

Hi Djordje,

Thanks for your inquiry. Please try to view these documents with DocumentExplorer project. You can view the DOM structures of these documents and find differences after reading the following article.
https://docs.aspose.com/words/java/aspose-words-document-object-model/

I hope, this helps.

Secondly, we will inform you as soon as we add an ability to set values for SdtListItem.Value and SdtListItem.DisplayText properties.

Best regards,

Hi,

I can not find Document Explorer on this https://github.com/aspose-words/Aspose.Words-for-Java link. Can You send me a link to DocumentExplorer?

Few posts before You told me that support for SDT it won’t be implemented soon, if is that still true can You just investigate some other way to I do what I need with document like this?

Thanks

Best regards

Hi Djordje,

Thanks for your inquiry. You can find DocumentExplorer under ‘viewersandvisualizers’ folder.
https://github.com/aspose-words/Aspose.Words-for-Java

Secondly, after visualizing the DOM structure of these documents, you may be able to push some nodes under SDT node and get the desired results.

Best regards,

@djordje,

Regarding WORDSNET-10365, we have completed the work on this issue and concluded to close this issue with “Won’t fix” status because there is an easy workaround to do this i.e. just clear the SdtListItemCollection and recreate every item as needed.