Last chart image is missing while Extracting content

Gptrnt · August 29, 2022, 2:09pm

Hi,
I am extracting content between two special hidden character(after find the character i will convert it as bookmark and passing it to extract method) and convert it to html and store it. Then for requirement I will take the html and create the document with html. Attaching the sample code SampleImport.zip (50.4 KB) . My extracting document and document created with sample html should be same. But in this input file input.docx (31.9 KB) . Output file not contains the added chart image (in hidden form). Attaching the output file output.docx (7.5 KB). Please help me to sort out this issue.

Note: My expected output files should be same as input file expect without hidden characters(|p|,|/p|)

Thank you

alexey.noskov · August 29, 2022, 2:36pm

@Gptrnt It looks like there is something wrong with your method that extracts content between noes. I have tested using the following code and content is properly extracted:

Document document = new Document("C:\\Temp\\in.docx");
FindReplaceOptions options = new FindReplaceOptions(FindReplaceDirection.BACKWARD);
AsposeReplaceCallBack replaceCallBack = new AsposeReplaceCallBack();
options.setReplacingCallback(replaceCallBack);
//extraction call back
Pattern pattern = Pattern.compile("\\|[/]?[pnt][0-9][/]?\\|",Pattern.CASE_INSENSITIVE);
document.getRange().replace(pattern, " ", options);

Bookmark bk = document.getRange().getBookmarks().get("p2");
ArrayList<Node> nodes = ExtractContentHelper.extractContent(bk.getBookmarkStart(), bk.getBookmarkEnd(), false);
Document outputDoc = ExtractContentHelper.generateDocument(document, nodes);
outputDoc.save("C:\\Temp\\out.docx",SaveFormat.DOCX);

Here is ExtractContentHelper implementation used on my side: ExtractContentHelper.zip (3.2 KB)
And here is output document produced on my side: out.docx (20.3 KB)