Extraction issue 14

Dear team,

We are extracting images from document using aspose java but below case its Not extracting part images please find below source code and Input & output File

Source Code :

if ((paragraph.toString(SaveFormat.TEXT).toLowerCase().trim().startsWith("fig")
					|| paragraph.toString(SaveFormat.TEXT).startsWith("Scheme")
					|| paragraph.toString(SaveFormat.TEXT).startsWith("Plate")
					|| paragraph.toString(SaveFormat.TEXT).startsWith("Abb")
					|| paragraph.toString(SaveFormat.TEXT).startsWith("Abbildung"))
					&& !paragraph.toString(SaveFormat.TEXT).toLowerCase().startsWith("abbreviations")
					// for duplicate figure caption it-15
					&& (paragraph.getNextSibling() != null
							&& !paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)
							|| (paragraph.getNextSibling() != null
									&& paragraph.getNextSibling().getNodeType() != NodeType.TABLE
									&& paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)
									&& (((Paragraph) paragraph.getNextSibling()).getChildNodes(NodeType.SHAPE, true)
											.getCount() > 0
											|| (paragraph.getNextSibling().getNextSibling()) != null
													&& paragraph.getNextSibling().getNextSibling()
															.getNodeType() != NodeType.TABLE
													&& ((((Paragraph) paragraph.getNextSibling().getNextSibling())
															.getChildNodes(NodeType.SHAPE, true).getCount() == 0)
															
															//this codition added by pavi-14-12-2021   for duplicate captions
															||(((Paragraph) paragraph.getNextSibling().getNextSibling())
																	.getChildNodes(NodeType.SHAPE, true).getCount() > 0))))
							|| paragraph.getParentSection().getBody().getLastParagraph().getText().trim()
									.matches(matches))
					// for duplicate figure caption
					&& ((paragraph.getPreviousSibling() != null
							&& paragraph.getPreviousSibling().getNodeType() != NodeType.TABLE)
							|| paragraph.getParentSection().getBody().getFirstParagraph().getText().trim()
									.matches(matches))
					&& paragraph.getNodeType() != NodeType.TABLE
					&& paragraph.getParentNode().getNodeType() != NodeType.CELL
					&& !paragraph.toString(SaveFormat.TEXT).contains(AIE.docName)
					
					//condition added by pavi -14-12-2021
					&& (!(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figure Captions"))||
							!(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figures"))))
					
			        //|| ((paragraph.getNextSibling() == null) && (builder.getCurrentParagraph().isEndOfDocument()))
			        
					
			{

Input and Output : New folder.zip (182.7 KB)

@e503824 I cannot reproduce the problem The images are properly extracted using ImageExtractor class provided in another your thread:
https://forum.aspose.com/t/need-to-extract-double-column-layout/250823/4

Dear team,

We are extracting images from document using ASPOSE java, But below case some part images are not extracting please find source code and Input & Output Files

Please Note : We need to extract this images in Caption Below conditions

Source Code :

if ((paragraph.toString(SaveFormat.TEXT).toLowerCase().trim().startsWith("fig")
					|| paragraph.toString(SaveFormat.TEXT).startsWith("Scheme")
					|| paragraph.toString(SaveFormat.TEXT).startsWith("Plate")
					|| paragraph.toString(SaveFormat.TEXT).startsWith("Abb")
					|| paragraph.toString(SaveFormat.TEXT).startsWith("Abbildung"))
					&& !paragraph.toString(SaveFormat.TEXT).toLowerCase().startsWith("abbreviations")
					// for duplicate figure caption it-15
					&& (paragraph.getNextSibling() != null
							&& !paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)
							|| (paragraph.getNextSibling() != null
									&& paragraph.getNextSibling().getNodeType() != NodeType.TABLE
									&& paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)
									&& (((Paragraph) paragraph.getNextSibling()).getChildNodes(NodeType.SHAPE, true)
											.getCount() > 0
											|| (paragraph.getNextSibling().getNextSibling()) != null
													&& paragraph.getNextSibling().getNextSibling()
															.getNodeType() != NodeType.TABLE
													&& ((((Paragraph) paragraph.getNextSibling().getNextSibling())
															.getChildNodes(NodeType.SHAPE, true).getCount() == 0)
															
															
															||(((Paragraph) paragraph.getNextSibling().getNextSibling())
																	.getChildNodes(NodeType.SHAPE, true).getCount() > 0))))
							|| paragraph.getParentSection().getBody().getLastParagraph().getText().trim()
									.matches(matches))
					// for duplicate figure caption
					&& ((paragraph.getPreviousSibling() != null
							&& paragraph.getPreviousSibling().getNodeType() != NodeType.TABLE)
							|| paragraph.getParentSection().getBody().getFirstParagraph().getText().trim()
									.matches(matches))
					&& paragraph.getNodeType() != NodeType.TABLE
					&& paragraph.getParentNode().getNodeType() != NodeType.CELL
					&& !paragraph.toString(SaveFormat.TEXT).contains(AIE.docName)
					
					&& (!(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figure Captions"))||
							!(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figures"))))
					
			        //|| ((paragraph.getNextSibling() == null) && (builder.getCurrentParagraph().isEndOfDocument()))
			        
					
			{

Input And Output : New folder.zip (182.7 KB)

And we neet to extract these labels also, please do needful

Please Note : We need to extract this images in Caption Below conditions

@e503824 This question is already answered in another your thread:
https://forum.aspose.com/t/extraction-issue-14/250868

dear team,

We need to extract under caption below conditions please help me out

@e503824 It is not quite clear what the problem is. The code I have provided in other your thread properly extracts images and captions from the attached document. Could you please elaborate your problem in more details.

Dear team,

  1. Given source code is not working for us
  2. Document have Figure caption in Below of the figure
  3. Sometimes given source code was extracting unwanted images with wrong File name
  4. Given source Code not extracting part images

Please find how we are calling classes

CaptionBelow.captionBelow(interimdoc);

CaptionAbove.captionAbove(interimdoc);

TableImage.imagesInTable(interimdoc);

TextFrameImage.textFrameImageWithCaption(interimdoc);

String outGAFilePath = tempFolder + "\\PDF\\GA1.pdf";
File gaFile = new File(outGAFilePath);

if (fileName.toLowerCase().replaceAll("\\s", "").contains(GRAPHICALABSTRACT) || (!Kromatrix.figjArray.isEmpty() && (fileName.toLowerCase().startsWith("fig") || fileName.toLowerCase().startsWith("scheme") || fileName.toLowerCase().startsWith("plate"))))
{
	if (!gaFile.exists())
	{
		FixedGraphic.fixedImage(interimdoc);
	}
}

That’s why we are using these methods

@e503824 Please note, the problem is not in Aspose.Words, the problem is in the logic you are using to analyze the document content, which is out of Aspose.Words scope. The logic implementation is not responsibility of Aspose.Words support.

Also, as I mentioned the code provided here зroperly extracts the images from the attached document. Here is output produced by this code on my side:
Fig. 7. Comparison of torsional stiffnes.pdf (50.9 KB)
Fig. 8. The relationship between kc and .pdf (40.0 KB)
As you can see both captions (see file name) and images are extracted properly. So it is not quite clear what does not work on your side.

Dear team,

please note : We need to extract Labels also please refer screenshot

Missing Items.png (2.0 KB)

@e503824 Aspose.Words does not provide document content analysis features. Aspose.Words is a tool that allows to work with documents. The logic required to analyze the documents content depend on your requirements and needs.
As I can see you do not have any issues related to Aspose.Words itself, but to content analysis only, which is out of Aspose.Words scope.