Landscape image extraction using Java

e503824 · April 8, 2022, 8:31am

Dear team,

We are extracting images from doc files using java and aspose now a days we are getting landscape images in documents how to extract landscape images. we are using below source code to extract images

if ((paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig")
    || paragraph.toString(SaveFormat.TEXT).startsWith("Scheme")
    || paragraph.toString(SaveFormat.TEXT).startsWith("Plate")
    || paragraph.toString(SaveFormat.TEXT).startsWith("Abb")
    || paragraph.toString(SaveFormat.TEXT).startsWith("Abbildung")
            && paragraph.getNodeType() != NodeType.TABLE)
    //						//changes by pavi -starts check sample  D:\testing\AIE\Iteration 16_4 points\Document contains Duplicate figure captions\Revised-MANUSCRIPT
    && ((paragraph.getNextSibling() != null
    && paragraph.getNextSibling().getNodeType() != NodeType.TABLE)
    || paragraph.getParentSection().getBody().getFirstParagraph().getText().trim().matches(matches))
    && (paragraph.getNextSibling().getNodeType() != NodeType.TABLE)
    //changes by pavi -end 
    && paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0
    && !paragraph.toString(SaveFormat.TEXT).contains(AIE.docName)
        && !paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)//duplicate caption by pavi
    && !(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figure Captions")) ||
        !(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figures")))
{

Input document : Revised Manuscript-Clean.docx (792.3 KB)

our output : Revised Manuscript-Clean_Fig0004.pdf (177.9 KB)
Revised Manuscript-Clean_Fig0003.pdf (108.2 KB)

please do needful

alexey.noskov · April 8, 2022, 2:22pm

@e503824 You can get page size and orientation of the section where the original image is placed by getting the parent Section of the Shape and accessing section’s PageSetup. Foe example see the following simple code:

Section section = (Section) shape.getAncestor(NodeType.SECTION);
System.out.println(section.getPageSetup().getOrientation());
System.out.println(section.getPageSetup().getPageWidth());
System.out.println(section.getPageSetup().getPageHeight();

When you import the figure into another document, you can set the same page settings in this document to get the desired result.

e503824 · April 11, 2022, 12:43pm

dear team,
we have already tried it was not working please find source code we are using

for (Bookmark bm : doc.getRange().getBookmarks()) {
	boolean containsImage = false;
	if (bm.getName().startsWith(BK)) {
		ArrayList nodes = extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
		int orientation = ((Paragraph) bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup()
				.getOrientation();
		dstDoc = generateDocument(doc, nodes, orientation);

		PageSetup sourcePageSetup = ((Paragraph) bm.getBookmarkStart().getParentNode()).getParentSection()
				.getPageSetup();
		dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
		dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
		dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());

		NodeCollection<Paragraph> prs = dstDoc.getChildNodes(NodeType.PARAGRAPH, true);
		for (Paragraph pr : prs) {
			boolean isLabel = false;

alexey.noskov · April 11, 2022, 12:50pm

@e503824 Could you please share source code of generateDocument method? As I can assume from your code snippet, orientation of page is set in this method. We will check it and provide you more information.

e503824 · April 19, 2022, 8:03am

Dear team,

please find below source code

DocumentBuilder builder = new DocumentBuilder(doc);
	Paragraph para = null;
	Document dstDoc = null;
	String bookmarkname = "";
	String imgCaption = "";
	String matches = "Fig.*(?:[ \\r\\n\\t].*)+|Scheme.*|Plate.*|Abbildung.*";

	for (Bookmark bm : doc.getRange().getBookmarks()) {
		boolean containsImage = false;
		if (bm.getName().startsWith(BK)) {
			ArrayList nodes = extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
			int orientation = ((Paragraph) bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup()
					.getOrientation();
			dstDoc = generateDocument(doc, nodes, orientation);

			PageSetup sourcePageSetup = ((Paragraph) bm.getBookmarkStart().getParentNode()).getParentSection()
					.getPageSetup();
			dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
			dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
			dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());
			NodeCollection<Paragraph> prs = dstDoc.getChildNodes(NodeType.PARAGRAPH, true);
			for (Paragraph pr : prs) {
				boolean isLabel = false;
				if (pr.getRange().getText().contains(ControlChar.LINE_BREAK)) {
					// for label missing

					if (findSimSun(pr.getText().substring(1).trim(), dstDoc)) {
						isLabel = true;

					} else {
						isLabel = false;
					}
				}

and in this case we are removing Section Breaks after that this images are automatically moves to portrait page that’s why we are getting this type of output, anything possible to hold images in same page

alexey.noskov · April 19, 2022, 8:12am

@e503824 Page orientation is set for section in MS Word document. So without section break you cannot have different page orientation for pages in your document.
To keep the page orientation, you have to keep sections in your document.

e503824 · April 19, 2022, 9:33am

Dear team,

any think possible to find landscape and portrait pages and please share me the java source code for that

alexey.noskov · April 19, 2022, 12:17pm

@e503824 As I mentioned page orientation is set per section in MS Word document. You can check whether section has landscape orientation using PageSetup.getOrientation property. For example see the following code:

for(Section s : doc.getSections())
{
    System.out.println(s.getPageSetup().getOrientation() == Orientation.LANDSCAPE);
}

If you need to check whether some particular node is in section with landscape orientation, you can use code like this:

Section parentSection = (Section)node.getAncestor(NodeType.SECTION);
System.out.println(parentSection.getPageSetup().getOrientation() == Orientation.LANDSCAPE);