Getting incorrect paragraph coordinates for a specific doc

Hi Aspose Support,

I am getting incorrect coordinates for a few paragraphs. This issue is occurring only for a specific doc which is attached herewith.

For example, for the text:

Subject to clause 12©, the Contractor must indemnify the Principal and each Principal Associate from and against: any Claim or Loss brought against, suffered or incurred by the Principal Associate as a result of the Contractor or a Contractor Associate failing to comply the Work Health and Safety Requirements or Environmental Requirements; any Claim or Loss brought against, suffered or incurred by the Principal or a Principal Associate arising out of, or in connection with, a breach by the Contractor of clauses 12.1 to 12.11 (inclusive); and any Claim or Loss brought against, suffered or incurred by the Principal or a Principal Associate arising out of, or in connection with, the Contractor or a Contractor Associate failing to comply with a Legislative Requirement, including any fines or penalties to the extent permitted by law.

which runs through pages 57-58 of the document, we are getting the following sets of coordinates:

[ 
        {
            "xstart" : 113.0,
            "ystart" : 640.0,
            "height" : 116.0,
            "width" : 383.0,
            "pagewidth" : 595.3,
            "pageheight" : 841.900024414062,
            "xend" : 496.0,
            "yend" : 756.0,
            "pageno" : 58
        }, 
        {
            "xstart" : 155.0,
            "ystart" : 116.0,
            "height" : 47.0,
            "width" : 154.0,
            "pagewidth" : 595.3,
            "pageheight" : 841.900024414062,
            "xend" : 309.0,
            "yend" : 163.0,
            "pageno" : 59
        }, 
        {
            "xstart" : 155.0,
            "ystart" : 168.0,
            "height" : 12.0,
            "width" : 274.0,
            "pagewidth" : 595.3,
            "pageheight" : 841.900024414062,
            "xend" : 429.0,
            "yend" : 180.0,
            "pageno" : 59
        }
    ]

Same code logic is running fine for other documents. Request your help to identify the issue which can cause this behaviour for this particular document.

Aspose_Doc_To_Share.docx (128.0 KB)

@manmohansirionlabs Could you share your code that will allow us to reproduce the problem. If I understand properly you are using LayoutCollector and LayoutEnumerator to get coordinates of content in your document. LayoutCollector and LayoutEnumerator uses the same layout engine as is used for conversion to fixed page formats, like PDF. I have checked conversion to PDF on my side using the latest 22.7 version of Aspose.Words for .NET and the output document looks correct on my side. Could you please check on your side whether the document is rendered properly?

Sure @alexey.noskov, below code is being used to find coordinates and set in a java object of Coordinates class:

Document document = null;
InputStream inputStream = new ByteArrayInputStream(documentData.getDocByteData()); //getting inputStream from byte array of document
document = new Document(inputStream);    
LayoutEnumerator layoutEnumerator = new LayoutEnumerator(document);
LayoutCollector    layoutCollector = new LayoutCollector(document);
DocumentBuilder documentBuilder = new DocumentBuilder(document);
Coordinates coordinates = null;

int i =0;
for (Paragraph para : (Iterable<Paragraph>) document.getChildNodes(NodeType.PARAGRAPH, true)) {
    i++;
    ParaData paraData = new ParaData(); //my custom class to store paragraph data

    int paraPageStart = layoutCollector.getStartPageIndex(para);
    paraData.setParaPageStart(paraPageStart);

    documentBuilder.moveTo(para);
    BookmarkStart start = documentBuilder.startBookmark("Bookmark" + i); //will be at the start of para
    BookmarkEnd end = documentBuilder.endBookmark("Bookmark" + i); //will be at the end of para
    paraData.setStart(start);
    paraData.setEnd(end);

    paraDatas.put(i,paraData);
}

for(Map.Entry<Integer, ParaData> entry : paraDatas.entrySet()){
    Object bstart = layoutCollector.getEntity(entry.getValue().getStart());
    Object bend = layoutCollector.getEntity(entry.getValue().getEnd());
    entry.getValue().setBstart(bstart);
    entry.getValue().setBend(bend);
}

Integer j = 0;
for (Paragraph para : (Iterable<Paragraph>) document.getChildNodes(NodeType.PARAGRAPH, true)) {
    j++;
    Object bstart = paraDatas.get(j).getBstart();
    Object bend = paraDatas.get(j).getBend();

    layoutEnumerator.setCurrent(bstart);
    Rectangle2D startBookmarkFrame = layoutEnumerator.getRectangle();
    layoutEnumerator.setCurrent(bend);
    Rectangle2D endBookmarkFrame = layoutEnumerator.getRectangle();
    coordinates = new Coordinates();

    coordinates.setStart_x_coordinate(startBookmarkFrame.getBounds().getMinX());
    coordinates.setStart_y_coordinate(startBookmarkFrame.getBounds().getMinY());
    coordinates.setEnd_x_coordinate(endBookmarkFrame.getBounds().getMaxX());
    coordinates.setEnd_y_coordinate(endBookmarkFrame.getBounds().getMaxY());
    coordinates.setHeight(Math.abs(coordinates.getEnd_y_coordinate() - coordinates.getStart_y_coordinate()));
    coordinates.setWidth(Math.abs(coordinates.getEnd_x_coordinate() - coordinates.getStart_x_coordinate()));
    coordinates.setPageHeight(document.getPageInfo(Math.max(curPageStart-1, 0)).getHeightInPoints());
    coordinates.setPageWidth(document.getFirstSection().getPageSetup().getPageWidth());
    coordinates.setPageNo(paraDatas.get(j).getParaPageStart(););
}

I also checked the rendered PDF document, it looks fine at my end. Here’s how I’m converting the docx to pdf:

Document document = null;
InputStream inputStream = new ByteArrayInputStream(documentData.getDocByteData()); //getting inputStream from byte array of document
document = new Document(inputStream);
ByteArrayOutputStream stream = new ByteArrayOutputStream();
document.save(stream, SaveFormat.PDF);
byte[] pdfBytes = stream.toByteArray();
//rest of the code uses pdfBytes to save pdf doc at specific location

Below is the rendered document given by Aspose:
Aspose_Doc_To_Share.pdf (441.2 KB)
I’m using Aspose.Words version 22.2 as opposed to the latest 22.7.
Also I’m calculating the coordinates from the .docx file and using those coordinates to highlight the text in .pdf file. Could there be any mismatch between coordinates in both the files that’s causing this issue?

@manmohansirionlabs Thank you for additional information. As I can see in your code you put bookmark at the beginning of the paragraph and at the end of the paragraph. But paragraph can span several pages and in this case you code dill not calculate paragraph rectangle correctly, since part of it is on one page and another part or even parts on other pages.
Also, could you please elaborate why you highlight content in the resulting PDF. Would not it be easier to highlight content in the source MS Word document and then convert it is PDF? You can highlight content using Aspose.Words and then save the document to PDF.

If conversion to PDF and coordinates calculation is performed in the same environment there should not be any mismatch because the same layout engine is used for converting document to PDF and for getting layout information for the document.