Getting Incorrect Page Number Using layoutCollector.getStartPageIndex(para)

Hi Team,

I am using layoutCollector.getStartPageIndex(para) and layoutCollector.getEndPageIndex(para) to get the startPageNumber and endPageNumber of the para but getting wrong page Number.

For the attached document, getting startpage as 7 and endPage as 8 for text : “Shipper shall use commercially reasonable efforts to cause the Capture Facilities Completion to occur with respect to the Capture Facilities that correspond to a Completed Receipt Point on or before the relevant Receipt Point Commencement Date. The Parties shall coordinate closely so that the Capture Facilities Completion of the relevant Capture Facilities occurs as close in time to the corresponding Receipt Point Commencement Date of the Completed Receipt Point, as is reasonably practicable. At least thirty (30) days prior to the estimated completion of the relevant Capture Facilities that correspond to a Receipt Point, Carrier”

And further there is no text having as pageNumber 8 and it then starts from pageNumber 9.

Please refer below the logs for same:

{"@timestamp":"2024-04-03T11:12:04.821Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: Carrier at the Receipt Point(s) for transportation to the Delivery Point its Committed Volume multiplied by the number of Days in such Contract Quarter, or (b) if a volume less than the Committed Volume is actually produced by Shipper’s Plant(s) in such Contract Quarter, to otherwise pay for a volume of Product measured in MSCF sufficient to incur Shipping Fees equal to Shipper’s Quarterly Revenue Commitment\u0002 NTD: Can discuss whether this should instead be a volume commitment.\rfor such Contract Quarter.  Shipper expressly acknowledges and affirms that Carrier is relying on Shipper’s promises in this section in order to establish the economic justification for the development of the Pipeline System to the Receipt Point(s).  \r, text count: 98, paraPageStart: 7, paraPageEnd: 7","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}
{"@timestamp":"2024-04-03T11:12:04.821Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: \u0002 NTD: Can discuss whether this should instead be a volume commitment.\r, text count: 99, paraPageStart: 7, paraPageEnd: 7","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}
{"@timestamp":"2024-04-03T11:12:04.821Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: Impact of Prorationing.  \r, text count: 100, paraPageStart: 7, paraPageEnd: 7","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}
{"@timestamp":"2024-04-03T11:12:04.821Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: Subject to Laws specifically affecting the Pipeline System, the pro-rationing of capacity on the Pipeline System will be governed by the GT&C.  To the extent permitted by applicable Law, the volume of Product that Shipper nominates for delivery, up to a volume each Day for each Receipt Point equal to Shipper’s Firm Volume for each such Receipt Point, shall be entitled to Firm Service.  Shipper shall not nominate any quantities that have not been accepted by the Operator for delivery at the Delivery Point pursuant to Section [●] of the Sequestration Services Agreement.\r, text count: 101, paraPageStart: 7, paraPageEnd: 7","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}
{"@timestamp":"2024-04-03T11:12:04.822Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: Any volume of Product that is not eligible for Firm Service pursuant to Section 2.2.1, including volume that Shipper nominates for shipment that exceeds Shipper’s Firm Volume from each Receipt Point, will be subject to the Proration Procedures in Section 4 of the GT&C and will not receive Firm Service.\u0002 NTD: There is flexibility to have Firm Volume be less than Committed Volume, but note that (1) this will likely not be appealing to emitters and (2) if we cannot take the entire Committed Volume, presumably we must vent the delta.\r To the extent Shipper does not nominate and tender up to Shipper’s Firm Volume from a Receipt Point in any Month, Carrier shall be free to utilize the capacity of the Pipeline System that Shipper has failed to use for the provision of transportation services to other shippers.\r, text count: 102, paraPageStart: 7, paraPageEnd: 7","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}
{"@timestamp":"2024-04-03T11:12:04.822Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: \u0002 NTD: There is flexibility to have Firm Volume be less than Committed Volume, but note that (1) this will likely not be appealing to emitters and (2) if we cannot take the entire Committed Volume, presumably we must vent the delta.\r, text count: 103, paraPageStart: 7, paraPageEnd: 7","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}
{"@timestamp":"2024-04-03T11:12:04.822Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: Shipper shall have no rights pursuant to this Agreement with respect to any expansion (including a pipeline loop), connection, or new service on the Pipeline System, or other pipeline system or facilities owned or operated by Carrier or its Affiliates other than with respect to the Pipeline Shipper Path as expressly provided herein.\r, text count: 104, paraPageStart: 7, paraPageEnd: 7","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}
{"@timestamp":"2024-04-03T11:12:04.822Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: Carrier Obligations.\r, text count: 105, paraPageStart: 7, paraPageEnd: 7","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}
{"@timestamp":"2024-04-03T11:12:04.822Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: Milestones.  \r, text count: 106, paraPageStart: 7, paraPageEnd: 7","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}
{"@timestamp":"2024-04-03T11:12:04.823Z","ecs.version":"1.2.0","log.level":"INFO","message":**"Start Page Number for text: Shipper shall use commercially reasonable efforts to cause the Capture Facilities Completion to occur with respect to the Capture Facilities that correspond to a Completed Receipt Point on or before the relevant Receipt Point Commencement Date. The Parties shall coordinate closely so that the Capture Facilities Completion of the relevant Capture Facilities occurs as close in time to the corresponding Receipt Point Commencement Date of the Completed Receipt Point, as is reasonably practicable.  At least thirty (30) days prior to the estimated completion of the relevant Capture Facilities that correspond to a Receipt Point, Carrier a\f, text count: 107, paraPageStart: 7, paraPageEnd: 8","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}**
**{"@timestamp":"2024-04-03T11:12:04.823Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: CVX DRAFT 11/11/22\r, text count: 108, paraPageStart: 0, paraPageEnd: 0","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}**
**{"@timestamp":"2024-04-03T11:12:04.823Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: \u0013PAGE\u00148\u0015\r, text count: 109, paraPageStart: 0, paraPageEnd: 0","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}**
**{"@timestamp":"2024-04-03T11:12:04.823Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text:  written good faith estimate of the Capture Facilities Completion for such Capture Facilities.  \r, text count: 110, paraPageStart: 9, paraPageEnd: 9","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}**
{"@timestamp":"2024-04-03T11:12:04.823Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: If, due to no fault of Carrier, the Capture Facilities Completion has not occurred with respect to the Capture Facilities that correspond to any Completed Receipt Point on or before the date on which Carrier would have achieved the Receipt Point Commencement Date due to Shipper Delay, then the applicable Receipt Point Commencement Date shall be deemed to have occurred and the Pipeline Shipper Path on the Pipeline System shall be deemed to be fully operational and ready to commence commercial service from the affected Receipt Point(s) for all purposes hereunder, including for the purpose of commencing the first Contract Quarter and calculating Deficiency Amounts under Section 5.4, except that Carrier shall have no obligation to provide Services from any Receipt Point for which Capture Facilities Completion has not occurred.\r, text count: 111, paraPageStart: 9, paraPageEnd: 9","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}
{"@timestamp":"2024-04-03T11:12:04.824Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: Subject to any extension in accordance with Section 15.2 of the GT&C on a day-for-day basis to the extent impacted by Force Majeure Events, Carrier shall endeavor to achieve the Commencement Date on or before [●] (such date, as extended for Force Majeure Events, the “Commencement Deadline”).\r, text count: 112, paraPageStart: 9, paraPageEnd: 9","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}
{"@timestamp":"2024-04-03T11:12:04.841Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: If the Commencement Date is not achieved on or before the Commencement Deadline, then Carrier shall deliver to Shipper a notice containing Carrier’s good faith estimate of the date on which the Commencement Date will be achieved (such date, the “Commencement Date Estimate”).  Upon such notice, the Commencement Date Estimate shall become the Commencement Deadline; provided that if the Commencement Date Estimate is more than [three hundred and sixty-five Days (365)] after the Commencement Deadline, Shipper may terminate this Agreement on thirty (30) Days prior written notice to Carrier, delivered no later than the date that is ten (10) Days following Shipper’s receipt of the Commencement Date Estimate and upon such termination each Party shall be released from any and all obligations under this Agreement and have no further liabilities to the other Party. If Shipper fails to deliver notice of termination within ten (10) Days following receipt of the Commencement Date Estimate, then Shipper’s right to terminate this Agreement pursuant to this Section 3.1.4 shall be deemed waived. Other than as set forth in this Section 3.1.4, Shipper shall have no rights or remedies for the failure of Carrier to achieve the Commencement Date prior to any date.\r, text count: 113, paraPageStart: 9, paraPageEnd: 9","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}
{"@timestamp":"2024-04-03T11:12:04.841Z","ecs.version":"1.2.0","log.level":"INFO","message":"Start Page Number for text: TIME IS OF THE ESSENCE WITH RESPECT TO THIS SECTION 3.1.\r, text count: 114, paraPageStart: 9, paraPageEnd: 9","process.thread.name":"pool-2-thread-1","log.logger":"com.sirionlabs.asposeParser.service.AsposeDocumentParser"}

Code Sippet:

private void getPageNumber (Document document) throws Exception {
    LayoutCollector layoutCollector = new LayoutCollector(document);
    int i = 0;
    for (Paragraph para : (Iterable<Paragraph>) document.getChildNodes(NodeType.PARAGRAPH, true)) {
        i++;
        if (Objects.isNull(para) || Objects.isNull(para.getParentNode()) || Objects.isNull(para.getDocument())) {
            log.warn("getPageNumber - para is null or parent node is null or document is null for paraCount: {}", i);
            continue;
        }
        int paraPageStart = layoutCollector.getStartPageIndex(para);
        int paraPageEnd = layoutCollector.getEndPageIndex(para);
        log.info("Start Page Number for text: {}, text count: {}, paraPageStart: {}, paraPageEnd: {}", para.getText(), i, paraPageStart, paraPageEnd);
    }
}

Attached DOcument :
test_1.docx (44.0 KB)

@ashu_agrawal_sirionlabs_com Your document is ODT document and it is rendered differently in MS Word and in OpenOffice. Aspose.Words usually behaves as MS Word. But upon reading your document content on the first page is converted to All Caps. This pushes content down and page indexes of the paragraphs might be determined incorrectly.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-26834

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Hi @alexey.noskov

Similar issue with the MS word docx document.
Attaching the document for reference.
Bayou Bend TSA - CVX 11-11-22 (1).docx (73.9 KB)

@ashu_agrawal_sirionlabs_com I cannot reproduce the problem on my side. You should note that MS Word documents are flow by their nature and does not have “page” concept. The consumer applications reflows the document content into pages on the fly. So t detect page indexes of nodes Aspose.Words also builds the document layout. The problem on your side might occur because the fonts used in your input document are not available on the machine where document is processed. The fonts are required to build document layout. If Aspose.Words cannot find the font used in the document, the font is substituted. This might lead into fonts mismatch and document layout differences due to the different fonts metrics and as a result incorrect page detection. You can implement IWarningCallback to get notifications when font substitution is performed.
Please see our documentation to learn where Aspose.Words looks for fonts:
https://docs.aspose.com/words/net/specifying-truetype-fonts-location/