Identify the page breaks

Is it possible to identify the page breaks in PDF? Please refer the attached files.

files.zip (203.3 KB)

The PDF files are created using Word save as PDF functionality. I have two word documents. Doc 1 and Doc 2. On both document we have same paragraphs of text. In Doc 1 the pages are split using Page Break and Section Break. In Doc 2 no page/section breaks are used.

Is it possible to identify whether the page has page/section break from PDF files supplied?

As we need to add some header text while assembling the PDF depending upon how the page was split.

@Sri79

Thank you for contacting support.

We have analyzed the data shared by you and would like to update you that a PDF document does not provide any “Page Break” mark or any other information related to it. Thus, we are afraid that a page break may not be identified from a PDF document. In case you are able to identify it using Adobe Acrobat then please mention the steps, so that we may investigate to help you out.

Thanks for your reply.

Currently I do not have any solution.

Is it possible to add any kind of hidden flag, bookmark or identification in Word file and read that in PDF? For example if a page do not have hard page/section break, we will insert a bookmark in that page in Word and create PDF using Aspose.Words or Word SaveAs functionality (whichever suits well for this requirement) and using Aspose.PDF identify the pages which has the bookmark and add required header text.

Thanks

@Sri79

We are checking the details from Aspose.Words perspective and will get back to you with our findings, soon. In case we need some further information regarding the scenario, we will be requesting for it accordingly.

@Sri79,

Thanks for your inquiry. We suggest you please read following article.
Find and Replace

Following code example shows how to find page break and section break and insert bookmark before the break. Hope this helps you.

Document doc = new Document(MyDir + "Doc1.docx");

FindReplaceOptions options = new FindReplaceOptions();
options.ReplacingCallback = new FindPageBreaks();

//Find page break
doc.Range.Replace(new Regex("&m"), "", options);
//Find section break
doc.Range.Replace(new Regex("&b"), "", options);

doc.Save(MyDir + "18.5.docx");
PdfSaveOptions saveoptions = new PdfSaveOptions();
saveoptions.OutlineOptions.DefaultBookmarksOutlineLevel = 1;
doc.Save(MyDir + "18.5.pdf", saveoptions);

private class FindPageBreaks : IReplacingCallback
{
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        DocumentBuilder builder = new DocumentBuilder((Document)e.MatchNode.Document);
        builder.MoveTo(e.MatchNode);
        builder.StartBookmark("PageBreak" + mMatchNumber);
        builder.EndBookmark("PageBreak" + mMatchNumber);
        mMatchNumber++;
        return ReplaceAction.Skip;
    }

    private int mMatchNumber;
}