How to get all the section headers and Create Bookmarks in an existing PDF?

Hi there,

I have a requirement to create bookmarks in an existing PDF document. However, the current bookmarkeditor.createbookmarks() method would simply create bookmarks for all the pages. Instead, i would like to take all the section/page headers and convert them into bookmarks. Is this possible? Appreciate your help!

@sanjaybk

You can create bookmark for a particular page and create bookmark for a range of pages using Aspose.PDF. For more detail, please refer to the following articles.
Create Bookmark of a Particular Page
Create Bookmarks of a Range of Pages

@tahir.manzoor,
Thank you very much for the response. The links you shared will help to create new bookmarks in a particular or range of pages. Below is my requirement.

  1. I have a PDF page with no bookmarks set.
  2. I need to create new bookmarks on existing PDF. However it is not necessarily in a particular page or range of pages.
  3. For instance, this PDF has 10 pages and page 2, 4 & 6 has a section/paragraph headings.
  4. Now, can i programmatically find those section/paragraph headers and then mark those as new bookmarks?

Thank you!

@sanjaybk

Could you please attach your input and expected output PDF files? We will then provide you more information about your query along with code example.

@tahir.manzoor,
Thank you. I have attached a single PDF document for your reference.
Below is my requirement.
For this document, i need to find the section/paragraph headings and mark them as bookmarks. My preferred output of this document should be the same page with below bookmarks and levels.

LITIGATION (Level 1)
Issuer (Level 2)
Concessionaire (Level 2)
City and Parking Authority (Level 2)
LEGAL MATTERS (Level 1)
TAX EXEMPTION AND TAX MATTERS (Level 1)
Federal Tax Exemption for 2016A, 2016C and 2016D Bonds (Level 2)

Thank you!Testing.pdf (89.1 KB)

@sanjaybk

We are working over your query and will get back to you soon.

@sanjaybk

Following code example shows how to find the text and insert the bookmark for searched text. Hope this helps you.

// Open document
Document pdfDocument = new Document(dataDir + "input.pdf");

// Create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("Your Text");
// Accept the absorber for all the pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
// Get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

Aspose.Pdf.Facades.PdfBookmarkEditor editor = new Aspose.Pdf.Facades.PdfBookmarkEditor();
editor.BindPdf(pdfDocument);
int bm = 1;
// Loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
    foreach (TextSegment textSegment in textFragment.Segments)
    {
        Aspose.Pdf.Facades.Bookmark bookmark = new Aspose.Pdf.Facades.Bookmark();
        bookmark.PageNumber = textFragment.Page.Number;
        bookmark.Title = "bookmark" + bm; bm++;
        bookmark.PageDisplay_Top = (int)(textFragment.Page.MediaBox.Height - textFragment.Position.YIndent - textFragment.Rectangle.Height);
        bookmark.PageDisplay_Left = (int)textFragment.Rectangle.LLX;

        editor.CreateBookmarks(bookmark);
    }
}

pdfDocument.Save(MyDir + "output.pdf");

@tahir.manzoor,
Thanks. This helps.
Follow up question:
I have a PDF document with 10 bookmarks. When i try to create a new bookmark for page 2, it is added to the last index of the bookmarks list. Is there a way to add a new bookmark in the middle of an existing bookmarks index? Or is it always possible to insert new bookmarks to the end of the list only?

@sanjaybk

Could you please attach your input, problematic and expected output PDF files here for testing? We will investigate the issue and provide you more information on it.

@tahir.manzoor,
I’m attaching a PDF file. In this document, there is a bookmark called “APPENDIX B - IDEA and the CIDEA Public_2019.pdf (2.7 MB)
harter schools” on page 107.
My requirement is to create a child bookmarks under this particular bookmark.
For example: ON page 111, there is section called “Future Expansion”, which i need to mark as a child bookmark under “APPENDIX B - IDEA and the Charter Schools” bookmark. Can you please help?

@sanjaybk

Please spare us some time to write the code example for your case. We will get back to you soon.

@tahir.manzoor,
Thanks for looking into it. I look forward to your findings.

@sanjaybk

In your case, we suggest you following solution.

  • Find the text from the PDF and get its page number
  • Iterate over bookmarks collection and get desired bookmark
  • Insert the child bookmark e.g. ‘Future Expansion’ and set its action as page number.

Following code example can be used to achieve your requirement.

Document pdfDocument = new Document(dataDir + "IDEA Public_2019.pdf");

TextFragmentAbsorber tfa = new TextFragmentAbsorber(new System.Text.RegularExpressions.Regex(@"... your regular expression for text e.g. Future Expansion"));
tfa.TextSearchOptions.IsRegularExpressionUsed = true;
int page = 1;
pdfDocument.Pages.Accept(tfa);
TextFragmentCollection tfc = tfa.TextFragments;
foreach (TextFragment tf in tfc)
{
    page = tf.Page.Number;
    break;
}

// Loop through all the bookmarks
foreach (OutlineItemCollection outlineItem in pdfDocument.Outlines)
{
    if (outlineItem.Count > 0)
    {
        // There are child bookmarks then loop through that as well
        foreach (OutlineItemCollection childOutline in outlineItem)
        {
            if (childOutline.Title == "APPENDIX B - IDEA and the Charter Schools")
            {
                // Create a bookmark object
                OutlineItemCollection pdfChildOutline = new OutlineItemCollection(pdfDocument.Outlines);
                pdfChildOutline.Title = "Future Expansion";
                // Set the destination page number
                pdfChildOutline.Action = new GoToAction(page);
                childOutline.Add(pdfChildOutline);
                break;
            }
        }
    }

}
pdfDocument.Save(MyDir + "output.pdf");

@tahir.manzoor, Thanks. I’ll try and let you know.

What is the page defined in the GoToAction?

Nevermind. I got it. Thank you!

@sanjaybk

GoToAction class represents a go-to action that changes the view to a specified destination (page, location, and magnification factor).