How to retrieve data between two bookmarks

Hi Team,

Could you please share the code snippet to retrieve the data (without missing format of the data like font, bold, italik etc., and images if any it contains) between two given bookmarks.
We have Aspose.PDF license.

Thanks,
PullaReddy
Sr. Manager
Freyr Global Regulatory Solutions and Services, Hyderabad

@gc.pullareddy

Would you please share sample PDF document with us and specify the bookmarks between which you want to extract the data. We will test the scenario in our environment and address it accordingly.

Hi Asad Ali,

Thanks for your quick response.

Please find attached PDF document for your reference and find below requirements to extract content from the PDF doc
MyPDF.pdf (849.8 KB)

  1. Extract Text under the bookmark “DOSAGE AND ADMINISTRATION” ( until before the bookmark “DOSAGE FORMS AND STRENGTHS” - Page 1)
  2. Extract Text and Images under the bookmark “2.5 Intraventricular Infusion Procedure” ( until before the bookmark “3 DOSAGE FORMS AND STRENGTHS” - Page 4 to 9)
  3. Extract Text and Table data under the bookmar “12.3 Pharmacokinetics” ( until before the bookmark “3 DOSAGE FORMS AND STRENGTHS” - Page 15)

Thanks,
PullaReddy

@gc.pullareddy

We are looking into your requirements and will get back to you shortly.

Hi Asad Ali,

Please let me now if you find a solution for given use cases in my previous requests.

Thanks,
PullaReddy

@gc.pullareddy

We tried to extract the information based on your requirements using following code snippet.

Aspose.Pdf.Document document = new Aspose.Pdf.Document(dataDir + "MyPDF.pdf");
// Create PdfBookmarkEditor
Facades.PdfBookmarkEditor bookmarkEditor = new Facades.PdfBookmarkEditor();
bookmarkEditor.BindPdf(document);
// Extract bookmarks
Aspose.Pdf.Facades.Bookmarks bookmarks = bookmarkEditor.ExtractBookmarks();
double sbottom = 0, stop = 0, sleft = 0, sright = 0, ebottom = 0, etop = 0, eleft = 0, eright = 0;
foreach (Aspose.Pdf.Facades.Bookmark bookmark in bookmarks)
{
 if(bookmark.Title.ToLower().Equals("dosage and administration"))
 {
  sbottom = bookmark.PageDisplay_Bottom;
  stop = bookmark.PageDisplay_Top;
  sleft = bookmark.PageDisplay_Left;
  sright = bookmark.PageDisplay_Right;
 }
 if(bookmark.Title.ToLower().Equals("dosage forms and strengths"))
 {
  ebottom = bookmark.PageDisplay_Bottom;
  etop = bookmark.PageDisplay_Top;
  eleft = bookmark.PageDisplay_Left;
  eright = bookmark.PageDisplay_Right;
 }
}
// could not test below code as above values were zero
TextFragmentAbsorber absorber = new TextFragmentAbsorber();
absorber.TextSearchOptions.LimitToPageBounds = true;
absorber.TextSearchOptions.Rectangle = new Rectangle(eleft, ebottom, sright, stop);
document.Pages.Accept(absorber);
foreach(TextFragment tf in absorber.TextFragments)
{

}

We noticed that API did not extract necessary information from bookmarks in the shared PDF. We further need to investigate whether your requirements are feasibile or not and for the purpose, we have logged an investigation ticket as PDFNET-48387 in our issue tracking system. We will further look into its details and keep you posted with the status of its resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.