HTML to PDF with bookmarks x

learnstyle · September 13, 2022, 11:08pm

I can’t seem to find any guidance in the documentation. When exporting HTML to PDF, is there a way to set bookmarks with specific destinations automatically?
Ideally I’d like to target any H1,H2 elements from the html source and have those converted to targeted bookmarks, but if that isn’t an option, how can I dig through the PDF manually to enter those?

asad.ali · September 14, 2022, 3:58am

@learnstyle

Could you please share your sample HTML for our reference? Does it have TOC links working already in it? We will check it and share our feedback with you accordingly.

learnstyle · September 14, 2022, 3:42pm

Certainly, sample content is attached here.
I put the css inline rather than linking to the sheet so you have it here, but all of that comes out fine.

And no, I don’t have a TOC in it yet.

SampleContent.zip (11.4 KB)

asad.ali · September 14, 2022, 10:03pm

@learnstyle

In your case, you can please use the example of Adding TOC in an Existing PDF or Creating Bookmarks. First you need to convert your HTML into PDF in order to do that. You will not be able to extract headings from the PDF using API because there is no such thing in the PDF format. It does not contain any separate definition for heading element. All the content is present in the form of simple text. You can use a predefined list of the headings and use it to generate TOC/Bookmarks in the PDF. In case you face any issues, please let us know.

learnstyle · September 14, 2022, 10:25pm

Thanks for the note. So the TOC isn’t really applicable, as I don’t want a TOC on the doc.
The bookmarks link I did go through already before submitting the ticket, but the docs are lacking a lot of info.

I need the bookmarks more specific. Running the examples, it’s easy to add bookmarks per page, but I might have 3 or 4 of them on one single page.
From my limited understanding, I think I need to use “Destinations” in order to do that, and tie the bookmark to the specific Destination, but I couldn’t find anything in the docs on how to do that.

Can you point me in the right direction?
I would need to parse through each text item on each page to look for the content I’m trying to tag, and then add that as a destination (how?) and then link a bookmark to that destination?

And just to verify (sanity checks), there’s nothing differnet I can do in the markup to have H1,H2 tags automatically mapped to bookmarks, that just not something the product supports, correct?

asad.ali · September 15, 2022, 6:06am

@learnstyle

Yes, you are right.

Furthermore, it would be helpful for us to create a sample code example for your requirement if you can please share an expected output PDF for our reference as well. We will try to create the file using code in our environment and share that code with you.

learnstyle · September 15, 2022, 12:37pm

Certainly, file is attached. Bookmark placement might be slightly off as I was just doing this scrolling through Acrobat, but you should get the idea.
Basically from the markup in that Sample zip I sent, every H2 element in there (the bold items underneath the dark black lines in the PDF) have bookmarks on the side taking them to the target place.IEP (Abbott, Cleo).pdf (812.8 KB)

asad.ali · September 15, 2022, 8:54pm

@learnstyle

Please try to use the below code snippet in order to achieve your expected output PDF:

var document = new Document(dataDir + "SampleContent.html", new HtmlLoadOptions());
Facades.PdfBookmarkEditor pdfBookmarkEditor = new Facades.PdfBookmarkEditor();
pdfBookmarkEditor.BindPdf(document);
pdfBookmarkEditor.DeleteBookmarks();
List<string> listHeadings = new List<string>();
listHeadings.Add("REASON FOR DEVELOPING THE IEP");
listHeadings.Add("STUDENT PROFILE");
// you can add more items in the list
foreach(var heading in listHeadings)
{
 foreach(var page in document.Pages)
 {
  var textFragmentAbsober = new TextFragmentAbsorber(heading);
  page.Accept(textFragmentAbsober);
  foreach(var textFragment in textFragmentAbsober.TextFragments)
  {
   // Add bookmark
   Facades.Bookmark bookmark = new Facades.Bookmark();
   bookmark.Title = heading;
   bookmark.PageNumber = textFragment.Page.Number;
   bookmark.PageDisplay_Top = (int)page.GetPageRect(true).Height - (int)textFragment.Rectangle.URY;
   pdfBookmarkEditor.CreateBookmarks(bookmark);
  }
 }
}
pdfBookmarkEditor.Save(dataDir + "outputPDF.pdf");

outputPDF.pdf (241.4 KB)

learnstyle · September 16, 2022, 2:45pm

Thanks for the snippets, that gives me a starting point. Unfortunately this method simply isn’t viable.
First, the text I’m looking for is going to be dynamic, so I can’t count on pre-defining the strings I’m looking for.
Second, this would match ANY occurrence of the phrase on the page, not just what was in the header element. So I might end up rendering bookmarks on similar text in the middle of a paragraph.

I was very much hoping to avoid using yet ANOTHER component to parse the html first to look for the H2 tags, but sadly that looks like it might be my only option.
I might just have to consider this a “not possible with this product”.

Since you’re already familiar with this particular form, can I ask you about one more issue?
The bottom of the form where there’s inputs, only type="text is rendering. In between the last two rendered input elements, there’s a type=“date” and type=“number”.
I would have expected these to render as inputs into the PDF, with the Format specified accordingly. But instead they’re just rendering as plain text with a drawn border, not an input field at all.
Is there something I’m doing wrong with that?
Or again do I have to parse the HTML myself first and change everything to a different format?

Thanks

asad.ali · September 16, 2022, 7:44pm

@learnstyle

Aspose.PDF is specialized to deal with only PDF format. It does not offer any capability to parse HTML files. You can however use any other library like Aspose.HTML or perform string operations to extract heading values before creating bookmarks.

Regarding input fields, we checked the output and noticed the same issue. It has been logged as PDFNET-52573 in our issue tracking system. We will let you know as soon as it is resolved. Please spare us some time.

We are sorry for the inconvenience.

aspose.notifier · October 20, 2022, 3:49pm

The issues you have found earlier (filed as PDFNET-52573) have been fixed in Aspose.PDF for .NET 22.10.