Generating Table of Content

akozan · July 14, 2017, 1:44am

We are currently evaluating aspose.pdf for our company and we are currently testing it by generating a pdf with html (Html to PDF). The objective is to create a table of content but we dont know the page each section ended up after conversion. Is it possible to find out on which page each html h1, h2, h3 tags end up when the html is converted to pdf in order to create the TOC afterwards?

With the Document() i can see how many pages were generated… but how can i check on which page the h1, h2, h3 tags ended up and what value they contain?

Thank you in advance

imran.rafique · July 14, 2017, 9:47am

@akozan,
Thank you for contacting support. You can import an HTML document in the Aspose.Pdf API, and insert a blank page to place the table of contents, add text stamps, create local links, and then save this document in the PDF format. Please refer to this help topic: Concatenate PDF files and create Table Of Contents.

You will discard the concatenation part of the PDF documents. Please let us know in case of any confusion or questions.

Best Regards,
Imran Rafique

akozan · July 14, 2017, 12:09pm

Thank you for replying!

The issue is that i dont know where my different <h1>Heading</h1> will end up. Since its a dynamically generated report and if the user inputs 100 items… the report might have 10 pages and i might have a heading on page 1, 4 and 8… how can i look for my <h1>Heading</h1> and then know the pages so i can build my table of content?

The only thing i can think of is:

convert html to pdf… this will generate a pdf with lets say 10 pages
then for each page convert it back to html, look for on each page <h1>Heading</h1> and store it in memory
then create table of content and insert it in page 1

Is there a better way to do this?

Thanks again

imran.rafique · July 14, 2017, 8:36pm

@akozan,
There is no way to link text stamps with h1 tag values with Aspose.Pdf API. However, you can achieve this with Aspose.Words API because Microsoft Office Word has a feature to customize table of contents with switches. In order to get more understanding of switch, please refer to this help topic: TOC Switches

[C#]

Aspose.Words.Document document = new Aspose.Words.Document(@"C:\Pdf\test153\input.html");
Aspose.Words.DocumentBuilder builder = new Aspose.Words.DocumentBuilder(document);

// Insert a table of contents at the beginning of the document.
builder.InsertTableOfContents("\\o \"1-1\" \\h \\z \\u");

// The newly inserted table of contents will be initially empty.
// It needs to be populated by updating the fields in the document.
document.UpdateFields();
document.Save(@"C:\Pdf\test153\Output.pdf", Aspose.Words.SaveFormat.Pdf);

Please refer to the input HTML and output PDF documents: inputHTML.zip (427 Bytes) and Output.pdf (49.8 KB)

Best Regards,
Imran Rafique

muteakiben · July 3, 2019, 5:40am

Can we reach to attached files on the answer? I got the same pain in my project. Thanks!

Farhan.Raza · July 3, 2019, 4:11pm

@muteakiben

Thank you for posting.

Generally the attachments are accessible to thread owner and Aspose staff in order to ensure data privacy and security. However, the data here is a sample created by us which does not contain any confidential contents. Therefore, you may download it for your kind reference and create a separate topic if you still need any assistance.

HTMLtoPDF.zip