Free Support Forum -

How to get heading tags a part of the Pdf to html conversion

We are currently using aspose.pdf to convert pdf to Html. But want to check, if there is an option to generate heading, italic, bold tags while converting pdf to Html. If I am converting the same pdf to Html using Microsoft word it generates the heading tags. Also is there a way for removing header and footer type content from the pdf before converting to Html?


Could you please share source PDF and output HTML files for our reference along with the sample code snippet? Also, please share the HTML generated from MS Word. We will test the scenario in our environment and address it accordingly.

You can surely search the text within a specified rectangle (header/footer) and remove it using TextFragmentAbsorber Class.

// instantiate TextFragment Absorber object
Aspose.Pdf.Text.TextFragmentAbsorber TextFragmentAbsorberAddress = new Aspose.Pdf.Text.TextFragmentAbsorber();
// search text within page bound
TextFragmentAbsorberAddress.TextSearchOptions.LimitToPageBounds = true;
// specify the page region for TextSearch Options
TextFragmentAbsorberAddress.TextSearchOptions.Rectangle = new Aspose.Pdf.Rectangle(0, page.PageInfo.Height - 72, page.PageInfo.Width, page.PageInfo.Height);
// search text from first page of PDF file

Thanks for the tip on removing the specific text using TextFragmentAbsorber Class.

I will provide the sample files and the code snippet in couple of days regarding the other issue.


Sure, please take your time to gather the material to share.