Is it possible to add custom style or can remove any style while converting pdf to html string using aspose

Hai
I am trying to convert a PDF document into html string having highlight annotations in it, while converting, the API splits text into two span and add “visibility:hidden” style to one of the span element, making text invisible in the html string, the resultant output (html file) has only highlighted color, the text is being hidden.

Is there any way to solve this, to remove that particular style or add any style which makes it to be overridden and to make text and highlighted color, both visible in the html file.

Any help will be appreciated. Thank you.

Hai
I am trying to convert a PDF document into html string having highlight annotations in it, while converting, the API splits text into two span and add “visibility:hidden” style to one of the span element, making text invisible in the html string, the resultant output (html file) has only highlighted color, the text is being hidden.

Is there any way to solve this, to remove that particular style or add any style which makes it to be overridden and to make text and highlighted color, both visible in the html file.

Any help will be appreciated. Thank you.

@pooja.jayan

Could you please ZIP and attach your input PDF along with problematic output HTML and expected output HTML here for testing? We will investigate the issue and provide you more information on it.

Hai,

PDF File is : whitepaper.pdf (335.7 KB)

Code I have used:
int index=0;
byte[] byteData = null;
int pageCount = doc.Pages.Count;
for (int page = 0; page < pageCount; page++)
//foreach (Page page in pdfFile.Pages)
{
using (MemoryStream pageStream = new MemoryStream())
{
// Save each page as a separate document.
//Page extractedPage = page;
Aspose.Pdf.Document extractedPage = new Aspose.Pdf.Document();
extractedPage.Pages.Add(doc.Pages[page + 1]);
HtmlSaveOptions htmlOptions = new HtmlSaveOptions();

            htmlOptions.FixedLayout = true;
            htmlOptions.PartsEmbeddingMode = Aspose.Pdf.HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
            htmlOptions.RasterImagesSavingMode = Aspose.Pdf.HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
            htmlOptions.RemoveEmptyAreasOnTopAndBottom = true;
            htmlOptions.SplitIntoPages = false;
            htmlOptions.SplitCssIntoPages = false;
            string cssprefix = "aspose_pdf" + page;
            htmlOptions.CssClassNamesPrefix = cssprefix;
            //htmlOptions.HtmlMarkupGenerationMode = 
            Aspose.Pdf.HtmlSaveOptions.HtmlMarkupGenerationModes.WriteAllHtml;

            extractedPage.Save(pageStream, htmlOptions);
            //pdfFile.Save(pageStream, htmlOptions);
            var pageBytes = pageStream.ToArray();


            if ((pageNumber == 0) & (page == 0))
            {
                byteData = pageBytes;
            }
            if (pageNumber == page + 1)
            {
                byteData = pageBytes;
            }
        }
    }
    string HtmlString = byteData.ProcessHtml();
     File.WriteAllText(path + index + ".html", HtmlString);
     index++;

and the result I am getting is : output.PNG (62.8 KB)

Please have a look at this html page
PageHtml.PNG (47.5 KB)

I want to remove the style “visibility:hidden” being added when converted pdf to html

@pooja.jayan

We have tested the scenario using the latest version of Aspose.PDF for .NET 21.11 and have not found the shared issue. So, please use Aspose.PDF for .NET 21.11. We have attached the output HTML of 2nd page with this post for your kind reference.
page 2 html.zip (70.0 KB)

Hai,
I request you to try converting this document PDF_with_Highlighted_Text.pdf (299.6 KB) to html

@pooja.jayan

You can remove the highlighted annotations using following code example. Hope this helps you.

Document pdfDocument = new Document(MyDir + "PDF_with_Highlighted_Text.pdf");

HtmlSaveOptions htmlOptions = new HtmlSaveOptions();
foreach (Page page in pdfDocument.Pages)
{
    foreach (Annotation annotation in page.Annotations)
    {
        if (annotation is HighlightAnnotation)
        {
            page.Annotations.Delete(annotation);
        }
    }
}
            
pdfDocument.Save(MyDir + "21.11.html", htmlOptions);

Hai,

I dont want to delete highlight annoattion, I wanted to see the highlighted text and color even in the converted html also, deleting removes it right? I dont want that.

@pooja.jayan

This issue was already logged in our issue tracking system as PDFNET-50941 for you. You will be notified via your other thread once this issue is resolved.