PDF 2 HTML Error

Hello,


I am getting “Index Out of Bound” error while replace email address in PDF file and then save it to html format.

I am using below code :

//open document
Document pdfDocument = new Document(“input.pdf”);
//create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("\d{4}-\d{4}"); //like 1999-2000
//set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
//accept the absorber for all the pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
//get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
//loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
//update text and other properties
textFragment.Text = “New Phrase”;
set to an instance of an object.
textFragment.TextState.Font = FontRepository.FindFont(“Verdana”);
textFragment.TextState.FontSize = 22;
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Blue);
textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Green);
}
  HtmlSaveOptions htmlOptions = new HtmlSaveOptions();
htmlOptions.FixedLayout = true;
htmlOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsTTF;
htmlOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;

pdfDocument.Save(“output.html”,htmlOptions);

Please reply As soon as possible.

Thanks,

Jignesh Chauhan

Hi Jignesh,


Thanks for using our API’s.

I have tested the scenario using one of my sample PDF files and I am unable to notice any problem. Can you please share the source/input PDF file causing this problem so that we can further look into this matter.

We are sorry for this inconvenience.

Hello,


Please find attached Pdf file in which I got error. When I save this as PDF then it works perfectly and also email address is replaced with xxxxxx.

And also If I try to save original Pdf with out replacing email address as Html then also it works fine. Can you please check below code and advise me, where I made mistake.

Code :
Document pdfDocument = new Document(@“D:\EmailDemo.pdf”);
TextFragmentAbsorber textFragmentAbsorberForEmail = new TextFragmentAbsorber(@"\w+([-+.’]\w+)@\w+([-.]\w+).\w+([-.]\w+)*");
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorberForEmail.TextSearchOptions = textSearchOptions;
pdfDocument.Pages.Accept(textFragmentAbsorberForEmail);
TextFragmentCollection textFragmentCollection = textFragmentAbsorberForEmail.TextFragments;
//loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
textFragment.Text = “xxxxxxxxxxxxxx”;
textFragment.TextState.Font = FontRepository.FindFont(“Verdana”);
textFragment.TextState.FontSize = 12;
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Black);
textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Black);
}
HtmlSaveOptions htmlOptions = new HtmlSaveOptions();
htmlOptions.FixedLayout = true;
htmlOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsTTF;
htmlOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;

pdfDocument.Save(@“D:\EmailDemo.html”, htmlOptions);


Thanks,

Jignesh Chauhan

Hi Jignesh,

Thanks for your sharing code ad source file. We have tested the scenario and unable to replicate the issue while applying a valid license file. If you are evaluating our product without a valid license then please make request a 30 days temporary license otherwise double check whether your license is not expired or properly applied as suggest in the licensing document. Applying a valid license will fix your issue.

Please feel free to contact us for any further assistance.

Best Regards,

Hi Jignesh,


Adding more to Tilal’s comments, the IndexOutOfRangeException - Index was outside the bounds of the array, occurs when using Aspose.Pdf for .NET in trial mode. However when using a valid license, the conversion is performed without any issue.