Hi,
I’m checking whether a document contains only images:
public static bool HasOnlyImages(Aspose.Pdf.Document document)
{
for (int page = 1; page <= document.Pages.Count; page++)
{
//create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();
document.Pages[page].Accept(textAbsorber); // Exception occurs on this line
string extractedText = textAbsorber.Text;
extractedText = Regex.Replace(extractedText, @"[\s\r\n]+", " ");
//get the extracted text
if (extractedText != String.Empty)
{
return false;
}
}
return true;
}
When this runs on the attached PDF, it generates an exception within the textAbsorber :
12-Dec-2015 11:02:44 Extracting text
12-Dec-2015 11:02:44 Rendering error: System.NullReferenceException: Object reference not set to an instance of an object.
12-Dec-2015 11:02:44 at ?.?.?(Operator )
12-Dec-2015 11:02:44 at ?.?.Parse()
12-Dec-2015 11:02:44 at ?.?.(BaseOperatorCollection , Resources , Page )
12-Dec-2015 11:02:44 at ?.?.(BaseOperatorCollection , Resources )
12-Dec-2015 11:02:44 at ?.?.()
12-Dec-2015 11:02:44 at Aspose.Pdf.Text.TextAbsorber.Visit(Page page)
12-Dec-2015 11:02:44 at T1.Rendering.Renderer.RenderUtils.HasOnlyImages(Document document)
12-Dec-2015 11:02:44 Rendering error: System.NullReferenceException: Object reference not set to an instance of an object.
12-Dec-2015 11:02:44 at ?.?.?(Operator )
12-Dec-2015 11:02:44 at ?.?.Parse()
12-Dec-2015 11:02:44 at ?.?.(BaseOperatorCollection , Resources , Page )
12-Dec-2015 11:02:44 at ?.?.(BaseOperatorCollection , Resources )
12-Dec-2015 11:02:44 at ?.?.()
12-Dec-2015 11:02:44 at Aspose.Pdf.Text.TextAbsorber.Visit(Page page)
12-Dec-2015 11:02:44 at T1.Rendering.Renderer.RenderUtils.HasOnlyImages(Document document)
It works for many other PDF’s. I can’t see anything wrong or special with the PDF that fails.
Is this an Aspose bug, or is there something wrong with this PDF?
Thanks,
Martin