Hello,
My team is using the Aspose.PDF library for . NET to extract text from PDF files. We ran into the following exception while using the TextAbsorber to extract text page by page:
Aspose.Pdf.PdfException: Operand value is not a name
at #=zyt4T9KO7peVjkhq3xluWgWvLUxlueatyfBhU0$bz4ekX.#=zpVFbElM=()
at #=zhwl8667iwsEz6rze47bjzpYYwNEMl$3tLQG6InPVjqRbrrW5fXY$J94=.#=z6QD6iaDT30UG(Int32 #=zu_nAOcU=, Operator #=zXwUxPQE=)
at #=zhwl8667iwsEz6rze47bjzpYYwNEMl$3tLQG6InPVjqRbrrW5fXY$J94=.#=zbQhQKFg=(Page #=ztL8V05k=)
at #=z9W8OEM$p8$g7694whr8T0tKyLlpInzoY3I4MjFJfDkJbn$j9eoDDDq8FlOns.#=zwQHe3GexITOq(BaseOperatorCollection #=zWR6Slpk=, Resources #=za3NwiOk=, Page #=ztL8V05k=, Rectangle #=zsJwR5inyT$sP)
at #=z9W8OEM$p8$g7694whr8T0tKyLlpInzoY3I4MjFJfDkJbn$j9eoDDDq8FlOns.#=zwQHe3GexITOq(BaseOperatorCollection #=zWR6Slpk=, Resources #=za3NwiOk=, Rectangle #=zsJwR5inyT$sP)
at #=z9W8OEM$p8$g7694whr8T0tKyLlpInzoY3I4MjFJfDkJbn$j9eoDDDq8FlOns.#=zGK7Mmdc=(Boolean #=zf9_O69sVgPb0)
at #=z9W8OEM$p8$g7694whr8T0tKyLlpInzoY3I4MjFJfDkJbn$j9eoDDDq8FlOns..ctor(Page #=ztL8V05k=, TextSearchOptions #=zqQYmXUFMg2zg, Boolean #=zGVp0$i07r2iN)
at #=z9W8OEM$p8$g7694whr8T0tKyLlpInzoY3I4MjFJfDkJbn$j9eoDDDq8FlOns..ctor(Page #=ztL8V05k=, TextSearchOptions #=zqQYmXUFMg2zg)
at Aspose.Pdf.Text.TextAbsorber.Visit(Page page)
at Aspose.Pdf.Page.Accept(TextAbsorber visitor)
This is a snippet of our code for extracting text from PDFs:
public (string textContent, int pageCount) PdfToText(byte[] sourceBytes)
{
using (var inputStream = new MemoryStream(sourceBytes))
{
using (var document = new Aspose.Pdf.Document(inputStream))
{
int pageLimit = int.Min(ExtractionConfig.PdfNumberOfPages, document.Pages.Count);
TextAbsorber textAbsorber = new TextAbsorber();
if (!ExtractionConfig.ReadByPageNumber || ExtractionConfig.ReadPages == null || ExtractionConfig.ReadPages.Length == 0)
{
// Read pages up to the page limit starting from the beginning of the document
for (int i = 1; i <= pageLimit; i++)
{
document.Pages[i].Accept(textAbsorber);
}
}
else
{
// Read by page number if the user has selected specific pages to read
var length = document.Pages.Count;
var lengthInvert = length * -1; //waste some memory for speed of not excuting per page
var pages = ExtractionConfig.ReadPages!.Where(i => i < length && i > lengthInvert).ToArray();
for (int i = 1; i <= pages.Length; i++)
{
//index through the indexs and if any are negative invert them from the end
var trueIndex = (pages[i] >= 0 ? pages[i] : length + pages[i]) + 1;
document.Pages[trueIndex].Accept(textAbsorber);
}
}
return (textAbsorber.Text, document.Pages.Count);
}
}
}
Unfortunately, due to an NDA agreement, I cannot share the file that caused the exception. However, the file does open in a normal PDF viewer (Microsoft Edge) suggesting that the file is not corrupted.
The code extracts text from most PDFs, but we get the PDF exception on 3 files out of 1000 files.
Could I get some guidance on why I’m getting this PDF exception and how I can fix it please.
Thank you