We’re using Aspose.Pdf to extract all paragraph text from PDF files. Everything worked fine when developing on a Windows 10 workstation but when I deploy the application to our server running Ubuntu an Object reference not set to an instance of an object error when I use the ParagraphAbsorber on the document.
using (var stream = new MemoryStream(FileBytes))
{
using (var pdfDocument = new Document(stream))
{
var absorber = new ParagraphAbsorber();
absorber.Visit(pdfDocument);
foreach (var pageMarkup in absorber.PageMarkups)
{
foreach (var markupSection in pageMarkup.Sections)
{
foreach (var paragraph in markupSection.Paragraphs)
{
// Extract paragraph text
}
}
}
}
}
Stack trace of the error:
Object reference not set to an instance of an object.
at .(Operator )
at ()
at .(BaseOperatorCollection , Resources , Page )
at .(BaseOperatorCollection , Resources )
at .()
at Aspose.Pdf.Text.TextFragmentAbsorber.Visit(Page page)
at Aspose.Pdf.Text.PageMarkup.(Page )
at Aspose.Pdf.Text.ParagraphAbsorber.Visit(Page page)
at Aspose.Pdf.Text.ParagraphAbsorber.Visit(Document doc)
at Jala.ProjectProcessor.Extractors.PdfExtractor.Extract()
Server specifications:
OS: Ubuntu 16.04.4 x64
.Net Core Runtime SDK installed: aspnetcore-runtime-2.1
Aspose.Pdf version: 18.8.0
I also attached a sample Pdf file that throws the same error.
Winged.pdf (49.7 KB)