Highlight Text from PDF


#1

Hello,
I’m writing because we are implementing a functionality to highlight text in the pdf, we are using AWS Serverless Lambda architecture, locally the function works just fine, when running on a windows container, however, when deploying the lambda into AWS, which is running on Linux (out of our control) we are getting a null reference exception.
We wanted to check with you if there is a solution for this behavior, or some particular dll that could be missing when running on Linux that needs to be included.
Thanks in advance,
Javier


#2

@javcavallo

Thank you for contacting support.

Would you please share a narrowed down sample application along with complete stack trace of the exception. Also mention if it occurs on each execution or randomly. Before sharing requested data, please ensure using latest version of the API.

Moreover, Aspose.PDF is compatible with Linux as well so no additional DLL is required to make it work.


#3

Hi,
We are working on a serverless function (AWS Lambda) using C# .Net Core, which purpose is to receive a list of string and highlight the words matching in a PDF.

Bellow is the stack trace we got every time when trying to save the document. It’s worth to mention that when running the same exact code on a Windows machine, it works just fine, the issue it’s only when deploying to AWS which runs on linux and we have no control over it.

Object reference not set to an instance of an object.: NullReferenceException
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zzn7krZSD6h8Ttf4Q3w==(Object #=zRtTBZNQ=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zJ4DlEF1xfWjv4knxCw==()
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zI0T5f01BvpGejHLBVnmFrls=(Object #=zRtTBZNQ=, UInt32 #=zrb$csX4=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zZp0c6siqExNQpyUT83XZVc4N818nti5pPrLIMJ71vLni(Boolean #=zRtTBZNQ=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zZp0c6siqExNQpyUT83XZVc4N818nti5pPrLIMJ71vLni(Boolean #=zRtTBZNQ=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zKLMd7Ur$Y2KrT2TZ7uTBbusdf8IQ()
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zAJjZnHNAFUq_FVFlPpZyLPkwGpzm4gfoo6ehzks=(Object[] #=zRtTBZNQ=, Type[] #=zrb$csX4=, Type[] #=zAl_FPn8=, Object[] #=zZlirkcE=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zDmxaysj$oANjFhN5zDgj2AQ$scvnvVYryg==(Stream #=zRtTBZNQ=, String #=zrb$csX4=, Object[] #=zAl_FPn8=, Type[] #=zZlirkcE=, Type[] #=z0ajIK14=, Object[] #=zS2TweAE=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zr8xbGFB3C1sl88g8OJxBIcA1lRqTZzrgyg==(Stream #=zRtTBZNQ=, String #=zrb$csX4=, Object[] #=zAl_FPn8=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=ziiMXcF3Y0FFaxbWw_EiGccE=(Stream #=zRtTBZNQ=, String #=zrb$csX4=, Object[] #=zAl_FPn8=)
at Aspose.Pdf.Document.#=z34wha_tMYW5e(Stream #=z$$huN9A=, SaveOptions #=zn9OMIRURwZxW)
at Aspose.Pdf.Document.#=zB0WnsfwY8HBL(Stream #=zz9$Z3WAZIg0O, SaveOptions #=z$cDUb3k=)
at Aspose.Pdf.Document.Save(Stream outputStream, SaveFormat format)
at CvStorage.Api.GetPdfHighlighted.AsposePdfHelper.ApplyHighlighting(Document document, IEnumerable1 toHighlight, Color yellow) at CvStorage.Api.GetPdfHighlighted.AsposePdfHelper.GetPdfDoc(Stream cvStream, IEnumerable1 wordsToHighlight)
at CvStorage.Api.GetPdfHighlighted.Processor.Process(GetPdfHighlightedRequest request)
at CvStorage.Api.Functions.BaseApiGatewayFunction`1.FunctionHandler(APIGatewayProxyRequest apiProxyEvent, ILambdaContext context)
at lambda_method(Closure , Stream , Stream , LambdaContextInternal )

We are also using the latest version from aspose.pdf.


#4

@javcavallo

Thank you for further details.

Please also share narrowed down sample application which replicates the behavior of working locally but causing issue when deployed to AWS, for our reference.


#5

Hi Farhan,
Please find bellow a sample code of what we are trying to do:

private static Stream ApplyHighlighting(Document document, IEnumerable<string> toHighlight, Color yellow)
{
var enumerableKeyWords = toHighlight.ToList();
var pattern = GetRegex(enumerableKeyWords);
var searchOptions = new TextSearchOptions(true);
var tfa = new TextFragmentAbsorber(pattern, searchOptions);
document.Pages.Accept(tfa);

foreach (var textFragment in tfa.TextFragments)
{
	var ha = new HighlightAnnotation(textFragment.Page, textFragment.Rectangle) { Color = yellow };
	textFragment.Page.Annotations.Add(ha);
}           

var outputStream = new MemoryStream();
document.Save(outputStream, SaveFormat.Pdf);
return outputStream;

}

hope this helps,
regards!


#6

@javcavallo

Thank you for contacting support.

We have logged an investigation ticket with ID PDFNET-46751 in our issue management system. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.


#7

I’m also interested in the resolution. I’m getting the same error when deployed to a docker container running Linux. I do not get the error when running on windows.

[Error] ProcessPdfDocument:Object reference not set to an instance of an object.
at #=zbcNZrqYsqV0xB75o_htRduS31S47.#=z6jnc$1k=(#=zL7$cu8NDyzD$_BcIDMWqN27Bdyqx #=zQZkYvoGxj19I)
at #=zymTjhRurWmk$CHAOQY5vorI=.#=z0$DLZYM=(#=zL7$cu8NDyzD$_BcIDMWqN27Bdyqx #=zQZkYvoGxj19I)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zzn7krZSD6h8Ttf4Q3w==(Object #=zRtTBZNQ=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zDj9CnFQFjWhKuN50xg==(MethodBase #=zRtTBZNQ=, Boolean #=zrb$csX4=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zQQnBITmHxY25IHSch35cnHVaidyyYXLoGdZygIfcVzUG(#=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0= #=zRtTBZNQ=, #=qxfRroPYkt4WdkW4c21LPLUBKwUxCQweFdfXIf0x8HQo= #=zrb$csX4=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zPXH0_F11LpyLFXgW_SSczOs=()
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zZp0c6siqExNQpyUT83XZVc4N818nti5pPrLIMJ71vLni(Boolean #=zRtTBZNQ=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zzn7krZSD6h8Ttf4Q3w==(Object #=zRtTBZNQ=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zJ4DlEF1xfWjv4knxCw==()
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zDLIpilk2jVAGhQdKADAZk6Kfj$fJyINDzCiuzz0=(#=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0= #=zRtTBZNQ=, #=qxfRroPYkt4WdkW4c21LPLUBKwUxCQweFdfXIf0x8HQo= #=zrb$csX4=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zPXH0_F11LpyLFXgW_SSczOs=()
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zZp0c6siqExNQpyUT83XZVc4N818nti5pPrLIMJ71vLni(Boolean #=zRtTBZNQ=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zzn7krZSD6h8Ttf4Q3w==(Object #=zRtTBZNQ=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zJ4DlEF1xfWjv4knxCw==()
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zI0T5f01BvpGejHLBVnmFrls=(Object #=zRtTBZNQ=, UInt32 #=zrb$csX4=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zZp0c6siqExNQpyUT83XZVc4N818nti5pPrLIMJ71vLni(Boolean #=zRtTBZNQ=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zZp0c6siqExNQpyUT83XZVc4N818nti5pPrLIMJ71vLni(Boolean #=zRtTBZNQ=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zKLMd7Ur$Y2KrT2TZ7uTBbusdf8IQ()
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zAJjZnHNAFUq_FVFlPpZyLPkwGpzm4gfoo6ehzks=(Object[] #=zRtTBZNQ=, Type[] #=zrb$csX4=, Type[] #=zAl_FPn8=, Object[] #=zZlirkcE=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zDmxaysj$oANjFhN5zDgj2AQ$scvnvVYryg==(Stream #=zRtTBZNQ=, String #=zrb$csX4=, Object[] #=zAl_FPn8=, Type[] #=zZlirkcE=, Type[] #=z0ajIK14=, Object[] #=zS2TweAE=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=zr8xbGFB3C1sl88g8OJxBIcA1lRqTZzrgyg==(Stream #=zRtTBZNQ=, String #=zrb$csX4=, Object[] #=zAl_FPn8=)
at #=qjPK8N0BenBUYGxJcFiD7icMfuVh16X_er4Or8iY$7H0=.#=ziiMXcF3Y0FFaxbWw_EiGccE=(Stream #=zRtTBZNQ=, String #=zrb$csX4=, Object[] #=zAl_FPn8=)
at Aspose.Pdf.Document.#=z34wha_tMYW5e(Stream #=z$$huN9A=, SaveOptions #=zn9OMIRURwZxW)
at Aspose.Pdf.Document.Save(Stream output)


#8

@shernandez068

Thank you for contacting support…

We have recorded your concerns and will let you know as soon as some significant update will be available in this regard.


#9

I was able to resolve the error by copying all the windows fonts to the docker (Linux) container and adding the path to the Aspose PDF code:
Aspose.Pdf.Text.FontRepository.Sources.Add(new Aspose.Pdf.Text.FolderFontSource("/usr/share/fonts"));


#10

@shernandez068

Thank you for your worthy feedback.

We are pleased to know it has been resolved.

@javcavallo

Would you please try the solution shared above and then let us know your findings.