Text extraction issue - missing space character

Using Aspose.Pdf for .NET 21.3, we extract text from a certain PDF (please refer to the attached PDF) using the following code:

        //open input PDF
		PdfExtractor pdfExtractor = new PdfExtractor();
		pdfExtractor.BindPdf(inputFilePath);

		//use parameterless ExtractText method
		pdfExtractor.ExtractText();

		pdfExtractor.GetText(outputFilePath);

The email address in the top right of the first page is extracted by Aspose.Pdf as pamela.merchant@doj.ca.gov without any spaces.

Other PDF libraries extract the email address as pamela.merchant @doj.ca.gov with a space before the @ character. This space is visible in the PDF.

Could you please investigate whether it is possible to make the behaviour of Aspose.Pdf consistent with other libraries in situations like this?

2403.pdf (750.3 KB)

@edtsoftware

We were able to notice the issue in our environment that API extracted the text “pamela.merchant@ doj.ca.gov” without any space. Therefore, we have logged an issue as PDFNET-49663 in our issue tracking system. We will further look into its details and keep you posted with the status of its rectification. Please be patient and spare us some time.

We are sorry for the inconvenience.