Using Aspose.Pdf for .NET 21.3, we extract text from a certain PDF (please refer to the attached PDF) using the following code:
//open input PDF
PdfExtractor pdfExtractor = new PdfExtractor();
pdfExtractor.BindPdf(inputFilePath);
//use parameterless ExtractText method
pdfExtractor.ExtractText();
pdfExtractor.GetText(outputFilePath);
The email address in the top right of the first page is extracted by Aspose.Pdf as pamela.merchant@doj.ca.gov
without any spaces.
Other PDF libraries extract the email address as pamela.merchant @doj.ca.gov
with a space before the @
character. This space is visible in the PDF.
Could you please investigate whether it is possible to make the behaviour of Aspose.Pdf consistent with other libraries in situations like this?
2403.pdf (750.3 KB)