Extract Text from PDF in ASP.NET using Aspose.PDF - cannot read special characters

Hello,

We are using Aspose.PDF for ASP.NET Core and we could not extract/read special characters from a Croatian PDF.

Code we are using:

using Aspose.Pdf;
using Aspose.Pdf.Text;
using System.IO;

namespace AsposeTest
{
class Program
{
static void Main(string[] args)
{
AsposeConfiguration.ConfigureLicense();
string dataDir = @".\TestCases";
foreach (string file in Directory.EnumerateFiles(dataDir, “*.pdf”))
{
Document pdfDocument = new Document(file);
//create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();
//accept the absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);
//get the extracted text
string extractedText = textAbsorber.Text;
// create a writer and open the file
string fileName = Path.ChangeExtension(file, “txt”);
TextWriter tw = new StreamWriter(dataDir + “\PDFtoText\” + Path.GetFileName(fileName));
// write a line of text to the file
tw.WriteLine(extractedText);
// close the stream
tw.Close();
}
}
}
}

HR-Edited-7bf68347-7210-4305-9b5f-7992597ec3ce.pdf (65.8 KB)

I am attaching the PDF sample.

The text is exported from the Aspose Text absorber as following:

HR-Edited-7bf68347-7210-4305-9b5f-7992597ec3ce.zip (258 Bytes)

Thank you

@Panagiotis_Biris_gr_ey_com

I have observed your comments. For further investigation, can you please share which version of .NET core you are using.

Hi Adnan, Thank you for the prompt response. We are using .NET Core 2.2 .

@Panagiotis_Biris_gr_ey_com,

Thanks for sharing further details.

We have logged an investigation ticket as PDFNET-47922 in our issue tracking system. We will further look into details of it and keep you posted with the status of its resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.

Hi Adnan,

Do you have any update on this?

Thank you

@Panagiotis_Biris_gr_ey_com

We are afraid that earlier logged ticket is pending for analysis. Please note that it was logged under free support model and it will be resolved on first come first serve basis. However, we will surely inform you as soon as we have some definite updates regarding its resolution. Please spare us some time.

We are sorry for the inconvenience.