Extract Text from PDF in ASP.NET using Aspose.PDF - cannot read special characters

Panagiotis_Biris_gr_ey_com · March 31, 2020, 1:30pm

Hello,

We are using Aspose.PDF for ASP.NET Core and we could not extract/read special characters from a Croatian PDF.

Code we are using:

using Aspose.Pdf;
using Aspose.Pdf.Text;
using System.IO;

namespace AsposeTest
{
class Program
{
static void Main(string[] args)
{
AsposeConfiguration.ConfigureLicense();
string dataDir = @".\TestCases";
foreach (string file in Directory.EnumerateFiles(dataDir, “*.pdf”))
{
Document pdfDocument = new Document(file);
//create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();
//accept the absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);
//get the extracted text
string extractedText = textAbsorber.Text;
// create a writer and open the file
string fileName = Path.ChangeExtension(file, “txt”);
TextWriter tw = new StreamWriter(dataDir + “\PDFtoText\” + Path.GetFileName(fileName));
// write a line of text to the file
tw.WriteLine(extractedText);
// close the stream
tw.Close();
}
}
}
}

HR-Edited-7bf68347-7210-4305-9b5f-7992597ec3ce.pdf (65.8 KB)

I am attaching the PDF sample.

The text is exported from the Aspose Text absorber as following:

HR-Edited-7bf68347-7210-4305-9b5f-7992597ec3ce.zip (258 Bytes)

Thank you

Adnan.Ahmad · April 1, 2020, 4:21am

@Panagiotis_Biris_gr_ey_com

I have observed your comments. For further investigation, can you please share which version of .NET core you are using.

Panagiotis_Biris_gr_ey_com · April 1, 2020, 11:07am

Hi Adnan, Thank you for the prompt response. We are using .NET Core 2.2 .

Adnan.Ahmad · April 1, 2020, 10:05pm

@Panagiotis_Biris_gr_ey_com,

Thanks for sharing further details.

We have logged an investigation ticket as PDFNET-47922 in our issue tracking system. We will further look into details of it and keep you posted with the status of its resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.

Panagiotis_Biris_gr_ey_com · June 26, 2020, 12:56pm

Hi Adnan,

Do you have any update on this?

Thank you

asad.ali · June 26, 2020, 10:34pm

@Panagiotis_Biris_gr_ey_com

We are afraid that earlier logged ticket is pending for analysis. Please note that it was logged under free support model and it will be resolved on first come first serve basis. However, we will surely inform you as soon as we have some definite updates regarding its resolution. Please spare us some time.

We are sorry for the inconvenience.