Hello,
We are using Aspose.PDF for ASP.NET Core and we could not extract/read special characters from a Croatian PDF.
Code we are using:
using Aspose.Pdf;
using Aspose.Pdf.Text;
using System.IO;
namespace AsposeTest
{
class Program
{
static void Main(string[] args)
{
AsposeConfiguration.ConfigureLicense();
string dataDir = @".\TestCases";
foreach (string file in Directory.EnumerateFiles(dataDir, “*.pdf”))
{
Document pdfDocument = new Document(file);
//create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();
//accept the absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);
//get the extracted text
string extractedText = textAbsorber.Text;
// create a writer and open the file
string fileName = Path.ChangeExtension(file, “txt”);
TextWriter tw = new StreamWriter(dataDir + “\PDFtoText\” + Path.GetFileName(fileName));
// write a line of text to the file
tw.WriteLine(extractedText);
// close the stream
tw.Close();
}
}
}
}
HR-Edited-7bf68347-7210-4305-9b5f-7992597ec3ce.pdf (65.8 KB)
I am attaching the PDF sample.
The text is exported from the Aspose Text absorber as following:
HR-Edited-7bf68347-7210-4305-9b5f-7992597ec3ce.zip (258 Bytes)
Thank you