Not able to replace any standard language(Hindi, French, Spanish) text with another text using aspose

Using below code I tried to replace typically hindi word to another word. But there was no change in pdf. How to replace another language word to any specific word using aspose
I have attached sample pdf that used for this code
HindiPDf.pdf (43.1 KB)

               var inputfile = @"D:\HindiPDf.pdf";
                // Open document
                Document doc = new Document(inputfile);

                // Create TextFragmentAbsorber object to find all "hello world" text occurrences
                string rectString = "साहित्य";
                TextFragmentAbsorber absorber = new TextFragmentAbsorber(rectString);

                 doc.Pages[1].Accept(absorber);
                  // Change text and font of the first text occurrence
                  absorber.TextFragments[1].Text = "redact";
                
                // Save document
                doc.Save(Path.Combine(Path.GetDirectoryName(inputfile) + Path.GetFileNameWithoutExtension(inputfile) + "Redacted" + Path.GetExtension(inputfile)));

@cyginfo

It seems that the font used in your PDF document is not an Unicode font. For example, if you open the document in Adobe Reader, copy any text from it and paste it in search box - you will notice a garbage value e.g. e.g. साहित्य will turn into lkfgR;. Furthermore, we have logged an investigation ticket as PDFNET-46638 in our issue tracking system to further investigate the scenario.

We will look into details of the issue and keep you posted with the status of ticket resolution. Please be patient and spare us little time. Meanwhile, please try to process a PDF document where HINDI text is written using some Unicode fonts e.g. Mangal.

Meanwhile, please try to process a PDF document where HINDI text is written using some Unicode fonts e.g. Mangal

  • With this line , you mean try to process using lkfgR; to search & replace. That I have tried and it’s working. But how to interpret साहित्य as lkfgR;
    How to know the unicode of a pdf and its equivalence using aspose? How to replace the same?
    Thanks for reponse

@cyginfo

No, we did not suggest you to try search and replace using that value. It was just a garbage string/text which was shared just to clarify that Adobe Reader was not recognizing HINDI text because it was not using a Unicode font. And it was replacing original text with some random or garbage text.

Usually non-English or Asian languages use Mangal font in Adobe Reader and it recognizes characters of these languages correctly if a Unicode font is used. You may have other PDF documents as well where you can check if Mangal Font is used by checking properties of PDF document in Adobe Reader.

OR, you may please share more information like what is the actual source of you PDF document. Are you converting it from some other file format? In that case, please share that with us. It would help us investigating the scenario accordingly.

I have directly downloaded the file from Google. Although, I am unable to figure out if we can search using aspose for any other language , how search can be possible even if the uni-coded file is used

@cyginfo

It seems like you have shared your similar concerns and post your inquiry in other thread where we have shared our response. Please note that sometimes issues are related to specific document due to specific reasons depending upon structure and complexity of the document.

We have already logged a ticket for the document that you have shared in this post. In case you are having issue with other documents as well, please share with us. We will test the scenario in our environment and address it accordingly.