TextAbsorber changing Font/Font size of entire document

I have used tesseract to create a searchable pdf.
out.pdf (49.7 KB)

I have used TextAbsorber and saved the document again.

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(path);
pdfDocument.Pages.Accept(new TextAbsorber());
pdfDocument.Save(path);

out2.pdf (50.1 KB)

Before using the above code I could mark the entire text. See picture out.png (308.5 KB)
However, after using TextAbsorber , I can’t mark the text in the same way. See picture: out2.png (308.5 KB)

I think TextAbsorber changing Font/Font size etc. (I’m not sure) What can I do to prevent this ?

@traderaboy

Thanks for contacting support.

We have tested the scenario in our environment and were able to replicate the issue while using Aspose.PDF for .NET 18.11. Therefore, we have logged this issue as PDFNET-45739 in our issue tracking system for the sake of correction. We will further look into details of the issue and keep you posted with the status of its correction. Please be patient and spare us little time.

We are sorry for the inconvenience.

Thank you !
Can you estimate how long does it take for a fix?
This hidden bug was discovered in the final phase of Aspose.PDF evaluation. So it’s important for us to know how long it takes for a fix.

@traderaboy

Thanks for your inquiry.

Since, the issue has been logged under our free support model, it has low priority and will be resolved on first come first serve basis. However, we have recorded your concerns and will definitely consider them during issue investigation. We will surely let you know as soon as we have some definite updates regarding resolution ETA. Please spare us little time.

We are sorry for this inconvenience.

Hi !
Any progress on this issue?
I tested the Aspos.PDF for JAVA and it worked perfectly on JAVA version. But unfortunately our developed system is written in C#.

@traderaboy

Thanks for your inquiry.

The investigation of logged issue has been planned for next week and as soon as it is completed, we will be in position to share some definite updates regarding resolution ETA. Please spare us little time.

Hi !
I have a lot of pressure from the company I work at.
They want to continue with the project. Has the issue been resolved?
I know you do not prioritize this because we do not have a license. We have been working for months with your API. The company will buy the license as soon as this has been resolved.

@traderaboy

Thanks for writing to us.

We have investigated the issue and found that Tesseract uses ‘GlyphLessFont’ with an absent dictionary of glyph widths in the font specification in the document and at the same time it uses several ‘tricks’ with text properties to align the background (invisible) text despite of incorrect information about glyph widths. These two circumstances are ‘confusing’ our font and text processing procedures. Therefore any operation with text leads to its shrinking in this type of the documents.

We have planned to fix this issue before release of Aspose.PDF for .NET 19.2 but, resolution may take more time depending upon the number of high priority issues in the queue. We have recorded your concerns and will definitely consider them during ticket resolution. We will let you know in case we have further updates in this regard. Please spare us little time.

We are sorry for the delay and inconvenience.

Thank you for your explanation.
I had tested this with the JAVA API and it worked great. Shouldn’t both APIs have similar logic? The fix should be simple if you already have the JAVA API that works.

@traderaboy

We are in process to investigate the reasons behind the issue and will definitely look into this from every perspective. We will provide you update as soon as we have some additional details.

Any update???

@traderaboy

We regret to share that fix to your issue has been postponed due to other high priority issues. However, we assure you that we have recorded your concerns and escalated the issue to next level. We will definitely let you know as soon as we have significant updates regarding ticket resolution. We greatly appreciate your patience in this matter. Please spare us little time.

We are sorry for the inconvenience.

Sorry to jump into the middle but we are also facing the same issue i.e, ‘Tesseract’ is generating the ‘GlyphLessFont’. Did you got the fix for the same?

Hi !

After many months as Aspose support wasted my and my company’s time with their lies we tried another API.
The API is called Vintasoft. They have amazing support.
Here is their link https://www.vintasoft.com/

Thanks for the suggestion @traderaboy, but we already moved on with ‘Aspose’ and we are already using licensed version. Hence @asad.ali, @aspose can you please provide a solution for the same.

@traderaboy, @mdalam

We apologize for the inconvenience and delay.

Please note that we resolve every logged issue, however, resolution time of an issue depends upon several factors e.g. nature and complexity of the issue, how many issues are logged prior to it, etc. This particular issue is quite complex in nature and definitely needs more time to get fixed.

Later investigation showed a serious problem with processing embedded font that has no specified widths dictionary in the font specification. We have updated font functions. However several difficulties (regression) were occurred with previously resolved issues. We will try our level best to resolve it in April (for Aspose.PDF for .NET 19.5) which may be considered as an ETA.

As soon as issue is fixed, we will update you within this forum thread. We again apologize for the delay and inconvenience faced.