Extract pdf layout as image without text

danrub · June 1, 2017, 2:49am

Hi,

How can I generate pdf page image without it’s text in performant way.

For now I’m using a very slow approach:

finding all text via TextFragmentAbsorber, then setting all fragments to empty string,

then sending the pdf to PngDevice. After, I rollback all changes.

* Tried setting foreground color to transparent, but it didn’t work in all cases.

There are two issue with that approach:

1) Each text set, is taking to long, it seems like a very intensive operation.
2) All aspose absorbers are crashing in multi-threaded environment. even if two unrelated PdfDocuments are running TextFragmentAbsorber simultaneously, sometimes I get a very unexpected crushes in different stages of Aspose.Pdf code, it is not consistent.
Required me to add a locking mechanism above that prevents two Absorbers to run under the same process (Very Slow)

Please advise,

Thanks

asad.ali · June 1, 2017, 10:51am

Hi Daniel,

Thanks for your inquiry.

danrub:

1) Each text set, is taking to long, it seems like a very intensive operation.

We already have observed this issue and for the sake of correction, have logged it under a ticket PDFNET-37864 in our issue tracking system. I have associated the ticket ID with this thread as well, so that you will be notified once the issue is resolved. Please be patient and spare us little time.

danrub:

2) All aspose absorbers are crashing in multi-threaded environment. even if two unrelated PdfDocuments are running TextFragmentAbsorber simultaneously, sometimes I get a very unexpected crushes in different stages of Aspose.Pdf code, it is not consistent.
Required me to add a locking mechanism above that prevents two Absorbers to run under the same process (Very Slow)

We will really appreciate if you please share a sample PDF document along with working code snippet, so that we can also test the scenario in our environment and address it accordingly.

We are sorry for the inconvenience.

Best Regards,