I am trying to use Aspose.PDF to remove invisible all text layers from a document. These invisible text layers are the result of a previous OCR operation. I need to remove these text layers before reprocessing the document through OCR - otherwise, I will get a “Double OCR” problem, making all searchable text will appear twice.
I previously tried using the TextFragmentAbsorber to find invisible text fragments and then setting each text fragment to an empty string. Unfortunately, this was incredibly slow - sometimes over 5 minutes per page, which is triggering my process timeout.
Therefore, I am trying to loop through the page.Contents collection to search for BT (Begin Text) and ET (End Text) operators, capturing all operators in-between, looking for a 3 Tr operator (text rendering mode invisible), and then calling page.Contents.Delete() if a 3 Tr is found.
Unfortunately, when I try to call document.Save() at the end of this loop, then I receive a NullReferenceException. The stack trace is useless because it’s obfuscated. Nonetheless, here is the stack trace:
at #=zAeY$NuMBAmJjsmkfApRKezBh_F5l.#=zmk7mtyM=(#=zFmA20cmyVm5QZ$0jFMrqaP3Ohjh1 #=zokgkXC$dh0fR)
at #=zUVbXLpeH7qSc7llNlK7ncoQ=.#=zjDiJry0=(#=zFmA20cmyVm5QZ$0jFMrqaP3Ohjh1 #=zokgkXC$dh0fR)
at #=ziUwdsE_RoM4kdjkRFk_Rzj0=.#=zjDiJry0=(#=zFmA20cmyVm5QZ$0jFMrqaP3Ohjh1 #=zokgkXC$dh0fR)
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object obj, Span`1 copyOfArgs, BindingFlags invokeAttr)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zxvr9L7oLdT_YTPOMeNkBlgpqFRN1ea9CjVFOi_LWbv5b(Object #=zTPv_1Iw=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zAGHy1h3L4dZimT3OgWmf$8An0AKE(Object #=zTPv_1Iw=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=z0$FZgFgQgCzvKRYGKt8qNOQ=(MethodBase #=zTPv_1Iw=, Boolean #=zAzPicws=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=z9HP2gw4N_YbowQMB0f3Kg5Y=(#=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U= #=zTPv_1Iw=, #=qpERxZKoT7cBo5CypqeVCEPDzmOQ5qjrD6ZryeJocp0I= #=zAzPicws=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zqbECXdJ_twxtStaTJrkqBjs=(Boolean #=zTPv_1Iw=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zxvr9L7oLdT_YTPOMeNkBlgpqFRN1ea9CjVFOi_LWbv5b(Object #=zTPv_1Iw=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zAGHy1h3L4dZimT3OgWmf$8An0AKE(Object #=zTPv_1Iw=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zRJFnc6X$WY2agkoIHOO9GJ5cFrOH00wqCw==()
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=z68Lp8NA4xKwE3uDHLNIkPlRN6HpOeehKYg==(#=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U= #=zTPv_1Iw=, #=qpERxZKoT7cBo5CypqeVCEPDzmOQ5qjrD6ZryeJocp0I= #=zAzPicws=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zqbECXdJ_twxtStaTJrkqBjs=(Boolean #=zTPv_1Iw=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zxvr9L7oLdT_YTPOMeNkBlgpqFRN1ea9CjVFOi_LWbv5b(Object #=zTPv_1Iw=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zAGHy1h3L4dZimT3OgWmf$8An0AKE(Object #=zTPv_1Iw=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zRJFnc6X$WY2agkoIHOO9GJ5cFrOH00wqCw==()
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zKVFlxpW_AznOJx4FIuZsUjlykL6jArealxNr2H$IpDHE(Object #=zTPv_1Iw=, UInt32 #=zAzPicws=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zqbECXdJ_twxtStaTJrkqBjs=(Boolean #=zTPv_1Iw=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zqbECXdJ_twxtStaTJrkqBjs=(Boolean #=zTPv_1Iw=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zyYmXx0gb$vFPKAEmHxnFbpOuUq2cxWg4Qw==(Object[] #=zTPv_1Iw=, Type[] #=zAzPicws=, Type[] #=zW264f5Q=, Object[] #=z9pl$FKo=)
at #=qmh6bZuxa7yeSwR5TaJdU3Ygunr$5vEJqcbqnyBH9v5U=.#=zTWbaTmvPyC8zNf0Z3a9tY1ZuVKLLfhCgsUkW0y0OnvNe(Stream #=zTPv_1Iw=, String #=zAzPicws=, Object[] #=zW264f5Q=)
at Aspose.Pdf.Document.#=zgv0R6rzJJFzN(Stream #=z4pqNpB4=, SaveOptions #=zDrH$9qVYCr8I)
at Aspose.Pdf.Document.#=zGCQNpK7iFIXB(String #=z8FuxuEK8Mixe)
at Aspose.Pdf.Document.Save(String outputFileName)