I have a PDF with watermarks and text streams that are visible on screen, but suddenly disappear when printed out from Adobe. The PDF was created with Aspose.PDF for .NET 18.1.
The document is supposed to represent an authentic email, but rather than a direct export, it appears to have been reconstructed (possibly from Word or Aspose) to look like an email. Oddly, the most important paragraph of the email disappears.
In reviewing the code, it appears a white rectangle was added to cover the paragraph. There also appears to be three versions of the same paragraph - two of which are hidden.
For example:
BT
/TT1 1 Tf
10.5 0 0 10.5 56 634 Tm
[(K)0.5 (id)0.5 (s)0.5 (–)0.5] TJ
10.5 0 0 10.5 56 620 Tm
[(P)0.5 (le)0.5 (as)0.5 (e)0.5 ( )0.5 (c)0.5 (on)0.5 (si)0.5 (de)0.5 (r)0.5 ( )0.5 (op)0.5 (0)0.5
(on)0.5 ( 3.)] TJ
Which was covered by the white rectangle:
50 662.5 511.5 79.5 re
W
n
This hidden text is not visible. New text was then added after the white box, and that text is visible - until the PDF is printed out.
My question is, is this odd behavior typical for Aspose - to create white rectangle backgrounds, layers and OCGs, that cause text streams to disappear when printed? Or does this (along with hidden text) suggest the PDF was tampered with?
Also, is there anyone who knows how to review text objects one by one, apply the embedded font’s ToUnicode map, and provide a transcript of what’s hidden?
Thank you for your help!
@santanaj
The behavior you’re experiencing with disappearing watermarks and text when printing a PDF can occur due to several reasons, including how the PDF was created or modified. The presence of a white rectangle covering text suggests that the PDF may have been manipulated, possibly to hide certain content.
Aspose.PDF allows for the creation of layers and the use of optional content groups (OCGs), which can lead to text being hidden or obscured. However, this behavior is not typical for standard PDF generation and may indicate that the document was altered after its initial creation.
To review text objects and analyze the content, you can use Aspose.PDF’s capabilities to extract text. Here’s a basic example of how to extract text from a PDF:
using Aspose.Pdf;
using Aspose.Pdf.Text;
// Load the PDF document
Document pdfDocument = new Document("input.pdf");
// Create a TextAbsorber to extract text
TextAbsorber textAbsorber = new TextAbsorber();
pdfDocument.Pages.Accept(textAbsorber);
// Get the extracted text
string extractedText = textAbsorber.Text;
Console.WriteLine(extractedText);
This code will help you extract visible text. If you suspect that some text is hidden, you may need to analyze the content streams directly, which can be more complex.
If you suspect tampering, it might be beneficial to compare the original document with the PDF to identify discrepancies.
For further assistance, consider consulting the Aspose documentation for more specific guidance on handling text extraction and analyzing PDF content.
Thank you. Where would I find software engineers experienced in Aspose that could compare and analyze the content streams, and make some determination? It’s beyond my ability. Can you recommend anyone, or should I just ask experts to DM me if they’re willing to help?
@santanaj
Are you using Aspose.PDF for .NET to print the PDF as well? If possible, can you please share your sample PDF documents and code snippet that you are using to add watermark and then print them? We will test the scenario in our environment and address it accordingly.
PS: Please try to test the case using 25.8 version before sharing requested information.