Get Watermark text from pdf

profiler · January 20, 2025, 9:41am

Hi,

I’m tiring to get watermark text from pdf. The document is watermarked with the following code:

var document = new Aspose.Pdf.Document(filename);

var wmFontSize = ws.FontSize;
var wmText = ws.Text;
var lines = wmText.Split(new string[] { "\r\n", "\r", "\\r", "\n", "\\n" }, StringSplitOptions.RemoveEmptyEntries);

var text = new FormattedText();
foreach (var l in lines)
{
    text.AddNewLineText(l);
}

var stamp = new TextStamp(text)
{
    Opacity = ws.Transparency < 0.01 ? 1 : 1 - ws.Transparency,
    TopMargin = 10,
    LeftMargin = 10,
    RightMargin = 10,
    BottomMargin = 10
};

var textAlignment = GetHorizontalAlignment(ws.TextAlignment);
stamp.TextAlignment = textAlignment;
if (textAlignment == HorizontalAlignment.Justify)
{
    stamp.HorizontalAlignment = HorizontalAlignment.Center;
    stamp.VerticalAlignment = VerticalAlignment.Center;
}
else
{
    stamp.HorizontalAlignment = textAlignment;
    stamp.VerticalAlignment = GetVerticalAlignment(ws);
}

if (ws.RotationAngle == RotationAngleConsts.HEADER)
    stamp.VerticalAlignment = VerticalAlignment.Top;
else if (ws.RotationAngle == RotationAngleConsts.FOOTER)
    stamp.VerticalAlignment = VerticalAlignment.Bottom;
else if (ws.RotationAngle == RotationAngleConsts.VERTICAL)
    stamp.RotateAngle = 90;
else if (ws.RotationAngle == RotationAngleConsts.DIAGONAL)
    stamp.RotateAngle = 45;


stamp.TextState.Font = FontRepository.FindFont(ws.FontType);
stamp.TextState.FontSize = wmFontSize;
stamp.TextState.FontStyle = ws.WMFontStyles;
stamp.TextState.Underline = ws.Underline;
stamp.TextState.StrikeOut = ws.Strikeout;
stamp.TextState.ForegroundColor = ws.GetColor();

foreach (var page in document.Pages)
{
    page.AddStamp(stamp);
}

var ownerPassword = Guid.NewGuid().ToString();
var userPassword = ws.PasswordForOpening ?? "";

var documentPrivilege = ws.DocumentPrivilege;
var allPrivilegesAllowed = ws.AllPrivilegesAllowed(documentPrivilege);
if (!allPrivilegesAllowed || !string.IsNullOrWhiteSpace(userPassword))
{
    documentPrivilege.AllowAssembly = false;
    document.Encrypt(userPassword, ownerPassword, documentPrivilege, CryptoAlgorithm.RC4x128, false);
}

document.Metadata.RegisterNamespaceUri("xmp", "http://ns.adobe.com/xap/1.0/");
document.Metadata["xmp:IsWatermarkedByDynamo"] = "true";

document.Save(resultFile);

Then I’m trying to get the artifact info with the following code:

var pdfDocument = new Aspose.Pdf.Document(file, password);

var aftifact = pdfDocument.Pages[1].PageInfo;

foreach (var artifact in pdfDocument.Pages[1].Artifacts)
{
    var textState = artifact.TextState;
    Assert.AreEqual(ws.FontType, textState.Font.FontName);
    Assert.AreEqual(ws.FontSize, textState.FontSize);
    Assert.AreEqual(ws.WMFontStyles, textState.FontStyle);
    Assert.AreEqual(ws.GetColor().ToString(), textState.ForegroundColor.ToString());
    Assert.AreEqual(ws.Transparency < 0.01 ? 1 : 1 - ws.Transparency, artifact.Opacity);
    
    **Assert.AreEqual(ws.Text, artifact.Text);**
}

Everything looks fine except artifact.Text. It is empty.

Professionalize.Discourse · January 20, 2025, 9:41am

@profiler

To retrieve watermark text from a PDF document using Aspose.PDF, you can follow the example provided below. It appears that you are correctly adding a watermark to the PDF, but you are encountering an issue where the artifact.Text is empty when you try to access it.

Here’s a basic example of how to extract watermark text from a PDF document:

using System;
using Aspose.Pdf;

namespace Aspose.Pdf.Examples
{
    public class GetWatermark
    {
        public static void Run()
        {
            // Load the PDF document
            Document pdfDocument = new Document("path/to/your/watermarked.pdf");

            // Iterate through the artifacts on the first page
            foreach (Artifact artifact in pdfDocument.Pages[1].Artifacts)
            {
                // Output the subtype and text of the artifact
                Console.WriteLine($"Subtype: {artifact.Subtype}, Text: {artifact.Text}, Rectangle: {artifact.Rectangle}");
            }
        }
    }
}

Important Notes:

Artifact Text: The artifact.Text property may be empty if the watermark was added as a visual element rather than as text content. Ensure that the watermark is being added correctly as a text stamp.
Check Subtype: You can check the artifact.Subtype to confirm that it is indeed a watermark. If it is not recognized as a watermark, it may not have text associated with it.
Debugging: If artifact.Text remains empty, consider checking how the watermark is being created. Ensure that the text is being set correctly when creating the TextStamp.

If you continue to experience issues, you may want to review the examples provided in the Aspose documentation or check for any updates or changes in the API that might affect how watermarks are handled.

For further reference, you can look at the following examples:

These examples can provide additional insights into working with watermarks and artifacts in PDF documents.

Sources:
[1]: Get Watermark Example
[2]: Counting Artifacts Example

profiler · January 20, 2025, 9:47am

Thanks for the answer! I’m using the same code you suggested, but it doesn’t work. Try opening the attached pdf with password=Velislav_123

I assume the following code is problematic:

var text = new FormattedText();
...
var stamp = new TextStamp(text)

test.zip (29.3 KB)

asad.ali · January 20, 2025, 2:59pm

@profiler

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-59085

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.