We are seeing cases where we call PdfJavaScriptStripper.Strip, it returns false, but the PDF content has been visibly altered. Some text is getting removed from the PDF that existed before calling Strip. It seems like returning false should correspond to the input stream being unmodified.
- Observed behavior:
PdfJavaScriptStripper.strip()returnsfalsebut the output PDF is missing text that was present in the source. - Expectation: A
falsereturn value should indicate that the input stream was left unchanged.
What we need to investigate
- Aspose.PDF for Java version – please confirm the exact version (e.g., 23.12, 24.1, etc.).
- Minimal reproducible sample – a small PDF (≤ 2 pages) that shows the issue, together with the exact code you use to call
strip(). - How the streams are supplied – are you passing file paths,
InputStream/OutputStream, or memory streams?
Suggested first step
- Upgrade to the latest Aspose.PDF for Java release (if you are not already on it) and re‑run the same test. In recent builds the
strip()method has been tightened to avoid modifying the source when it returnsfalse(see internal change log for version 23.12).
If the problem persists
Provide the items listed above so we can reproduce the issue internally and log a defect. Once we have a reproducible case we can confirm whether this is a bug in the current version and work on a fix.
Reference: PdfJavaScriptStripper is used to extract or remove JavaScript actions from PDFs (forum discussion on extracting JavaScript)【aspose.com】.
Would you kindly share your sample code snippet along with the sample PDF document for our reference? We will test the scenario in our environment and address it accordingly.
I can share the code snippet, but the only PDF I have with a repro has PII in it
.
However, since you asked for the code, I realized it might not be the Strip call causing the issue. We also call Flatten. I just confirmed that the issue is caused by Flatten, not Strip.
Is the Flatten call correct?
Here’s the code:
public static MemoryStream StripJavascriptFromPDF(Stream inputStream)
{
using (var outputStream = new MemoryStream())
using (var flattenedStream = new MemoryStream())
{
var pdfDoc = new Document(inputStream);
var pdfJsStripper = new PdfJavaScriptStripper();
// Flattens form fields and annotations. If these are left intact, JS Stripping can fail with
// an inscrutable error like:
// "System.InvalidOperationException: Operation is not valid due to the current state of the object."
// Suggestion from: https://forum.aspose.com/t/strip-actions-from-pdf/229163
// If Flatten fails, log the error but continue
try
{
pdfDoc.Flatten();
pdfDoc.Save(flattenedStream);
}
catch (Exception ex)
{
CPRLogger.LogException("PDF Flatten failed, continuing to strip JS", ex);
inputStream.Rewind();
pdfJsStripper.Strip(inputStream, outputStream);
return outputStream;
}
pdfJsStripper.Strip(flattenedStream, outputStream);
return outputStream;
}
}
Yes, the Document.Flatten() method may be the reason of this behavior because it disables the form features in the document and if JavaScript is embedded with some input field, it can disable it as well. Please try commenting this method and see if it resolves the issue.
As mentioned, I confirmed that the issue does not happen if we don’t call Flatten. However, the comment points to a recommendation that says the Strip call can fail if Flatten is not called first.
Is there a recommendation for a different way to handle the need to Flatten in some cases? Can we detect whether the Flatten call is needed before calling it?
The Flatten method is only needed when a PDF has forms in it. This method removes form fields and place their value at the same place. Whereas, Strip method is specifically used to strip JavaScript from the document. Based on the difference between these two methods, you can decide when to use them.
Just to confirm, if there are form fields, the Strip call will fail, correct? Is there a way to detect form fields? I need to know how to detect when I should Flatten. Thanks.
No, it doesn’t mean that Strip call will always fail if there are form fields. As per initial investigation, the Strip call is failing because JavaScript maybe embedded with the form fields and because you are calling Flatten method, the form fields are getting erased with associated JS.
In order to check if Document has form fields or not, you can use below code sample:
Document doc = new Document(dataDir + "input.pdf");
var count = doc.Form.Count;
In case you are still not able to fix the issue that you are facing, please share a sample PDF for our reference so that we can investigate accordingly. If you cannot share your document publicly, you can please share it in a private message. Please click on the username and press Blue Message Button to send a private message.
Thanks. I’ve added the check to skip the Flatten if there are no forms and it addresses the issue I was seeing with the problematic PDF.
One final question: if there are forms, should I only call Flatten, or is it better to call Flatten and then Strip to ensure no JS remains?
It’s better to call both methods since JavaScript might not be exclusively tied to form fields.