Good day!
I’m using Aspose.Pdf for Java 17.7.0. And I need to remove all headers and footers from input PDFs. I’ve already tried to use the code snippets from this post: Removing footers . Moreover, I have following mandatory requirements:
- No operations with file system during PDF processing (e. g. no using of temp files or other external reesources);
- After removing I need to extract plain text, so I could get a com.aspose.Document object after headers-footers removing or do it within com.aspose.pdf.Document object.
- Conversion to word document (.DOCX, .ODT or others) is prohibited.
Taking into account above restrictions, I use the following code to remove headers-footers from PDF and extract plain text:
@Nonnull
public String extract(@Nonnull byte[] bytes) throws Exception {
//open file
Document pdfDocument;
String originalText;
try (InputStream fileInputStream = new ByteArrayInputStream(bytes)) {
PdfContentEditor pce = new PdfContentEditor();
pce.bindPdf(fileInputStream);
pce.deleteStampByIds(new int[] {100, 101}); //delete headers and footers
try (ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
pce.save(bos);
try (ByteArrayInputStream bis = new ByteArrayInputStream(bos.toByteArray())) {
pdfDocument = new Document(bis);
}
}
// pdfDocument = new Document(fileInputStream);
}
com.aspose.pdf.TextAbsorber textAbsorber = new com.aspose.pdf.TextAbsorber();
// Accept the absorber for all the pages
pdfDocument.getPages().accept(textAbsorber);
// Get the extracted text
originalText = textAbsorber.getText();
// cleanup from BOM symbols
StringUtilities strUtils = new StringUtilities();
originalText = strUtils.removeAllUTF8BOM(originalText);
originalText = new PdfTextNormalizer().normalizePdfText(originalText);
return originalText;
}
I have run this code on some documents (in attachments) but no headers-footers was removed. Is it possible to correct this code?
Thanks!
P. S. I have attached some example documents below. All of them are with headers-footers:
General Terms of Use-1.pdf (264.3 KB)
CQ 5.5 OnPremise (License Terms 2012v1)-1.pdf (354.2 KB)
Adobe Connect Hosted Terms of Service-1.pdf (402.4 KB)