Using below code to extract the attached PDF from an original PDF. (Using Aspose for JAVA aspose-pdf-11.2.0.jar)
PdfExtractor extractor = new PdfExtractor();
extractor.bindPdf(orginalPDFWithNonEnglishAttachmentPath);
extractor.extractAttachment();
List<String> attachmentName= extractor.getAttachNames();
for(String aattachName:attachmentName){
extractor.extractAttachment(aattachName);
extractor.getAttachment(primaryResponse);
primaryRespAttachment.add(primaryResponse+"\\"+aattachName);
}
Issue: The attached file name is in non-English char inside the original PDF then the attached non English file is not extracting to the specified path.
Sample file names not working:
- Šanˇák_P18-04996.pdf
- Knüppel L, et al. A Novel Antifibrotic Mechanism of Nintedanib and Pirfenidone.pdf
Working file names:
- anyenglishname.pdf
Note: The attachment file name with English is extracting to the specified path and working fine.
UTF-8 is already set at server and JVM label.File name also displaying fine when I debug through the code.
Please suggest any solution how to extract non English embedded file names in a PDF.