Hello,
We are trying to convert PDFs to PDF/A-2a on Ubuntu with Java 8 using the latest Aspose PDF for Java 23.4.
When converting documents with fonts that are not embedded into the input document, we get an invalid converted document.
We have a PDF with text in Times New Roman, the Times New Roman font is not embedded and it is not available in the OS. We wish to substitute it with a different font that is available in the OS - Liberation Serif.
The conversion is done using the following code:
try (com.aspose.pdf.Document pdf = new com.aspose.pdf.Document("times-new-roman.pdf")) {
PdfFormatConversionOptions conversionOptions = new PdfFormatConversionOptions(PdfFormat.PDF_A_2A);
ByteArrayOutputStream conversionLog = new ByteArrayOutputStream();
conversionOptions.setLogStream(conversionLog);
boolean isConvertedSuccessfully = pdf.convert(conversionOptions); // Returns false
TextFragmentAbsorber absorber = new TextFragmentAbsorber(new TextEditOptions(TextEditOptions.FontReplace.RemoveUnusedFonts));
pdf.getPages().accept(absorber);
TextFragmentCollection textFragments = absorber.getTextFragments();
for (Iterator<TextFragment> iterator = textFragments.iterator(); iterator.hasNext();) {
TextFragment textFragment = iterator.next();
Font font = textFragment.getTextState().getFont();
String fontName = font.getFontName();
if (!font.isEmbedded()) {
if (fontName.equals("TimesNewRomanPSMT")) {
textFragment.getTextState().setFont(FontRepository.findFont("Liberation Serif"));
} else if (fontName.equals("TimesNewRomanPS-BoldMT")) {
textFragment.getTextState().setFont(FontRepository.findFont("Liberation Serif Bold"));
} else if (fontName.equals("TimesNewRomanPS-ItalicMT")) {
textFragment.getTextState().setFont(FontRepository.findFont("Liberation Serif Italic"));
} else if (fontName.equals("TimesNewRomanPS-BoldItalicMT")) {
textFragment.getTextState().setFont(FontRepository.findFont("Liberation Serif Bold Italic"));
}
}
}
pdf.save("times-new-roman-converted.pdf", new com.aspose.pdf.PdfSaveOptions());
}
And later validated using the following code:
try (com.aspose.pdf.Document convertedPdf = new com.aspose.pdf.Document("times-new-roman-converted.pdf")) {
ByteArrayOutputStream validationLog = new ByteArrayOutputStream();
boolean isValid = convertedPdf.validate(validationLog, PdfFormat.PDF_A_2A); // Returns false
}
-
The converted document, however, is not a valid PDF/A-2a. Could you please advise what to implement so that the converted result is a valid PDF/A-2a? Please, see Times New Roman example: times-new-roman.zip (37.9 KB)
-
The font substitution code is a bit too specific whether the font is in bold/italic - is there a way to just say “substitute Times New Roman with Liberation Serif” with a single call? Or a way to use the OS font cache for substitutions?
On the other hand, if an input document has all its fonts embedded, and it is just converted to PDF/A-2a without font substitutions, then the converted result is valid. Please, see Calibri example: calibri.zip (50.2 KB)
Thank you!