Unsuccessful PDF to PDF/A-2a conversion with font substitution

Hello,

We are trying to convert PDFs to PDF/A-2a on Ubuntu with Java 8 using the latest Aspose PDF for Java 23.4.

When converting documents with fonts that are not embedded into the input document, we get an invalid converted document.

We have a PDF with text in Times New Roman, the Times New Roman font is not embedded and it is not available in the OS. We wish to substitute it with a different font that is available in the OS - Liberation Serif.
The conversion is done using the following code:

try (com.aspose.pdf.Document pdf = new com.aspose.pdf.Document("times-new-roman.pdf")) {
	PdfFormatConversionOptions conversionOptions = new PdfFormatConversionOptions(PdfFormat.PDF_A_2A);
	ByteArrayOutputStream conversionLog = new ByteArrayOutputStream();
	conversionOptions.setLogStream(conversionLog);

	boolean isConvertedSuccessfully = pdf.convert(conversionOptions); // Returns false

	TextFragmentAbsorber absorber = new TextFragmentAbsorber(new TextEditOptions(TextEditOptions.FontReplace.RemoveUnusedFonts));
	pdf.getPages().accept(absorber);

	TextFragmentCollection textFragments = absorber.getTextFragments();
	for (Iterator<TextFragment> iterator = textFragments.iterator(); iterator.hasNext();) {
		TextFragment textFragment = iterator.next();

		Font font = textFragment.getTextState().getFont();
		String fontName = font.getFontName();

		if (!font.isEmbedded()) {
			if (fontName.equals("TimesNewRomanPSMT")) {
				textFragment.getTextState().setFont(FontRepository.findFont("Liberation Serif"));
			} else if (fontName.equals("TimesNewRomanPS-BoldMT")) {
				textFragment.getTextState().setFont(FontRepository.findFont("Liberation Serif Bold"));
			} else if (fontName.equals("TimesNewRomanPS-ItalicMT")) {
				textFragment.getTextState().setFont(FontRepository.findFont("Liberation Serif Italic"));
			} else if (fontName.equals("TimesNewRomanPS-BoldItalicMT")) {
				textFragment.getTextState().setFont(FontRepository.findFont("Liberation Serif Bold Italic"));
			}
		}
	}

	pdf.save("times-new-roman-converted.pdf", new com.aspose.pdf.PdfSaveOptions());
}

And later validated using the following code:

try (com.aspose.pdf.Document convertedPdf = new com.aspose.pdf.Document("times-new-roman-converted.pdf")) {
	ByteArrayOutputStream validationLog = new ByteArrayOutputStream();
	boolean isValid = convertedPdf.validate(validationLog, PdfFormat.PDF_A_2A); // Returns false
}
  1. The converted document, however, is not a valid PDF/A-2a. Could you please advise what to implement so that the converted result is a valid PDF/A-2a? Please, see Times New Roman example: times-new-roman.zip (37.9 KB)

  2. The font substitution code is a bit too specific whether the font is in bold/italic - is there a way to just say “substitute Times New Roman with Liberation Serif” with a single call? Or a way to use the OS font cache for substitutions?

On the other hand, if an input document has all its fonts embedded, and it is just converted to PDF/A-2a without font substitutions, then the converted result is valid. Please, see Calibri example: calibri.zip (50.2 KB)

Thank you!

@t.dobreva

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFJAVA-42777

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Hello,

I did the font substitutions before converting the PDF and got valid results.

Still, an easier way to substitute the fonts would be great.

@t.dobreva

Thanks for sharing your feedback and concerns. We have updated the ticket information and will surely investigate from this perspective.