PDF Text Replacement

Hi,

I’ve been experimenting with the PDF text replacement Java library, and have encountered the following

“Failed with error com.aspose.pdf.internal.ms.System.l7k: Format of font “Symbol” is not supported for new composite fonts”

This is only when replacing a text ‘first_name’ with something such as ‘JJ1234’. The characters ‘JJ’ seem to be the issue.

Is there are a character set that this is being matched onto, that causes this? Would you have any recommendations on how to escape it?

The code sample is below

    public static void replace( Replacement replacement, String source, String target  ){
        Document pdfDocument = new Document(source);
        TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber( "{{" + replacement.oldS() + "}}");
        pdfDocument.getPages().accept(textFragmentAbsorber);
        TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();

        for (TextFragment textFragment : (Iterable<TextFragment>) textFragmentCollection) {
            textFragment.setText( replacement.newS());
        }

        pdfDocument.save(target);
    }

1 Like

@Twister99

Would you please share your sample PDF document as well. We will run the code snippet in our environment with your file and share our feedback with you after observing the results.

Hi Asad,

Please see below

https://canusicom-my.sharepoint.com/:b:/g/personal/benjamin_hargrave_canusi_com/Ea7f-STivz5Ki7T51ukOUAgBABG2GGFIqar31V4GWvaL-Q?e=rFv1QM

@Twister99

We tested the scenario with Aspose.PDF for Java 20.10 and were unable to notice any issue. Please check following code snippet used for testing and attached PDF output.

final FileInputStream fis = new FileInputStream(dataDir+ "Updated_Text.pdf");
Document pdfDocument = new Document(fis);

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("{{first_name}}");
pdfDocument.getPages().accept(textFragmentAbsorber);

// Get the extracted text fragments into collection
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
System.out.println("Count : " + textFragmentCollection.size());
// Get the extracted text fragments into collection
for (TextFragment textFragment:textFragmentCollection) {
 //Update text and other properties
 textFragment.setText("JJ1234");
}
pdfDocument.save(dataDir + "20.10.pdf");

20.10.pdf (200.6 KB)

Would you please make sure to use latest version of the API and in case you still face any issue, please let us know.

@asad.ali

Curious - I attempted using the above but the issue remains.
I’ll provide some additional details then.

Here’s the code snippet based on the above

import com.aspose.pdf.*;
import java.io.FileInputStream;
import java.io.IOException;

public class Test2 {

    public static void main( String[] args ) throws IOException{

        replace("./Updated_Text.pdf", "./AsposeTestOutput.pdf" );
    }


    public static void replace( String source, String target  ) throws IOException {
        final FileInputStream fis = new FileInputStream(source);
        Document pdfDocument = new Document(fis);

        TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("{{first_name}}");
        pdfDocument.getPages().accept(textFragmentAbsorber);

// Get the extracted text fragments into collection
        TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
        System.out.println("Count : " + textFragmentCollection.size());
// Get the extracted text fragments into collection
        for (TextFragment textFragment:textFragmentCollection) {
            //Update text and other properties
            textFragment.setText( "JJ1234");
        }
        pdfDocument.save(target);
    }

}
/usr/lib/jvm/java-11-openjdk-amd64/bin/java -javaagent:/snap/intellij-idea-ultimate/253/lib/idea_rt.jar=44111:/snap/intellij-idea-ultimate/253/bin -Dfile.encoding=UTF-8 -classpath /home/twister/dev/freel/freel-shaundavis/target/scala-2.12/classes:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/com/itextpdf/itextpdf/5.5.13.2/itextpdf-5.5.13.2.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/com/typesafe/akka/akka-actor_2.12/2.5.23/akka-actor_2.12-2.5.23.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/com/typesafe/akka/akka-http-core_2.12/10.1.7/akka-http-core_2.12-10.1.7.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/com/typesafe/akka/akka-http-spray-json_2.12/10.1.7/akka-http-spray-json_2.12-10.1.7.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/com/typesafe/akka/akka-http-testkit_2.12/10.1.7/akka-http-testkit_2.12-10.1.7.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/com/typesafe/akka/akka-http_2.12/10.1.7/akka-http_2.12-10.1.7.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/com/typesafe/akka/akka-parsing_2.12/10.1.7/akka-parsing_2.12-10.1.7.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/com/typesafe/akka/akka-protobuf_2.12/2.5.23/akka-protobuf_2.12-2.5.23.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/com/typesafe/akka/akka-stream_2.12/2.5.23/akka-stream_2.12-2.5.23.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/com/typesafe/scala-logging/scala-logging_2.12/3.9.2/scala-logging_2.12-3.9.2.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/com/typesafe/config/1.3.3/config-1.3.3.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/com/typesafe/ssl-config-core_2.12/0.3.7/ssl-config-core_2.12-0.3.7.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/commons-logging/commons-logging/1.2/commons-logging-1.2.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/io/spray/spray-json_2.12/1.3.5/spray-json_2.12-1.3.5.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/pdfbox/fontbox/2.0.21/fontbox-2.0.21.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/pdfbox/pdfbox/2.0.21/pdfbox-2.0.21.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/org/reactivestreams/reactive-streams/1.0.2/reactive-streams-1.0.2.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/org/scala-lang/modules/scala-java8-compat_2.12/0.8.0/scala-java8-compat_2.12-0.8.0.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/org/scala-lang/modules/scala-parser-combinators_2.12/1.1.2/scala-parser-combinators_2.12-1.1.2.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/org/scala-lang/scala-library/2.12.7/scala-library-2.12.7.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/org/scala-lang/scala-reflect/2.12.7/scala-reflect-2.12.7.jar:/home/twister/.cache/coursier/v1/https/repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/home/twister/dev/freel/freel-shaundavis/lib/bootstrap-etl-assembly-1.0.jar:/home/twister/dev/freel/freel-shaundavis/lib/aspose-pdf-19.12.jar com.shaundavis.Test2
Count : 1
Exception in thread "main" com.aspose.pdf.internal.ms.System.l7k: Format of font "Symbol" is not supported for new composite fonts
	at com.aspose.pdf.internal.l4n.lj.lf(Unknown Source)
	at com.aspose.pdf.internal.l4n.lj.<init>(Unknown Source)
	at com.aspose.pdf.internal.l6l.lf.lI(Unknown Source)
	at com.aspose.pdf.internal.l4n.l1p.lI(Unknown Source)
	at com.aspose.pdf.internal.l4k.lu.lI(Unknown Source)
	at com.aspose.pdf.internal.l4f.lI.lI(Unknown Source)
	at com.aspose.pdf.internal.l5l.lI.lI(Unknown Source)
	at com.aspose.pdf.TextSegment.setText(Unknown Source)
	at com.aspose.pdf.TextFragment.setText(Unknown Source)
	at com.shaundavis.Test2.replace(Test2.java:38)
	at com.shaundavis.Test2.main(Test2.java:21)

Process finished with exit code 1

The environment is a ubuntu 18.05 OS.
I attempted the test using the following libraries

aspose-pdf-20.10-jdk17.jar
aspose-pdf-20.10.jar
aspose-pdf-19.12.jar

The tests were done using
openjdk 1.8
openjdk 11

All had the same result, using the same file.

Are there any additional details I can provide, that would help with this?

@Twister99

We were able to notice the issue while testing the scenario over Ubuntu 15.04. Therefore, an issue as PDFJAVA-39910 has been logged in our issue management system for the sake of correction. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

@asad.ali

Great, thanks.

I imagine the fix will be in the 20.11 release, probably around mid-November then, which will be some time away for my client’s needs.

May I ask what configuration, the working solution had? I can run this within a docker environment matching the specification.

Thanks for the support

@Twister99

The seems related to missing fonts in the system. We need to analyze what fonts are acquired in order to run the scenario without any issue. The scenario where no exception was raised, was tested in Windows 10 environment where a lot of fonts (including Windows Essential Core Fonts) were installed. We will further share updates with you as soon as we have some after the investigation is done. Please give us some time.

We apologize for the inconvenience.

@asad.ali

Cheers, thanks for the update.

I see the issue has been marked as resolved - does this mean, it will be included in a November patch?

Thanks for the confirmation.

An update for anyone else tackling this.

The solution as asad mentioned was installing a font suite on the linux server.
A useful link can be found here (https://www.maketecheasier.com/how-to-install-fonts-ubuntu/)

The relevant script is below

sudo apt install ttf-mscorefonts-installer fonts-cantarell lmodern ttf-aenigma ttf-georgewilliams ttf-bitstream-vera ttf-sjfonts ttf-unifont fonts-entypo fonts-isabella fonts-mplus fonts-prociono ttf-anonymous-pro ttf-engadget ttf-staypuft ttf-summersby -yqq

@asad.ali if possible, look forward to confirming this is full font suite - if any others are needed, would be good to know. Cheers

@Twister99

We have investigated the ticket and it has been resolved. Please check our below findings:

The fonts in the document are embedded as a subset (it means that the fonts have information only about characters that were used in the document). The character “J” (in UpperCase) was not used before so its glyph is absent in the embedded Calibri font so the exception has been raised (the Symbol font is the last place where we look at absent glyphs so it is mentioned in the exception).

To correct the situation, it is enough to install standard MS fonts (you could install msttcorefonts package for Ubuntu) so the absent glyphs will be found in the full font.

Or you could point the path to the font by the code:

FontRepository.addLocalFontPath("PATH TO FONTS");

Please note that adding the “Calibri” font will be enough for this issue, but for some other document, some other font should be needed.

Thanks for sharing the link and yes, this information is correct for installing MS Fonts in the Linux system. These fonts support most of the language characters and gylphs so should work with most of the documents. In some special cases where non-Windows fonts are used, some unwanted results may be expected. In such a situation, please do share your sample document with us as we will investigate and share our feedback with you.

1 Like