PDF Text Replacement - Incorrect Font Replacement

backandwhite-nda-merge (3).pdf (226.4 KB)

hi,

Replacing text within a document is mutating the text to an incorrect font. Please see the first paragraph, “50 Sterndale Drive”.

The text seems to change size/font automatically. This behavior is replicable when changing starting the replacement text with a numeric instead of a character. See the send image below, where the 2nd field is begins with a character instead, no impact is found.

image.png (34.3 KB)

Replacing the font to Times New Roman fixed the issue, my guess is this is font specific.

NDA-Non-Disclosure-Agreement.pdf (98.1 KB)

Steps tried to remedy this

            textFragment.getTextState().setUnderline(flagUnderline);
            textFragment.getTextState().setFontSize(fontSize);

            if( backgroundColor != null ){
                textFragment.getTextState().setBackgroundColor(backgroundColor);
            }

            if( font != null ){
                textFragment.getTextState().setFont(font);
            }

There are other tickets similar to this correct? Is this an expected issue for various fonts, if so, would there be a list that could be provided for end users, that are stable?

Thanks and best wishes,
Ben

@Twister99

As for your previously logged ticket “PDFJAVA-39910”, the issue was related to specific font and was investigated accordingly. We already had shared our findings and solution with you in respective forum thread.

For your recently shared scenario, we have logged an investigation ticket as PDFJAVA-40053 in our issue tracking system in order to analyze it further and check if we can come up with some specifications of the fonts. Meanwhile, could you please provide the complete sample code snippet that you are using at your side for text replacement. It would help us investigating the scenario accordingly.

Please see below



    public static void replace(Replacement replacement, String source, String target) throws IOException {
        final FileInputStream fis = new FileInputStream(source);
        Document pdfDocument = new Document(fis);

        // "/{{(.*?)}}/"
        TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("{{" + replacement.oldS() + "}}");
        pdfDocument.getPages().accept(textFragmentAbsorber);

        TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
        System.out.println("Count : " + textFragmentCollection.size());
        
        for (TextFragment textFragment : textFragmentCollection) {
             
            
       /**
            Font font = textFragment.getTextState().getFont();
            float fontSize = textFragment.getTextState().getFontSize();
            boolean flagUnderline = textFragment.getTextState().getUnderline();
            Color backgroundColor = textFragment.getTextState().getBackgroundColor();

             **/
            textFragment.setText(replacement.newS());

         /**  textFragment.getTextState().setUnderline(flagUnderline);
            textFragment.getTextState().setFontSize(fontSize);

            if( backgroundColor != null ){
                textFragment.getTextState().setBackgroundColor(backgroundColor);
            }

            if( font != null ){
                textFragment.getTextState().setFont(font);
            }*/

        }

        String stagingFile = File.createTempFile("temp", null).getAbsolutePath();
        pdfDocument.save(stagingFile);
        FileUtil.moveFile(stagingFile, target);


    }

Best wishes, Ben

@Twister99

Thanks for providing the sample code snippet.

We have updated the ticket information accordingly and will update you as soon as we have some updates in this regard.

@asad.ali

Any update on this? We’re aiming to rollout out end of this month, it would be good to know, if we can include this feature or should postpone.

Best
Ben

@Twister99

Regretfully, the ticket is not yet resolved as its investigation is not yet completed. As soon as we complete the analysis, we will be able to share some updates with you.

Additionally, please note that we recommend installing all Microsoft Essential Fonts in the system like Arial, Times New Roman, etc. as they are basic fonts and they support a large set of characters. Yes, you can also use any custom font while processing the PDF document but that font should be present in the system in order to get properly consumed by the API.

Along with the above, some custom fonts require a valid license in order to consume them. So, they should also be installed and their usage should also be licensed. We hope that shared information would be helpful for you to integrate font related functionality in your application using the Aspose.PDF.

Thanks Asad

I don’t suppose you or the dev team, have a docker image that has these fonts preinstalled correctly, or a Windows VM link/installation script?

This would be a lot easier, I can then just deploy that, knowing I’m reusing the same base that you and the team do.

Thanks and best,
Ben

@Twister99

We have different sets of environments which we use for development and testing purposes. We are afraid that we would not be able to share any image or installation details about them. However, we are looking into your request and will surely try to share some information that would help you creating an environment in order to prevent font related issues.

@Twister99

We have created a docker container that you can use for testing. Command to run docker with Ubuntu + jdk1.8+ pre-installed Microsoft TrueType Fonts pack.

docker run -t -i --rm cerrbeer/ubuntu-java:latest bash