Replace Text in PDF/A-2b result in some junk boxes in the replaced PDF

sarvanan.murthi.ext · November 6, 2017, 3:53pm

Dear Aspose Team,
We have a PDF/A-2b document. When we try to replace text, some weird junk boxes appear in the resulting PDF.

Can we get some help on how to use Aspose with PDF/A-2b documents?

Farhan.Raza · November 6, 2017, 6:12pm

@sarvanan.murthi.ext

I would like to request you to share with us a code snippet, source file and the output file so that we may investigate it on our end. Before you share the requested resources, please ensure your observations are based on latest available versions of Aspose.Pdf API, i.e Aspose.Pdf for .NET 17.11 if you are working on .NET platform or Aspose.Pdf for Java 17.10 if you are working on Java platform.

sarvanan.murthi.ext · November 9, 2017, 1:36pm

Hi,
We are using Aspose.Pdf for Java 17.10 for replacing keywords in PDF File. Attached the Input & Output PDF files.

Can you please help us?E1B5000475_A.pdf (15.3 KB)
E1B5000475_A_Output.pdf (87.0 KB)

Code used to replace:

public static void replace(String strKey,String strValue)
{
INFO("Replacing key “+strKey+” with value "+strValue);

	TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(strKey);
	
	TextReplaceOptions textReplaceOptions = textFragmentAbsorber.getTextReplaceOptions(); 
	
	if(textReplaceOptions != null)
		textReplaceOptions.setReplaceAdjustmentAction(TextReplaceOptions.ReplaceAdjustment.WholeWordsHyphenation);
	
	TextSearchOptions searchOptions = new TextSearchOptions(true);

	textFragmentAbsorber.setTextSearchOptions(searchOptions);

	pdfInputDocument.getPages().accept(textFragmentAbsorber);


	TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
	if(textFragmentCollection.size()>0)
	{
		int i=1;
		for (TextFragment textFragment : textFragmentCollection) 
		{
			if(textFragment != null)
			{
			INFO("\tReplacing Occurence "+i+" in Page "+textFragment.getPage().getNumber());
			//boolean isStrikeOut = textFragment.getTextState().getStrikeOut();
			//System.out.println("\t\t\t\t\tStrikeOut : "+isStrikeOut);
			//boolean isUnderLined = textFragment.getTextState().isUnderline();
			//System.out.println("\t\t\t\t\tUnderLine : "+isUnderLined);
			//textFragment.setWrapLinesCount(i);
			textFragment.setText(strValue);
			++i;
			}
		}
	}
	else{
		INFO("\tNo Occurences found");
	}
}

Farhan.Raza · November 9, 2017, 7:22pm

@sarvanan.murthi.ext

I have worked with the data shared by you and have been able to reproduce the issue in our environment. A ticket with ID PDFJAVA-37245 has been logged in our issue management system for further investigation and resolution. The issue ID has been linked with this thread so that you will receive notification as soon as the issue is resolved.

We are sorry for the inconvenience.

sarvanan.murthi.ext · November 13, 2017, 3:27pm

From our experience, we doubt this is because of Embedded Font & Encoding of the PDF file which is not considered by Aspose API.

Attached files in which issue occurs for particular fonts.
Font_Test_PDF2A_MERGE_PDF2A.pdf (75.7 KB)
Font_Test_PDF2A_MERGE_PDF2A_ASPOSE.pdf (449.1 KB)

Please confirm the embedded fonts & encoding supported by Aspose API?

Farhan.Raza · November 13, 2017, 7:14pm

@sarvanan.murthi.ext

I would like to share with you that the issue reported by you has recently been logged and is pending for investigation. I have updated the respective ticket with the latest information shared by you. We will be able to share our findings with you, once the issue is investigated in our environment.

Yes, Embedded Fonts and Encoding are supported by Aspose.Pdf for Java API.

Farhan.Raza · November 22, 2017, 10:28am

@sarvanan.murthi.ext

We have investigated the issue PDFJAVA-37245 reported by you. We have found out that, Calibri font was included as a subset but does not have standard encoding. Please note that when using subsetting, only those characters that are actually used in the layout are stored in the PDF and the encoding could be different. So if you want to edit text and the character you need is not included in the subset, it cannot be used for the correction. To avoid potential problems, before editing, this font could be replaced to the same font from current OS. Please try below code snippet in your environment and then share your kind feedback with us.

public static void replace(String strKey, String strValue) {
    INFO("Replacing key " + strKey + " with value " + strValue);
    
    TextFragmentAbsorber absorberFontChanger = new TextFragmentAbsorber(
            new com.aspose.pdf.TextEditOptions(com.aspose.pdf.TextEditOptions.FontReplace.RemoveUnusedFonts));

    pdfInputDocument.getPages().accept(absorberFontChanger);

    TextFragmentCollection textFragmentCollection2 = absorberFontChanger
            .getTextFragments();
    for (Iterator<TextFragment> iterator = textFragmentCollection2.iterator(); iterator.hasNext();) {
        TextFragment textFragment = iterator.next();
        if (textFragment.getTextState().getFont().getFontName().contains("Calibri")) {
            textFragment.getTextState().setFont(FontRepository.findFont("Calibri"));
        } else if (textFragment.getTextState().getFont().getFontName().contains("CourierNew")) {
            textFragment.getTextState().setFont(FontRepository.findFont("CourierNew"));
        } else if (textFragment.getTextState().getFont().getFontName().contains("MSGothic")) {
            textFragment.getTextState().setFont(FontRepository.findFont("MSGothic"));
        } else {
            // ... and so on...
        }
    }

    TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(strKey);

    TextReplaceOptions textReplaceOptions = textFragmentAbsorber.getTextReplaceOptions();

    if (textReplaceOptions != null) {
        textReplaceOptions.setReplaceAdjustmentAction(TextReplaceOptions.ReplaceAdjustment.WholeWordsHyphenation);
    }

    TextSearchOptions searchOptions = new TextSearchOptions(true);

    textFragmentAbsorber.setTextSearchOptions(searchOptions);

    pdfInputDocument.getPages().accept(textFragmentAbsorber);

    TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
    if (textFragmentCollection.size() > 0) {
        int i = 1;
        for (TextFragment textFragment : textFragmentCollection) {
            if (textFragment != null) {
                INFO("\tReplacing Occurence " + i + " in Page " + textFragment.getPage().getNumber());                   
                textFragment.setText(strValue);
                ++i;
            }
        }
    } else {
        INFO("\tNo Occurences found");
    }
}

I hope this will be helpful. Please let us know if you need any further assistance.

sarvanan.murthi.ext · November 22, 2017, 11:34am

Thanks… This works!!!

Farhan.Raza · November 22, 2017, 7:30pm

@sarvanan.murthi.ext

Thanks for sharing your feedback.

It is good to know that your issue has been resolved by suggested approach. Please keep using our API and in event of any further query, feel free to ask.