PDF file size grows after simple replacement

Hello, we have utility to replace some text by some other text. We call replacement twice in one document. When we set ExpertSans-Regular file size grows by 4KB. But for Japanese we set MS-Gothic for matching text fragment. In this case file size grows by more than 100KB for every run. It is too much. Is there any way how to avoid this file size increase?

Code snippet:

Document pdfDocument = new com.aspose.pdf.Document(documentStream);
for (String placeholderToReplace : textReplacementMap.keySet()) {
String textReplacement = textReplacementMap.get(placeholderToReplace);
TextFragmentAbsorber textFragmentAbsorber = new com.aspose.pdf.TextFragmentAbsorber(placeholderToReplace);
TextSearchOptions textSearchOptions = new com.aspose.pdf.TextSearchOptions(true);
textFragmentAbsorber.setTextSearchOptions(textSearchOptions);
pdfDocument.getPages().accept(textFragmentAbsorber);
com.aspose.pdf.TextFragmentCollection textFragmentCollection =
textFragmentAbsorber.getTextFragments();
for (com.aspose.pdf.TextFragment textFragment : (Iterable<com.aspose.pdf.TextFragment>) textFragmentCollection) {
String foundText = textFragment.getText();
textFragment.setText(textReplacement);
textFragment.getTextState().setFont(FontRepository.findFont(“ExpertSans-Regular”));
//textFragment.getTextState().setFont(FontRepository.findFont(“MS-Gothic”));
}
pdfDocument.save(outputStream);
result.setByteResponse(outputStream.toByteArray());

Hi Marek,


Thanks for contacting support.

Can you please share the resource file, so that we can test the scenario in our environment. We are sorry for this inconvenience.

Test PDF file attached to original post (as I do not see add attachment action in Quick Replay). I replace time stamp placeholder at first page at top right paer of page. 作成: yyyy年MM月dd日 HH:mm GMT by actual timestamp and then I replaced this timestamp again a few times to see how it behaves. It always increases file size. (It is on Windows 7, JDK7 but target environment is Linux.)

Hi Marek,

Thanks for sharing the resource file.

I have tested the scenario using Aspose.Pdf for Java 11.4.0 and as per my observations, the file size is even reduced to 176KB when I have tried replacing time placeholder with sample string. Please note that for testing purposes, I have used Arial font and issue might be occurring due to custom ExpertSans-Regular font which you are using. Can you please share the font file, so that we can again try reproducing the issue in our environment. We are sorry for this inconvenience.

For your reference, I have also attached the sample output generated over my end.

[Java]

Document pdfDocument = new com.aspose.pdf.Document("c:/pdftest/JPLandscape.pdf");

TextFragmentAbsorber textFragmentAbsorber = new com.aspose.pdf.TextFragmentAbsorber("作成: yyyy年MM月dd日 HH:mm GMT");
TextSearchOptions textSearchOptions = new com.aspose.pdf.TextSearchOptions(true);
textFragmentAbsorber.setTextSearchOptions(textSearchOptions);
pdfDocument.getPages().accept(textFragmentAbsorber);

com.aspose.pdf.TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();

for (com.aspose.pdf.TextFragment textFragment : (Iterable<com.aspose.pdf.TextFragment>) textFragmentCollection) {
    String foundText = textFragment.getText();
    textFragment.setText("Update String");
    textFragment.getTextState().setFont(FontRepository.findFont("Arial"));
    // textFragment.getTextState().setFont(FontRepository.findFont("MS-Gothic"));
}

pdfDocument.save("c:/pdftest/TextReplaced.pdf");

I found MS Gothic is ttc file 9MB big but here attachment filter fails as it does not allow ttc file suffix. I will try zip.

I added MS-Gothic and Expert Sans Regular font files. I found corresponding font files in Control Panel -> Fonts. For MS Gothic it is ttc file and for Expert Sans Regular there is references one ttf file. There are also other suffix files (pfm, pfb PS files?) for Expert Sans Regular.

I tested optimization options suggested in Developer Guide to see if it makes any difference. For testing I use once updated JP file which grows from 127KB to 243KB. When I use only:

pdfDocument.optimize();
Document.OptimizationOptions opt = new Document.OptimizationOptions();
opt.setRemoveUnusedObjects(true);
opt.setRemoveUnusedStreams(true);
opt.setLinkDuplcateStreams(true);
it makes minor file size change to 238KB. When I add:
opt.setUnembedFonts(true);
it helps to 181KB but Acrobat Reader then displays different result. It obviously depends on what fonts are available on my system.

Anyway it looks like bug as setting font to segment which is already used in document should not duplicate font in document ie. it should not increase file size so much.

Thanks

Marek

Hi Marek,


Thanks for sharing the resource files.

I have tested the scenario and have managed to reproduce size increase issue. For the sake of correction, I have logged it as PDFNEWNET-40766 in our issue tracking system. We will further look into the details of this problem and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.