Hi Team,
We have observed the size of the Searchable PDF is much larger than expected size. we understand than searchable pdf has the additional text in it. But we have seen the size is much larger than expected like 10x. This was fine until we were processing smaller tif ,png, pdf files but when we started processing 50 or 400+ pages of tiff or pdf the searchable pdf just grows exponentially. which is a show stopper problem issue for us.
On further analysis we spotted the Searchable PDF created by Adobe Acrobat was much smaller than what we create from Aspose.PDF/OCR. Looks like Aspose is still using 1.4 version of PDF which is leading to get creation of bigger size pdf compare to the PDF we are creating manually by Adobe 1.7 version.
analysis.zip (546.1 KB)
I cant send those pdf files as they have some confidential data in it. But have masked it for your view in analysis.zip . It has 3 screen shot.
- SizeDifference.png
- AsposeCreatedPDF.png (version and sizeā¦details Aspose created)
- AdobeCreatedFilePDF.png (version and sizeā¦details Adobe created Manually some tool)
Can you please help me to fix this size issue and get the PDF shrink as much as possible with optimal readability and size and color balance
Below is the code snippet we are using for PDF Optimization
/// <summary>
/// Optimize the Searchable PDF
/// </summary>
/// <param name="PdfToSearchablePdfStream"></param>
private void OptimizePdf(MemoryStream PdfToSearchablePdfStream)
{
Document doc = new(PdfToSearchablePdfStream);
GoToAction action = new(new XYZExplicitDestination(1, 0, 0, 1.5)); // Managing Zoom: 1 = 100%
doc.OpenAction = action;
OptimizationOptions optimizationOptions = new OptimizationOptions
{
LinkDuplcateStreams = true,
RemoveUnusedObjects = true, // This helps
AllowReusePageContent = true,
CompressObjects = true,
UnembedFonts = true
};
optimizationOptions.ImageCompressionOptions.ResizeImages = true;
optimizationOptions.ImageCompressionOptions.MaxResolution = _podService.GetPdfMaxResolution(); //240
optimizationOptions.ImageCompressionOptions.CompressImages = true;
optimizationOptions.ImageCompressionOptions.Encoding = ImageEncoding.Unchanged;
optimizationOptions.ImageCompressionOptions.ImageQuality = _podService.GetPdfQuality(); //20
optimizationOptions.ImageCompressionOptions.Version = Aspose.Pdf.Optimization.ImageCompressionVersion.Fast;
foreach (var page in doc.Pages)
{
foreach (var annotation in page.Annotations)
{
annotation.Flatten();
}
}
if (doc.Form.Fields.Count() > 0)
{
foreach (var item in doc.Form.Fields)
{
item.Flatten();
}
}
doc.OptimizeResources(optimizationOptions);
//doc.Optimize();
doc.Save(PdfToSearchablePdfStream);
}