Large MultiPage PDF generated using Aspose PDF 7.2

Hi,


I have been using Aspose 7.2 PDF for .Net 4 framework. I have a bunch of images (approx 55 pages) which i will be converting to PDF format. When it gets converted to MultiPage PDF, its file size becomes significantly Large.

For example : For sample attached image, when the same image used to create 55 MultiPage PDF, it would create larger Multipage PDF.

Please let us know how can we achieve optimum quality along with Minimum size.

Hi Dinesh,


Thanks for your inquiry. Please use new DOM approach to convert TIFF to PDF and optimize the output PDF document as following. It will help you to accomplish the task.

string outFile = myDir + “ImagetoPDFDOM.pdf”;<o:p></o:p>

string inFile = myDir + "img20140822_18405542.tif";

Document doc = new Document();

Page page = doc.Pages.Add();

// Set margins so image will fit, etc.

page.PageInfo.Margin.Bottom = 0;

page.PageInfo.Margin.Top = 0;

page.PageInfo.Margin.Left = 0;

page.PageInfo.Margin.Right = 0;

page.PageInfo.Height = Aspose.Pdf.PageSize.A4.Height;

page.PageInfo.Width = Aspose.Pdf.PageSize.A4.Width;

//Create an image object

Aspose.Pdf.Image image1 = new Aspose.Pdf.Image();

//Add the image into paragraphs collection of the section

page.Paragraphs.Add(image1);

//Set the ImageStream to a MemoryStream object

image1.File = inFile;

MemoryStream ms = new MemoryStream();

doc.Save(ms);

doc = new Document(ms);

foreach (Page page1 in doc.Pages)

{

int idx = 1;

foreach (Aspose.Pdf.XImage image in page1.Resources.Images)

{

using (var imageStream = new MemoryStream())

{

// To compress images change the image type and resolution

image.Save(imageStream, System.Drawing.Imaging.ImageFormat.Png, 72);

imageStream.Seek(0, SeekOrigin.Begin);

// Control image quality for better compression

page1.Resources.Images.Replace(idx, imageStream, 80);

}

idx = idx + 1;

}

}

doc.OptimizeResources();

doc.Save(outFile);

Please feel free to contact us for any further assistance.


Best Regards,

Thanks for your reply. Sorry for the late reply from my side.


I have tried out your approach. but it looks like certain API properties or method missing in Aspose.pdf 7.2 version.

For example :
1. page.PageInfo. - This Property Missing in Aspose.PDF - 7.2 Version
2. page1.Resources.Images.Replace(idx, imageStream, 80); - Aspose.PDF 7.2 version only contains 2 method arguments instead of 3.

Could you please tell me any new version which I need to use ?

Also, Let me know how can i achieve the same thing in Aspose.PDF (Version 7.2) because I am using 7.2 version.

Hi Dinesh,

I have tested the scenario with latest release of Aspose.Pdf for .NET 9.5.0 and all the methods/properties are working fine in latest release. I am afraid we might not be able to fix the issues in older release and we always encourage our customers to try using the latest release. Please download the latest release and try using it.

You may also consider requesting a 30 days temporary license to test the API without any limitations. For more information, please visit Get a temporary license

Hi ,


With Suggested code approach, we are getting good result in Output PDF Size but the Memory Consumption is very large. For 55 different Single Page Tiff, it would take approx 5 GB Peak memory consumption during conversion.

Also, Color images got stretched out during PDF conversion.

Below is the Code snippet with PDF Generator Object in which the attached images are not stretched out. However, with the suggested Code approach (as per Initial reply - PDFDocument Object) along with minor modification regarding Image height & Width, color image got stretched out.

	Dim m_Stream As MemoryStream = Nothing
Dim pdfDocumentGenerator = New Aspose.Pdf.Generator.Pdf()
Dim modifiedFilePath As String = destPath + “\ResultMultiPageFile” + “.PDF”
Try
        <span style="color:blue;">If</span> <span style="color:blue;">Not</span> System.IO.<span style="color:#2b91af;">Directory</span>.Exists(destPath) <span style="color:blue;">Then</span>
            System.IO.<span style="color:#2b91af;">Directory</span>.CreateDirectory(destPath)
        <span style="color:blue;">End</span> <span style="color:blue;">If</span>

        <span style="color:blue;">If</span> System.IO.<span style="color:#2b91af;">File</span>.Exists(modifiedFilePath) <span style="color:blue;">Then</span>
            System.IO.<span style="color:#2b91af;">File</span>.Delete(modifiedFilePath)
        <span style="color:blue;">End</span> <span style="color:blue;">If</span>

        <span style="color:blue;">For</span> <span style="color:blue;">Each</span> FilePath <span style="color:blue;">As</span> <span style="color:blue;">String</span> <span style="color:blue;">In</span> FileNames
            <span style="color:blue;">Dim</span> m_BinaryData <span style="color:blue;">As</span> <span style="color:blue;">Byte</span>() = System.IO.<span style="color:#2b91af;">File</span>.ReadAllBytes(FilePath)
            m_Stream = <span style="color:blue;">New</span> <span style="color:#2b91af;">MemoryStream</span>(m_BinaryData)
            <span style="color:blue;">Dim</span> inBitmap <span style="color:blue;">As</span> <span style="color:blue;">New</span> <span style="color:#2b91af;">Bitmap</span>(m_Stream)

            <span style="color:blue;">Dim</span> pdfGeneratorSection <span style="color:blue;">As</span> <span style="color:blue;">New</span> Aspose.Pdf.Generator.<span style="color:#2b91af;">Section</span>(pdfDocumentGenerator)

            <span style="color:green;">'Set margins so image will fit, etc.</span>
            pdfGeneratorSection.PageInfo.Margin.Top = 0
            pdfGeneratorSection.PageInfo.Margin.Bottom = 0
            pdfGeneratorSection.PageInfo.Margin.Left = 0
            pdfGeneratorSection.PageInfo.Margin.Right = 0

            <span style="color:green;">'pdfGeneratorSection.PageInfo.PageWidth = Aspose.Pdf.PageSize.A4.Width</span>
            <span style="color:green;">'pdfGeneratorSection.PageInfo.PageHeight = Aspose.Pdf.PageSize.A4.Height</span>
            pdfDocumentGenerator.Sections.Add(pdfGeneratorSection)

            <span style="color:blue;">Dim</span> pdfImage <span style="color:blue;">As</span> <span style="color:blue;">New</span> Aspose.Pdf.Generator.<span style="color:#2b91af;">Image</span>(pdfGeneratorSection)

            <span style="color:blue;">Dim</span> refWidth <span style="color:blue;">As</span> <span style="color:blue;">Integer</span> = <span style="color:blue;">CInt</span>(pdfGeneratorSection.PageInfo.PageWidth)
            <span style="color:blue;">If</span> inBitmap.Width > refWidth <span style="color:blue;">Then</span>
                pdfImage.ImageInfo.FixWidth = refWidth
                pdfImage.ImageInfo.FixHeight = (inBitmap.Height * refWidth) / pdfImage.ImageInfo.FixWidth
            <span style="color:blue;">End</span> <span style="color:blue;">If</span>

            <span style="color:blue;">Dim</span> nPercent <span style="color:blue;">As</span> <span style="color:blue;">Single</span> = 0
            nPercent = (<span style="color:blue;">CSng</span>(pdfImage.ImageInfo.FixWidth) / <span style="color:blue;">CSng</span>(inBitmap.Width))

            pdfImage.ImageInfo.FixWidth = <span style="color:blue;">CInt</span>(inBitmap.Width * nPercent)
            pdfImage.ImageInfo.FixHeight = <span style="color:blue;">CInt</span>(inBitmap.Height * nPercent)

            <span style="color:blue;">If</span> pdfImage.ImageInfo.FixWidth > pdfGeneratorSection.PageInfo.PageWidth <span style="color:blue;">Then</span>
                pdfImage.ImageInfo.FixWidth = pdfGeneratorSection.PageInfo.PageWidth
            <span style="color:blue;">End</span> <span style="color:blue;">If</span>

            <span style="color:green;">'Add the image into paragraphs collection of the section</span>
            pdfGeneratorSection.Paragraphs.Add(pdfImage)
            pdfImage.ImageInfo.ImageFileType = Aspose.Pdf.Generator.<span style="color:#2b91af;">ImageFileType</span>.Tiff
            pdfImage.ImageInfo.ImageStream = m_Stream
            pdfImage.ImageInfo.IsBlackWhite = <span style="color:blue;">False</span>
        <span style="color:blue;">Next</span>

        <span style="color:green;">'Save the PDF Generator Object to the PDF Object</span>
        pdfDocumentGenerator.Save(modifiedFilePath)

    <span style="color:blue;">Catch</span> ex <span style="color:blue;">As</span> <span style="color:#2b91af;">Exception</span>
        <span style="color:blue;">Throw</span> ex
    <span style="color:blue;">Finally</span>
        m_Stream.Dispose()
    <span style="color:blue;">End</span> <span style="color:blue;">Try</span></pre></div>

Hi Dinesh,


Thanks for sharing the details.

I have tested the scenario using latest release of Aspose.Pdf for .NET 9.6.0 where I have used img20140822_18405542.tif image file shared in your first post and during my conversion using DOM approach, I am unable to notice high Memory/CPU utilization and also I am unable to notice image stretch issue.

Besides this, I have also tried using the code snippet which you shared in above post (based on Aspose.Pdf.Generator namespace) and I also unable to notice the issue when using this approach. Can you please share some further details which can help us in replicating/noting this issue at our end. We are sorry for this inconvenience.
Hi,

Thanks for testing out. By looking at your output PDF files, it seems like you have tried out with single page tiff only.

Let me explain in details as below.

As per my earlier post, issue occurs when multiple single page tiff Documents converted to a MultiPage PDF document. At that time, it will take large amount of memory.

Consider the below scenario:

1. Take the Single page uncompressed image which i have given you in my first post - "img20140822_18405542.tif",
2. Copy the Same document approx 55 times so that you have 55 single page tiff images in a folder.
3. Now, convert those 55 single page tiff documents to the PDF document so that it outputs single MultiPage PDF contains 55 pages.

Try out this scenario with suggested code approach as per your first reply and also try out with the code snippet which i have shared with you. (based on Aspose.Pdf.Generator namespace).

Let me know if you need more details.

Hi Dinesh,


Thanks for sharing the details.

I have tested the scenario using Aspose.Pdf for .NET 9.6.0 where I have created 55 copies of attached TIFF image and as per my observations, the memory consumption hikes by 700MB when performing conversion over Intel Core i5, 2.5 Ghz with 4GB of RAM. I have used Visual Studio 2010 application with target platform as .NET Framework 2.0. For the sake of correction, I have logged it as PDFNEWNET-37458 in our issue tracking system. We will further look into the details of this matter and will keep you posted on the status of correction.

For testing purposes, I have used Aspose.Pdf.Generator code to perform conversion. We are sorry for this inconvenience.

Hi Dinesh,


I have also observed that the images in PDF file generated with DOM approach are blurred as compared to output generated with Aspose.Pdf.Generator namespace. For the sake of correction, I have logged this issue as PDFNEWNET-37460.

Besides this, I have observed that when using DOM approach, the conversion of 55 TIFF images to PDF format takes around 11 minutes 38 seconds. I have separately logged this problem as PDFNEWNET-37459.

We will further look into the details of these issues and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

Any Updates ??


With Suggested Code approach (as per Initial reply - PDFDocument Object), it took maximum amount of memory. Below is the couple of machine & OS related details. Also, we have used Visual Studio 2012 with Target .Net Framework is 4.0

OS : Windows 7 Enterprise Service Pack 1
Processor : AMD Athlon™ II X2 250 Processor 3.0 GHz
RAM : 8 GB

Hi Dinesh,


When using DOM approach, the conversion process also takes too much time and system resources. The problems have been logged in our issue tracking system and development team is further looking into the details of these issue. As soon as we have some definite news regarding their resolution, we will update you within this forum thread.

Any updates regarding all issues logged as a part of this thread.

Hi Dinesh,


Thanks for your inquiry. I am afraid your above reported issues are still not resolved due to other issues under investigation/resolution. However we have requested our development team to complete the investigation the issues and share ETA/initial findings at their earliest. We will notify you as soon as we get a feedback.

Thanks for your patience and cooperation.

Best Regards,

Hi Dinesh,


We have further investigated the issue PDFNEWNET-37458 and in order to resolve this problem, please try using new Document Object Model of Aspose.Pdf namespace as it consumes memory less then 300 MB and executes 40 sec:

[C#]

string outFile = “37458.pdf”;<o:p></o:p>

string inFile = "Generator 37458.tif";

Document doc = new Document();

Page page = doc.Pages.Add();

MarginInfo margin = new MarginInfo();

margin.Top = 0;

margin.Bottom = 0;

margin.Left = 0;

margin.Right = 0;

page.PageInfo.Margin = margin;

//Create an image object in the section

Image image1 = new Image();

//Add image object into the Paragraphs collection of the section

for (int i = 0; i < 55; i++)

page.Paragraphs.Add(image1);

//Set the path of image file

image1.File = inFile;

doc.Save(outFile);

The issues you have found earlier (filed as PDFNEWNET-37458) have been fixed in Aspose.Pdf for .NET 10.0.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

Thanks for your support.


What about below problems logged as a part of this thread ? Are they resolved with 10.0 as well.
1. PDFNEWNET-37459
2. PDFNEWNET-37460


dineshpr@cybage.com:
Thanks for your support.

What about below problems logged as a part of this thread ? Are they resolved with 10.0 as well.
1. PDFNEWNET-37459
2. PDFNEWNET-37460
Hi Dinesh,

Thanks for your patience.

The above stated issues are under investigation but I am afraid they are not yet resolved. Nevertheless, I have requested the development team to share the possible ETA. As soon as we have some further updates regarding their resolution, we will let you know.
dineshpr@cybage.com:
Thanks for your support.

What about below problems logged as a part of this thread ? Are they resolved with 10.0 as well.
1. PDFNEWNET-37459
2. PDFNEWNET-37460
Hi Dinesh,

Thanks for your patience.

I have further discussed the status of above stated issues with development team and as per our estimates, we plan to get this problem resolved in April-2015 and the fix will become part of Aspose.Pdf for .NET 10.4.0, which is planned for May-2015. As soon as we have some further updates, we will let you know.

We have tried with suggested code approach (DOM approach) like mentioned above with the latest Aspose.PDF 10.3 but still i am facing Memory issue. memory usage increases over 1 GB. At one point of time, it throws an out of memory error for just 55 sample tiff image which i have just attached in my first post. Refer chain of threads for more detail.


Here is the Code snippet which i have tried.

For Each FilePath As String In FileNames
Dim pdfPage As Aspose.Pdf.Page = pdfDocument.Pages.Add()

'Set margins so image will fit, etc.
pdfPage.PageInfo.Margin.Top = 0
pdfPage.PageInfo.Margin.Bottom = 0
pdfPage.PageInfo.Margin.Left = 0
pdfPage.PageInfo.Margin.Right = 0

Dim pdfImage As New Aspose.Pdf.Image()
'Add the image into paragraphs collection of the section
pdfPage.Paragraphs.Add(pdfImage)
pdfImage.File = FilePath
Next
pdfDocument.Save(DestPath)

Kindly consider this as a CRITICAL issue.

Hi Dinesh,


Thanks for sharing the details.

I have tested the scenario using img20140822_18405542.tif image file shared in first post, where I have used following code snippet with Aspose.Pdf for .NET 10.3.0 and I am unable to notice any issue. As per my observations, the process is being completed in 1 minute 20 seconds.

[VB.NET]

Dim pdfDocument As Document = New Document()<o:p></o:p>

'r Each FilePath As String In FileNames

Dim pdfPage As Aspose.Pdf.Page = pdfDocument.Pages.Add()

'Set margins so image will fit, etc.

pdfPage.PageInfo.Margin.Top = 0

pdfPage.PageInfo.Margin.Bottom = 0

pdfPage.PageInfo.Margin.Left = 0

pdfPage.PageInfo.Margin.Right = 0

Dim pdfImage As New Aspose.Pdf.Image()

pdfImage.File = "c:/pdftest/img20140822_18405542.tif"

Dim counter As Integer

For counter = 1 To 55

'Add the image into paragraphs collection of the section

pdfPage.Paragraphs.Add(pdfImage)

Next

pdfDocument.Save(<span style=“font-size:9.5pt;
line-height:115%;font-family:Consolas;mso-fareast-font-family:“Malgun Gothic”;
mso-fareast-theme-font:minor-fareast;color:#A31515;background:white;mso-highlight:
white;mso-ansi-language:EN-US;mso-fareast-language:KO;mso-bidi-language:AR-SA”>“c:/pdftest/55_TIFF_Image_test.pdf”)