Performance and File Size issue with Aspose PDF 6.2

Hi,

Current i am testing my code in Aspose.PDF 6.2 version which we recently bought. I am having few issue on the performance and file size, i didn't see this issue when working with Aspose.PDF.toolkit (not sure which version as it was trial and i have already removed it)

  1. Its extremely slow. I am concatenating around 1000 pdf each on avg 6 pages to one huge pdf
  2. The output file size are huge. Not sure why as it wasn't like this when tested with Aspose.PDF.Toolkit

Please let me know, if the new build will address this issues.

Hi Ujjwal,

Thanks for using our products.

We are about to release a new version of Aspose.Pdf for .NET 6.3.0 with some improvements/enhancements. Please try using the new version and in case the problem still persists, please feel free to contact. We are really sorry for this inconvenience.

Hi Ujjwal,

Thanks for your patience. Please visit the following link to download the latest release version of Aspose.Pdf for .NET 6.3.0 and in case you still encounter any issue or you have any further query, please feel free to contact.

Hi Nayeer

Aspose.PDF 6.3 concatenate has differently increased the performance by 100fold. Thank you for such a great turnaround

But i think there is still a question about the file size not sure if this issue is still pending. but here is what i am looking at.

I am using Aspose Word 10.1.1 build and Aspose.pdf 6.3. I use Aspose Word module to save them as pdf and pdf module to save each pdf as one large pdf.

Aspose 10.1.1 and Aspose 6.3. compliled in .net 3.5
Machine: OS Windows XP, Sp3. 3.0 GHz, 2.5G Ram

Current file size mutiplies so if i have

35kb word document, then aspose word will save that to say 80 kb pdf file then merge each of this pdf 500 times you will have one pdf file size of around 38mb. If i have 2000 files that i need to merge to one pdf then that file would be huge. Is this something you are aware of and is something that would be fixed in new/later build.

Let me know if there are any questions.

Any updates on this ?

Hi Ujjwal,

Thanks for sharing the details.

You may consider using a workaround where you are first merge all the source Word files into a single document and then convert it into PDF format. I am not entirely certain about the resultant, but may be it can help you to produce a resultant file with smaller size. For more information, please visit
Joining and Appending Documents

In case you have any further query, please feel free to contact. We apologize for your inconvenience.

Hi Nayyer,

I have already tried that process, infact that is the process currently used in production to merge the large amount of word documents to one pdf, But it is very unstable and time consuming.

Current run time in Prod is around 15 min just to merge around 6000 pages. 6000 pages is nothing compared to what we want to achieve, we want to be able to merge 20000 pages and more.

And there are always random errors when only using Aspose.Word

  1. when there are too many formats used in word document, it crashes.
  2. letter count gets bigger than 8000 page (sometimes 6000) it uses too much memory so it ends up crashing.

This was the primary reason why we purchased the pdf product. So that we can process this faster and efficiently without having to worry about when the program will crash if the file size grows. and I'm very disappointed with the Aspose.PDF as we tested with aspose.pdf.kit module which ended up being discontinued so we went with aspose.pdf module thinking it would be the same, that was our mistake, we should have tested it before we made this purchase. What do i say to customers or management ? This is very frustrating.

Does Aspose team understand the criticality of this issue and how dissapointing this is? Can i use Aspose.pdf.kit instead of aspose.pdf, i dont know how long do i have to wait for this issue to be fixed.

Thanks.

Hi Ujjwal,

Thanks for sharing the details. I have observed that the performance while generating the resultant file is decreased and the size of resultant file is increased. For the sake of correction, I have logged this problem as PDFNEWNET-31533 in our issue tracking system. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for this inconvenience.

Can i get an update on this issue ?

Thanks

Hi Ujjwal,


Thanks for your patience. Our development team is working hard to get this issue resolved but I am afraid its not yet completely fixed. Nevertheless, I have requested the development team to share the ETA regarding its resolution. Please be patient and spare us little time. Soon you will be updated with the status of correction. We apologize for this inconvenience.

Hi

Can you please share an ETA for this issue ?

Thanks

Hi Ujjwal,

We have contacted our team for the latest status and the ETA. We’ll get back to you shortly.

Regards,

Hi Ujjwal,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Our development team has analyzed your issue and performance enhancement has been scheduled in the coming month development plan (January 2012). Hopefully, this enhancement will be included in our monthly release of February 2012. We will update you once the enhancement is implemented and released for download.

Thank You & Best Regards,

Thank you Nausherwan for the status update. Appreciate it.

Thanks

The issues you have found earlier (filed as PDFNEWNET-31533) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.

Hi

i just tested out the latest Aspose.PDF 6.7 and the file concat performance has increased but the file size hasn't changed much. the 6000 page pdf is around 70mb.

the issue is still not fixed.

Thanks

Hello Ujjwal,

Thanks for your acknowledgment and sorry for the delay in response.

Please note that in recent release version, the concatenation process is improved. (PageTreeNode was reworked to avoid recreating pages array). In my earlier attempt, I used the following code snippet to reproduce the issue where source Sample+Document1.pdf and Sample+Document1 - Copy.pdf file size is 80KB. After the execution of code, the resultant Merged-output.pdf of size 161MB was generated.

[C#]

//open first document
Aspose.Pdf.Document pdfDocument1 = new Aspose.Pdf.Document("d:/pdftest/Sample+Document1.pdf");
for (int i = 0; i <= 2000; i++)
{
//open second document
Aspose.Pdf.Document pdfDocument2 = new Aspose.Pdf.Document("d:/pdftest/Sample+Document1 - Copy.pdf");
//add pages of second document to the first
pdfDocument1.Pages.Add(pdfDocument2.Pages);
}
//save concatenated output file
pdfDocument1.Save("d:/pdftest/Merged-output.pdf");

However in order to get better performance and minimized file size, we have made some changes to the code and after execution, the resultant PDF of size 8MB is generated. The size of the document is remarkably reduced from 161MB to 8MB.

[C#]

Aspose.Pdf.Document pdfDocument1 = new Aspose.Pdf.Document("d:/pdftest/Sample+Document1.pdf");
//open added document here
Aspose.Pdf.Document pdfDocument2 = new Aspose.Pdf.Document("d:/pdftest/Sample+Document1 - Copy.pdf");
for (int i = 0; i <= 2000; i++)
{
//add pages of second document to the first
pdfDocument1.Pages.Add(pdfDocument2.Pages);
}
//save concatenated output file
pdfDocument1.Save("d:/pdftest/Merged-output.pdf");

Nevertheless, if you are still getting the large resultant PDF file, please share the source files and code snippet that you are using so that we can test the scenario at our end. We are really sorry for this inconvenience.

Hi Nayyer

This is what i tried and the file size is still big. I have 1000 pdf files in the Temp Folder each around 80kb. you should be able to recreate this issue. I am using aspose word 10.7 and Aspose pdf 6.7.

For i = 0 To 1000 Step 1

If Not New FileInfo(Environment.CurrentDirectory & "\Temp\" & i & ".pdf").Exists Then

If New DirectoryInfo(Environment.CurrentDirectory).GetFiles("*.docx").Length > 0 Then

For Each FileInfo As FileInfo In New DirectoryInfo(Environment.CurrentDirectory).GetFiles("*.docx")

Dim doc As New Aspose.Words.Document(FileInfo.FullName)

If Not New DirectoryInfo(Environment.CurrentDirectory & "\Temp").Exists Then

Directory.CreateDirectory(Environment.CurrentDirectory & "\Temp")

End If

doc.Save(Environment.CurrentDirectory & "\Temp\" & i & ".pdf")

Next

End If

End If

Next

'#New

i = 0

Dim FirstDocument As Aspose.Pdf.Document = Nothing

Dim NextDocument As Aspose.Pdf.Document = Nothing

For Each File As FileInfo In New DirectoryInfo(Environment.CurrentDirectory & "\Temp").GetFiles

If FirstDocument Is Nothing And i = 0 Then

FirstDocument = New Aspose.Pdf.Document(File.FullName)

ElseIf i > 0 Then

NextDocument = New Aspose.Pdf.Document(File.FullName)

FirstDocument.Pages.Add(NextDocument.Pages)

NextDocument = Nothing

GC.Collect()

End If

i += 1

Next

FirstDocument.Save(Environment.CurrentDirectory & "\Output.pdf")

you can find the sample word document attached to this forum.

Hello Ujjwal,


Thanks for sharing the code snippet.

I have tested the scenario where I have first converted the .docx file into PDF format using Aspose.Words for .NET 10.8.0 and resultant PDF of size 80KB is generated. Then I have create a copy of resultant PDF file and have used the same code snippet that I have shared in my earlier post. The loop iterated for 1000 times and resultant PDF of size 8MB is created in 36 seconds. I have tested the scenario over Intel Dual core 1.8GHz with 2.5GB of RAM using Windows XP 32bit.

Besides this, I have also tried using the code snippet that you have shared but I am afraid I am getting InvalidPdfFileFormatException ‘Startxref not found’ when pages of NextDocument are being added to FirstDocument. However my understanding is that the reason for increase in size of resultant PDF is that you are creating a new instance of NextDocument for each document present in Temp folder. I think every time you need to create a new instance of NextDocument because the PDF documents in Temp folder might be different rather than merging copies of single PDF file. We will further look into the details of this problem and will keep you posted with the status of correction. We are really sorry for this inconvenience.

PS, In my attempt, I created the copy of single file and have loaded it once before entering the loop and due to this reason, the size of resultant file is less.