Merge PDF files using Aspose.PDF for .NET - API is taking long time for processing

Hello,

We have Aspose Total License and we’re trying to merge 8 PDFs of sizes from 12 MB to 115 MBs. It is a console application. Here is the code.

Imports System.IO
Imports Aspose.Pdf
Imports Aspose.Pdf.Facades
Imports Aspose.Pdf.Operator
Imports Aspose.Pdf.Text

Module Module1
Public Const AsposeTotalLicense As String = “Aspose.Total.lic”
Sub Main()

Dim folderpath As String
Dim Path As String
Dim fileStreams As New List(Of IO.Stream)
Dim outStream As New IO.MemoryStream
Dim tempInStream As FileStream
Dim templateStream As New IO.MemoryStream
folderpath = System.IO.Path.GetFullPath("..\..\") & "Files\"
Dim _savePath As String = folderpath & "MergeResult_" + (DateTime.UtcNow).ToString("MMddyyhhmmss") + ".pdf"

SetAsposeLicenses(folderpath)

Dim filePaths As New List(Of String)

Path = folderpath + "Schematics Final Design Review Doc_rev A.pdf"

filePaths.Add(Path)

tempInStream = New IO.FileStream(Path, IO.FileMode.Open, IO.FileAccess.Read)
fileStreams.Add(tempInStream)

Path = folderpath + "S-SPO-0104_CDRL_19-15_Integrated_Schematic_Drawing_Package.pdf"
filePaths.Add(Path)
tempInStream = New IO.FileStream(Path, IO.FileMode.Open, IO.FileAccess.Read)
fileStreams.Add(tempInStream)

Path = folderpath + "S-SPO-0284_CDRL_19-15_Integrated_Schematic_Drawing_Package_Rev_A5.pdf"
filePaths.Add(Path)
tempInStream = New IO.FileStream(Path, IO.FileMode.Open, IO.FileAccess.Read)
fileStreams.Add(tempInStream)

Path = folderpath + "TC3 REV A3.pdf"
filePaths.Add(Path)
tempInStream = New IO.FileStream(Path, IO.FileMode.Open, IO.FileAccess.Read)
fileStreams.Add(tempInStream)

Path = folderpath + "TC3 Rev A4.pdf"
filePaths.Add(Path)
tempInStream = New IO.FileStream(Path, IO.FileMode.Open, IO.FileAccess.Read)
fileStreams.Add(tempInStream)

Path = folderpath + "TC3 Rev A5.pdf"
filePaths.Add(Path)
tempInStream = New IO.FileStream(Path, IO.FileMode.Open, IO.FileAccess.Read)
fileStreams.Add(tempInStream)

Path = folderpath + "TCT3_RevA_Draft_14.pdf"
filePaths.Add(Path)
tempInStream = New IO.FileStream(Path, IO.FileMode.Open, IO.FileAccess.Read)
fileStreams.Add(tempInStream)



Try
    PDFConcat(filePaths, _savePath)
Catch ex As Exception
End Try

'Save Physical File Path
System.IO.File.WriteAllBytes(_savePath, outStream.ToArray)

End Sub

 Public Sub PDFConcat(ByRef streams() As System.IO.Stream, ByRef outstream As 
   System.IO.Stream)
  Dim pdfEditor As New Aspose.Pdf.Facades.PdfFileEditor
   pdfEditor.AllowConcatenateExceptions = True
   pdfEditor.Concatenate(streams, outstream)

End Sub

 Public Sub PDFConcat(ByVal files As List(Of String), ByVal outputFile As String)
Dim pdfEditor As New Aspose.Pdf.Facades.PdfFileEditor
pdfEditor.Concatenate(files.ToArray(), outputFile)

End Sub

Public Sub SetAsposeLicenses(licensePath As String)
Try
    Dim totalFilePath As String = FindAsposeLicense(licensePath, AsposeTotalLicense)

    ' Set the pdf license
    Dim pdfLicense As Aspose.Pdf.License = New Aspose.Pdf.License()
    pdfLicense.SetLicense(totalFilePath)
Catch ex As Exception
    Throw ex
End Try

End Sub
Private Function FindAsposeLicense(licensePath As String, fileName As String)
Dim filePath As String = Path.Combine(licensePath, fileName)
If File.Exists(filePath) Then
Return filePath
End If

Throw New FileNotFoundException(String.Format("Aspose license file '{0}' was not found.", filePath))

End Function

End Module

The problem is it keeps running forever (i7 8th Gen + 8 gb + SSD and ran for some 21 mins). PDF that needed merging sized some 560MBs. Please let me know if there is some workaround for me.

Regards,

This topic is copied from free account so please use this link for previous comments.

@kkurra

We have noticed that an issue has already been logged for the thread which you have referred. However, if you are facing issue with set of different PDF documents, please share those files with us so that we can test the scenario in our environment and address it accordingly.

Please try to use DOM approach in order to concatenate PDF files with the latest version of the API before sharing the sample files.

Hi,

Here is the Google Drive Link for you to look into. This is a set of PDF that fails.

https://drive.google.com/file/d/1DHdtTW56Uw6f7zPdmfwqHtea0c3oAxFx/view?usp=sharing

Regards,

@kkurra

The files are private over the link which you have shared and need access permission. Would you please share the link from which files can be downloaded so that we can test the scenario in our environment and address it accordingly.

Hi,

You should be able to download it. Its a shared link with everyone as editor.

@kkurra

We were now able to download the Archive (“Merging-Issue-PDFFiles.rar”) however, it was corrupted and could not be opened.

CorruptedArchive.png (5.3 KB)

Would you please try to upload it again with .zip format and share the link with us.

Please find below shared link
https://drive.google.com/file/d/1TMBjHQn8Pf8dVAhzAlIP2lW8l22rWd4v/view?usp=sharing

thanks,

@kkurra

Thanks for sharing re-uploading the files.

We have tested the scenario in our environment using following recommended code with Aspose.PDF for .NET 20.6.

string[] files = Directory.GetFiles(dataDir + "File");
Document doc = new Document();
foreach(string file in files)
{
 Document mDoc = new Document(file);
 doc.Pages.Add(mDoc.Pages);
}
doc.Save(dataDir + "merged20.6.pdf");

We were able to notice that the program took long time and kept running for more than 20 minutes with maximum memory consumption. Therefore, we have logged an issue as PDFNET-48386 in our issue tracking system for the sake of correction. We will further investigate the issue and keep you posted with the status of its rectification. Please be patient and spare us some time.

We are sorry for the inconvenience.

Hi Aspose,

While you’re at it, can you also check https://drive.google.com/file/d/1CRfTb24SXNLr_ftwr2ExTj4fwSrdPZy6/view?usp=sharing. Its failing too. I want ETA for the same.

Regards,

@kkurra

We have tested with the file that you have shared and found that another ticket as PDFNET-48232 has already been logged in our issue tracking system with this file. We have updated the information under respective ticket about this single file issue. We are investigating the files logged under both tickets and will share updates in next week about their resolution.

@asad.ali

We are really interested in getting an ETA to get our tickets moved. Try to put specific time instead of month or week. So we know when to invest time in taking follow up.

@kkurra

We would like to share with you that earlier logged ticket(s) are under the phase of investigation. We anticipate to complete the investigation during this week and as soon as they are completed, we will be in position to share an ETA. We will inform you in this forum thread as soon as we have additional updates after investigation.

@kkurra

We would like to share with you that we have completed the investigation of both tickets. We will provide a fix against these issues in upcoming version of the API i.e. Aspose.PDF for .NET 20.7 which will be released in first week of July.

@asad.ali

Its been more than a month they’re in investigation. And it is very difficult to keep chasing for solution and answering our clients for the same. We bought paid support to expedite the process and we do not see that happening. Do you have any escalation authority we can reach to? Please advise.

@kkurra

We already have shared ETA with you which is Aspose.PDF for .NET 20.7. The investigation of the issue has been completed and we intend to provide its fix in upcoming release of the API.

Hi,

Do you have a list of tickets that is going to be released with 20.7? We have a bunch of tickets open.

Regards,

@kkurra

The release of the API has been scheduled for the end of this week and release notes will be published along with it too. As soon as we have the release notes ready, we will inform you.

Update? I think first week of July is done.

@kkurra

We would like to share with you that your both issues (PDFNET-48386, PDFNET-48232) have been resolved in Aspose.PDF for .NET 20.7. We have published the release notes in API documentation which you can visit at the given link. As soon as we publish the API for download, we will send you a notification in this very thread.