Extreme bad performance converting PDF to PDF/A


#1

We evaluate Aspose.Total for converting files to PDF/A.

When testing converting from PDF to PDF/A the performance is really disappointing.

A Word document exported to PDF with MS Word needs 50 - 100 seconds for conversion.
The same PDF document converted with PDFTron PDFA Manager needs under 1 second.

The resulting PDF/A file is 3,79 MB with Aspose and 1.31 MB with PDFTron.
Both files are valid PDF/A for Aspose.pdf.

I attached sample files (docx, pdf created with export from Word, Rendering class used),

If we can’t speed up this function, Aspose will be unusable for our project.
Any ideas how to accelarate this?

sample.zip (1.4 MB)


#2

@BOO_Technologies

We have tested the scenario using Aspose.PDF for Java 19.5 and observed that API took 59 seconds for conversion in our environment. We have logged an issue as PDFJAVA-38628 in our issue tracking system for the sake of investigation. We will further look into details of the issue and keep you posted with the status of its correction. Please be patient and spare us little time.

We are sorry for the inconvenience.


#3

@asad.ali
Thanks for your reply.
Is there any chance, that there will be a solution in the next weeks?
Otherwise we just have to go on without using Aspose.


#4

@BOO_Technologies

Please note that PDFTron converts transparent images to index 1 bit per pixel images but, Aspose.PDF is converting similar images into 8 bits per pixel rgb images with 300 dpi to provide better conversion quality.

Furthermore, we have added handle to control this DPI in upcoming release of API i.e. Aspose.PDF for Java 19.6. If you specify the quality to 100 dpi, API will convert the document 250-270% faster. In our tests, conversion time decreased from 25 seconds to 12 seconds with DPI value as 150 and 9 seconds with 100.

As soon as new version of the API is available, we will surely notify you. It is expected to be released at the end of this month. Please spare us little time.

Also, please consider following code snippet with upcoming version of the API, in order to convert PDF into PDF/A format:

Document doc = new Document(dataDir + "test3.pdf");
long totalStart = System.currentTimeMillis();
// PdfConverter
int format = PdfFormat.PDF_A_1B;
PdfFormatConversionOptions opts = new PdfFormatConversionOptions(dataDir + "log.xml",
format, ConvertErrorAction.Delete);
opts.setTransparencyResolution(150);
doc.convert(opts);
long totalEnd = System.currentTimeMillis();
System.out.println("Total time taken was "+ (totalEnd-totalStart)/1000 + " seconds for conversion");
doc.save(dataDir + "Converted"+version+"_dpi150.pdf");

#5

Hello,
I have the same problem in c # .net. This is my code:

        public static bool ConvertPDFA(SPWeb web, string ListId, int ItemId)
    {
        SPList list = web.Lists[new Guid(ListId)];
        SPListItem listItem = list.GetItemById(ItemId);
        bool result = false;
        bool oldAllowUnsafeUpdates = web.AllowUnsafeUpdates;try
        {
            if (listItem != null)
            {
                Stream fs = null;
                SPSecurity.RunWithElevatedPrivileges(delegate()
                {
                    SPFolder spFolder = web.Site.WebApplication.Sites.FirstOrDefault().OpenWeb().GetFolder(TLSPropertyBag.GetValueByPropertyKey(web.Site.WebApplication, TLSConstants.WebAppKeys.TLS_Aspose_LicensePath).ToString());
                    SPFile spfile = spFolder.Files.OfType<SPFile>().SingleOrDefault(x => x.Name.Equals(TLSPropertyBag.GetValueByPropertyKey(web.Site.WebApplication, TLSConstants.WebAppKeys.TLS_Aspose_LicenseFileName).ToString()));
                    if (spfile != null)
                        fs = spfile.OpenBinaryStream(SPOpenBinaryOptions.None);                                             
                });
                if (fs != null)
                {
                    Aspose.Pdf.License lic = new Aspose.Pdf.License();
                    lic.SetLicense(fs);
                }
                Stream streamFile = listItem.File.OpenBinaryStream();
                Aspose.Pdf.Document doc = new Aspose.Pdf.Document(streamFile);

                Stream outLog = new System.IO.MemoryStream();

                string format = TLSPropertyBag.GetValueByPropertyKey(web.Site.Url, TLSConstants.WebAppKeys.PDFAFormat);
                Aspose.Pdf.PdfFormat pdfAFormat = Aspose.Pdf.PdfFormat.PDF_A_1B;

                try
                {
                    pdfAFormat = (Aspose.Pdf.PdfFormat)Enum.Parse(typeof(Aspose.Pdf.PdfFormat), format.ToUpper());
                }
                catch (Exception ex)
                {
                    pdfAFormat = Aspose.Pdf.PdfFormat.PDF_A_1B;
                    string methodLog = "Errore durante la conversione della property bag [" + TLSConstants.WebAppKeys.PDFAFormat + "] con questo valore [" + format + "] impostato il defaul [PDF_A_1B] ";
                    SPDiagnosticsService.Local.WriteTrace(0, new SPDiagnosticsCategory(TLSConstants.Log.Category, TraceSeverity.Unexpected, EventSeverity.Error), TraceSeverity.Unexpected, ex.Message + methodLog);
                    SPDiagnosticsService.Local.WriteTrace(0, new SPDiagnosticsCategory(TLSConstants.Log.Category, TraceSeverity.Unexpected, EventSeverity.Error), TraceSeverity.Unexpected, ex.StackTrace + methodLog);
                }


                bool isPDFA = doc.IsPdfaCompliant;
                if (!isPDFA)
                {
                    bool conversionResult = doc.Convert(outLog, pdfAFormat, Aspose.Pdf.ConvertErrorAction.Delete);
                    if (conversionResult)
                    {
                        Stream converted = new System.IO.MemoryStream();
                        doc.Save(converted);
                        web.AllowUnsafeUpdates = true;
                        listItem.File.SaveBinary(converted);
                        web.AllowUnsafeUpdates = oldAllowUnsafeUpdates;
                        result = true;
                    }
                }
                else
                {
                    result = true;
                }
            }
        }
        catch (Exception ex)
        {
            result = false;
            web.AllowUnsafeUpdates = oldAllowUnsafeUpdates;
            string fileUrl = listItem != null ? listItem.Web.Url + "/" + listItem.Url : "document undefined";
            string methodLog = string.Format("ConvertPDFA [{0}]", fileUrl);
            SPDiagnosticsService.Local.WriteTrace(0, new SPDiagnosticsCategory(TLSConstants.Log.Category, TraceSeverity.Unexpected, EventSeverity.Error), TraceSeverity.Unexpected, ex.Message + methodLog);
            SPDiagnosticsService.Local.WriteTrace(0, new SPDiagnosticsCategory(TLSConstants.Log.Category, TraceSeverity.Unexpected, EventSeverity.Error), TraceSeverity.Unexpected, ex.StackTrace + methodLog);
        }


        return result;
    }

in this line ** bool conversionResult = doc.Convert(outLog, pdfAFormat, Aspose.Pdf.ConvertErrorAction.Delete);** the conversion take a long time to convert a 1 mb pdf file.

what can I do to speed up the conversion process?


#6

@SAPwCitaly

Can you please share your sample PDF document with us. We will test the scenario in our environment and address it accordingly.


#7

ok, I have attached the pdf filemanualeutente.pdf (1.1 MB)


#8

@SAPwCitaly

We tested the scenario in our environment and observed that conversion took more than one minute with Aspose.PDF for .NET 19.7. Would you please share the time of conversion that you have noticed. Also, please share complete environment details e.g. OS Name and Version, Installed RAM. We will further proceed to assist you accordingly.