PDF to PDF/A conversion is too slow

We have Aspose.Total for .NET Developer OEM license and having a file converter wrapper by using aspose. We have been using Aspose.Pdf V 16.12.0.0

While converting pdf to pdf 2A (generally pdf/a), each document needs about 5 seconds to be converted and we also using multi-threading but still total completion time is not acceptable by our customer.

We used pdftron for this job and it was very fast and our customer was very satisfied and they also expect the similar process speed.

Our customer made a test with 4100 PDF documents with our aspose product and pdftron.

PdfTron converted these documents in 3 minutues and Aspose converted the same documents in about 3 hours.

The pdfs are text documents that do not have images or graphs.

Here is my code

protected override void InitializeLicense()
{

lock (licenseLockObj)

{
if (AsposeLicenseStream == null) return;

      <span style="color:blue;">var</span> license = <span style="color:blue;">new</span> <span style="color:#2b91af;">License</span>();
      <span style="color:blue;">try</span>
      {

          AsposeLicenseStream.Seek(0, <span style="color:#2b91af;">SeekOrigin</span>.Begin);

          license.SetLicense(AsposeLicenseStream);
          AsposeLicenseStream.Seek(0, <span style="color:#2b91af;">SeekOrigin</span>.Begin);
          Logger.Info(<span style="color:#a31515;">"Aspose License was set for pdf"</span>); 

      }
      <span style="color:blue;">catch</span> (<span style="color:#2b91af;">Exception</span> e)
      {
          Logger.Error(<span style="color:#a31515;">"While aspose license for pdf is applied, an error occured: "</span> + e.ToString());
          <span style="color:blue;">throw</span>;
      }
  }

}



public ConvertResult Convert (byte[] originalFileContent, IPdfSpecific settings)
      {
  <span style="color:#2b91af;">MemoryStream</span> lastStream = <span style="color:blue;">new</span> <span style="color:#2b91af;">MemoryStream</span>(); <span style="color:green;">//it holds the extracted pages </span>
  <span style="color:blue;">var</span> result = <span style="color:blue;">new</span> <span style="color:#2b91af;">ConvertResult</span>();
  SetSettings(settings); <span style="color:green;">// settings is an interface object, this method converts Ipdfspecific object to AsposePdfSettings object</span>
  <span style="color:blue;">try</span>
  {
      <span style="color:blue;">var</span> documentStream = originalFileContent.ConvertToMemoryStream();
      _pdfDocument = <span style="color:blue;">new</span> <span style="color:#2b91af;">Document</span>(documentStream);
      
      <span style="color:gray;">#region</span> Page Count Settings

      <span style="color:blue;">if</span> (!settings.OutputPageCountSetting.IsAllPages)
      {
          <span style="color:blue;">int</span> startPage, endPage;

          <span style="color:blue;">if</span> (_pdfDocument.Pages.Count < settings.OutputPageCountSetting.StartPage)
          {
              Logger.Warn(
                  <span style="color:#a31515;">$</span><span style="color:#a31515;">"StartPage number in the Pdf Settings, is more than page count of the file"</span>);
              startPage = 1; <span style="color:green;">//set start from the first page</span>
          }
          <span style="color:blue;">else</span>
              startPage = settings.OutputPageCountSetting.StartPage;


          <span style="color:blue;">if</span> (_pdfDocument.Pages.Count < settings.OutputPageCountSetting.EndPage)
          {
              Logger.Warn(
                  <span style="color:#a31515;">$</span><span style="color:#a31515;">"EndPage number in the Pdf Settings, is more than page count of the file"</span>);
              endPage = _pdfDocument.Pages.Count;
          }
          <span style="color:blue;">else</span>
              endPage = settings.OutputPageCountSetting.EndPage;

          <span style="color:#2b91af;">PdfFileEditor</span> pdfEditor = <span style="color:blue;">new</span> <span style="color:#2b91af;">PdfFileEditor</span>();

          pdfEditor.Extract(documentStream, startPage, endPage, lastStream);
      }
      <span style="color:blue;">else</span>
          lastStream = documentStream;
      <span style="color:gray;">#endregion</span>
      
      
      _pdfDocument = <span style="color:blue;">new</span> <span style="color:#2b91af;">Document</span>(documentStream); <span style="color: green;">//load again the stream, if page is extracted then the original stream has been changed</span><br> 
      
      <span style="color:blue;">return</span> AddSpecificiationtoPdf();

  }
  <span style="color:blue;">catch</span> (<span style="color:#2b91af;">Exception</span> ex)
  {
      result.Result = <span style="color:#2b91af;">Result</span>.Error;
      result.Message = ex.ToString();

  }
  <span style="color:blue;">return</span> result;

}

private ConvertResult AddSpecificiationtoPdf()
{
var result = new ConvertResult();
  <span style="color:blue;">try</span>
  {
      
      <span style="color:blue;">if</span> (_settings.Watermark.IsActive)
          AddWatermark();
      <span style="color:blue;">if</span> (_settings.HeaderFooter.Header.IsActive)
          AddHeader();
      <span style="color:blue;">if</span> (_settings.HeaderFooter.Footer.IsActive)
          AddFooter();
      <span style="color:blue;">if</span> (_settings.MetaDataSettings.IsActive)
          AddMetadata();

    
      _pdfDocument.OptimizeResources(<span style="color:blue;">new</span> <span style="color:#2b91af;">Document</span>.<span style="color:#2b91af;">OptimizationOptions</span>()
      {
          RemoveUnusedObjects = <span style="color:blue;">true</span>,
          RemoveUnusedStreams = <span style="color:blue;">true</span>,
          AllowReusePageContent = <span style="color:blue;">true</span>,
          ImageQuality = _settings.PdfSaveSettings.JpegQuality,
          CompressImages = _settings.PdfSaveSettings.CompressImages,<span style="color:green;">//it helps to reduce size but it makes the quality a bit lower</span>
          MaxResoultion = _settings.PdfSaveSettings.MaxResolution,
          ResizeImages = _settings.PdfSaveSettings.ResizeImages,
          LinkDuplcateStreams = _settings.PdfSaveSettings.LinkDuplicateStreams,
          RemovePrivateInfo = _settings.PdfSaveSettings.RemovePrivateInfo,
          UnembedFonts = _settings.PdfSaveSettings.UnembedFonts
      });

      <span style="color:blue;">var</span> resultPdfStream = <span style="color:blue;">new</span> <span style="color:#2b91af;">MemoryStream</span>();
      <span style="color:blue;">string</span> filename = <span style="color:#2b91af;">Path</span>.Combine(<span style="color:#2b91af;">GlobalConstraints</span>.TempAsposePdfPath, <span style="color:#a31515;">"AsposePdfLog_"</span> + <span style="color:#2b91af;">DateTime</span>.UtcNow.Ticks + <span style="color:#a31515;">".log"</span>);
      
      <span style="color:blue;">try</span>
      {

          _pdfDocument.Convert(filename, _settings.PdfSaveSettings.PdfFormat, <span style="color:#2b91af;">ConvertErrorAction</span>.None);
      }
      <span style="color:blue;">catch</span> (<span style="color:#2b91af;">IOException</span> ioe)
      {
          <span style="color:green;">//ignore just log file not found, nothing to worry.</span>
          Logger.Warn(<span style="color:#a31515;">"Log file cannot be located however conversion is successfull."</span>);
      }
      <span style="color:blue;">catch</span> (<span style="color:#2b91af;">Exception</span> ex)
      {

          Logger.Error(<span style="color:#a31515;">"API cannot convert to PDF</span><span style="color:#ff007f;">\r</span><span style="color:#ff66b2;">\n</span><span style="color:#a31515;">"</span> + ex.ToString());
      }
      <span style="color:blue;">finally</span>
      {
          <span style="color:blue;">try</span>
          {
              <span style="color:#2b91af;">File</span>.Delete(filename);

          }
          <span style="color:blue;">catch</span> (<span style="color:#2b91af;">Exception</span> e)
          {
              Logger.Warn(<span style="color:#a31515;">"Temp aspose log cannot be deleted. Error: "</span> + e.Message);
          }

      }

      _pdfDocument.Save(resultPdfStream);

      result.ConvertedDocument = resultPdfStream.ToArray();
      result.Result = <span style="color:#2b91af;">Result</span>.Successful;
      result.Message = <span style="color:#a31515;">"Successful"</span>;



  }
  <span style="color:blue;">catch</span> (<span style="color:#2b91af;">Exception</span> ex)
  {
      Logger.Error(ex.ToString());
      result.Result = <span style="color:#2b91af;">Result</span>.Error;
      result.Message =<span style="color:#a31515;">"AsposePDF "</span> + ex.Message;
  }


  <span style="color:blue;">return</span> result; <span style="color:green;">//Convert(GlobalSaveFormat.Pdf);</span>

}

In the AddSpecificiationtoPdf; WaterMark,Foother&Header and MetadaSettings are deactive and IsAllPages is true so there is no seperating pages, its only converting pdf to pdf/a

here is my settings

100 PDF_A_2A false 6000 false false false false

While we are multi-threading, we had to use “lock” for license initializing , otherwise we have faced to problems.

We have to reduce the total process time as acceptable, otherwise our customer will want to continue using Pdftron.

I am looking for solution for that situation.

100 PDF_A_2A false 6000 false false false false

@hasanirmak,

Thanks for contacting support.

As per my understanding from above code snippet, you are also performing Optimization and text extraction operations apart from PDF/A conversion. So these operations also consume time. Nevertheless, before commenting further, we request you to please share the input PDF files and a sample project, so that we can test the scenario in our environment. We are sorry for this inconvenience.