Corrupt PDF perhaps? Sometimes OK, Sometimes Not

I have been working on creating PDFs to upload to Concur Invoices. SOMETIMES the PDF will not upload because of some “unknown” error. Not helpful I know. If we open Acrobat and resave the file then it will upload fine using the Concur UI anyway. Here is the code I am using:

    private async Task<ResultsReturn> PrepAttachment(string concurRequestId, string attachmentUri, string masterFileName, int attachmentNbr)
    {
        Uri uri = new Uri(attachmentUri);
        PdfHelper.DownloadAndConvertToPdfAsync(uri, masterFileName);
        Console.WriteLine("Saved as: " + masterFileName);

        ResultsReturn results = new ResultsReturn();
        results.Success = true;
        
        return results;
    }

   public static class PdfHelper
   {
       public static async Task<(MemoryStream Stream, string ContentType)> DownloadToStreamAsync(Uri fileUri)
       {
           if (fileUri == null)
               throw new ArgumentNullException(nameof(fileUri));

           using (var httpClient = new HttpClient())
           {
               var response = await httpClient.GetAsync(fileUri);
               response.EnsureSuccessStatusCode();

               var contentType = response.Content.Headers.ContentType?.MediaType ?? "application/octet-stream";

               var memoryStream = new MemoryStream();
               await response.Content.CopyToAsync(memoryStream);
               memoryStream.Position = 0;
               return (memoryStream, contentType);
           }
       }

       public static void ConvertStreamToPdf(Stream inputStream, string contentType, string outputPdfPath)
       {
           if (inputStream == null)
               throw new ArgumentNullException(nameof(inputStream));
           if (string.IsNullOrWhiteSpace(outputPdfPath))
               throw new ArgumentNullException(nameof(outputPdfPath));

           contentType = contentType.ToLowerInvariant();

           if (contentType.Contains("pdf"))
           {
               // Already a PDF, just save as-is
               using (var fileStream = File.Create(outputPdfPath))
               {
                   inputStream.CopyTo(fileStream);
               }
           }
           else if (contentType.Contains("html"))
           {
               using (var doc = new Document(inputStream, new HtmlLoadOptions()))
               {
                   doc.Save(outputPdfPath, SaveFormat.Pdf);
               }
           }
           else if (contentType.Contains("text") || contentType.Contains("plain"))
           {
               using (var doc = new Document())
               {
                   var page = doc.Pages.Add();
                   using (var reader = new StreamReader(inputStream))
                   {
                       var text = reader.ReadToEnd();
                       var textFragment = new Aspose.Pdf.Text.TextFragment(text);
                       page.Paragraphs.Add(textFragment);
                   }
                   doc.Save(outputPdfPath);
               }
           }
           else if (contentType.Contains("image"))
           {
               var imageDoc = new Document();
               var page = imageDoc.Pages.Add();

               var imageStream = new MemoryStream();
               inputStream.CopyTo(imageStream);
               imageStream.Position = 0;

               var image = new Aspose.Pdf.Image { ImageStream = imageStream };
               page.Paragraphs.Add(image);
               imageDoc.Save(outputPdfPath);
           }
           else
           {
               throw new NotSupportedException($"Unsupported content type: {contentType}");
           }
       }

       public static async Task DownloadAndConvertToPdfAsync(Uri fileUri, string outputPdfPath)
       {
           var (stream, contentType) = DownloadToStreamAsync(fileUri).GetAwaiter().GetResult();
           ConvertStreamToPdf(stream, contentType, outputPdfPath);
       }
   }

@ctenney

It sounds like you’re experiencing intermittent issues with PDF uploads to Concur Invoices, which can be frustrating. Since resaving the PDF in Acrobat resolves the issue, it may indicate that the original PDF files have inconsistencies or metadata issues that Concur’s system cannot handle.

Here are a few suggestions to help ensure the PDFs you generate are more consistently accepted:

  1. Ensure PDF Compliance: Make sure that the PDFs you create are compliant with PDF/A standards, which are designed for long-term archiving and can help avoid compatibility issues.

  2. Use Aspose.PDF for Optimization: You can optimize the PDF before uploading it. This can include flattening the document, removing unnecessary metadata, or ensuring that all fonts are embedded. Here’s a simple way to optimize your PDF using Aspose.PDF:

    using Aspose.Pdf;
    
    public static void OptimizePdf(string inputPdfPath, string outputPdfPath)
    {
        Document pdfDocument = new Document(inputPdfPath);
        pdfDocument.Optimize(); // Optimize the PDF
        pdfDocument.Save(outputPdfPath);
    }
    
  3. Check for Corruptions: If the PDF is being generated from other formats (like HTML or images), ensure that the source files are not corrupt. You can also validate the PDF after creation to check for any issues.

  4. Error Handling: Implement error handling in your upload process to capture any specific error messages returned by Concur. This can provide more insight into what might be going wrong.

  5. Use OCR for Scanned PDFs: If your PDFs are scanned documents, consider using OCR (Optical Character Recognition) to convert them into searchable and editable PDFs. This can improve compatibility with various systems.

If you continue to face issues, consider reaching out to Aspose support or checking their forums for similar cases.

Let me know if you need further assistance!