Convert Word (and other file types) to PDF and Merge to PDF/A

I want to convert multiple different Documents (DOCX,HTML and RTF) to PDF in memory (with streams) and merge all those converted PDFs (and some existing PDFs) to a single PDF/A compliant PDF-File.

I still have the trial version so I couldn’t evaluate all possibilities.

But is this the correct approach?

  • use Aspose.Words and Aspose.PDF
  • convert Docx with
    Aspose.Words.Document loadedFromBytes = new Document(new MemoryStream(byteArrayDocx));
    loadedFromBytes.Save(new MemoryStream(),Aspose.Words.SaveFormat.Pdf);
  • convert HTML with
    Aspose.Words.Document loadedFromBytes = new Document(new MemoryStream(byteArrayHtml), new HtmlLoadOptions());
    loadedFromBytes.Save(new MemoryStream(),Aspose.Words.SaveFormat.Pdf);
  • convert RTF with
    Aspose.Words.Document loadedFromBytes = new Document(new MemoryStream(byteArrayRtf), new RtfLoadOptions());
    loadedFromBytes.Save(new MemoryStream(),Aspose.Words.SaveFormat.Pdf);
  • load existing PDF with
    new MemoryStream(byteArrayPdf)
    => then add all those MemoryStreams to Array of MemoryStreams
    => create new MemoryStream for the merged PDF
    => Concatenate all single MemoryStreams with PdfFileEditor
    => and finally convert the Stream with the merged File to PDF/A like
    new Aspose.Pdf.Document(mergedFileStream).Convert(new MemoryStream(), PdfFormat.PDF_A_1A, ConvertErrorAction.Delete);

Is this the way to go?

@Kernberger,

You can use Aspose.Words for .NET to convert Word documents such as DOCX/DOC/RTF to PDF.

You can use Aspose.HTML for .NET to convert HTML files to PDF.

Once you have all the formats converted to separate PDF files by using different Aspose APIs, you can use Aspose.PDF for .NET API to concatenate multiple PDF files into one big PDF. Please refer to the following article:
Concatenate multiple PDF Files into a Single PDF

You can also Convert this final PDF file to other Formats such as PDF/A by using Aspose.PDF for .NET.

Ok Thanks,

  1. I used Aspose.Words for DOCX to PDF.

  2. But for HTML conversion i did the following, which seems to work also with Aspose.Words (no Aspose.HTML necessary)

    using (var ms = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(htmlString)))
    {
    // Load Word Document from this byte array
    Aspose.Words.Document loadedFromBytes = new Aspose.Words.Document(ms,
    (LoadOptions)new HtmlLoadOptions() { Encoding = Encoding.UTF8 });
    // Save to PDF byte array
    MemoryStream pdfStream = new MemoryStream();
    loadedFromBytes.Save(pdfStream, Aspose.Words.SaveFormat.Pdf);
    memoryStream = pdfStream;
    }

  3. For Concatenating I used PdfEditor => Concatenate(pdfStreamList,resultStream), which seems to work.

Or are there any known downsides with these approaches?

@Kernberger,

When you convert HTML file to PDF with Aspose.Words, it tries to mimic the way the Microsoft Word’s page layout engine works. To you, this means that if you convert a HTML document into PDF, XPS or print it using Aspose.Words, the output will appear almost exactly as if it was done by Microsoft Word. But of course Aspose.Words does not utilize Microsoft Word. To learn about what features are supported when you load an HTML file into Aspose.Words before saving to PDF, please refer to the following section of documentation.

Load in the HTML (.HTML, .XHTML, .MHTML) Format