Aspose word to pdf conversion taking 100 times bigger size (Aspose.Word 23.2.0)

Product name & version

Aspose.Words(23.2.0) for .NET 8.0

Detailed description of the conversion issue :

When the system attempts to generate Word, it uses 251KB.
The same file we are converting into PDF, which has a size of 17.8MB.
Same 17.8MB PDF file we are converting into bytes for storage in our database.
Due to this, system dumping a large volume of data into our database.

Which is having a significant influence on our business. Users are having trouble in using our app.
Currently, we are clearing some historical data as a quick win.
Need immediate assistance to resolve this issue.

  • Code snippet

public async Task SaveInvoiceFileInEDMAsync(string invoiceCode, string templateCode = null)
{
    var invoiceCodes = new List<string>() { invoiceCode };
    var result = await GenerateInvoiceByPlugin(invoiceCodes, templateCode) ?? await _invoiceService.DownloadInvoiceFileAsync(new InvoiceTemplate() { InvoiceCodes = invoiceCodes });
    _ = result ?? throw new Exception($"Cannot generate invoice file for {invoiceCode}");

    result = _pdfConverter.ConvertToPDF(result);

    // Log pdf file size in KB after aspose conversion
    FileInfo fileInfo = new FileInfo(result.FilePath);
    _logger.LogInformation($"Invoice {invoiceCode} PDF size after Aspose conversion: {fileInfo.Length / 1024} KB.");

    _logger.LogInformation($"Invoice {invoiceCode} PDF file stream size after Aspose conversion: {result.FileStream.Length / 1024} KB.");

    // Convert Stream → byte[] reliably
    var bytes = await result.FileStream.ToByteArrayAsync();

    // log byte size in KB after stream to byte array conversion
    _logger.LogInformation($"Invoice {invoiceCode} PDF size after stream to byte array conversion: {bytes.Length / 1024} KB.");
    await _invoiceService.SaveInvoiceFileInEDMAsync(invoiceCode, result.FileName, bytes);
}

FileModel ConvertToPDF(FileModel fileModel)
{
    var pdffilePath = "";
    var filePath = fileModel.FilePath;
    bool isEncryption = false;

    string extension = Path.GetExtension(filePath);
    if (extension == ".docx")
    {
        pdffilePath = filePath.Replace(".docx", ".pdf");
        ConvertWordToPDF(filePath, pdffilePath, isEncryption);
        fileModel.FileName = fileModel.FileName.Replace(".docx", ".pdf");
    }

    using var fileStream = new FileStream(pdffilePath, FileMode.Open, FileAccess.Read);

    if (fileStream.CanSeek)
        fileStream.Position = 0;

    var memoryStream = new MemoryStream();
    fileStream.CopyTo(memoryStream);
    memoryStream.Position = 0;

    fileModel.FileStream = memoryStream; // assign clean stream
    fileModel.FilePath = pdffilePath;

    return fileModel;
}

  public string ConvertWordToPDF(string wordFileName, string pdfFileName = null, bool isEncryption = false)
  {
      if (string.IsNullOrEmpty(pdfFileName))
      {
          pdfFileName = wordFileName.Replace(".docx", ".pdf", StringComparison.OrdinalIgnoreCase);
      }
      // Set PDF save options
      PdfSaveOptions option = new PdfSaveOptions
      {
          EmbedFullFonts = false, // Do not embed full fonts
          ImageCompression = PdfImageCompression.Jpeg,
          JpegQuality = 70, // Adjust quality as needed (0-100)
          SaveFormat = SaveFormat.Pdf,
          OptimizeOutput = true, // Try to optimize output (if available)
      };            

      var doc = new Document(wordFileName);

      // to avoid pdf size issue commnted out HarfBuzz text shaper
      //doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
      doc.Save(pdfFileName, option);
      return pdfFileName;
  }
  • Exception/error message

  • Sample input/output files

No Error only size difference screen shot i have already attached

@MaheswariMurugan91 Could you please attach your DOCX document here for testing? We will check the issue and provide you more information.

Thank you so much for quick reply… Waiting for your response…
iub1h2mx.gsg.docx (251.9 KB)

@MaheswariMurugan91 Thank you for additional information. I cannot reproduce the problem on my side using the following simple code:

Document doc = new Document(@"C:\Temp\in.docx");
doc.Save(@"C:\Temp\out.pdf");

The output file size is about 800kb. Please see the attachment:
out.pdf (817.9 KB)

With PdfSaveOptions used in your code the output file size is even smaller - about 750kb.

Could you please attach your problematic output PDF?

Yes, It is random issue. This bigger size is not regular case… When we face this issue then we delete the file and then we are re-creating it again that time it is giving lower file size… Again started giving bigger size file…

Sorry I can’t attach more than 200 MB File.

@MaheswariMurugan91 Could you please try zipping the file or share it via google drive?

Kindly check the below link

Please hit on this i will give you approval

https://drive.google.com/file/d/10oXocHuIo29I90SAm_AbHDNrJI6yN7lI/view?usp=drive_link

Yes Done… Kindly check the below drive link

https://drive.google.com/file/d/10oXocHuIo29I90SAm_AbHDNrJI6yN7lI/view?usp=drive_link

@alexey.noskov Any Luck

@MaheswariMurugan91 I have requested access.

@MaheswariMurugan91 Thank you for additional information. It looks like whole fonts are embedded into your PDF documents instead of font subset. I can get similar output PDF size if save the document with the following code:

Document doc = new Document(@"C:\Temp\in.docx");
PdfSaveOptions opt = new PdfSaveOptions() { EmbedFullFonts = true };
doc.Save(@"C:\Temp\out.pdf", opt);

Could you please make sure PdfSaveOptions.EmbedFullFonts is not enabled in your code?

Yes I made EmbedFullFonts = false

@MaheswariMurugan91 false is default value of EmbedFullFonts and in this case Aspose.Words embeds font subset instead of full font.

Previously we had like this

var saveOptions = new PdfSaveOptions
{
    EmbedFullFonts = true
};

That time also we received higher size files… So we made it false one week before…

What should i do now for the fixes?

@MaheswariMurugan91 Unfortunately, I cannot reproduce the problem on my side. If possible, please create a simple console application that will allow us to reproduce it? We will investigate your code and provide you more information. Unfortunately, without ability to reproduce the problem we cannot tell what causes it on your side.

Sure Give me sometime i will create and Update you…

1 Like

Hi Alexey,

I have created a small console application… kindly check this below Drive link.

Kindly change the Word doc file path inside AsposeConvertorService.cs file

https://drive.google.com/file/d/1gn5cdfkSRp3eULrpKHp1hSSW9DpKxAI2/view?usp=drive_link

I have approved… Could you please check now

@MaheswariMurugan91 Thank you for additional information. The attached application generates 763Kb PDF output file. I tried run it the conversion in the loop with 100 iterations and all output PDF documents have the same size. So the problem is still not reproducible on my side.