Convert DOC with Images to RTF or PDF using C# Code of Aspose.Words | Avoid Extremely Large Size of RTF Files

Hi,

We are using aspose.words in our product to convert DOC to different formats like PDF or RTF.
The issue is that if the DOC file has images inside the document, the created RTF will be in 20-30 times bigger than the original document.

For example from doc file with size of 300Kb, there will be an RTF created with size 3-5MB.

This is affecting the performance of the system and also causing the DB to grow.

Please your assistance.

Thank you

Evgeny

@Egrechin,

Have you tried the latest (20.11) version of Aspose.Words for .NET on your end? In case the problem still remains, please ZIP and upload your input DOC Word document and Aspose.Words generated RTF file showing the undesired behavior here for testing. We will then investigate the issue on our end and provide you more information.

@awais.hafeez

Thank you for coming back to me.
Yes we have tried with the latest aspose and the result is the same . From 164KB doc file i get ~6MB RTF.
AsposeWithLogo.zip (480.9 KB)

Both files DOC and RTF are attached.

Thank you in advance for your assistance

Evgeny

@Egrechin,

In this case, Aspose.Words mimics the behavior of MS Word 2019. Try to open this “AsposeWithLogo.doc” with MS Word and use “Save As” command to convert this DOC to RTF format. On my end, MS Word 2019 produces a 6,826 KB RTF file.

Please let me know if I can be of any further assistance.

@awais.hafeez

What you are saying here is to open a ticket in Microsoft support?
I cannot open a ticket in Microsoft support because i don’t use their product to convert DOC to RTF.

I understand that Aspose is using the same technology as Microsoft Word does, however we convert using Aspose and not MS Word and we pay the license to Aspose and not to Microsoft.

This is why i am asking the explanation of this behavior from Aspose (even if possibly there is some sort of explanation from Microsoft side).

Thank you

Evgeny

@Egrechin
The issue is not about Aspose.Words or MS Word it is about RTF format. Generally, DOC format is more compact than RTF because it’s binary representation. RTF on the other hand is text format.
In your particular case, the document contains an image in WMF format, its size is 1665 Kb. When it is written to RTF each byte is written as 2 chars (hex string), i.e. 1byte=2bytes in the output document. So, the size of the image in the output RTF is 3330 KB. Also, this image is written twice in the output RTF file to be properly displayed in the old readers. And as a result, you get 6660 Kb representation of your logo image in the output document.
To minimize output file size, you can disable writing image for old readers, but still the representation of the image in the output RTF document will take 3330 KB.

RtfSaveOptions opt = new RtfSaveOptions();
opt.ExportCompactSize = true;
opt.ExportImagesForOldReaders = false;

doc.Save(@"C:\Temp\out.rtf", opt);

Also, you can try convert you image to something more compact, for example PNG, but this will reduce the image quality.

1 Like

@alexey.noskov

Thank you for the detailed explanation.

Can you please give some more details around ExportImagesForOldReaders? Which readers will not be able to view the image if this property is used?

Regards

Evgeny

@alexey.noskov

Thank you for the detailed explanation.

Can you please give some more details around ExportImagesForOldReaders? Which readers will not be able to view the image if this property is used?
Also the meanting of ExportCompactSize from the code perspective. We are trying to understand all the possible side effects of these two changes.

Regards

Evgeny

@Egrechin
When you set ExportCompactSize option to true, Aspose.Words skips writing RTL settings into the output RFT document. So it is safe to set this option if you are sure your documents do not contain right-to-left text in languages like Arabic or Hebrew.
ExportImagesForOldReaders - “Old readers” are pre-Microsoft Word 97 applications and also WordPad. If you set this option to false, then only images in WMF, EMF and BMP formats will be displayed in “old readers”.

@alexey.noskov

Can you please give some more explanation please?
The doc file size with image is ~165KB, but you say that the image inside the doc is 1665KB .

If in RTF 1byte=2bytes, i would expect the RTF size to grow in 4 times at maximum, i.e <1MB, but in our case the size is 3.3MB even after using ExportCompactSize and ExportImagesForOldReaders adjustments

Thank you

Evgeny

@Egrechin Again this is specific of DOC format. In DOC format EMF images are compressed and that is why DOC file size is much more compact than RTF where the image is not compressed.
You can observe the same if save your document as DOCX, the output file size will be 160Kb, but if you unzip it (DOCX file is ZIP archive) and locate the image you will see its original size without compression - 1665Kb.