Doc to PDF conversion issue with Japanese character using C# with Ubuntu OS

Hi,

We’re using Aspose.Words. 21.7.0

We’re using a word document convertion to PDF and HTML.

If our data contains east Asian language (Japanese, Vietnamese), the east Asian characters in the PDF document are corrupted and appear as squares (using ASP NetCore 5 with Ubuntu OS).

Everything seems OK with the language on the machine. We’re using Font Arial, MS PGothic and it works good when saving a Word document.

I’ve attached a sample pdf with Japanese characters.

Please assist with the issue.

input: transHTML_test_KevinNguyen.docx (101.5 KB)

output: transHTML_test_KevinNguyen__1631071569100.pdf (47.9 KB)

Thanks
Kevin Nguyen

@nguyendinhchinh

Please try the latest version of Aspose.Words for .NET 21.9 and let us know how it goes on your side. Hope this helps you.

If you still face problem, please attach the PDF file generated by Aspose.Words for .NET 21.9 here for testing. We will then investigate the issue and provide you more information on it.

@tahir.manzoor

We tried with Aspose.Words for .NET 21.9
However, this error still occurs similar to other versions, I have attached the original .docx file and converted PDF file below:

Original file: transHTML_test_KevinNguyen.docx (101.5 KB)
Output: transHTML_test_KevinNguyen__1631154717506.pdf (48.2 KB)

Please tell me how to fix this error as soon as possible.

Thanks
Kevin Nguyen

@nguyendinhchinh

If you open the PDF in notepad and check at the end of file. There is extra space added at the end of document. We suggest you please call Response.end after calling Document.Save method.

If you still face problem, please create a simple web application (source code without compilation errors) that helps us to reproduce your problem on our end and attach it here for testing.

@tahir.manzoor
I have attached the simple web application (source code without compilation errors ), please run it on Ubuntu OS
AsposeWordsDemo.zip (940.0 KB)

@nguyendinhchinh

Please spare us some time for the investigation of this issue. We will get back to you soon. Thanks for your cooperation.

@tahir.manzoor
When converting words to HTML, I load a 41MB document using Aspose.Words for .NET 21.9 and it throw a OutOfMemory Error “The document appears to be corrupted and cannot be loaded.”.
Is there any property I should to set?

@nguyendinhchinh

You can use SaveOptions.MemoryOptimization property for memory optimization. Setting this option to true can significantly decrease memory consumption while saving large documents at the cost of slower saving time.

Please check the following code example. Hope this helps you.

    Document doc = new Document(MyDir + "SaveOptions.MemoryOptimization.doc");
    // When set to true it will improve document memory footprint but will add extra time to processing. 
    // This optimization is only applied during save operation.
    SaveOptions saveOptions = SaveOptions.CreateSaveOptions(SaveFormat.Pdf);
    saveOptions.MemoryOptimization = true;

    doc.Save(MyDir + "SaveOptions.MemoryOptimization.pdf", saveOptions);

@tahir.manzoor

Thanks for your reply,

//Our source as below.
using Stream newFile = new MemoryStream(fileBytes); // fileBytes is a byte array of the document file

//fileBytes is about 41MB large
Document doc = new Aspose.Words.Document(newFile); //Error occurred when initializing Document

@nguyendinhchinh

It is quite difficult to answer such questions because CPU performance and memory usage all depend on complexity and size of the documents you are loading/generating.

It hardly depends on local environment. It can be completely different for a server that generates thousands documents 24/7 or for a local PC that generate only the one document by demand.

We suggest you please increase the memory size to avoid this issue. If you still face problem, please share your document here for testing.

@nguyendinhchinh

We have setup the same environment and tested the scenario. We have not found the shared issue. The issue is more related to sending memory stream to disk instead of Aspose.Words. To check the valid PDF conversion, you can save the stream that is generated by Aspose.Words to disk.