Converting word doc to pdf (Multi language docs) Java

@VOCONSENT,

Please ZIP and attach the ‘Latha’ font file here for further testing. Please also provide a comparison screenshot highlighting the problematic areas in Aspose.Words generated PDF and attach it here for our reference. Please point out the exact problematic places for this issue. We will then investigate the issue on our end and provide you more information.

Hi Hafeez,
Thanks for your reply!

I am not using any font file in my code. My application supports nearly 21 Languages as follows [English,Arabic,Chinese,Czech,Dutch,Finnish,French,German,Hungarian,Indonesian,Italian,Japanese,Korean,Polish,Portuguese,Russian,Spanish,Swedish,Tamil,Thai,Turkish]

I am uploading word document for each language, And converting the word doc to html bytes to save in database ( I am not facing any issues with the generated html . works fine for all languages)
Whereas while converting the word doc to pdf bytes to save in database. ( i am facing issues in the generated pdf with lot of mistakes in font)

I will upload the sample code which am using to convert word to html and word to pdf along with sample uploaded word doc.

Note: Am using aspose.words (licensed)

com.aspose
aspose-words
19.5
jdk17


com.aspose
aspose-words
19.5
javadoc

PFA zip file attached

Sample word doc and code used.zip (67.6 KB)

Please do the needful.
Thanks.

@VOCONSENT,

The difference between HTML and PDF outputs may be because you are missing required fonts. Please note that Aspose.Words requires TrueType fonts when rendering documents to fixed-page formats such as PDF, XPS or Images. Please refer to the following articles for details:

How Aspose.Words Uses True Type Fonts
How to Receive Notification of Missing Fonts and Font Substitution during Rendering

Please try to install the missing font. If the problem still remains, please also provide the required Font file here for further testing.

Thanks Hafeez.

Do I need to do any change in logic even if i install the fonts in windows server ?
I checked those links and executed the same … results in showing missing respective fonts !
Do I need to run for all those 21 languages to get to know the missing fonts ? Is there any list already available with Aspose ?

And one more I have tamil fonts inside windows fonts folder in my machine. Please refer the screenshot in this link Adobe Community ( Tamil language) is the same issue am facing while converting word to pdf using aspose word java 19.5v .

Please let me know if there is any findings.

@VOCONSENT,

There should not be any change in your code/logic even after installing the required fonts. I am afraid, there is no list of Fonts that we can provide to you. You do not need to install all the Fonts. You only need to check which Fonts are used inside Word documents and then install them for correct rendering to PDF by Aspose.Words. In above case, the Latha font was used inside the Word document and when running the code from here, Aspose.Words warns about it. So, you should install this font for this document.

If the problem still remains, please also provide the required Font file (Latha in this case) here for further testing.

Hi Hafeez,

Thanks, I followed the link and tried using the Latha font but still in generated pdf supplement letters get swapped. I am attaching my standalone project source and diff image. Please find the same. Zip contains fonts , word doc and generated pdf respectively. Used 19.5 aspose word jarAsposeWordToPdf.zip (6.8 MB)

@VOCONSENT,

We have logged the issue with “Tamil-latha.docx” in our issue tracking system. The ID of this issue is WORDSNET-18621. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

Regarding “Tamil-nirmala.docx” and “Tamil-vijaya.docx”, please also provide comparison screenshots highlighting the problematic areas here for further testing.

Thanks Hafeez. Please find the attached doc contains the diff for all those 3 fonts. Also may i know is there any timeline for the above ticket WORDSNET-18621.
Diff word-pdf for tamil fonts.zip (110.2 KB)

@VOCONSENT,

We tested the scenarios and have managed to reproduce the same problems on our end. For the sake of corrections, we have logged the following problems in our issue tracking system.

WORDSNET-18631: related to Tamil-nirmala.docx
WORDSNET-18632: related to Tamil-vijaya.docx

We will further look into the details of these problems and will keep you updated on the statuses of these issues. We apologize for your inconvenience.

Secondly, the issue (WORDSNET-18621) is currently pending for analysis and is in the queue. There is no ETA available at the moment. We will inform you via this thread as soon this issue is resolved.

Hi Hafeez,

Any update on it please ?

Please let me know if i need to go with paid support.
It would be great if i get the solution asap.

Thanks.

@VOCONSENT,

Unfortunately, your issues are not resolved yet. We have completed the initial analysis of these issues but I am afraid, because of complexity, the implementations of all of these issue have been postponed till a later date. There are no estimates available at the moment. We will inform you via this thread as soon as these issues will be resolved in future. We apologize for your inconvenience.

@awais.hafeez,

It seems the tickets are postponed, could you please let me know if there is any alternative/workaround for the same.

@VOCONSENT,

Yes, these issues are postponed. We will inform you of any available updates/workarounds for these issues. We apologize for your inconvenience.

@VOCONSENT,

Regarding WORDSNET-18631, it is to update you that we have completed the analysis of this issue but I am afraid, because of complexity, the implementation of this issue has been postponed till a later date. There are no estimates available at the moment. However, we have recently implemented a new functionality (Advanced Typography based on HarfBuzz Shaper Supported) in Aspose.Words for Java API. So as a workaround, please use the OpenType Features for proper rendering of this document to PDF. Hope, this helps.

A post was split to a new topic: com.aspose.words.shaping.harfbuzz.HarfBuzzTextShaperFactory not found

Hi Hafeez thanks for the reply. I updated the dependency. i have small doubt here

ByteArrayOutputStream pdfStream = new ByteArrayOutputStream();
Document wdToPdf = new Document(new ByteArrayInputStream(docxBytes));
//below one line i added… remaining line are my old code.
wdToPdf.getLayoutOptions().setTextShaperFactory(com.aspose.words.shaping.harfbuzz.HarfBuzzTextShaperFactory.getInstance());
PdfSaveOptions pdfSaveOptions = new PdfSaveOptions();
pdfSaveOptions.setSaveFormat(com.aspose.words.SaveFormat.PDF);
pdfSaveOptions.setFontEmbeddingMode(PdfFontEmbeddingMode.EMBED_NONE);
wdToPdf.save(pdfStream, pdfSaveOptions);
byte[] pdfBytes = pdfStream.toByteArray();

My doubt is still i am have the old code pdfSaveOptions which am passing in save method. will it cause any effects ? or is there any example code as passing stream and pdfsaveoptions in save method.

A post was split to a new topic: Converting Document to HTML stream sometimes throws EXception

The issues you have found earlier (filed as WORDSNET-18621) have been fixed in this Aspose.Words for .NET 19.12 update and this Aspose.Words for Java 19.12 update.

The issues you have found earlier (filed as WORDSNET-18632) have been fixed in this Aspose.Words for .NET 19.12 update and this Aspose.Words for Java 19.12 update.

The issues you have found earlier (filed as WORDSNET-18631) have been fixed in this Aspose.Words for .NET 20.4 update and this Aspose.Words for Java 20.4 update.