Free Support Forum - aspose.com

Converting word doc to pdf (Multi language docs) Java


#1

Hi,

I am trying to convert the word document to pdf using aspose.words Java (licence). but the converted document have contents with lot of spelling mistakes.

pfb code used to convert doc to pdf ( tested for Tamil language)

ByteArrayOutputStream baos = new ByteArrayOutputStream();
Document wdForPdf = new Document(uploadfile.getInputStream());
PdfSaveOptions saveOptions = new PdfSaveOptions();
saveOptions.setSaveFormat(com.aspose.words.SaveFormat.PDF);
saveOptions.setExportDocumentStructure(true);
saveOptions.setEmbedFullFonts(true);
saveOptions.setPrettyFormat(true); saveOptions.setFontEmbeddingMode(PdfFontEmbeddingMode.EMBED_NONE);
wdForPdf.save(baos, saveOptions);
byte[] docBytes = baos.toByteArray();
someObject.setFile(docBytes); // to store it in db

Please do the needful asap.
Thanks in advance!
Consent System Privacy Notice_TA.zip (1.4 MB)


#2

@VOCONSENT,

Please ZIP and attach the ‘Latha’ font file here for further testing. Please also provide a comparison screenshot highlighting the problematic areas in Aspose.Words generated PDF and attach it here for our reference. Please point out the exact problematic places for this issue. We will then investigate the issue on our end and provide you more information.


#3

Hi Hafeez,
Thanks for your reply!

I am not using any font file in my code. My application supports nearly 21 Languages as follows [English,Arabic,Chinese,Czech,Dutch,Finnish,French,German,Hungarian,Indonesian,Italian,Japanese,Korean,Polish,Portuguese,Russian,Spanish,Swedish,Tamil,Thai,Turkish]

I am uploading word document for each language, And converting the word doc to html bytes to save in database ( I am not facing any issues with the generated html . works fine for all languages)
Whereas while converting the word doc to pdf bytes to save in database. ( i am facing issues in the generated pdf with lot of mistakes in font)

I will upload the sample code which am using to convert word to html and word to pdf along with sample uploaded word doc. you can get the converted doc from the below url’s,

https://dev-voconsent-admin.cat.com/user/documents/A00A1004/lang/ta/versions/1/content?fileType=html
https://dev-voconsent-admin.cat.com/user/documents/A00A1004/lang/ta/versions/1/content?fileType=pdf

Note: Am using aspose.words (licensed)

com.aspose
aspose-words
19.5
jdk17


com.aspose
aspose-words
19.5
javadoc

PFA zip file attached

Sample word doc and code used.zip (67.6 KB)

Please do the needful.
Thanks.


#4

@VOCONSENT,

The difference between HTML and PDF outputs may be because you are missing required fonts. Please note that Aspose.Words requires TrueType fonts when rendering documents to fixed-page formats such as PDF, XPS or Images. Please refer to the following articles for details:

How Aspose.Words Uses True Type Fonts
How to Receive Notification of Missing Fonts and Font Substitution during Rendering

Please try to install the missing font. If the problem still remains, please also provide the required Font file here for further testing.


#5

Thanks Hafeez.

Do I need to do any change in logic even if i install the fonts in windows server ?
I checked those links and executed the same … results in showing missing respective fonts !
Do I need to run for all those 21 languages to get to know the missing fonts ? Is there any list already available with Aspose ?

And one more I have tamil fonts inside windows fonts folder in my machine. Please refer the screenshot in this link https://forums.adobe.com/thread/1634986 ( Tamil language) is the same issue am facing while converting word to pdf using aspose word java 19.5v .

Please let me know if there is any findings.


#6

@VOCONSENT,

There should not be any change in your code/logic even after installing the required fonts. I am afraid, there is no list of Fonts that we can provide to you. You do not need to install all the Fonts. You only need to check which Fonts are used inside Word documents and then install them for correct rendering to PDF by Aspose.Words. In above case, the Latha font was used inside the Word document and when running the code from here, Aspose.Words warns about it. So, you should install this font for this document.

If the problem still remains, please also provide the required Font file (Latha in this case) here for further testing.


#7

Hi Hafeez,

Thanks, I followed the link and tried using the Latha font but still in generated pdf supplement letters get swapped. I am attaching my standalone project source and diff image. Please find the same. Zip contains fonts , word doc and generated pdf respectively. Used 19.5 aspose word jarAsposeWordToPdf.zip (6.8 MB)


#8

@VOCONSENT,

We have logged the issue with “Tamil-latha.docx” in our issue tracking system. The ID of this issue is WORDSNET-18621. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

Regarding “Tamil-nirmala.docx” and “Tamil-vijaya.docx”, please also provide comparison screenshots highlighting the problematic areas here for further testing.


#9

Thanks Hafeez. Please find the attached doc contains the diff for all those 3 fonts. Also may i know is there any timeline for the above ticket WORDSNET-18621.
Diff word-pdf for tamil fonts.zip (110.2 KB)


#10

@VOCONSENT,

We tested the scenarios and have managed to reproduce the same problems on our end. For the sake of corrections, we have logged the following problems in our issue tracking system.

WORDSNET-18631: related to Tamil-nirmala.docx
WORDSNET-18632: related to Tamil-vijaya.docx

We will further look into the details of these problems and will keep you updated on the statuses of these issues. We apologize for your inconvenience.

Secondly, the issue (WORDSNET-18621) is currently pending for analysis and is in the queue. There is no ETA available at the moment. We will inform you via this thread as soon this issue is resolved.