Chinese characters issw

Frunza · September 28, 2016, 4:17am

Hallo

I noted that some characters that work in a table cell do not work in other. The code snippet is the following:

com.aspose.pdf.Document doc = new com.aspose.pdf.Document(“Blank.pdf”);
com.aspose.pdf.Table table = new com.aspose.pdf.Table();
table.setColumnWidths(“400”);
table.getRows().add().getCells().add(“竤”);//is show correctly
table.getRows().add().getCells().add(“䧜”);//is not show correctly
table.getRows().add().getCells().add(“竤—”);//is show correctly
table.getRows().add().getCells().add(“竤—䧜”);// both Chinese symbols are not shown correctly
doc.getPages().get_Item(1).getParagraphs().add(table);
doc.save(“Result.pdf”);

(I also attached the pdf)
In the last cell, none of the Chinese characters are shown. It seems that another font is automatically used, but it somehow messes up both characters. I would more or less expect that one is show correctly and the other one not, in worse case. Is this a bug, or is this the correct behavior? If it is the correct behavior, what is the explanation for it?

Thank you,
Samuel

amjad.sahi · September 28, 2016, 4:37am

Hi,

Thanks for providing us details.

Well, your query/issue looks to be related to Aspose.Pdf APIs, so I am moving your thread to Aspose.Pdf forum where one of my fellow colleagues from Aspose.Pdf team will help you soon.

Thank you.

tilal.ahmad · September 29, 2016, 2:42am

Hi Samuel,

Thanks for your inquiry. Please note it is not a bug, it is required to use appropriate font for the contents. Your text characters belong to MingLiU font, so the font should be available on your host system and referred as following, It will help you to accomplish the task.

com.aspose.pdf.Document doc = new com.aspose.pdf.Document(myDir + "HelloWorld.pdf");

com.aspose.pdf.Table table = new com.aspose.pdf.Table();
table.setTop(100);
table.setColumnWidths("400");

TextState textState = new TextState();
textState.setFont(FontRepository.findFont("MingLiU"));

table.getRows().add().getCells().add("竤", textState); // Shows correctly
table.getRows().add().getCells().add("䧜", textState); // Not shown correctly
table.getRows().add().getCells().add("竤---", textState); // Shows correctly
table.getRows().add().getCells().add("竤---䧜", textState); // Both Chinese symbols are not shown correctly

doc.getPages().get_Item(1).getParagraphs().add(table);
doc.save(myDir + "Result.pdf");

Best Regards,

Frunza · September 29, 2016, 3:16am

Hallo

Thank you for the answer. In the application I am writing, the text comes from the user, so I do not know what font to use. The code snippet was a minimal sample to demonstrate my issue.
After I opened the document I attached(I re-attached it, because the first one seems to be a fail) with “PDF-Viewer”, I noticed that the first and 3rd cell use the ArialUnicodeMs font, while the 2nd and 4th cell use the MSGothic font.
When not specifying the font, there seems to be an algorithmus that does the matching, and it seems that in some cases it could be improved. But since you wrote that using an appropriate font for the contents is required, I guess it is too much to ask to improve the matching algorithmus

Thank you for the clarification,
Samuel

tilal.ahmad · September 30, 2016, 1:47am

Hi Samuel,

Thanks for your feedback. We will look into the issue and will provide you more information accordingly. Meanwhile can you please confirm whether setting font folder resolved this issue?

Best Regards,

Frunza · September 30, 2016, 2:19am

Hallo

|Meanwhile can you please confirm whether setting font folder resolved this issue?
They are unrelated.

Setting the font explicitly, as it was proposed, solves this minimal sample.

Thank you,
Samuel

tilal.ahmad · October 3, 2016, 3:03am

Hi Samuel,

Thanks for your feedback.

Frunza:

After I opened the document I attached(I re-attached it, because the first one seems to be a fail) with “PDF-Viewer”, I noticed that the first and 3rd cell use the ArialUnicodeMs font, while the 2nd and 4th cell use the MSGothic font.
When not specifying the font, there seems to be an algorithmus that does the matching, and it seems that in some cases it could be improved. But since you wrote that using an appropriate font for the contents is required, I guess it is too much to ask to improve the matching algorithmus

In reference to your above query, we have logged a ticket PDFJAVA-36187 for further investigation and enhancement. We will keep you updated about the issue resolution progress.

Best Regards,

tilal.ahmad · November 14, 2016, 2:35am

Hi Samuel,

Thanks for your patience. In reference to above logged issue, please note for reasons of performance and display-quality we do not check all the fonts to find one that contains all the requested symbols. It will be not good if every time when a symbol will not be found we will bring another font into the document. We perform the fast check with a couple of system fonts.

However you can use functionality to find the suitable font by using the following code:

System.out.println(FontRepository.findFont("MingLiU").doesFontContainAllCharacters("竤"));
System.out.println(FontRepository.findFont("Arial Unicode MS").doesFontContainAllCharacters("竤"));
System.out.println(FontRepository.findFont("MSGothic").doesFontContainAllCharacters("竤"));

Best Regards,

Frunza · November 22, 2016, 6:54am

Hallo

Ok, I understand how it is done.
This actually works great for me.

Thank you,
Samuel

tilal.ahmad · November 22, 2016, 10:50pm

Hi Samuel,

Thanks for your feedback. It is good to know that suggested code worked for you.

Please keep using Aspose.Pdf and feel free to contact us for any further assistance.

Best Regards,