PDF format issues while converting from word

Hi team,

I’m trying to convert a word doc to PDF using Aspose Java(latest version of word & pdf). I had came across some problems like,

   * Font size changes
   * Color Shades difference
   *  Label mismatch in part image
   * Image size changes in flow chart kind of image
   * Extra lines in graph image       
   * Letters merged in Image containing text

At sometime, the exact replica of word document was not arrived at.
I have attached my sample docs.Please go through it and try to clarify with a solution.

Sample: samples.zip (6.1 MB)

Thanks in advance.

@Gomathi,

We tested the scenarios and have managed to reproduce the same problems on our end. For the sake of corrections, we have logged the following problems in our issue tracking system.

WORDSJAVA-2006: Colour shading Issue in PDF (related to 10N760L2HNR_10_egi10N760L2HNR.doc)

WORDSNET-18290: Image labels/captions render at incorrect places in PDF (related to 10CFJPDTVG70_10_egi10CFJPDTVG7.docx)

WORDSNET-18291: Flow chart image font issue in PDF (related to 10D1F79RFXC_10_egi10D1F79RFXC.doc)

We will further look into the details of these problems and will keep you updated on the statuses of these issues. We apologize for your inconvenience.

Regarding “10J7SPG224T_10_egi10J7SPG224T.docx” in “Original doc not replicated(Image lines & text)” folder, please also provide a comparison screenshot highlighting the problematic areas in Aspose.Words generated PDF file with respect to your input DOCX and attach it here for our reference.

@Gomathi,

Regarding WORDSNET-18291, “SimHei” and “DengXian” fonts are required to render this metafile. Unfortunately, it is not possible to render the metafile correctly without these fonts because glyph ids specific to the fonts are used in some places in the metafile. You can also see it in the MS Word output without these fonts either. With these fonts installed Aspose.Words is able to render metafile correctly. So, to fix this issue, please simply install the above two fonts. Hope, this helps.

@Gomathi,

Regarding WORDSNET-18291, we have completed the work on your issue and concluded to close this issue as ‘Not a Bug’. Please see my previous post for analysis details.

Regarding the other two issues, they are currently ‘pending for analysis’ and are in the queue. We will inform you via this thread as soon as these issues are resolved or any updates are available.

@awais.hafeez

Yeah, its working fine.Thanks for your valuable time over it.

While dealing with different documents, there occur below kind of font issues.It occur even after installing “SimHei” and “DengXian” fonts.Can you please go through the below samples?

Sample:
sample.zip (5.6 MB)

@Gomathi,

We tested the scenarios and have managed to reproduce the same problems on our end. For the sake of corrections, we have logged the following problems in our issue tracking system.

WORDSNET-18331: The last bracket of x-scale is missing in PDF (related to 10CC0HHH5KL.docx)

WORDSNET-18332: Chart x-scale has font problem in PDF (related to 10CWQ4LKF87.docx)

WORDSNET-18333: Superscript and Subscript characters in picture are not retained in PDF (related to 10FGDT8RD2G.docx)

WORDSNET-18334: The colon at the image top is not retained in PDF (related to 10FS83P20RK.docx)

We will further look into the details of these problems and will keep you updated on the statuses of these issues. We apologize for your inconvenience.

Not only Colon, at sometimes comma,semicolon,degree and delta symbols in image scales were not retained

Please consider the first ,fifth & sixth points as mentioned above.Because at most cases, there exists merging of letters in word & font type and font size changes.

Please go through the below samples.
Sample:samples.zip (2.3 MB)

@Gomathi,

We are working on your queries and will get back to you soon.

@Gomathi,

We have logged two more issues i.e. WORDSNET-18368 (related to 10G872J7MQ0.docx) and WORDSNET-18370 (related to 10TDH50Q9LZ.docx) in our issue tracking system. We will further look into the details of these problems and will keep you updated on the statuses of these issues. We apologize for your inconvenience.

Well, Thanks in advance.

@Gomathi,

Regarding WORDSNET-18370, it is to update you that the implementation of this issue has been postponed till a later date (no estimates are available at the moment). This is a rare case/scenario. We will inform you via this thread as soon as this issue is resolved. We apologize for your inconvenience.

As a workaround, please use the following code:

Document doc = new Document("E:\\temp\\10TDH50Q9LZ.docx");

PdfSaveOptions opts = new PdfSaveOptions();
opts.getMetafileRenderingOptions().setUseEmfEmbeddedToWmf(false);

doc.save("E:\\Temp\\awjava-19.3-setUseEmfEmbeddedToWmf-false.pdf", opts);

@Gomathi,

Regarding WORDSNET-18368 (related to 10G872J7MQ0.docx), please also convert this document to PDF format by using MS Word on your end and attach it here for our reference. Please also provide a comparison screenshot highlighting the problematic areas in Aspose.Words generated document with respect to MS Word generated document and attach it here for our reference. Please point out the exact problematic places for this issue. We do not see a big difference between Aspose.Words and MS Word generated PDF outputs on our end. Thanks for your cooperation.

Sorry for delayed reply.Please keenly see the x-axis fonts of that rendered pdf.Its not retained as in the original doc.,and usually the graph lines are little more darker.

Reference Output Pdf: Sample.zip (1.2 MB)

For time being, please give solution for other issues raised under the same topic “PDF format issues while converting from word”. Thanks in advance.

Regards,
Gomathi. N.

@Gomathi,

We need the resources mentioned in my previous post to be able to investigate your issue (WORDSNET-18368) further on our end. Please also tell, what OS, MS Word and JDK versions are you testing these scenarios on? This is because the output shown in MS Word on our end may be different than to what is displayed on your end. Please also see how MS Word 2019 produces the PDF output on our end.
msw-2019.pdf (53.1 KB)

@Gomathi,

We are waiting for your further input on this topic (WORDSNET-18368). Please see my previous posts and share the required resources. Thanks for your cooperation.

I’ am currently using below services

OS : Windows 10 OS(64-bit)
JDK : jdk1.8.0_181
MS word: Microsoft Word 2010

Sorry for delayed reply.

@Gomathi,

Thanks for the additional information. Please also convert (Save As) this document (10G872J7MQ0.docx) to PDF format by using MS Word 2010 on your end and attach it here for our reference.

Please find MS Word 2010 for the mentioned sample,
Sample :Sample.zip (225.8 KB)