Hello.
I have a .rtf document and I have to save it to .docx. After that, I use Apache Tika to extract XHTML from the .docx. I am not using Aspose in this last step because I was not able to extract a clean HTML (without formatting).
The code is as follows:
Document doc = new Document("D:\\Teste\\10932983.rtf");
doc.removeMacros();
OoxmlSaveOptions options = new OoxmlSaveOptions(SaveFormat.DOCX);
options.setDmlEffectsRenderingMode(0);
options.setDmlRenderingMode(1);
options.setCompliance(OoxmlCompliance.ISO_29500_2008_TRANSITIONAL);
doc.getCompatibilityOptions().setDisableOpenTypeFontFormattingFeatures(true);
doc.getCompatibilityOptions().optimizeFor(MsWordVersion.WORD_2013);
doc.save("D:\\Teste\\serah3.docx", options);
The problem is that the generated .docx from Aspose does not follow the definition of DrawingML that pictures come in the pic:pic element (<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
). So Tika does not recognize the pictures when reading the docx generated by Aspose.
Aspose generates the picture in the v:shape element. I guess this v:shape element is microsoft proprietary?
When I save the document with Microsoft Word, it works fine.
I’ve tried different configurations of setDmlEffectsRenderingMode and setDmlRenderingMode without success.
I need help. Thanks in advance.
Thank you for the reply.
The files are attached.
Check line 408 of xhtml_generated_from_word_doc.html. It shows:
Localização do Pará no Brasil
This image is identified by Tika from the following element of docx:
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
(Tika searches the namespace http://schemas.openxmlformats.org/drawingml/2006/picture
and the pic:pic element)
In the xhtml generated from aspose’s docx, there’s no img element.
Thank you for your attention.
Hi Alessandra,
Thanks for your inquiry. We tested the scenario and have managed to reproduce the same problem on our end. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-13719. Our product team will further look into the details of this problem and we we will keep you updated on the status of correction. We apologize for your inconvenience.
Best regards,
How is the status of this issue? Was it corrected?
Thanks in advance.
Hi Alessandra,
Thanks for your inquiry. Unfortunately, this issue is not resolved yet. This issue is currently pending for analysis and is in the queue. We will keep you informed and let you know once this issue is resolved. Sorry for inconvenience.
Best regards,
The issues you have found earlier (filed as WORDSNET-13719) have been fixed in this Aspose.Words for .NET 17.2.0 update and this Aspose.Words for Java 17.2.0 update.
This message was posted using Notification2Forum from Downloads module by aspose.notifier.