Empty image after converting from doc to xhtml

tschmid · July 21, 2011, 8:37am

hi,
here is my problem:

I convert a word-doc to xhtml using aspose.words, latest update.
Now, if there is an image in the doc, which I put in it using the clipboard, I get an empty image with the right size but all white.
aspose.words tells me, that the image format is EMF.

The lines in the program:

 Charset CSet = Charset.forName("UTF-8");
 Document doc = new Document(fi.getInputStream());
 FontSettings.setFontsFolder("/opt/Infopark/NPS-test/apache-tomcat-.0.29/webapps/nrwe/TTFonts",false);
 com.aspose.words.SectionCollection sc = doc.getSections();
 com.aspose.words.HtmlSaveOptions ho = new com.aspose.words.HtmlSaveOptions(com.aspose.words.SaveFormat.HTML);
 ho.setPrettyFormat(true);
 ho.setEncoding(CSet);
 ho.setExportDocumentProperties(true);
 int iii = sc.getCount();
 for (int ii = 0; ii < iii; ii++) sc.get(ii).clearHeadersFooters();
     doc.save(EingabeDateiName + ".xhtml", ho);*

In the ZIP there is a test document with its results.
The empty image is the one which ends with .002.png.

Another little problem: as you see I want utf-8 encoding. It seems to me, that using a ANSI-character like the one for ä - ä - xE4 in a word-doc causes aspose.words to export exactly this char and not its UTF-8-coding C3 A4, as it should. The docs in which this occurs are not public and I did not succeed in reproducing such a doc, so I dont send an example, and I have a work-around for this problem.

Sorry, but getting a solution for the image problem is very urgent to me.

Thanks,

Theo Schmid

alexey.noskov · July 21, 2011, 2:09pm

Hi
Thank you for reporting this problem to us. I managed to reproduce the problem on my side. As a temporary workaround, you can use the following code:

Document doc = new Document("C:\\Temp\\Testdokument5.doc");
HtmlSaveOptions opt = new HtmlSaveOptions(SaveFormat.HTML);
opt.setExportMetafileAsRaster(false);
doc.save("C:\\Temp\\out.html", opt);

In this case, metafiles will be exported to HTML as is. But some browsers cannot display metafiles.
Regarding the second problem, if it is possible, you can remove all sensitive information from your document and attach it here.
Best regards,

tschmid · July 22, 2011, 4:25am

Hi Alexey,

The Problem with the ANSI-Chars seems to be vanished … So: Sorry, forget this one.

I did as you suggested. Then I put the EMF-Files into aspose.metafiles (temporary license, I got it yersterday.) and everything is fine.

Doesnt aspose use the same code for aspose.words and aspose.metafiles for this task?
Will there be an aspose.words with the working code in near future?
If not, is there a way to get a cheap license of aspose.metafiles on top of my existing license of aspose.words?

Thanks,

Theo Schmid

alexey.noskov · July 22, 2011, 5:28am

Hi
Thanks for your request. No, Aspose.Words does not using Aspose. Metafiles internally. Sure we will fix the problem in one of future version. We will let you know once the issue is resolved.
Best regards,

tschmid · July 22, 2011, 11:44am

Hi Alexey,

Sorry, but the problem goes on:

As I told, I changed the code as you suggested.
Now
there is an EMF-image, which is still exported as PNG and as an empty
image. It is the first image in the doc, which I append.
So I tried to save the images by shape.getImageData().save(…).

com.aspose.words.NodeCollection shapes = doc.getChildNodes(com.aspose.words.NodeType.SHAPE, true, false);
int imageIndex = 0;
for (com.aspose.words.Shape shape : (Iterable<com.aspose.words.Shape>)shapes)
{
    if (shape.hasImage())
    {
        int IT = shape.getImageData().getImageType();
        if (IT == com.aspose.words.ImageType.WMF)
        {
            shape.getImageData().save(EingabeDateiName + "." + imageIndex + ".wmf");
        }
        else if (IT == com.aspose.words.ImageType.EMF)
        {
            shape.getImageData().save(EingabeDateiName + "." + imageIndex + ".emf");
        }
        else if (IT == com.aspose.words.ImageType.PICT)
        {
            shape.getImageData().save(EingabeDateiName + "." + imageIndex + ".pict");
        }
        else if (IT == com.aspose.words.ImageType.JPEG)
        {
            shape.getImageData().save(EingabeDateiName + "." + imageIndex + ".jpeg");
        }
        else if (IT == com.aspose.words.ImageType.PNG)
        {
            shape.getImageData().save(EingabeDateiName + "." + imageIndex + ".png");
        }
        else if (IT == com.aspose.words.ImageType.BMP)
        {
            shape.getImageData().save(EingabeDateiName + "." + imageIndex + ".bmp");
        }
        imageIndex++;
    }
}

The images, which I get by this method, are right, I can convert all the EMFs with no problem but:
the
numbering in the normal export is not like mine. the normal export
seems to check, if an image is a copy of another one and exports it only
once, so my routine above gets more images then the normal export. In
my doc there is a image, which is 6 times in the doc (the second one followed by 5 copies).
The images from the normal export are numberd from 001 to 008, an I get 0 to 12, that meens 5 images more, exactly the number of the copies of the second image.

My question:
Is there a way to identify, which image will be exported with which number?
Then I can exchange them.
Or is there a way to identify, if an image is the copy of anotherone?
Then I can skip it.

Alternatively:
In
an entry in your forum I found a hint, that in such a case I can export
like above and mark the right place with dummy entries, so I can fill
in the right img-tag later. But I cannot find the entry again, so if you
please, give me a hint.

Thanks a lot,

Theo Schmid

alexey.noskov · July 22, 2011, 12:36pm

Hi
Thanks for your request. I think you can try using IImageSavingCallback to achieve what you need:
https://reference.aspose.com/words/java/com.aspose.words/iimagesavingcallback/
Also, if you can convert metafiles to PNG without problems, you can do that before converting to HTML and reset the image data of the appropriate shapes:
https://reference.aspose.com/words/java/com.aspose.words/imagedata/#getImageBytes
Best regards,

tschmid · July 27, 2011, 9:15am

Hello Alexey,

Thank you for the help.

What I have programed:

Loop through the shapes.
Identify the emf and wmf Images.
Get the ImageData ByteArray.
Convert them with aspose.metafiles.
Put the converted bytes in place of the old ImageData bytes.
Extract as if nothing was happened.
Be happy!

Bye,

Theo Schmid

alexey.noskov · July 27, 2011, 9:26am

Hi Theo,
It is perfect that you managed to work the problem around. Please feel free to ask in case of any issues. We are always glad to help you.
Best regards,

aspose.notifier · July 30, 2011, 2:11am

The issues you have found earlier (filed as WORDSJAVA-17) have been fixed in this .NET update and in this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(1)