Chm conversion multiple issues

Hello,
Kindly try to convert this sample Chm to other formats, start with Pdf:
chm.zip (207.4 KB)

  1. The resulting saved Pdf’s title will be taken from the last html page inside chm file, I think it’s much better to get the Pdf’s title from the Chm’s title:

The above applies to all other formats that can have Titles, like Html itself.

  1. See the top bar in the chm:

    When converted to other formats, it’s not converted correctly and is defaced, also wrapped into 2 lines!

  2. I’v faced the black background for transparent images in the conversion again, will update it :slight_smile:

  3. As you see, the style of 1st page inside chm is different as the other pages, but the following pages’ style is applied to the 1st page too, not serious, if you think it’s time consuming or might corrupt other parts simply disregard it.

OK this one surely is a bug, load a Chm and get the title (no idea if BuiltInDocumentProperties.Title is the only correct way to get the document title, confirm if there are other properties too):

Dim MyDocument As Words.Document = New Words.Document(SourceFile, LoadOptions)
MsgBox( MyDocument.BuiltInDocumentProperties.Title )

This will return the last html page’s title inside the chm file, wrong.
Must return the actual title of Chm help itself, as shown above:
title.png (64.8 KB)

About issue no. 3 in the first post, simply load this chm:

chm.zip (207.4 KB)

Save it to html, then load the resulting html and convert that html to RTF, see here:

2 issues:

  1. transparent images have black background now! Seen similar issues here and there in Aspose Words before.
  2. Rtf file size is insane huge, 146mb? If you save as doc, it will be 6.9mb, what’s wrong?

@australian.dev.nerds
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25800,WORDSNET-25801

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@australian.dev.nerds
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25802

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@australian.dev.nerds

  1. I have opened WORDSNET-25800 ticket for the issue with title.
  2. I have opened WORDSNET-25801 ticket for the issue bar text outline effect.
  3. I could not reproduce this issue on CHM->HTML->RTF conversion. Could you please share the code to reproduce the issue?
  4. I have opened WORDSNET-25801 ticket for first page style issue.
  5. I cannot reproduce the RTF size issue. On my side both CHM->RTF and CHM->HTML->RTF output size is 16MB. Could you please share the code to reproduce the issue?
1 Like

Hello
Yep, now made a 584mb file and Word even cannot open it:
word.png (5.5 KB)

The > 512mb problem was for HtmlSaveOptions.ImageResolution = 600
Setting it to 300 will make a 165mb file, simply run my sample:

WindowsApplication4.zip (6.8 MB)

Although still thinking why produce a Word file larger than 512 MB that even my high end rig won’t open? Seems Words format limit, should Aspose Words pass that limit?

@australian.dev.nerds There is no file size limit of MS Word documents. The only limit is resources available on your machine and behavior how large documents are handled by MS Word can differ from machine to machine. There are recommendation not to use huge documents. Normal MS Word document size is about 100-200 pages. Larger documents might cause issues in MS Word, again depending on available resource on the machine.

1 Like

Thanks, but this case has less than 10 pages!
Anyway, kindly confirm the CHM->HTML->RTF conversion issue by running the sample, as a side thing, if you found the root of strange large size, kindly let me know :slight_smile:

@australian.dev.nerds The issue is caused by images in your document. RTF is not a compact document format and images stored in RTF takes quite large amount of size. In your HTML save options you specify high resolution for images this causes RTF document size increasing.

1 Like

Well thanks, ok I won’t set the ImageResolution but can you please confirm the issue with transparent images appearing with black background? :slight_smile:

@australian.dev.nerds The problem occurs because RtfSaveOptions.SaveImagesAsWmf is set to true in your code. If set it to false the images look correct.

1 Like

Thanks, if I permanently set RtfSaveOptions.SaveImagesAsWmf to false, it won’t be compatible with WordPad right?

About the large size issue, ImageResolution property itself was not the root alone, the most part was by RtfSaveOptions.ExportImagesForOldReaders 160mb vs 13mb

Do you recommend to permanently set ExportImagesForOldReaders to False? Because the default is True :frowning:

@australian.dev.nerds RtfSaveOptions.SaveImagesAsWmf option might help to avoid WordPad warning messages. But even without this option the RTF should be compatible with WordPad.
When RtfSaveOptions.ExportImagesForOldReaders is set the image is written into RTF document twice, that is why size dramatically increases. Setting this option depend on your needs. If it is not required to support old/simplified RFT readers then you can disable this option.

@australian.dev.nerds The problem with the bar text occurs because IE’s specific CSS property filter: progid:DXImageTransform.Microsoft.Glow(color= 'Blue' , Strength= '2') is used in CHM. This feature is deprecated now and our development team is not going to work on this feature support in the nearest future. The ticket WORDSNET-25801 will be closed as “Won’t Fix”.

1 Like

oops, ancient -ms-filter DXImageTransform.Microsoft.Glow

yep, agreed, it’s not wise to support it, I though it’s html5.

OK just one thing remains from Chm conversion issues: the TITLE

The resulting saved Pdf’s title will be taken from the last html page inside chm file, I think it’s much better to get the Pdf’s title from the Chm’s title:

The above applies to all other formats that can have Titles, like Html itself.

@australian.dev.nerds This issue is already logged as WORDSNET-25800. We will keep you update and let you know once it is resolved.

1 Like

The issues you have found earlier (filed as WORDSNET-25801) have been fixed in this Aspose.Words for .NET 23.9 update also available on NuGet.

1 Like

Hello, fixed? You mean support for such ancient feature is added? :o