Words mht to pdf

australian.dev.nerds · June 17, 2023, 1:27pm

Hello,
If wanna use in combination with Aspose Email to convert emails to pdf, first I convert the email to Mhtml.
Then using Aspose Words:

Dim MyDocument As Words.Document = New Words.Document(ms, LoadMHTopt)
MyDocument.Save(SFD.FileName, SavePDFopt)

Please advise when converting Mht to Pdf, which load options and save options (convertion options) are advise to set / consider?

A suggestion for good: make Words.Document disposable to be able to have Using

alexey.noskov · June 17, 2023, 6:41pm

@australian.dev.nerds There is no need to specify additional load or save options upon conversion form MHTML to PDF. You can use the following code to convert MHTML to PDF:

Document doc = new Document(@"in.mhtml");
doc.Save(@"out.pdf");

Aspose.Words does not allocate any unmanaged resources upon loading document, so there is no need to dispose the Document object, it is collected by garbage collector once Document object is out of scope.

australian.dev.nerds · June 18, 2023, 1:02am

Hello and thanks, but I found these useful and liked to have short queries about them:

New Words.Loading.LoadOptions
.LoadFormat = Words.LoadFormat.Mhtml
.Encoding = Encoding.UTF8

New Words.Saving.PdfSaveOptions
.SaveFormat = Words.SaveFormat.Pdf
.ExportGeneratorName = False
.UseHighQualityRendering = True
.UseAntiAliasing = True
.TempFolder = Nothing

ExportGeneratorName, how can I add my own custom name as string and leave ExportGeneratorName = True
UseHighQualityRendering and .UseAntiAliasing if set to True will have any effect for PDF? Even if no effect, set them to True is harmless?
TempFolder = Nothing has any effect for PDF save?
PrettyFormat for PDF, leave False is recommended?
Kindly advise about .MemoryOptimization, set to True or False?
Finally, AllowEmbeddingPostScriptFonts does have any effect for PDF? Which formats this will have effect on?

To convert emails to Pdf, do you recommend:
Saving email as Mhtml and then use Words.Loading.LoadOptions
or
Saving email as Html and then use Words.Loading.HtmlLoadOptions
?

Thank you very much for your help

australian.dev.nerds · June 18, 2023, 1:22am

And one issue: Since most emails have hosted images, when saving as Pdf, such images must be downloaded and injected to the target Pdf, while it will not, resulting in:
error.zip (1.8 KB)

alexey.noskov · June 18, 2023, 5:28am

@australian.dev.nerds

Unfortunately, there is no way to specify your own custom generator name in output PDF document.

UseHighQualityRendering properly is applicable only while saving document to image formats, like Tiff, Png, Bmp, Jpeg, Emf.

TempFolder specifies the folder for temporary files used when saving to a DOC or DOCX file. This property does not have an effect when save to PDF.

PrettyFormat is used to make HTML, MHTML, EPUB, WordML, RTF, DOCX and ODT output human readable. Useful for testing or debugging. It is not applicable for PDF format.

MemoryOptimization can significantly decrease memory consumption while saving large documents at the cost of slower saving time. It is recommended to enable this option if you convert large documents.

AllowEmbeddingPostScriptFonts is not applicable for PDF format. It is applicable only for MS Word output formats like DDOCX, DOC and RTF.

LoadOptions is a base class for HtmlLoadOptions, so all options available in LoadOptions class are also available in HtmlLoadOptions class. HtmlLoadOptions also provides properties which are specific for HTML-like formats.

Could you please attach your input MHTML document here for testing? We will check the issue and provide you more information. The problem might occur because the image is not available or Aspose.Words does not have access to it. You can implement IResourceLoadingCallback interface if you want to control how Aspose.Words loads external resources when importing a document.

australian.dev.nerds · June 18, 2023, 10:44am

Hello, wonder if no one ever asked to have this as a feature request? Words is not an end user app but a high-end high-priced SDK, flexibility is demanded.

One thing: if we use Words.FileFormatUtil.DetectFileFormat just to check the file format for other purposes (not for opening by Aspose Words) what should I do if default enum Auto is returned?! How to interpret the file type then?

alexey.noskov · June 18, 2023, 4:46pm

@australian.dev.nerds I have logged a feature request in our defect tracking system as WORDSNET-25551. We will consider adding such feature.

Words.FileFormatUtil.DetectFileFormat method never returns LoadFormat.Auto. This enum value is used by Document constructor to let Aspose.Words know that it should auto detect load format (default behavior when no load options are passed).

australian.dev.nerds · June 21, 2023, 1:46am

Hello,
Ready to run VS.net 2010 project to reproduce the problem:
WindowsApplication396.zip (6.2 MB)

alexey.noskov · June 21, 2023, 5:04am

@australian.dev.nerds Thank you for additional information. The image in your MHTML document is not accessible. It is not displayed when view the document in browser or when open document in MS Word. The problematic image URL is the following:
https://docs.microsoft.com/answers/themes/minerva/images/qna-email-logo.png

If convert document to PDF using MS Word the image is also not loaded:
aw.pdf (63.0 KB)
ms.pdf (21.5 KB)

australian.dev.nerds · June 21, 2023, 9:45am

Thanks, not so sure, open in browser:
Untitled.jpg (20.2 KB)

Anyway, kindly run this project sample to compare Words vs Cells conversion.
Yep, Cells’ output is a waste, but at least it downloads the image, word never get it, tested against many emails:
WindowsApplication55.zip (19.6 KB)

alexey.noskov · June 21, 2023, 12:27pm

@australian.dev.nerds On my side I see the image in the output PDF document:

out.zip (62.0 KB)

australian.dev.nerds · June 21, 2023, 3:06pm

Thanks, please kindly run my vs.net project sample below:
Some images in PCL output are rendered incorrectly, like negative:
WindowsApplication60.zip (6.2 MB)

Also, kindly let me know if running this same code base will download all images to embed to the output PDF?

alexey.noskov · June 21, 2023, 4:25pm

@australian.dev.nerds
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25575

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

PS: Images in output PDF looks correct on my side.

australian.dev.nerds · June 22, 2023, 1:18am

Thanks, if you don’t mind, may I request one more case test:

WindowsApplication60.zip (6.2 MB)

Saving as pdf/ epub/ xps/ svg/ tif

Tiff / epub / xps - only saves the 1st page, possible to have all pages in a single file?

And the image download problem I had earlier still exists, I disabled my whole Windows Firewall, can’t find the problem, are you running my exact project code above? If yes, any idea what might be wrong?

alexey.noskov · June 22, 2023, 5:40am

@australian.dev.nerds As I can see output documents have 3 pages as expected. For example see XPS output produced by your code: xps.zip (124.9 KB)

Please also note, EPUB is not fixed page format, it is flow format more like HTML, it does not have a page concept.

aspose.notifier · July 10, 2023, 4:57am

The issues you have found earlier (filed as WORDSNET-25575) have been fixed in this Aspose.Words for .NET 23.7 update also available on NuGet.