Words mht to pdf

Hello,
If wanna use in combination with Aspose Email to convert emails to pdf, first I convert the email to Mhtml.
Then using Aspose Words:

Dim MyDocument As Words.Document = New Words.Document(ms, LoadMHTopt)
MyDocument.Save(SFD.FileName, SavePDFopt)

Please advise when converting Mht to Pdf, which load options and save options (convertion options) are advise to set / consider?

A suggestion for good: make Words.Document disposable to be able to have Using

@australian.dev.nerds There is no need to specify additional load or save options upon conversion form MHTML to PDF. You can use the following code to convert MHTML to PDF:

Document doc = new Document(@"in.mhtml");
doc.Save(@"out.pdf");

Aspose.Words does not allocate any unmanaged resources upon loading document, so there is no need to dispose the Document object, it is collected by garbage collector once Document object is out of scope.

1 Like

Hello and thanks, but I found these useful and liked to have short queries about them:

New Words.Loading.LoadOptions
.LoadFormat = Words.LoadFormat.Mhtml
.Encoding = Encoding.UTF8

New Words.Saving.PdfSaveOptions
.SaveFormat = Words.SaveFormat.Pdf
.ExportGeneratorName = False
.UseHighQualityRendering = True
.UseAntiAliasing = True
.TempFolder = Nothing
  1. ExportGeneratorName, how can I add my own custom name as string and leave ExportGeneratorName = True

  2. UseHighQualityRendering and .UseAntiAliasing if set to True will have any effect for PDF? Even if no effect, set them to True is harmless?

  3. TempFolder = Nothing has any effect for PDF save?

  4. PrettyFormat for PDF, leave False is recommended?

  5. Kindly advise about .MemoryOptimization, set to True or False?

  6. Finally, AllowEmbeddingPostScriptFonts does have any effect for PDF? Which formats this will have effect on?

To convert emails to Pdf, do you recommend:
Saving email as Mhtml and then use Words.Loading.LoadOptions
or
Saving email as Html and then use Words.Loading.HtmlLoadOptions
?

Thank you very much for your help :slight_smile:

And one issue: Since most emails have hosted images, when saving as Pdf, such images must be downloaded and injected to the target Pdf, while it will not, resulting in:
error.zip (1.8 KB)

@australian.dev.nerds

Unfortunately, there is no way to specify your own custom generator name in output PDF document.

UseHighQualityRendering properly is applicable only while saving document to image formats, like Tiff, Png, Bmp, Jpeg, Emf.

TempFolder specifies the folder for temporary files used when saving to a DOC or DOCX file. This property does not have an effect when save to PDF.

PrettyFormat is used to make HTML, MHTML, EPUB, WordML, RTF, DOCX and ODT output human readable. Useful for testing or debugging. It is not applicable for PDF format.

MemoryOptimization can significantly decrease memory consumption while saving large documents at the cost of slower saving time. It is recommended to enable this option if you convert large documents.

AllowEmbeddingPostScriptFonts is not applicable for PDF format. It is applicable only for MS Word output formats like DDOCX, DOC and RTF.

LoadOptions is a base class for HtmlLoadOptions, so all options available in LoadOptions class are also available in HtmlLoadOptions class. HtmlLoadOptions also provides properties which are specific for HTML-like formats.

Could you please attach your input MHTML document here for testing? We will check the issue and provide you more information. The problem might occur because the image is not available or Aspose.Words does not have access to it. You can implement IResourceLoadingCallback interface if you want to control how Aspose.Words loads external resources when importing a document.

1 Like

Hello, wonder if no one ever asked to have this as a feature request? Words is not an end user app but a high-end high-priced SDK, flexibility is demanded.

One thing: if we use Words.FileFormatUtil.DetectFileFormat just to check the file format for other purposes (not for opening by Aspose Words) what should I do if default enum Auto is returned?! How to interpret the file type then?

@australian.dev.nerds I have logged a feature request in our defect tracking system as WORDSNET-25551. We will consider adding such feature.

Words.FileFormatUtil.DetectFileFormat method never returns LoadFormat.Auto. This enum value is used by Document constructor to let Aspose.Words know that it should auto detect load format (default behavior when no load options are passed).

Hello,
Ready to run VS.net 2010 project to reproduce the problem:
WindowsApplication396.zip (6.2 MB)

@australian.dev.nerds Thank you for additional information. The image in your MHTML document is not accessible. It is not displayed when view the document in browser or when open document in MS Word. The problematic image URL is the following:
https://docs.microsoft.com/answers/themes/minerva/images/qna-email-logo.png

If convert document to PDF using MS Word the image is also not loaded:
aw.pdf (63.0 KB)
ms.pdf (21.5 KB)

Thanks, not so sure, open in browser:
Untitled.jpg (20.2 KB)

Anyway, kindly run this project sample to compare Words vs Cells conversion.
Yep, Cells’ output is a waste, but at least it downloads the image, word never get it, tested against many emails:
WindowsApplication55.zip (19.6 KB)

@australian.dev.nerds On my side I see the image in the output PDF document:

out.zip (62.0 KB)

1 Like

Thanks, please kindly run my vs.net project sample below:
Some images in PCL output are rendered incorrectly, like negative:
WindowsApplication60.zip (6.2 MB)

Also, kindly let me know if running this same code base will download all images to embed to the output PDF?

@australian.dev.nerds
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25575

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

PS: Images in output PDF looks correct on my side.

1 Like

Thanks, if you don’t mind, may I request one more case test:

WindowsApplication60.zip (6.2 MB)

Saving as pdf/ epub/ xps/ svg/ tif

Tiff / epub / xps - only saves the 1st page, possible to have all pages in a single file?

And the image download problem I had earlier still exists, I disabled my whole Windows Firewall, can’t find the problem, are you running my exact project code above? If yes, any idea what might be wrong? :slight_smile:

@australian.dev.nerds As I can see output documents have 3 pages as expected. For example see XPS output produced by your code: xps.zip (124.9 KB)

Please also note, EPUB is not fixed page format, it is flow format more like HTML, it does not have a page concept.

The issues you have found earlier (filed as WORDSNET-25575) have been fixed in this Aspose.Words for .NET 23.7 update also available on NuGet.