@australian.dev.nerds There is no need to specify additional load or save options upon conversion form MHTML to PDF. You can use the following code to convert MHTML to PDF:
Document doc = new Document(@"in.mhtml");
doc.Save(@"out.pdf");
Aspose.Words does not allocate any unmanaged resources upon loading document, so there is no need to dispose the Document object, it is collected by garbage collector once Document object is out of scope.
ExportGeneratorName, how can I add my own custom name as string and leave ExportGeneratorName = True
UseHighQualityRendering and .UseAntiAliasing if set to True will have any effect for PDF? Even if no effect, set them to True is harmless?
TempFolder = Nothing has any effect for PDF save?
PrettyFormat for PDF, leave False is recommended?
Kindly advise about .MemoryOptimization, set to True or False?
Finally, AllowEmbeddingPostScriptFonts does have any effect for PDF? Which formats this will have effect on?
To convert emails to Pdf, do you recommend:
Saving email as Mhtml and then use Words.Loading.LoadOptions
or
Saving email as Html and then use Words.Loading.HtmlLoadOptions
?
And one issue: Since most emails have hosted images, when saving as Pdf, such images must be downloaded and injected to the target Pdf, while it will not, resulting in: error.zip (1.8 KB)
Unfortunately, there is no way to specify your own custom generator name in output PDF document.
UseHighQualityRendering properly is applicable only while saving document to image formats, like Tiff, Png, Bmp, Jpeg, Emf.
TempFolder specifies the folder for temporary files used when saving to a DOC or DOCX file. This property does not have an effect when save to PDF.
PrettyFormat is used to make HTML, MHTML, EPUB, WordML, RTF, DOCX and ODT output human readable. Useful for testing or debugging. It is not applicable for PDF format.
MemoryOptimization can significantly decrease memory consumption while saving large documents at the cost of slower saving time. It is recommended to enable this option if you convert large documents.
AllowEmbeddingPostScriptFonts is not applicable for PDF format. It is applicable only for MS Word output formats like DDOCX, DOC and RTF.
LoadOptions is a base class for HtmlLoadOptions, so all options available in LoadOptions class are also available in HtmlLoadOptions class. HtmlLoadOptions also provides properties which are specific for HTML-like formats.
Could you please attach your input MHTML document here for testing? We will check the issue and provide you more information. The problem might occur because the image is not available or Aspose.Words does not have access to it. You can implement IResourceLoadingCallback interface if you want to control how Aspose.Words loads external resources when importing a document.
Hello, wonder if no one ever asked to have this as a feature request? Words is not an end user app but a high-end high-priced SDK, flexibility is demanded.
One thing: if we use Words.FileFormatUtil.DetectFileFormat just to check the file format for other purposes (not for opening by Aspose Words) what should I do if default enum Auto is returned?! How to interpret the file type then?
@australian.dev.nerds I have logged a feature request in our defect tracking system as WORDSNET-25551. We will consider adding such feature.
Words.FileFormatUtil.DetectFileFormat method never returns LoadFormat.Auto. This enum value is used by Document constructor to let Aspose.Words know that it should auto detect load format (default behavior when no load options are passed).
@australian.dev.nerds Thank you for additional information. The image in your MHTML document is not accessible. It is not displayed when view the document in browser or when open document in MS Word. The problematic image URL is the following: https://docs.microsoft.com/answers/themes/minerva/images/qna-email-logo.png
If convert document to PDF using MS Word the image is also not loaded: aw.pdf (63.0 KB) ms.pdf (21.5 KB)
Thanks, not so sure, open in browser: Untitled.jpg (20.2 KB)
Anyway, kindly run this project sample to compare Words vs Cells conversion.
Yep, Cells’ output is a waste, but at least it downloads the image, word never get it, tested against many emails: WindowsApplication55.zip (19.6 KB)
Thanks, please kindly run my vs.net project sample below:
Some images in PCL output are rendered incorrectly, like negative: WindowsApplication60.zip (6.2 MB)
Also, kindly let me know if running this same code base will download all images to embed to the output PDF?
@australian.dev.nerds
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): WORDSNET-25575
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
PS: Images in output PDF looks correct on my side.
Tiff / epub / xps - only saves the 1st page, possible to have all pages in a single file?
And the image download problem I had earlier still exists, I disabled my whole Windows Firewall, can’t find the problem, are you running my exact project code above? If yes, any idea what might be wrong?