I read the same articles and RFCs on the Internet, thank you J. We’ve also been testing our MHTML import implementation using Jacob Palme samples from the point we started working on it. Now I’ve made all combinations of multipart structures pass from that set. You’ll be able to verify this with our next version. Of course we have many other tests in the range from real cases to very marginal of them.
multipart/alternative and multipart/related are typically combined in exactly two ways: one of them two is the outer level and another is the inner. Please read the referenced mail message carefully. multipart/mixed can occur in multipart/alternative. But this case is uncommon. Usually the same is done by placing multipart/alternative inside multipart/related. If a mailer doesn’t recognize text/html it should take text/plain and treat subsidiary parts of the outer multipart/related as if they were in multipart/mixed. This will give the same result but without the need of repetition of every subsidiary part in the two alternatives.
In any case lack of support multipart/mixed in multipart/alternative won’t affect you since we always take multipart/related alternative to find text/html inside. Why should Aspose.Words read other alternatives? Maybe only in the case when preferable alternative is absent or damaged. Do you agree? If yes then this is a very minor case. I have created the corresponding issue and linked to this thread. We’ll notify you when it’s fixed. But I can repeat that business priority is very low.
Regarding extremely big images and tables I’ve already answered your question. You can edit the document programmatically when it has been imported. Can I help you more with this question?
at this time it is important for us we can read/parse real mails we have already received from customers. Some example of them was attached to my previous message.
Parallel we can try to build a workaround for storage and manuall processing of mails for cases mail is not automatically processed. And analyse this rare (may be it will be allways exactly the same unusualy case) cases later.
I’m waiting for the next version. If 've understand you correct, next version will parse more cases, right?
Is there any samples to see, how programmatically traverse docuemnt and extend alls sites in manner that all truncated elements on such sites bekame visible?
Yes, we have supported in the current mainstream some new features. They’ll be available with the next release. If you have any other questionable samples please share them here in the forum.
No, there is no existing sample for extending page size to fit content. Maybe someone wrote such a sample but I don’t know of it. If it’s difficult then I can try myself. It’s also good to have a sample of source document to be absolutely sure that code will cover the cases.
At the moment I’ve not a right page sizing algorithm.
But we catch new real HTML-mail, with very uncommon internal format - this mail can be viewed by Internet Explorer and Outlook, but can’t be opened with MS Word.
Moreover, at the botoom of mail is some error in boundary name, it seem to be truncated, which not prevent IE und OU from correct rendering of this file. If I’ve corrected the bottom line, the mail can’t be opened by MS Word anywhere.
I’ve attached this mail to my post. May be you can take a look bevore publish a new release.
I have looked on your last attached file more precisely and figured out three issues:
Multipart boundary is not recognized if contains spaces. Not a problem to support. This is most probably a deviation from standard since Microsoft Word doesn’t support such boundaries. As a workaround you can replace spaces in boundary strings with underscore characters or whatever else.
Ending multipart/related boundary is damaged. When I fixed these two manually I managed to open the document with Microsoft Word. A new issue has been created but I’m not sure we’ll fix this in considerable future. Repairing damaged files is a complex and “endless” task since we cannot predict any “affordable level of corruption”.
between two tables doesn’t take effect. It’s an issue with HTML importer which is also logged.
Thanks for your inquiry. Could you please attach sample document, which will allow us to reproduce the problem? We will check it and provide you more information.
this is the same document I’ve attached in my previous post (with very long file name) in this thread.
Here the attachment once again (with corrected last boundary footer).
But I think, I found what is caused a problem: the trial-string on the first page seems to pull down the content on first page so that the last line does not fit the page boundary:)
What is a littel curious, that line will be placed bottom on the page, not below the last line bevore.
Thank you for additional information. The problem might occur because there is two tables one by one in the document. I linked your request to the appropriate issue, you will be notified as soon as it is resolved.
As a workaround, you can just add an empty paragraph between tables. For example, see the following code:
// Open document.
Document doc = new Document(@"Test001\in.mhtml");
// Get collection of tables
NodeCollection tables = doc.GetChildNodes(NodeType.Table, true);
// loop through all tables
foreach (Table table in tables)
{
// Check if the next node after the tabel is another table.
// If so, insert an empty paragraph between tables.
if (table.NextSibling != null && table.NextSibling.NodeType == NodeType.Table)
table.ParentNode.InsertAfter(new Paragraph(doc), table);
}
// Save output document
doc.SaveToPdf(@"Test001\out.pdf");
Previously I’ve found and diskuss a problem with a spaces inside boundary names (example attached) and FileCorrupted-exception while opening such files with Aspose.Words.
Can you please, supply a fix for this problem?
It’s a little bit difficult to analyse the file and replace spaces in boundries within undercores in a robust way.
Thanks for your inquiry. The problem with spaces in multipart boundaries is already resolved in the current codebase. The fix will be included into the next hotfix, which is released in 3-4 weeks. You will be notified as soon as it is published.