How to store paginated print preview of HTML/MHT

Hello,

I’ve got the trial version in prior to evaluate HTML/WORD rendering capabilities of Aspose.Words.

What is required is to store such files pagewise, such as Print Preview, takin page dimension and resolution into accout. Can someone, please, show me some simple example how to do accomplish such a task?

I’ve already try a given “Preview” example, but rendering of HTML with table seems not to look very pleasant and also an aligment tag’s on images will be ignored. Here some simple example:
Kopf, Körper und Fuß einer Tabelle definieren

Betroffene Menschen

Assoziation 1 Assoziation 2 Assoziation 3
betroffen:
4 Mio. Menschen
betroffen:
2 Mio. Menschen
betroffen:
1 Mio. Menschen
Berlin Hamburg München
Miljöh Kiez Bierdampf
Buletten Frikadellen Fleischpflanzerl
2. Buletten 2. Frikadellen 2. Fleischpflanzerl
3. Buletten 3. Frikadellen 3. Fleischpflanzerl
4. Buletten 4. Frikadellen 4. Fleischpflanzerl
Assoziation 1 Assoziation 2 Assoziation 3
betroffen:
4 Mio. Menschen
betroffen:
2 Mio. Menschen
betroffen:
1 Mio. Menschen
Berlin Hamburg München
Miljöh Kiez Bierdampf
Buletten Frikadellen Fleischpflanzerl
2. Buletten 2. Frikadellen 2. Fleischpflanzerl
3. Buletten 3. Frikadellen 3. Fleischpflanzerl
4. Buletten 4. Frikadellen, Fleischpflanzerl

Many thanks in advice!

Alexander

Hi

Thanks for your request. The problem occurs because there is colspan in some cells in your table. When you convert HTML to DOC and there table with rowspan or colspan Aspose.Words represents them as merged cells, but each of merged cell contains the same content, this causes the problems during rendering or converting to PDF. I linked your request to the appropriate issue. You will be notified as soon as it is resolved.

As a temporary workaround, you can try using the following code:

// Open HTML document

Document doc = new Document(@“Test199\HTMLTables1.html”);

//Get collection of cells in the docuemnt

NodeCollection cells = doc.GetChildNodes(NodeType.Cell, true);

//Loop through all cells and search for merged cells

foreach (Cell cell in cells)

{

if (cell.CellFormat.HorizontalMerge == CellMerge.Previous ||

cell.CellFormat.VerticalMerge == CellMerge.Previous)

{

//Remove content from merged cells

cell.RemoveAllChildren();

}

}

// Save as image

doc.SaveToImage(0, doc.PageCount, @“Test199\preview.tif”, null);

Hope this helps.

Best regards.

Many thanks for your answer!

Now I’ve followed questions:

  1. It’s possible to get Image per page in some way like “Image iImage=doc.RenderPage()” ?

  2. Which HTML subset will be fully supported, some Table/Cell/Row attributes seems not to supported. Align-tags like are also not supported as shown in previous example?

  3. How close DOC documents will be rendered to MS WORD and wchich versions of WORD formats are supported?

  4. What is appropriate response time for some HTML & WORD issues for registered customers?

  5. I’ve read, the MHTL is also supported, but for some .eml mail files (which is close to MHTML) I get Error “Unsupported Content-Type”. It’s possible to disable such Errors and render all text/lain & text/html parts or it’s necessary to convert all parts to standalone HTML files?

Thanks in advice!

Hi

Thanks for your inquiry.

  1. Yes of course you can achieve this. You should just use another overload of SaveToImage method. Please see the following link for more information.

http://www.aspose.com/documentation/file-format-components/aspose.words-for-.net-and-java/aspose.words.document.savetoimage_overload_2.html

  1. Not all HTML features are supported now. You can find additional information here:

http://www.aspose.com/community/files/51/file-format-components/aspose.words-for-.net-and-java/entry108980.aspx

  1. It is difficult to say how close Word document will be to HTML, because HTML and Word format are different and it is extremely difficult to produce Word document that looks exactly as HTML and opposite. It is dependent from your documents and their complexity.

You can find list of supported formats here:

http://www.aspose.com/documentation/file-format-components/aspose.words-for-.net-and-java/file-formats-and-conversions.html

  1. Requests in the forum should be answered within 24 hours. Aspose.Words releases are published every 4-5 weeks and contains fixes of clients issues. When you report the problem, it will be pushed into the queue and we will fix it in one of future releases. Time is dependent from issue complexity and importance.

  2. There is no way to disable errors. Please attach your MHTML document here for testing. I will check it and provide you more information.

Best regards.

Thanks for your quick response!

One of question I’ve not correct formulated: )

… 3. How close Aspose-Engine will render MS WORD .DOC documents compared to MS WORD and wchich versions of WORD formats are supported?

This question was not related to HTML.

Other question is, if it is planned to support the full implementation of HTML 4 in near future?

Thanks!

Hi

Thanks for your request.

  1. Aspose.Words rendering engine renders Word document very close to original, in most of cases rendering image looks exactly as the original documents. If you have any problems with rendering your document, please report them in this forum.

  2. Aspose.Words supports DOC format (starting from MS Word 97), DOCX (Word 2007 format), WordML (Word 2003 XML format), RTF, ODT, HTML, EPUB and MHTML (which are based on HTML), and of course export to TXT and PDF.

  3. It is extremely difficult to promise you that HTML 4 will be fully supported, because HTML is not actually DOC format, it is web page format, and it is difficult to map between HTML features and Word documents features.

Best regards.

Thanks for yous answer!

alexey.noskov:

3. It is extremely difficult to promise you that HTML 4 will be fully supported, because HTML is not actually DOC format, it is web page format, and it is difficult to map between HTML features and Word documents features.

I will try to simplify my question: How close you can support HTML 4 set of tags, particularly for images, alignment, color, font styles, font sizes an table borders/lines/rules?

Thanks in advice.

Hi

Thanks for your request. You can find an approximate list of what is supported in HTML import/export here:

http://www.aspose.com/community/files/51/file-format-components/aspose.words-for-.net-and-java/entry108980.aspx

But, you should note, this list could be out of date, since we are working on improving out HTML import/export modules.

Best regards.

Thanks!

alexey.noskov:

Hi

**

Thanks for your request. You can find an approximate list of what is supported in HTML import/export here:

http://www.aspose.com/community/files/51/file-format-components/aspose.words-for-.net-and-java/entry108980.aspx

But, you should note, this list could be out of date, since we are working on improving out HTML import/export modules.

Best regards.

It would be very nice, if you can shortly done some small improvements.

I’ve documented the results of A4 page rendering with Aspose.Words (over screenshoots, because the trial version prints only the first site), IE7 and Word 2003 in attached files. IE7 looks better as Word 2003, i.e. for table footers.

For our requiremnts: at the moment we send self created HTML-mails (becaus of this fact, we can small change the structure at some places, but not overall), receive the user completed HTML-mails back and archive this documents. Later we must can to produce page images from any HTML mails. At the same moments we must produce images from some numbers of formats such as .DOC.

It would be also very pleasant, if we can render HTML-Mails with embedded images (as MHT oder EML) without conversion to HTML files. It is possible?

Thanks in advice!

Hi

Thank you for additional information.

  1. You can improve conversion a little by using workaround I suggested earlier. As I can see you did not use it to convert your HTML to TIF.

Document doc = new Document(@“Test001\T.html”);

RemoveContentFromMergedCells(doc);

doc.SaveToImage(0, doc.PageCount, @“Test001\out.tif”, null);

===================================================================

///

/// Remove content from merged cells.

///

public void RemoveContentFromMergedCells(Document doc)

{

// Remove content from merged cells.

// Get collection of cells in the docuemnt.

NodeCollection cells = doc.GetChildNodes(NodeType.Cell, true);

foreach (Cell cell in cells)

{

// Check whether cell is merged with previouse.

if (cell.CellFormat.HorizontalMerge == CellMerge.Previous ||

cell.CellFormat.VerticalMerge == CellMerge.Previous)

{

// Remove content from the cell.

cell.RemoveAllChildren();

}

}

}

  1. Aspose.Words supports MHTML format, so you can open MHTML file directly, without converting them to HTML.

Best regards.

Hi!

many thanks for your help! This solved the cell-merging rendering problem.

Now I interested is solving some issues i.e. with borders, rules, alignment, line breaking inside headers, table headers and footers, cell padding, etc as showing in screenshoots in attached archive file.

alexey.noskov:

But, you should note, this list could be out of date, since we are working on improving out HTML import/export modules.

Can you, please, supply information, what is on HTML todo improvement-list and planned deadline for?

Thanks in advice!

Hi

Thanks for your inquiry. As I told you earlier it is extremely difficult and often impossible to produce Word documents (and documents preview) which looks exactly as source HTML. This is because HTML documents are not Word documents, HTML is one page document and this format is designed for Web and not for displaying on pages.

Unfortunately, currently, I cannot provide you list of planed improvements and estimates.

Best regards.

Thanks for quick response!

It’s not planned to produce Word documents from HTML documents at all :slight_smile:

Only the print or store as images, i.o.w. paginated rendering of HTML.

As for example you product already perfectly support the paginated preview and print of HTML, but with only wih some small restriction/missed features. As I can see in the table, you support alignment for paragraph, but not for image; all rules in the table are allways rendered within same width, but sure it can be rendered without some (or all) rules (as defined in the HTML-Table/Row options); the same is within footers or headers of table, which must allways be on bottom or top of table; unwanted text breaking inside the header may be only small bug - so I think, it’s only a small subset of missed features to accomplish most of requirements of this task, other missed features is not of strong hight importance for rendering.

Thanks!

Hi

Thank you for additional information. Maybe you can try some post-processing. You can create Document from HTML and then format Document’s elements as needed. If you need help with this, I can try to help you. But you should note, that workarounds can work for one document, but will not work for other documents.

Best regards.

Thanks!

You are right, of course we can do some post-processing for our own documents, or replies if not much modofications are done.

But now we have reqirements to process fully undefinded user created incommind mail - for such scenario we can apply no postprocessing.

By the way, you have mentioned, it’s possible to process single page archive, but I’v got an axception if I try to open such file: http://www.aspose.com/community/forums/189924/unexpected-error-occured-if-opened-mht-single-webpage-document/showthread.aspx#189924

Thanks in advice!

Hello Alexander!

Thank you for clarification.

It’s a really more complex task: to handle documents coming from third-parties, customers etc. In other words from the sources you don’t control. There could be really anything in those documents. Anyway you are welcome to share problematic documents with us and we’ll take appropriate measures. I’m going to see the document in the referenced thread.

Regards,

The issues you have found earlier (filed as 7739) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.