Can't able to retrieve page number and page count fields in Header Footer in DOC->HTML->DOC

chandrasekar · April 25, 2008, 11:09am

Hi,

I converted a .doc file into html using aspose. The .doc header footer contains some page number and page count fileds.

Now i am converting the .html file which i got earlier into .doc. I am using the below code for to insert header and footer into the .doc file. I got the header and footer in the .doc file but not the page number and page count fields.

Document doc=new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// thead which i got from the .html file

String thead ="1";

builder.moveToHeaderFooter(HeaderFooterType.HEADER_PRIMARY);
builder.insertHtml(thead);
doc.save("test.doc");

Now how can i retrieve my page number and page count fields in the .doc header footer.

Thanks,
chandrasekar.P

Klepus · April 25, 2008, 2:42pm

Hello!
Thank you for your inquiry.
When a document is converted to HTML page number and page count fields becomes ordinary text. Evidently after DOC->HTML->DOC roundtrip you will get that text in the resulting DOC.
Please clarify why you need such a double conversion (DOC->HTML->DOC) and how you’d like to handle these fields.
Regards,

chandrasekar · April 28, 2008, 1:13am

Hi,

Thanks for your reply.

My concern is,
If I convert a doc to html, I got the PAGE and NUMPAGES fields as ordinary text, this is OK since we cant have such fields in the HTML doc.

But If I reconvert the same html to doc, I expect that Aspose will have some unique marks for those fields and convert it to the corresponding fields. But it is not happening.

So, Now I need to do the same by some means.

Is it possible for Aspose to convert the tag with some unique class attribute like “PAGE” or “NUMPAGES” to the corresponding doc fields when I invoke builder.insertHtml() method.

For e.g.,

String thead ="1";

builder.moveToHeaderFooter(HeaderFooterType.HEADER_PRIMARY);
builder.insertHtml(thead); //thead is header html string
doc.save("test.doc");

Hope you will give me an enhancement patch if it is not already supported.

Thanks,
Chandrasekar.P

Klepus · April 28, 2008, 4:33am

Hello!
Thank you for giving more information.
Aspose.Words outputs HTML in its “classic” form, without any special marks for potential roundtrip. We have some exceptions from this rule but they are optional and disabled by default. HTML produced by MS Word is steadier for roundtrips but many people unlike it because of Microsoft extensions.
Currently Aspose.Words doesn’t support import of out-of-line style information. I mean it is not sensitive to class=“NUMPAGES”. This feature is under development. You can apply some heuristic to find what was a page number before the roundtrip. For instance, you have this fragment in HTML:
1
It’s a paragraph with one span having some particular formatting. You can find it in the document model after you convert HTML to DOC and replace with a field. From my point of view good criteria are that the paragraph contains only one run (spans become runs) with a number and widow control is disabled for it. Normally widows and orphans are prohibited in documents and only in rare cases this behavior is overridden. You can check this with ParagraphFormat.WidowControl property.
Please let us know whether my workaround suits you needs.
Regards,

asposebhuvana · June 18, 2008, 7:57am

Hi,
Is there any documentation or article for word document to html file conversion. What are the tags are supported.

alexey.noskov · June 18, 2008, 8:23am

Hi
Thanks for your inquiry. Please see the following document.
https://releases.aspose.com/words/net
Best regards.

aspose.notifier · April 2, 2012, 8:53am

The issues you have found earlier (filed as WORDSNET-5557) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(7)

Can't able to retrieve page number and page count fields in Header Footer in DOC-&gt;HTML-&gt;DOC

Can't able to retrieve page number and page count fields in Header Footer in DOC->HTML->DOC