Page split function

Hi I had 2 queries:

  1. is there any way to copy page content and save it to either a new document or save it directly to html? DocumentSplitCriteria = DocumentSplitCriteria.PageBreak; option in HtmlSaveOptions does not seem to split by page number, it only splits when you have inserted Page Breaks in a doc manually.

  2. I am getting an error when saving to html with PageBreak document split option.

I have attached the test document I am using. Any advice you have would be appreciated

saveOptions.DocumentSplitCriteria = DocumentSplitCriteria.PageBreak;

Other save options:

saveOptions.ExportTextInputFormFieldAsText = true;
saveOptions.PrettyFormat = true;
saveOptions.CssStyleSheetType = CssStyleSheetType.Embedded;

Exception:

System.InvalidOperationException: There was no XML start tag open.
at System.Xml.XmlTextWriter.InternalWriteEndElement(Boolean longFormat)
at System.Xml.XmlTextWriter.WriteFullEndElement()
at xce0136f05681c5e9.x967f6fb76e7d0c72.x718c3268815fe948()
at xce0136f05681c5e9.x967f6fb76e7d0c72.x9a7ad1735553086c(Inline x31545d7c306a55e4)
at xce0136f05681c5e9.x967f6fb76e7d0c72.VisitRun(Run run)
at Aspose.Words.Run.Accept(DocumentVisitor visitor)
at Aspose.Words.CompositeNode.x464d2134480a7bf2(DocumentVisitor x672ff13faf031f3d)
at Aspose.Words.CompositeNode.xf7ae36cd24e0b11c(DocumentVisitor x672ff13faf031f3d)
at Aspose.Words.Paragraph.Accept(DocumentVisitor visitor)
at Aspose.Words.CompositeNode.x464d2134480a7bf2(DocumentVisitor x672ff13faf031f3d)
at Aspose.Words.CompositeNode.xf7ae36cd24e0b11c(DocumentVisitor x672ff13faf031f3d)
at Aspose.Words.Body.Accept(DocumentVisitor visitor)
at xce0136f05681c5e9.x967f6fb76e7d0c72.xf5a995748d89aa41(Story x93d8434f027afd5a)
at xce0136f05681c5e9.x967f6fb76e7d0c72.x51ee56decc29a9da(Section xb32f8dd719a105db)
at xce0136f05681c5e9.x967f6fb76e7d0c72.xd7560e2140c6338f()
at xce0136f05681c5e9.x967f6fb76e7d0c72.xe00a9c07c675b8ed()
at xce0136f05681c5e9.x967f6fb76e7d0c72.xa2e0b7f7da663553(Document x6beba47238e0ade6, Stream xcf18e5243f8d5fd3, String xafe2f3653ee64ebc, Boolean xddad811e564048f0, HtmlSaveOptions xc27f01f21f67608c, IDictionary x82fcc4c2a16a6294)
at xce0136f05681c5e9.x967f6fb76e7d0c72.x8cac5adfe79bc025(x8556eed81191af11 x5ac1382edb7bf2c2)
at Aspose.Words.Document.xf381a641001e6830(Stream xcf18e5243f8d5fd3, String xafe2f3653ee64ebc, SaveOptions xc27f01f21f67608c)
at Aspose.Words.Document.Save(String fileName, SaveOptions saveOptions)

Hi

Thanks for your request. Word document is flow document and does not contain any information about its layout into lines and pages. Therefore, technically there is no “Page” concept in Word document.
Aspose.Words uses our own Rendering Engine to layout documents into pages. And we have plans to expose layout information. Your request has been linked to the appropriate issue. You will be notified as soon as this feature is supported.
As a workaround you can try using PageNumberFinder class suggested by Adam in this thread:
https://forum.aspose.com/t/58199
Also I managed to reproduce the problem with your document. Your request has been linked to the appropriate issue. You will be notified as soon as it is fixed.
Best regards,

Thanks Andrey, we found some other bugs you might want to look at too:

  1. Aspose when loading a file does not exactly render doc fields, text, images on the same pages as MS Word is doing, so if we do page split some content may be moved to other pages.

  2. For some reason top text line (or any other object) on pages above 1 is not added to pages when saving to html. I think code that is doing this page numbering which I have from aspose has some bug, was looking at it but it’s quite complex and I don’t understand it fully.

I have PageFinder code from this post https://forum.aspose.com/t/58199 - added by aske012 aspose staff member.

Code that retrieves all nodes on page is not returning items (paragraphs) which are on the first line/row one a page for pages above 1.

So basically it gives me the correct values for the first line of page 1, but for pages after page 1 it will not return paragraph node for the first line on the page.

List pageNodes = pageFinder.RetrieveAllNodesOnPage(pageNum, true);
  1. When copying pages numbering is incorrect. It will always start from 1 again. So numbering is lost for sections or long lists etc.

Best wishes.

Brian

Hi Brian,
Thanks for this additional information.
Regarding your first request, could you please attach the documents that are being rendered differently here for testing?
Regarding the second and third request I have posted some feedback on the original thread.
Thanks,

The issues you have found earlier (filed as WORDSNET-4502) have been fixed in this .NET update and in this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

The issues you have found earlier (filed as WORDSNET-2978) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(30)

The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan