PageSplitter Malfunction Aspose.Words.Net 17.3


#1

We are using Aspose.Words.NET 17.3 in an IIS web service.

Our application uses Word templates in a range of formats (doc, docx, rtf) containing MailMerge tags, and sometimes conditional logic. We merge data with those templates, then we SPLIT the document (using PageSplitter) so we can find the exact location of certain objects (Shapes) AFTER the merge. It’s critical we know the Page number each objects lands on. For that, we use PageSplitter to split the document into individual page sub documents.

DocumentPageSplitter splitter = new DocumentPageSplitter(layoutCollector);
for (int page = 1; page <= doc.PageCount; page++)
{
    Aspose.Words.Document pageDoc = new Aspose.Words.Document();
    pageDoc = splitter.GetDocumentOfPage(page);
    . . . . 
}

The single page sub documents created by the PageSplitter do not split accurately. Each sub document should be a single page, but we regularly find a few extra words or characters on a SECOND page in the sub document.

I saw a related thread (Split document page issue) where Moderator Tahir Manzoor suggested installing fonts used in the document on the server. That will NOT work for us, because we are accepting Word files from Customers. We can not always have every font they have. We assume if font substitution happens, the result will look good enough.

Background - I inherited this code base from a developer who left. I’m not sure how she set this all up

Question 1) I see the source code for PageSplitter.cs is built directly into our project. Is that normal? I expected PageSplitter to be part of the object model Aspose provides?

Question 2) Where can I get the latest version of PageSplitter.cs? I don’t see it when I search Aspose.Words docs.

Question 3) Are there cases where PageSplitter is KNOWN to have problems?

Thanks for your help,

SimpleTestSignableBadSplit.zip (26.9 KB)

Peter Cleaveland


#2

@pcleaveland

Please note that you need to install fonts that are used in your document on the machine where you are splitting document’s pages into separate documents. If the fonts are not installed on machine, the page layout of document will be incorrect and you will get incorrect output.

This is not part of Aspose.Words API. It is separate utility that splits the document’s pages into separate document.

Please get the latest code of this utility from the Github repository of Aspose.Words for .NET.

There are no known problems of this utility. If you face any issue while using this utility, please let us know.


#3

Thanks @tahir.manzoor,

I updated to the latest PageSplitter.cs from the Examples, and it helps for SOME of our problem documents, but not all.

Next I will upgrade to the latest Aspose.Words (19.10) and see if that helps more.

Please explain about the Font issue you mentioned.

These templates are created by customers, who may have any font in the world installed. Hopefully they don’t go font-crazy, but we have no control over that. We merge them in our web service, where a limited number of common fonts are installed, ten save the result as PDF.

We know font substitution will happen sometimes. We assume font substitution will be “good enough, most of the time”. We accept that, because we can never have ALL fonts that ALL customers may use, and we have no power to “police” the fonts they have installed.

When we open the templates in Word, they look “good enough”, and they have good font substitution.

When we merge the templates using Aspose, they do work, and the results look “good enough” when viewed in Word, except the page splitting is not always accurate.

Why is it be critical to have all the fonts the customer used in order to split accurately?

Since we CAN NOT actually have ALL fonts that ALL customers might use, what is our alternative?

If we saved the template as a NEW document, AFTER fonts have been substituted, would that split accurately?

IF that would work, can we use Aspose to do it? We can’t edit all templates in Word.exe.

Peter


#4

@pcleaveland

The PageSplitter utility uses Aspose.Words.Layout API. When fonts of document are not installed, the layout API does not get the correct position/node where the page starts and ends.

Moreover, Aspose.Words requires TrueType fonts when rendering document to fixed page formats (JPEG, PNG, PDF or XPS). If you face any issue while using PageSplitter, you will also notice the same issue with document to fixed page format conversion. E.g. you can convert document to PDF, XPS etc. to check it.

We suggest you please read the following articles.
How Aspose.Words Uses True Type Fonts
How to Receive Notification of Missing Fonts and Font Substitution during Rendering


#5

@tahir.manzoor,

I updated to the latest version of PageSplitter, and Aspose 19.10, and we are still getting page split issues. I do not know if these issues are related to the fonts, but I will repeat my questions from before:

Question 4) Since we CAN NOT actually have ALL fonts that ALL customers might use, what is our alternative ? We MUST have accurate splitting.

Question 5) If we saved the template as a NEW document, AFTER fonts have been substituted, would that split accurately?

Question 6) IF saving AFTER font substitution would avoid the Font problem, how would we use Aspose to do it?

Question 7) How can we make Table Header Rows split correctly?

I see some other docs that are splitting wrong that has nothing to do with Fonts. Here is a form (PRIVACY ACT NOTICE) that uses tables with header rows. Those table header rows do not work correctly when we split.


#6

@pcleaveland

In your case, we suggest you please convert each page of document to PDF using Aspose.Words. Please check the following code example.

Document doc = new Document(MyDir + @"Issue document.docx");
PdfSaveOptions pdfSaveOptions = new PdfSaveOptions();
pdfSaveOptions.PageSavingCallback = new PageSavingCallback();
doc.Save(MyDir + @"19.10.pdf", pdfSaveOptions);

private class PageSavingCallback : IPageSavingCallback
{
    public void PageSaving(PageSavingArgs args)
    {
        args.PageFileName = string.Format(MyDir + @"Page_{0}.pdf", args.PageIndex);
    }
}

Once you convert the document’s pages to PDF, please use Aspose.PDF to convert PDF to Word document. Please check the following code example.

Aspose.Pdf.Document pdf = new Aspose.Pdf.Document(MyDir + "output.pdf");
Aspose.Pdf.DocSaveOptions saveOptions = new Aspose.Pdf.DocSaveOptions();
saveOptions.Mode = Aspose.Pdf.DocSaveOptions.RecognitionMode.Flow;
pdf.Save(MyDir + "output.doc", saveOptions);

Hope this helps you.


#7

@tahir.manzoor,

Sorry, that answer does not help. We can not adopt a whole new PDF product into our code base at this time. We need THIS splitting problem to be fixed.

This code produces a bad split for the attached document.
How can Aspose.Words and PageSplitter handle this document?

Splits Badly.zip (52.8 KB)
Splits Badly_Page01.zip (37.8 KB)
Splits Badly_Page02.zip (14.7 KB)

Aspose.Words.Document doc = new Aspose.Words.Document(customTemplatePath);
LayoutCollector layoutCollector = new LayoutCollector(doc);
doc.UpdatePageLayout();

DocumentPageSplitter splitter = new DocumentPageSplitter(doc);
for (int page = 1; page <= doc.PageCount; page++)
{
    Aspose.Words.Document pageDoc = new Aspose.Words.Document();
    pageDoc = splitter.GetDocumentOfPage(page);
    string pagePath = GetSavePath(page);
    pageDoc.Save(pagePath);
}

Peter


#8

@pcleaveland

We have logged this problem in our issue tracking system as WORDSNET-19376. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.