Incorrect pagination while splitting document

gregarshinov · March 23, 2018, 12:39pm

Hello!
I’m trying to shred a docx using the example, that is provided at PageSplitter.cs.
My document has 50 pages, but this code generates 62. Some of the pages are split incorrectly, with final paragraphs placed onto a new page. I’m using .Net Core 2.0 on macOS and planning to deploy the code on Ubuntu. The version of Aspose.Words is 18.3. The document I’m trying to run the splitter on is attached to the message. Документация_1.docx.zip (210.6 KB)

awais.hafeez · March 23, 2018, 4:27pm

@gregarshinov,

Aspose.Words for .NET can render your Word document to PDF correctly. However, the PageSplitter class is unable to produce correct number of documents. This utility’s code is quite complex. We have logged a feature request in our issue tracking system to provide a built-in method in Aspose.Words to split documents into pages. The ID of this issue is WORDSNET-16228. Your thread has been linked to this issue and you will be notified via this thread as soon as this issue is resolved. We apologize for the inconvenience.

As a workaround, you can convert your Word document to individual PDF pages by using Aspose.Words for .NET as follows:

Document doc = new Document(MyDir + @"input.docx");
int pageCount = doc.PageCount;
PdfSaveOptions opts = new PdfSaveOptions();
opts.PageCount = 1;
opts.UpdateFields = false;
for (int i = 0; i < pageCount; i++)
{
    opts.PageIndex = i;
    doc.Save(MyDir + @"18.3-" + i + ".pdf", opts);
}

After that you can use Aspose.PDF for .NET to convert each PDF page to DOC or DOCX format:
Convert PDF to DOC or DOCX format

gregarshinov · March 26, 2018, 7:53am

Thank you for your response.
So I need to buy Aspose.PDF to perform this? Aspose.Words is told to be able to split pages correctly without any extra libraries, so is it possible to buy Aspose.Words and Aspose.PDF for the price of one?
I tried to save my document as pdf and didn’t succeed: Aspose.Words seems to not understand the encoding of the document and yields sequences of squares instead of proper characters.
Also, my whole task is a little bit more complicated. As I mentioned before, I have 50 p document. It contains a table of contents(without SdtBlock). I need to extract page ranges according to some specific entries from TOC. I have already pulled the needed page-ranges, but Aspose.Words fails to do correct pagination and consequently fails to do the whole thing. How can I perform this task?
P.S. How may I access your bug tracker? I need to know an approximate time estimate for this issue to be resolved because our project is going to rely on Aspose.Words.

awais.hafeez · March 26, 2018, 11:12am

@gregarshinov,

Aspose.Words and Aspose.PDF are two separate libraries and you need to buy both separately to be able to use the workaround that I shared in my previous post.

gregarshinov:

but Aspose.Words fails to do correct pagination and consequently fails to do the whole thing. How can I perform this task?

I am afraid, page wise PDF files (see code in my previous post) are being generated by the latest version of Aspose.Words i.e. 18.3 correctly on our end. However, it is the PageSplitter class that is causing the incorrect pagination. Instead of external PageSplitter class, we will most likely provide a built-in method in Aspose.Words DLL to split documents into pages. There are no estimates available at the moment. We will inform you as soon as this issue (WORDSNET-16228) is resolved.

Moreover, there is no direct way that you can use to track issues by yourself. But, you are welcome to ask your issue status via forum threads. We will verify the status from our internal issue tracking system and reply you back.

aspose.notifier · October 25, 2020, 7:31am

The issues you have found earlier (filed as WORDSNET-16228) have been fixed in this Aspose.Words for .NET 20.10 update and this Aspose.Words for Java 20.10 update.