Get Page Count & Convert Pages of Word DOCX Document to Text .txt Files (C# .NET)

Hi,

We are using RenderedDocument class to read the content of the document. But we have issue in identifying the page breaks within the document. we are expecting to get the page breaks at the place where the each page will end in the word object model but it is not happening, So can any one help to fix this issue to get the page breaks same as word object model.

we are using Aspose.Words v20.4.0.0. I have given the sample file in attached sample zip file to reproduce the issue.

Below is the sample code.

RenderedDocument layoutDoc = new RenderedDocument(AsposeDocument);

foreach (RenderedPage page in layoutDoc.Pages)
{

}
Sample doc.zip (83.1 KB)

@Gayatri_K,

Please upgrade to the latest 21.3 version of Aspose.Words for .NET and see how it goes on your end? In case the problem still remains, then please provide a standalone simple Console application (source code without compilation errors) that helps us to reproduce your current problem on our end and attach it here for testing. Please do not include Aspose.Words DLL files in it to reduce the file size. We will then start further investigation into your issue and provide you more information.

Sample application.zip (6.2 MB)
Sample_doc.zip (107.2 KB)
Hi @awais.hafeez

I have tested with 21.3 version of aspose words dll still issue is reproducible. I am attaching sample application and sample doc.

We have found one more new issue where the hidden listnum in document is missing from aspose word object. In the attached sample documents “Sample 2.docx” document’s first paragraph have the hidden list number. Use show/hide option of word document to check the hidden paragraph. This hidden list number was not present in rendered page of aspose word object.

Please look into both of these issues.

Thanks,
Gayatri

@Gayatri_K,

Unfortunately, your query is not clear enough therefore we request you to please elaborate your inquiry further by providing complete details of your use-case. This will help us to understand your scenario, and we will be in a better position to address your concerns accordingly.

The project you shared uses 20.4 version of Aspose.Words; can you please upgrade to latest 21.4 version and see how it goes on your end? You are running the app in evaluation mode; can you please apply the license before running the app? Also, your application does not produce any output files or display anything on Console which makes it difficult to determine the nature of problem.

I am afraid, we are unable to locate any “Sample 2.docx” in the attachments you shared. But, the “Sample.docx” you shared has 120 pages and the document’s first Paragraph on first Page does not have any hidden list numbers.

Regarding the “Sample doc.doc” that you attached in first post, it has 46 pages; on what exact pages are you observing this problem? Do you see the same problem when converting this DOC to PDF by using following simple code? If yes, then please create and attach a comparison screenshot which highlights the problematic area(s) in following Aspose.Words 21.4 generated PDF (with respect to MS Word generated PDF). Please also convert your documents to PDF by using Aspose.Words and MS Word and attach the PDF files here for our reference.

Document doc = new Document("C:\\temp\\Sample doc\\Sample doc.doc");
doc.Save("C:\\Temp\\Sample doc\\21.4.pdf");

I have also attached 21.4 version generated PDF file here for your reference: (see 21.4.pdf (271.3 KB))

Hi

We have 2 issues where the problem is with aspose page breaks which are not matching with word object model page breaks.

I have used 21.4 version of aspose words dll, issues are not fixed.

Scenario 1: “Sample1.doc” : This document have footnotes which is causing issue with page breaks.
Scenario 2: “Sample2.docx” : This document have the hidden list number. Use show/hide property of the word document to see hidden listnumber. As this hidden list number is not identified, it is effecting on page breaks.

Please take the modified sample application and sample documents from attachments to reproduce the issue.

Thanks,
Gayatri Sample_doc.zip (40.5 KB)
SampleApp.zip (6.0 MB)

@Gayatri_K,

The problem simply occurs because you are using Aspose.Words for .NET in evaluation mode (i.e. without applying a license). In this case, you need to set the license to get the desired output. If you want to test ‘Aspose.Words for .NET’ without the evaluation version limitations, then you can also request a 30-day Temporary License. Please refer to How to get a Temporary License? .

There is another way to get the desired results by using Document.ExtractPages method. Please check if the following simple code is acceptable for you?

Aspose.Words.License license = new Aspose.Words.License();
license .SetLicense("Aspose.Words.lic");

Document doc = new Document("C:\\Temp\\Sample_doc1\\Sample2.docx");

int total_Pages = doc.PageCount;
for (int i = 0; i < total_Pages; i++)
{
    Document one_Page_Doc = doc.ExtractPages(i, 1);
    one_Page_Doc.Save("C:\\temp\\Sample_doc1\\page " + (i + 1) + ".txt");
}

@awais.hafeez

we do have license for aspose. And I tried with updated license and 21.4 version of Aspose.Words.dll. I could see that the issue with document “Sample1.doc” is fixed. But I have similar document (Doc2.DOC) as Sample1.doc which have footnotes in it and with updated license and updated Aspose.Word.dll, page breaks in aspose document object are not matching with interop word object model. And also issue which I raised for “Sample2.docx” still exists.

Could you please check it?

Thanks,
GayatriApp & Docs.zip (6.0 MB)

@Gayatri_K,

We have logged this problem in our issue tracking system with ID WORDSNET-22343. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

We are checking this scenario and will get back to you soon.

@Gayatri_K,

Using this as an input Word document Sample2.zip (24.2 KB), I have generated following output text file with 21.6 version of Aspose.Words for .NET by using the code you supplied earlier.

And also produced this output text file (see page 1.zip (464 Bytes)) by using the following C# code:

Document doc = new Document("C:\\Temp\\App & Docs (1)\\Sample2.docx");

int total_Pages = doc.PageCount;
for (int i = 0; i < total_Pages; i++)
{
    Document one_Page_Doc = doc.ExtractPages(i, 1);
    one_Page_Doc.Save("C:\\temp\\App & Docs (1)\\page " + (i + 1) + ".txt");
}

So, in this case, we suggest you to please use Document.ExtractPages method to get the desired results.