Aspose Words .NET Save as PDF loses section page numbering

Hi, I’m evaluating Aspose .Words and I have a document that when saved using Aspose Words .NET, as DOCX, maintains section page numbers starting at 1 at the start of each section. When I subsequently save the doc as PDF, the doc starts at page number 1 and increments to the end of the document.
So instead of 1 of 6, 2 of 6, 3 of 6, 1 of 6, 2 of 6, 6 of 6 it shows 1 of 6 thru to 6 of 6.
We’ll normally use Page X rather than Page X of Y format, but we do need to have each section start with 1.
Is this expected behavior or can it be investigated further?
I hoped that PdfSaveOptions may help but apparently not.
Thanks,
Warrick

@Warrick Could you please attach your input and output documents here for our reference? We will check the issue and provide you more information. Unfortunately, without actual documents it is difficult to tell what the problem is.

Here they are. With Test.DOCX open in Aspose I just used Doc.Save(“Test.PDF”)
The docx has 2 docs inserted and both with pages 1 of 16 thru 8 of 16 while Test.PDF is has 1 of 16 thru 16 of 16.
Thanks!

Test.DOCX (37.6 KB)

Test.PDF (107.2 KB)

@Warrick Unfortunately, I cannot reproduce the problem on my side using the following simple code:

Document doc = new Document(@"C:\Temp\in.docx");
doc.Save(@"C:\Temp\out.pdf");

The output PDF produced by Aspose.Words look the same as PDF produced by MS Word.
Aspose.Words: out.pdf (106.6 KB)
MS Word: ms.pdf (169.2 KB)

Do you perform so preprocessing of the document before saving it as PDF?

I am just doing one save to DOCX immediately followed by the PDF Save.
However! I am running the Aspose .NET with Progress OpenEdge code rather than with C#.

lvFinalDocument:SAVE(lvFinalDocxPath).  
lvFinalDocument:SAVE(lvFinalPDFPath).

Tomorrow I’ll try running identical code with C#. I can’t see why there would be a difference though since the Aspose.Words.dll is doing all the work.

I’ll let you know…

@Warrick Yes, it is really strange. I have also tried saving to DOCX and then to PDF as in your code:

Document doc = new Document(@"C:\Temp\in.docx");
doc.Save(@"C:\Temp\out.docx");
doc.Save(@"C:\Temp\out.pdf");

and still the output PDF is rendered properly. Waiting for your inputs after testing.

I tried in C# and it’s fine. Both DOCX and PDF are identical.

The Aspose.Words.Dll I’m using with Progress is version 24.3.0 while that being used in C# is Net 6.0 version 23.2.0
So the older version is working but the later version is not it would appear.
Can you test in both these versions?

Sorry, C# is Net 6.0 24.2.0

@Warrick I have tested with both versions and the output PDF documents are the same:
out_24.2.pdf (106.6 KB)
out_24.3.pdf (106.6 KB)

I managed to fix it.
Using OpenEdge, I need to save the doc as DOCX, delete the Document object, re open the DOCX and then Save it as the PDF.

Why this makes the difference though, I have no idea.

@Warrick It is a really strange behavior. Have you tried changing the order of save operation, i.e. save as PDF first and then save to DOCX? There actually should not be any difference, but maybe there is some peculiarity in OpenEdge.
Also, you can try calling Document.UpdatePageLayout() before saving document to PDF, this forces Aspose.Words to rebuild document layout in case document layout has been build and cached previously.

Hi Alexey,
I’ve decided to move development solely to C# as OpenEdge is probably not the best environment for .NET code.
I’ve run into an intermittent issue when converting PDF’s to DOCX/XML. Sometimes a header such as REPORTING PROGRESS TO PARENTS is converted to REPORTINGPROGRESSTO PARENTS ie missing the spaces.
I tried setting up a test PDF for you to test with but that worked fine (of course!). I’m assuming the issue is due to the small gap between the words being skipped every so often and it does appear to only occur with the bolded headers in the doc.
We are data scraping from externally generated PDFs. Are there any settings at all that might fix this?

@Warrick Do you use Aspose.Words or Aspose.PDF for conversion from PDF to DOCX?

Aspose.Words:

Aspose.Words.Document doc = new Aspose.Words.Document(@"C:\Temp\in.pdf");
doc.Save(@"C:\Temp\out.docx");

Aspose.PDF:

Aspose.Pdf.Document doc = new Aspose.Pdf.Document(@"C:\Temp\in.pdf");
Aspose.Pdf.DocSaveOptions docSaveOptions = new Aspose.Pdf.DocSaveOptions();
docSaveOptions.Format = Aspose.Pdf.DocSaveOptions.DocFormat.DocX;
doc.Save(@"C:\Temp\out.docx", docSaveOptions);

Unfortunately, without a problematic document it is difficult to say what causes the problem.

We are using Aspose.Words but would Aspose.PDF be a better product for converting a PDF to a CSV?

@Warrick It is difficult to say what product will fit better for your needs. It would be better to test both products with tour real documents and see which gives better results.

Hi Alexey,
Actually, I’m finding using Apose.Words and saving as HTML works well with one exception. In a multi page PDF, if a table extends over > 1 page then the entire table is rendered as HTML paragraphs (to mimic a table) rather than an actual Table. Is there any way to prevent that happening? Would merging all pages into the PDF into just one page help? (if that’s indeed possible)

@Warrick Could you please attach your input and output document you have encounter the problem with? We will check the issue and provide you more information.
You should note that in PDF document, generally, there is no “table” concept, the content that looks like a table is represented using absolutely positioned content and lines that emulates borders.

Test.PDF (148.8 KB)
I cant upload the HTML but I just ran the following:

Document doc = new Document("test.pdf");
doc.Save("test.html");

Could you please delete the PDF after viewing. It it sanitized but I’d rather it not be available to all.

@Warrick Thank you for additional information. It is safe to attach files in the forum, only you as a topic starter and Aspose staff can access the attachments in the forum. So the files cannot be leaked to third-party persons.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-26859

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.