I’m experiencing the issue that extra blank pages are added to pdf and html when converting word doc using Aspose.Word. The version I used was v17.6.0 and it added 4 blank pages. I have upgraded to v20.2.0 but it still adds 1 extra page. The locations of blank pages are exactly same between html and pdf - 4 blank pages locations and 1 blank page location both. Can I get some advice?
Please ZIP and upload your input Word DOC document and Aspose.Words generated PDF and HTML files showing the undesired behavior here for testing. We will then investigate the issue on our end and provide you more information.
Please see the file attached which includes word doc and PDF file converted. I altered the word file to remove sensitive data and I still could duplicate the issue. You can see the extra blank page in page 10 in PDF file. It is converted using Aspose.Word v20.2.0. Thank you for the help.
Word and PDF Converted.zip (291.0 KB)
Please convert your “Word To Convert.docx” document to PDF format by using Save As command of MS Word and attach the PDF file here for further testing. Please also ZIP and attach the following font files here for further testing. Thanks for your cooperation.
- Open Sans
Please see the file attached. It includes a PDF file “saved as PDF” in Word and Open Sans font files. I found out I did not have the Open Sans fonts installed on my local so I downloaded them and sending to you. The PDF file is created before installing the fonts.
Thanks for your help.
Word PDF and Fonts.zip (612.1 KB)
We have installed the fonts you provided (OpenSans-Bold & OpenSans-Semibold are still missing) and then converted your Word document to PDF format by using MS Word 2019 and Aspose.Words 20.2 and attached them here for your reference:
The two PDF files look identical. So, this seems to be an expected behavior as Aspose.Words mimics the behavior of MS Word in this case. Please let us know if we can be of any further assistance.
I found out that converting the original document to PDF using the built-in Save-AS Word feature on my local still adds a blank page to different location even after I installed the fonts. (The document I sent you was the part of the original one with fake data to duplicate issue).
Also I found out our server already had those fonts installed and was running Aspose.Word. The server with Aspose.Word 20.2.0 generates a PDF with a blank page in the same location as the PDF generated on my local.
This case, I wonder if we still have any opportunities to remove the blank pages with Aspose.Word even though it mimics the behavior of MS Word.
I think, you can first convert Word document to PDF format by using Aspose.Words for .NET and then using Aspose.PDF for .NET API do the following:
- Detect blank pages in PDF (see Page.IsBlank Method)
- Remove such empty Pages from PDF (see PageCollection.Remove Method)
For any more details about post-processing generated PDF files, please refer to Aspose.PDF documentation.
Thank you for the advice.
We actually have a lot of Word docs to convert PDF and HTML daily basis and I’m not sure if it is good idea that removing blank pages from PDF without knowing it is intended or not.
Also we found out the same issue with converted HTML - adding blank pages - and it is the same location as converted PDF with this particular Word doc.
I hope there are better solutions we can use but if that is the best could you provide some idea of removing blanks pages from converted HTML like PDF case?
Generally speaking, one solution would be as follows:
- Convert Word to PDF using Aspose.Words
- Load/Open above PDF using Aspose.PDF (in memory)
- Remove blank pages from above PDF object
- Instead of saving to PDF, save the object in Word format (Aspose.PDF supports DOC DOCX and Plain HTML Formats)
- Load above DOC or DOCX with Aspose.Words again
- Save to plain HTML or HTML Fixed formats
Hope, this helps.
Thank you for the support. We will discuss your solutions.