I am attempting to split 1 pdf document with x pages into x pdf documents with 1 page. When these pdf documents contain images, I am finding that the resulting pages contain all of the embedded resources from the original document. According to this post (<a href="https://forum.aspose.com/t/97576) OptimizeResources is supposed to strip all unused resources from a Pdf, however, I am not getting these results.
Words.Document wordDoc = new Words.Document("ImageAndText.docx"); MemoryStream wholePdfStream = new MemoryStream(); wordDoc.Save(wholePdfStream, Words.SaveFormat.Pdf); wholePdfStream.Seek(0, SeekOrigin.Begin); Pdf.Document wholePdf = new Pdf.Document(wholePdfStream); foreach (Pdf.Page page in wholePdf.Pages) { Pdf.Document singlePagePdf = new Pdf.Document(); singlePagePdf.Pages.Add(page); singlePagePdf.OptimizeResources(); singlePagePdf.Save(String.Format(@"pages\{0}.pdf", page.Number)); }
The result of this code is:
FileName | FileSize |
---|---|
1.pdf | 1370kb |
2.pdf | 1380kb |
3.pdf | 1380kb |
4.pdf | 1370kb |
As you can see, the pages with only text are actually larger as they are storing the images and text.
When I use Microsoft Word to save the document as Pdf first and then run the following code:
Pdf.Document wholePdf = new Pdf.Document("ImageAndText.pdf"); foreach (Pdf.Page page in wholePdf.Pages) { Pdf.Document singlePagePdf = new Pdf.Document(); singlePagePdf.Pages.Add(page); singlePagePdf.OptimizeResources(); singlePagePdf.Save(String.Format(@"pages\{0}.pdf", page.Number)); }
The result of this code is:
FileName | FileSize |
---|---|
1.pdf | 172kb |
2.pdf | 81kb |
3.pdf | 81kb |
4.pdf | 193kb |
This makes me thing that there is something wrong with the way that Aspose.Words is saving the document when saving as a Pdf.
I have attached both the original Word file and the pdf file I created using Microsoft Word.