Splitting PDF into Single pages

edmonton · August 16, 2010, 5:13pm

Is there a faster way to extract all the pages of a PDF document into single PDF documents?

The current documentation suggests the following:

System.IO.MemoryStream[] outBuffer = pdfEditor.SplitToPages(inFile1);

Then looping through the memorystream and saving each page. This works great but the initial SplitToPages takes an extremely log time if you are dealing with huge pdf files.

This message was posted using Aspose.Live 2 Forum

shahzadlatif · August 18, 2010, 4:13am

Hi Ashneel,

I’m afraid, SplitToPages is the only method that allows you to split a PDF file into individual pages. I would also like to add that the time taken by this method depends upon the size of the input PDF file and the number of pages. However, if you think it is taking too much time for some particular file then please share that problematic PDF file with us, so we could try to improve the performance.

We’re sorry for the inconvenience.
Regards,

edmonton · August 24, 2010, 5:24pm

Hello,

Thank you for the quick response. I was trying to split a 1,368 page pdf into single pages and after 8 hours I simply stopped the process becuase it was taking too long. At this point it was still trying to load it into memory. I'm writing a Windows Services that will process x number of PDF's. However I did find a work around that I would like to share. I basically used 2 of your functions: editor.Extract() & editor.SplitToPages().

What I did was I created a temp directory and used editor.Extract() function to spilt the PDF's into 100 page pdf's. I basically took the total PDF, got the total page numbers and looped through and saved each 100 pages into its own pdf. Then I looped through the directory and used the editor.SplitToPages() on each 104 pdf's and saved it in another directory. After wards I deleted the temp directory. This brought the time down significantly and processed 1,0368 pdf's in half an hour.

Thanks,

Ashneel

edmonton · August 24, 2010, 5:27pm

Hello,

I do have another problem, there are certain pdf that the txt extraction doesn't seem to working properly. I've upload the pdf file and the outputed text file. Please let me know if you have an answer.

Thanks,

Ashneel

shahzadlatif · August 25, 2010, 4:58am

Hi Ashneel,

I’m unable to downoad the text file successfully. Can you please zip and upload the file again? Also, please elaborate what problem you’re facing while extracting text.

We’re looking forward to help you out.
Regards,

edmonton · August 25, 2010, 10:12am

Hello,

I've attached a zipped folder containing both pdf and extracted text file. The problem is the text file contains a lot of unreconizable characters. I've tried specifying different Encoding when doing the Extract but still have the same problem.

Thanks,

Ashneel

shahzadlatif · August 26, 2010, 4:27am

Hi Ashneel,

I have reproduced this problem at my end and logged it as PDFKITNET-19601 in our issue tracking system. Our team will look into this issue and you’ll be updated via this forum thread once it is resolved.

We’re sorry for the inconvenience.
Regards,

aspose.notifier · October 4, 2010, 3:16am

The issues you have found earlier (filed as 19601) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.