Split PDF into single PDF files

toschf71de · November 15, 2017, 11:58am

We are currently evaluating Aspose.PDF.

This is the functionality we are looking for:
One of our applications generates a PDF file with data for single users. The report for each user can be 1-3 pages long.
We want to split the large file and generate a PDF file for each user.
Since the data for a user can be 1-3 pages long we would have to check if the user id or name is somewhere on the page and make the user pdf accordingly.

Is this possible with Aspose.PDF?

Cheers for your help
Thomas

asad.ali · November 15, 2017, 5:50pm

@toschf71de

Thanks for contacting support.

Please use following code snippet, in order to fulfill your requirement:

// Load the input Document
Document doc = new Document(dataDir + "SomeLargePDF.pdf");
// Following list contains all user ids or names which are present in the PDF
List<string> UserIDsOrNames = new List<string>();

foreach(string username in UserIDsOrNames)
{
 // A new document object
 Document document = new Document();
 foreach(Page page in doc.Pages)
 {
  TextFragmentAbsorber tfa = new TextFragmentAbsorber(username);
  page.Accept(tfa);
  // check if page has username on it
  if (tfa.TextFragments.Count > 0)
      document.Pages.Add(page);
 }
 document.Save(dataDir + username + ".pdf");
}

In case you still face any issue, or above suggested code does not perform what you actually require, please share your sample PDF document and some more details about the scenario. We will test the scenario in our environment and share our feedback accordingly.

toschf71de · November 16, 2017, 1:13pm

thanks for that, basicall works fine but I can only generate four single pdf files. Is that a limitation of the evaluation version?

I have a text like ‘EmpKey:[x]’ on every page.
Now the program loop through all usernames and then loops through the whole input pdf for every username is there a way to find the output pages.
Is there a way to find ‘Empkey:[’ (start of the keyword) on every page, find what follows until then closing bracket ‘]’ and use this to generate the output pdf. This way I only have to loop through the input pdf once.

Thomas

asad.ali · November 16, 2017, 6:21pm

@toschf71de

Thanks for your inquiry.

You can also perform search, based upon Regular Expressions. Please use following code snippet in order to search the PDF as per your requirements:

Document doc = new Document(dataDir + "SomeLargePDF.pdf");
TextFragmentAbsorber tfa = new TextFragmentAbsorber(@"^EmpKey:\[.*\]$");
tfa.TextSearchOptions = new TextSearchOptions(true);
doc.Pages.Accept(tfa); 
foreach (TextFragment tf in tfa.TextFragments)
{
  // This page has EmpKey:[x] on it
  Page page = tf.Page;
}

In case of any further assistance, please feel free to contact us.

toschf71de · November 17, 2017, 8:43am

Works fine, thanks for the help.

My sample app only generates 3 output PDFs. Is this a limitation of the evaluation version?

Thomas

asad.ali · November 17, 2017, 2:24pm

@toschf71de

Thanks for your kind feedback.

We are sorry for not including information about this in our previous response. Due to limitation of trial version, you can only process 4 elements of any collection (e.g Pages, Annotations, Paragraphs, etc.) in the API. In order to have access to complete features of API, please consider applying for 30 days temporary license. This way you can evaluate API with full access to all features, it offers.

In event of any further query, please feel free to ask.

toschf71de · November 17, 2017, 2:52pm

Fantastic, works like a charm and lightning fast.
I will order this tool early next week.

Thomas

asad.ali · November 17, 2017, 5:23pm

@toschf71de

Thanks for your feedback and choosing our API.

It is good to know that your required functionality is achieved by suggested approach. Please keep using our API and in event of any other query, feel free to create a new topic in our forum. We will be more than happy to assist you accordingly.