What is the mechanism for splitting PDF while searching for a phrase and naming the split PDF with another word found on each page programmatically using C# and Aspose.Pdf?
You can split PDF document on the basis of pages. There is no such direct functionality in the API to split the document on the basis of content like text inside pages. Also, as per the structure of PDF file format, it would be a complex feature to split and re-arrange the content. Nevertheless, if possible - can you please share your sample PDF along with an expected output for our reference?
Whenever I find this specific phrase “τρέχουσα ληξιπρόθεσμη δόση”, I want to split the pdf into a separate file. I want to split this large pdf file into N pdf files if I find it N times. I should name its Pdf file after the word found in it, which is “ΑΡΙΘΜ. ΛΟΓΑΡΙΑΣΜΟΥ ΔΑΝΕΙΟΥ: 4044080780”. During pages, this number changes. It’s actually the loan number. It’s a pdf file, and I want to name it with the loan number followed by the date, which appears on the name of the pdf file. I hope I have explained this clearly. I have uploaded a sample of my pdf file.
IRKT00_Report_10062021_00001.pdf (154.7 KB)
Please check the below code snippet with the attached output PDFs and let us know if this helps:
private static void SplitPDF(string dataDir)
{
// Load the PDF document
Document pdfDocument = new Document(dataDir + "IRKT00_Report_10062021_00001.pdf");
// Search for the specific phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@"ΔΑΝΕΙΟΥ:\s+(\d+)", new TextSearchOptions(true));
pdfDocument.Pages.Accept(textFragmentAbsorber);
// Iterate through the found occurrences
foreach (TextFragment textFragment in textFragmentAbsorber.TextFragments)
{
// Get loan number from the page
string loanNumber = GetLoanNumber(textFragment);
// Create a new PDF document for the extracted content
Document newPdfDocument = new Document();
newPdfDocument.Pages.Add(pdfDocument.Pages[textFragment.Page.Number]);
// Save the new PDF document
string outputFileName = $"{loanNumber}.pdf";
newPdfDocument.Save(dataDir + outputFileName);
}
}
static string GetLoanNumber(TextFragment text)
{
// Implement your logic to extract the loan number based on the page number
// You might need to use TextFragmentAbsorber or other techniques to find and extract the loan number
// Replace the following placeholder logic with your actual logic
return text.Text.Replace(" ", "").Replace("ΔΑΝΕΙΟΥ:", "");
}
SplitPDF.zip (387.1 KB)
Thank you! It works. I enrich the code, so it will search the other phrase too. Thanks again!
Its nice to know that you are able to achieve your requirements. Please keep using our API and feel free to let us know in case you need further assistance.
Sure, I will. Thanks again!