HRemove_line_Properly_Test_doc.pdf (382.0 KB)
Hi ,
@Farhan.Raza
Thanks for the suggestion,
but i guess " [Search and Get Text from all pages using Regular Expression]" ,
will not work for my scenario , we are having some validation rules to check before removing line numbers
My original Requirement is to Check whether the PDF is having Line number or not
if line number is present in PDF then i need to remove line number.
Please check the sample document attached.
Technically first i need to validate the PDF document ,
To make sure PDF is having valid Line Numbers
validation rules are
— > Line Number should be In Ascending order
----> it is the First element of each line
----> for Header and footer section there is no line number
----> check all pages are having line number
-----> No separate line numbers for each pages
If above validate rule is success then i need to remove line number
Here i am taking each line and
checking whether line number is valid
if it is valid
then i need to remove the line number
and save the PDF
please check the code Below
private static bool IsValidThenRemoveLineNumbers(Document pdfDocument) {
var extractOption = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
var textAbsorber = new TextAbsorber(extractOption);
pdfDocument.Pages[0].Accept(textAbsorber);
var extractedtext = textAbsorber.Text;
var pdfTextContents = extractedtext.Split(’\n’);
var _pattern = @"^[ ]*\d+";
var lineNumber = 0;
var prevLineNumber = 0;
var isValid = false;
foreach (var content in pdfTextContents)
{
// if only null Element or white space then Ignore the line
if (string.IsNullOrWhiteSpace(content))
{
continue;
}
// Check Line Number exist In the Line
var result = Regex.Match(content, _pattern);
// Ignore Header And Footer
if (!result.Success && lineNumber > 0)
{
return false;
}
//Remove LineNumber If Valid
if (result.Success)
{
if (ConvertToInt(out lineNumber, result.Value))
{
// Return Invalid if line number is not in Ascending Order
if (lineNumber == prevLineNumber + 1)
{
isValid = true;
prevLineNumber = lineNumber;
// Require a Solution to remove line number
}
else
{
return false;
}
}
}
}
return isValid;
}
if Valid then Save the PDF
**Here after validating the line **
i need to remove the line - i didn’t find a solution to remove the text
please suggest a solution to remove the line number
regards