Hi Aspose Team,
We are facing a problem while indexing(capturing text) a PDF file for a particular zone.
Below is the code where you can find the zone values:-
RedactionAnnotation annot = new RedactionAnnotation(pdfDocument.Pages[i], new Aspose.Pdf.Rectangle(92.16, 509.76, 391.68, 601.92));
Problem:-
-> When we are trying to extract the particular zone value, it is not capturing completely. If there are three lines in that zone then two lines only coming. The 1st line is not coming.
-> The 1st line is coming on the zone line which Aspose.PDF unable to read that line. We want the entire content inside the co-ordinate and on the co-ordinate also.
-> We are capturing the text for a particular zone and save it in a csv file. After seeing the csv file there is only two lines not three lines but actually there is three lines are there in the pdf file in the given zone.
-> Could you please find the attached zip file where you can find the input.pdf and output.pdf.
input.zip (723.8 KB)
-> In output.pdf you can find the zone in yellow color.
Code:-
public void getText()
{
Aspose.Pdf.License licencepd = new Aspose.Pdf.License();
licencepd.SetLicense(Convert.ToString(ConfigurationManager.AppSettings[“AsposeLic”]));
string directory = @"D:\";
string filename = "input.pdf";
using (Document pdfDocument = new Document(directory + filename))
{
int count = pdfDocument.Pages.Count;
for (int i = 1; i <= count; i++)
{
//Create TextAbsorber object to extract text
Aspose.Pdf.Text.TextAbsorber absorber = new Aspose.Pdf.Text.TextAbsorber();
absorber.TextSearchOptions.LimitToPageBounds = false;
absorber.TextSearchOptions.Rectangle = new Aspose.Pdf.Rectangle(92.16, 509.76, 391.68, 601.92);
// Accept the absorber for first page
pdfDocument.Pages[i].Accept(absorber);
// pdfDocument.Pages[1].Accept(absorber);
// Get the extracted text
string extractedText = absorber.Text;
// Create RedactionAnnotation instance for specific page region
RedactionAnnotation annot = new RedactionAnnotation(pdfDocument.Pages[i], new Aspose.Pdf.Rectangle(92.16, 509.76, 391.68, 601.92));
annot.FillColor = Aspose.Pdf.Color.LightYellow;
annot.BorderColor = Aspose.Pdf.Color.Green;
annot.Color = Aspose.Pdf.Color.Blue;
Border border = new Border(annot);
border.Width = 5;
border.Dash = new Dash(1, 1);
annot.Border = border;
annot.Rect = new Aspose.Pdf.Rectangle(92.16, 509.76, 391.68, 601.92);
// Text to be printed on redact annotation
annot.OverlayText = "REDACTED";
annot.TextAlignment = Aspose.Pdf.HorizontalAlignment.Center;
// Repat Overlay text over redact Annotation
annot.Repeat = false;
// Add annotation to annotations collection of first page
pdfDocument.Pages[i].Annotations.Add(annot);
// Flattens annotation and redacts page contents (i.e. removes text and image
// Under redacted annotation)
annot.Redact();
}
directory = directory + "\\output.pdf";
pdfDocument.Save(directory);
}
}
Note:-
-> We are using Aspose.Total 18.5 for .Net.
Please help us on the above issue. We are in a critical condition.
Thanks & Regards,
Santosh Kumar Panigrahi