Get text at a position in a PDF

I’m trying to find a method that allows for two different methods of PDF redaction. I have files where an exact phase is being replaced and another where I have a fixed phase and a varying value.

The method outlined at Replace Text in PDF|Aspose.PDF for .NET is working well when the string is a known value.

The other method in question needs to match a string and allow for text in proximity to it to be modified for redaction. The example I’m working with is along the lines of the string “GPA:3.5”. I need to find “GPA” (which I can do currently) and redacted the text to the right of its position. The value for GPA will vary in each of the documents being processed.

I’ve dug through the documentation and can’t find a technique to access a fragment based upon position.

I maybe approaching the problem incorrectly in terms of how Aspose allow interaction with the text in a PDF. If there is an alternate approach let me know.

Thanks.

Hi Brain,

Thank you for your interest in our products.

You may check the following documentation links for more details and code snippets about working with text in existing PDF documents.

Working with Text

Working with Text (Facades)

if you still face any problem, kindly share the sample source code and template documents you are using or create a sample application to show the issue. This will help us to figure out the issue and reply back to you soon.

We apologize for your inconvenience.

Thanks & Regards,

Rashid,

Thanks for the reply. I’ve looked through the two links and the various examples. I don’t see how it would addresses my scenario.

I’ve attached an example file with test data. In the attached file I need to be able to redact the SAT, ACT, MCAT, and DAT data blocks. The blocks will vary in content and position as an application could have none, one, or many of the elements in question. Simple text extraction, pattern matching, or fixed position replacement isn’t sufficient in this case.

I was hoping I could match a fixed string value of the headers like “Vrbl” in the SAT block. Then get the position of the MCAT header and be able to access and redact the text between those positions. This approach may not be possible so I’m open to any suggestions.

For fixed string values or even patterned data the redaction works great. I was considering doing a wild card pattern to get access to all the string fragments and trying to figure out if one is question needed to be redacted, but this approach looks problematic.

Thanks for any assistance.

Hi Brain,

Thank you for sharing the template document, as per my understanding you have different SAT, ACT, MCAT, and DAT data blocks. Number of blocks and position of blocks can be vary in PDF document. You want to replace the text of "Vrbl" to say "Phy" based on SAT and replace the text of "Vrbl" to say "Chm" based on MCAT. Kindly correct me if I am wrong and if I am not wrong, then I am very sorry to say that currently, there are no direct means available in Aspose.Pdf to fullfil your requirement.

Please feel free to contact support in case you need any further assistance.

Thanks & Regards,

Rashid,

You’ve almost got it. I’m wanting to find a known string like “Vrbl” and use its position to access and modify other string values in proximity. I thought it might be a long shot.

Thanks for looking into it.

Hi Brain,

Thanks for sharing the details, I have logged it for further investigation with ID: PDFNEWNET-33474 in our issue tracking system. Our development team is looking into this feature and you will be updated via this forum thread on the status of correction.

We apologize for your inconvenience.

Thanks & Regards,

Rashid,

No worries and thanks for looking into it.

Hi Brain,


Thanks for your patience. Please note Aspose.Pdf provide no direct way to find and replace text basing on position. However, TextFragments has properties Position and Rectangle that allow to map fragment on the page. Also you may specify Rectangle property of TextSearchOptions of TextFragmentAbsorber to limit zone when text searching (or segments absorbing) will be done.

You want to find a known string like “Vrbl” and use its position to access and modify other string values in proximity. There are at least two way to do this using Aspose.Pdf.

First way: it is possible to find string like “Vrbl” and determine its position. Next you should to form rectangle basing on position in direction you need.
Next you can perform new search in the borders of rectangle to find segments near initial.

In following example we search for top string “Vrbl” and change value of first text fragment that located below initial:


//open document<o:p></o:p>

Document pdfDocument = new Document(myDir + "John_Doe_Profile.pdf");

Page page = pdfDocument.Pages[1];

//create TextFragmentAbsorber object to find all instances of the input search phrase

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("Vrbl");

//accept the absorber for all the pages

page.Accept(textFragmentAbsorber);

//get the extracted text fragments

TextFragmentCollection initialCollection = textFragmentAbsorber.TextFragments;

TextFragment initFragment = initialCollection[1];

//select top 'Vrbl' fragment

foreach (TextFragment textFragment in initialCollection)

{

if (textFragment.Position.YIndent > initFragment.Position.YIndent)

initFragment = textFragment;

}

//create rectangle below of 'Vrbl' fragment

Aspose.Pdf.Rectangle rect = new Aspose.Pdf.Rectangle(initFragment.Rectangle.LLX,

initFragment.Rectangle.LLY - initFragment.TextState.FontSize * 2,

initFragment.Rectangle.URX,

initFragment.Rectangle.LLY - 2);

//recreate TextFragmentAbsorber

textFragmentAbsorber = new TextFragmentAbsorber();

TextSearchOptions options = new TextSearchOptions(rect);

textFragmentAbsorber.TextSearchOptions = options;

//accept the absorber for all the pages

page.Accept(textFragmentAbsorber);

//get the extracted text fragments

TextFragmentCollection newCollection = textFragmentAbsorber.TextFragments;

TextFragment fragmentBelow = null;

if (newCollection.Count > 0)

fragmentBelow = newCollection[1];

//select top segment below from 'Vrbl' fragment

foreach (TextFragment fragment in newCollection)

{

if (fragment.Position.YIndent > initFragment.Position.YIndent)

fragmentBelow = fragment;

}

//replace fragment text

if (fragmentBelow != null)

fragmentBelow.Text = "777";

pdfDocument.Save(myDir + "33474_out.pdf");

Second way: try our new feature (TableAbsorber class) to find tables and table elements in existing PDF-document.

Please feel free to contact us for any further assistance.

Best Regards,

The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan