We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Text fragment at coordinates

Does the aspose .net library allow to fragment text at a specified coordinate location?
We have pdfs that need text to be erased at a specified coordinate location and replaced with spaces.
Currently we do whiteboxing of the rectangular area and overlay it with new data downstream. This causes overhead and is not an efficient solution.

Hi There,


Thanks for contacting support.

In order to add text at specified coordinates inside page please check the following code snippet. You may use TextBuilder class to append Textfragments into it at specified location/coordinates. I have also attached sample input/output documents which have been used in the following code snippet.

Document doc = new Document(dataDir + “SampleText.pdf”);<o:p></o:p>

// Write some text or spaces here

TextFragment textFragment = new TextFragment(" ");<o:p></o:p>

textFragment.Position = new Position(97, 605);

// Set text properties

textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.White);

textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Black);

TextBuilder textBuilder = new TextBuilder(doc.Pages[1]);

textBuilder.AppendText(textFragment);

doc.Save(dataDir + "SampleText_out.pdf");


You may also check "Add Text in PDF" section in our API documentation. In case of any further assistance please feel free to contact us.


Best Regards,

Dim pdf As New Aspose.Pdf.Document(PDFfilename)
Dim TextFragmentAbsorberAddress As New Aspose.Pdf.Text.TextFragmentAbsorber()
TextFragmentAbsorberAddress.TextSearchOptions.LimitToPageBounds = True
TextFragmentAbsorberAddress.TextSearchOptions.Rectangle = New Aspose.Pdf.Rectangle(lx, ly ux, uy)
pdf.Pages(1).Accept(TextFragmentAbsorberAddress)

                    For Each tf As Aspose.Pdf.Text.TextFragment In TextFragmentAbsorberAddress.TextFragments
                        tf.Text = ""
                    Next

                    pdf.Save(PDFfilename)

Figured it out…the above code will replace all text within the rectangular area and replace with spaces.

Hi There,


Thanks for your feedback and sharing the code snippet. It will be beneficial for others in order to implement the similar functionality. Please keep using our API and in case you have any other query please feel free to let us know. We will be more than happy to extend our support.


Best Regards,

Hello Support,
I have noticed an issue with the APOSE dll. I have code that extracts text and replaces it with empty string within a rectangular area.
The code works fine on smaller pdfs. It crashes on a PDF that is bigger that 20MB in size.

I do the following in code:
1)Open a csv file with the name of the pdf document.
2)Open the pdf file specified in the CSV file.
3) Extract text at the location specified.
4)I am running the code pasted below:

Dim License As New Aspose.Pdf.License
** License.SetLicense("\It1\IT\Automation\publish\Prerequisites\Aspose\Aspose.Total.lic")**
** Dim pdf As New Aspose.Pdf.Document(Path.Combine(PDFPath, SequenceStep.Sequence.CurrentFile.GetFileNameWithoutExtension & “.pdf”))**
** PDFfilename = Path.Combine(PDFPath, SequenceStep.Sequence.CurrentFile.GetFileNameWithoutExtension & “.pdf”)**
** While Not MyReader.EndOfData**
** currentRow = MyReader.ReadFields() //row from CSV file that gives the page number for extraction**
** Dim TextFragmentAbsorberAddress As New Aspose.Pdf.Text.TextFragmentAbsorber()**
** TextFragmentAbsorberAddress.TextSearchOptions.LimitToPageBounds = True**
** TextFragmentAbsorberAddress.TextSearchOptions.Rectangle = New Aspose.Pdf.Rectangle(Specs(0), Specs(1), Specs(2), Specs(3))**
** pdf.Pages(CInt(currentRow(4))).Accept(TextFragmentAbsorberAddress)**
** For Each tf As Aspose.Pdf.Text.TextFragment In TextFragmentAbsorberAddress.TextFragments**
** If Not String.IsNullOrWhiteSpace(tf.Text) Then**
** tf.Text = “” //replace text with empty string**
** End If**
** Next**

** End While**
** pdf.Flatten()**
pdf.Save(PDFfilename)

Please advice on a solution as this is holding up a crucial release.

Thanks,
Shilpa

@sjoshi

Thanks for contacting support.

Would you please share your sample PDF document with us. We will test the scenario in our environment and address it accordingly.

Furthermore, please also share the values of Specs variable in your above code. This would help us testing the scenario accordingly. In case your sample PDF document is more than of 3.0MB size, you may please upload it to Google Drive or Dropbox and share the link with us.

Hello,
Is there any other way we can trouble shoot this problem as we will not be able to share the pdf due to confidentiality(HIPPA) regulations.

Thanks!

@sjoshi

Thanks for writing back.

We have tested the scenario with one of our sample PDF documents (i.e 32MB) and were unable to notice any issue. Please note that sometimes issue can be document specific and in order to replicate the issue, we need that specific document. In case you cannot share document here, you may please send it in a private message. This way it will only be accessible by Aspose Staff.

Please also confirm if your program crashes while processing single document of 20MB size, OR your program/code is processing more than one file and crashes at some specific PDF document? You may please narrow down your use case and share only problematic source document with us. In case the issue is related to our API, we will definitely address it accordingly.

Hello,
I am processing only one PDF at a time. The PDF has 39MB which has 8000+ pages. The first 4000 pages process in around 20 mins but the next 4000 crash my computer.
I am deleting text out on every other page, the text that gets deleted is inside the rectangular area defined by specs.
I have all the page numbers to be cleaned in a list and sending in the list to the method above. The ASPOSE method references page numbers(1,3,5,7,9…as I am deleting content from every other page) from the list and replaces text on the pages(1,3,5,7,9…)with an empty string.

@sjoshi

Thanks for sharing further details.

It seems document specific issue and we need that document along with complete list of pages and coordinates, where text needs to be replaced with empty value. We will test the scenario in our environment and address it accordingly. We assure you that we do not disclose shared documents with anyone and erase/discard them soon after investigating the scenario.