Aspose TextFragmentAbsorber with Regular expression - Not working in specific cell

Hi,

I face the below issue in a .pdf file while using Aspose. Actaully we are converting excel file into pdf and uploading into our application. Then we need to find the content inside of { and } and also from [ to ] in the uploaded pdf file. Aspose is working fine for all the pages in the pdf. But it is not able detect the open "[" and close "]" brackets in a particular cell in a specific page.
Code we are using to find the format is :
Find [ and ] in a page
---------------------
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("\\[[^\\[\\]]*\\]");

Could someone find me the solution. I have placed the screenshot of the file below for your reference.

Thank you in advance.
Eswari G

Hi Eswari,


Thanks for your inquiry. I am afraid we can’t suggest you anything without looking into your document. Please share your sample document here, so we will test the scenario and will provide you some information.

We are sorry for the inconvenience caused.

Best Regards,

Hi


Here with I have attached the sample excel source file with the format that I have faced issue.
we are uploading and using after Converting this excel into pdf .

"[" not getting find by Aspose in “[Note” section.

Thanks
Eswari G

Hi Eswari,

Thanks for sharing the resource file.

I have tested the scenario using Aspose.Pdf for .NET 8.8.0 where I have used the following code snippet and as per my observations, the text is being replaced. For your reference, I have also attached the resultant PDF generated at my end.

[C#]

//open document
Document pdfDocument = new Document("c:/pdftest/outSource+File1.pdf");

//create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("\\[[^\\[\\]]*\\]");

//set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;

//accept the absorber for all the pages
pdfDocument.Pages.Accept(textFragmentAbsorber);

//get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

//loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
    //update text and other properties
    textFragment.Text = "!";
    textFragment.TextState.Font = FontRepository.FindFont("Verdana");
    textFragment.TextState.FontSize = 22;
    textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.Blue;
    textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.Green;
}

pdfDocument.Save("c:/pdftest/TextReplace_output.pdf");

Hi

Thanks for your quick response. I have tried with your code, but still the same issue exists. You have mentioned that you are using 8.8.0 version but we are using 8.4.0 version.
Please let me know whether the problem is because of the version ?
Also help me in resolving the issue using 8.4.0 version.

Thanks
Eswari G

Hi Eswari,


Thanks for your feedback. I am afraid we are unable to replicate the issue while testing the scenario with Aspose.Pdf for .NET 8.4.0 3.5 Client Profile DLL over Windows 7 (64 bit). We have converted your xlsx file to PDF using Aspose.Cells for .Net and then replaced [ ] using Aspose.Pdf for .NET 8.4.0. Please share your environment details so we will try to replicate the issue at our end and provide you more information accordingly.

We are sorry for the inconvenience caused.

Best Regards,

Hi,


We are using the same environment that you have used.
We are manually converted excel file into pdf(save as -> PDF) and then we are uploading into our application. Please help me on this issue.

Thanks,

Eswari G

Hi Eswari,

Thanks for sharing additional information. We have converted the xlsx file to PDF using the Save As option and tested the scenario with the latest version of Aspose.Pdf for .NET 8.8.0. We have managed to reproduce the reported issue and logged it in our bug tracking system as PDFNEWNET-36387 for further investigation and resolution. We will notify you via this thread as soon as it is resolved.

We are sorry for the inconvenience caused.

Best Regards,

Hi,


Thanks for your response.

In our scenario, we are manually converting current selected cells from the worksheet from an Excel workbook into a pdf and then uploading into the .NET application.

But when I try to convert a sheet using Aspose.Cells, it is converting the whole workbook not an activeIndex sheet.

I need to select a particular set of cells in a particular worksheet of my Excel workbook and convert that into a .pdf using Aspose.cells.

Please help me on this issue.

Thanks,
Eswari G



eswari1004:
In our scenario, we are manually converting current selected cells from the worksheet from an Excel workbook into a pdf and then uploading into the .NET application.

But when I try to convert a sheet using Aspose.Cells, it is converting the whole workbook not an activeIndex sheet.

I need to select a particular set of cells in a particular worksheet of my Excel workbook and convert that into a .pdf using Aspose.cells.

Please help me on this issue.
Hi Eswari,

Since your above stated requirement is related to Aspose.Cells, so I am moving this thread to another forum where my fellow workers from respective team will further look into the details of this problem. Soon you will be updated with the required information.
Hi,

eswari1004:
Hi,
But when I try to convert a sheet using Aspose.Cells, it is converting the whole workbook not an activeIndex sheet.

I need to select a particular set of cells in a particular worksheet of my Excel workbook and convert that into a .pdf using Aspose.cells.

Please help me on this issue.


Well, you may hide your unwanted worksheets in the workbook and make your desired sheet visible before rendering to PDF for your needs, see the document/ article for your reference: http://www.aspose.com/docs/display/cellsnet/Save+Each+Worksheet+to+a+Different+PDF+File. Also, you may specify the printable area using PageSetup options to render only your specified cells range for the worksheet for your requirements, see the sample code for your reference:
e.g
..............
//Obtaining the reference of the PageSetup of the worksheet
PageSetup pageSetup = worksheet.PageSetup;

//Specifying the cells range (from A1 cell to T35 cell) of the print area
pageSetup.PrintArea = “A1:T35”;


Thank you.

Hi


Thanks for your immediate response now. It helped me a lot.

I have another issue. When I tried to find some contents inside “[” and “]”, Aspose will find all the text contents and removed all those things. But it will not find “__________”(underline formatting) inside the specific sections. So underline is not getting removed .

Please help me on this issue.

Thanks,
Eswari G

Hi,


Please provide more details and tell us which Aspose product you are using, I suspect you are using Aspose.Pdf to find the contents. So, please provide your template file and sample code to show the issue. After getting details, we will help you accordingly.

Thank you.

Hi


Please find the sample file of for the Issue.

I want to remove the text contents between “[” and “]”. For Ex: “Revised Men’s” to “clea” text got removed, but underline is still exists.

I am using Aspose.pdf to convert this word document into Pdf and doing the removal operation.

Please help me on this issue.

Thanks
Eswari G

Hi Eswari,


As you have shared a .doc file in your earlier post, so you may consider using Aspose.Words to search and replace the desired text. I am afraid Aspose.Pdf cannot manipulate MS Word files. I am moving this thread to another forum where my fellow workers form respective team will further look into this matter.
eswari1004:
I am using Aspose.pdf to convert this word document into Pdf and doing the removal operation.
Hi Eswari,

By the way, I am still not certain about your above stated requirement. Are you performing text replace operation after the DOC file is converted into PDF format ?

Hi Eswari,


Can you please share the code snippet which you are using to replace the text in PDF file, so that we can again test the scenario at our end. Also please share that either you are converting .DOC file to PDF format using Aspose.Words for .NET or you are manually saving the .DOC file to PDF format using MS Word.

Hi,


I am using Aspose.Pdf to find and remove the text from pdf file itself.

Hi

Please ignore my previous reply. Sorry

My application will convert the document/excel into pdf and then it will find and replace the text from the PDF itself.

For above two functionality I am using Aspose products only.

My Code:


//open document
Document pdfDocument = new Document("c:/pdftest/outSource+File1.pdf");
//create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("\\[[^\\[\\]]*\\]");
//set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
//accept the absorber for all the pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
//get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
//loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
//update text and other properties
textFragment.Text = "";
}
pdfDocument.Save("c:/pdftest/TextReplace_output.pdf");


Thanks,
Eswari G

Hi Eswari,


Thanks for sharing the code snippet.

I have tested the scenario and I am able to
notice that underline remains in PDF file after text replace. For the sake of correction, I have logged this problem
as PDFNEWNET-36417 in our issue tracking system. We will further
look into the details of this problem and will keep you updated on the status
of correction. Please be patient and spare us little time. We are sorry for
this inconvenience.