Fetch data from PDF based on the background color

Hi,

I have a PDF file, which has different data under a different background colors.
So now, will I be able to fetch data based on these specific background colors?
It also has lines separating each block of data. I would also like to know whether those line breaks be identified and how?
I have attached a sample file which explains my requirement.

Request your help in the above mentioned scenario.

Thanks & Regards,
Jacob

Hi Jacob,


Thanks for your inquiry. You can get text fragment Background property and accomplish your task. Please check following documentation link to find text fragment properties. Hopefully it will help you to accomplish the task.


Please feel free to contact us for any further assistance.

Best Regards,

Dear Tilal,

Thanks for clarifying the queries.
But we have a requirement like we need to fetch the data based on the blocks of table or borders (as given in the diagram) and colors as well.
For example, If the box or block of table (as given in the diagram ) is in white color, then we need to read the data inside it. So Is there any way i can identify the blocks of table or Boxes.

Request your help in this scenario.

Thanks & Regards,
Jacob

Hi Jacob,


Adding more to Tilal’s comments, you need to search through the complete document and get all the text using Regular Expression and then determine the background property of each TextFragment and distinguish text based on background color properties. For more information, you may consider visiting Search and get Text from all pages using Regular Expression


In order to search all the strings (parse all strings) inside PDF document, please try using following regular expression.
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@“[\S]+”);

Jacobvincent:
.
But we have a requirement like we need to fetch the data based on the blocks of table or borders (as given in the diagram) and colors as well.
For example, If the box or block of table (as given in the diagram ) is in white color, then we need to read the data inside it. So Is there any way i can identify the blocks of table or Boxes.


Hi Jacob,

Thanks for your inquiry. I am afraid currently Apsose.Pdf does not support to manipulate existing tables. We have already logged a feature request, PDFNEWNET-36802, for the same. We have linked your request to the issue id and will notify you as soon as it is resolved…

We are sorry for the inconvenience caused.

Best Regards,

Dear Tilal,

Do we have any update on the above given requirement?

Regards,
Jacob

Hi Jacob,


Thanks for your patience.<o:p></o:p>

The development team has been busy resolving other priority issues and I am afraid the above stated requirement to manipulate tables in existing PDF files is not yet supported. Nevertheless, we have requested the development team to share any possible ETA. As soon as we have some updates regarding its resolution, I would be more than happy to update you with the status of correction. Please be patient and spare us little time.

We are sorry for this delay and inconvenience.

Hi Jacob,


Thanks for your patience.

The development team has further investigated the issue reported earlier and I am afraid it is quite difficult to implement this feature in common scenario because most of the documents don not provide any table formatting marks. However some documents may provide this information and we may implement some support of reading such tables.

Can you please share some sample document so that we can start investigating the requirement based on this document. Please note that if the document contains marked table, we will be able to introduce the support in a long term (currently we cannot provide ETA for both cases).

Hi Jacob,


As requested earlier, please share some further details and resource files which can help us in better understanding the requirement. Else we will be marking this issue as deferred and will be marked closed for now.

The issues you have found earlier (filed as PDFNEWNET-36802) have been fixed in Aspose.Pdf for .NET 10.6.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.
(13)

could you please share working code to get the background color from PDF?

@a.sharma16lboro

The Page.Background Property can be used to get-set the background color of the page in a PDF document.