Unable to read the data from the table

sakalasiva · October 4, 2023, 6:20am

I am trying to read the data from the table from PDF document which have tab separated columns instead of box format. Table_format.jpg (92.0 KB). But TableAbsorber unable to read content of the table. Can someone help me to read data from the table.

sergei.shibanov · October 4, 2023, 3:43pm

@sakalasiva
This possibility was not provided.
But perhaps additional features have been added to the library. If I had a file I would try with the setted:

tableAbsorber.TextSearchOptions.Rectangle 
// and 
tableAbsorber.UseFlowEngine = true;

If this does not work, then you can do it yourself by getting the text in a given rectangle and dividing it by tabs and y values for the found fragments.

sakalasiva · October 30, 2023, 8:30am

Hi, I tried what you suggested Thanks for you for that. But I am facing issue with limiting search area to get better results. I see rectangle value is not honored while scanning the document. Following is my code. Can you help me what wrong I am doing here. Thanks in advance.

TableAbsorber tableAbsorber = new TableAbsorber();
tableAbsorber.TextSearchOptions = new TextSearchOptions(new Rectangle(20, 30, 40, 70));
tableAbsorber.TextSearchOptions.LimitToPageBounds = true;
tableAbsorber.UseFlowEngine = true;
tableAbsorber.Visit(((PDFDocument)pDFDocument).PdfReader.Pages[criteria.Page]);
ICollection collection = tableAbsorber.TableList?.Select((Func<AbsorbedTable, IPDFTable>)((x) => new AsposePDFTable
{
PageNo = criteria.Page,
PdfTable = x
})).ToList();

sergei.shibanov · October 30, 2023, 2:54pm

@sakalasiva
you can try this

tableAbsorber.TextSearchOptions = new TextSearchOptions(new Rectangle(20, 30, 40, 70));

instead

tableAbsorber.TextSearchOptions.Rectangle = new Rectangle(20, 30, 40, 70);

But it should also work in your version.
Please attach the document you used so that we can check and reproduce this error.

sakalasiva · October 31, 2023, 7:40am

SalesAndTaxes.pdf (278.4 KB)

Hi, I am attaching the sample document for testing. Can you verify and let me know what I can do to get it correct. I am interested reading description of property column.

TableAbsorber tableAbsorber = new TableAbsorber();
tableAbsorber.TextSearchOptions = new TextSearchOptions(true);
tableAbsorber.TextSearchOptions.Rectangle = new Rectangle(183.05000305175781, 39.596485137939453, 596.1500244140625, 53.190235137939453);
tableAbsorber.TextSearchOptions.LimitToPageBounds = true;
tableAbsorber.UseFlowEngine = true;
tableAbsorber.Visit(((PDFDocument)pDFDocument).PdfReader.Pages[criteria.Page]);
ICollection collection = tableAbsorber.TableList?.Select((Func<AbsorbedTable, IPDFTable>)((x) => new AsposePDFTable
{
PageNo = criteria.Page,
PdfTable = x
})).ToList();

sergei.shibanov · October 31, 2023, 1:48pm

@sakalasiva
Thank you for attaching the document. I studied it - there is no text, and all the letters are drawn in graphics.
image.png (84.0 KB)
Therefore, classes for working with text do not find anything. You should use a GraphicAbsorber object (you can get a SubpathCollection with it) - although this will not be text, but graphics in essence.

sakalasiva · October 31, 2023, 6:10pm

Sorry I think PDF file format changed while printing specific page. I am uploading again with correct file. Please have a look and help me.
SalesAndTaxes.pdf (110.6 KB)

sergei.shibanov · November 1, 2023, 6:04am

This document provides the text. I’ll look into it and write to you later.

sergei.shibanov · November 1, 2023, 12:45pm

@sakalasiva
Using the code

var doc = new Document(dataDir + "SalesAndTaxes.pdf");
var tfa = new TextFragmentAbsorber();
tfa.Visit(doc.Pages[1]);
foreach (TextFragment textFragment in tfa.TextFragments)
{
    if(!string.IsNullOrWhiteSpace(textFragment.Text))
        Console.WriteLine($"{textFragment.Text} at ({textFragment.Position.XIndent},{textFragment.Position.YIndent})");
}

I got text data as output.
Form ID.docx (13.4 KB)

True, this is only available if you have a license (without a license, only 4 elements will be issued).
Operating with the values (X,Y) of the resulting fragments, you can select the necessary lines and compose them as you need.

sakalasiva · December 9, 2023, 11:54am

Thanks for the reply