Hi,
I’m using Aspose.PDF for .NET 20.9.0.
I’m working on a POC with Aspose.PDF for extracting specific information in PDF.I have this kind of PDF :
PDF-Exemple.png (26.9 KB)
I would like to know if there is a way to retrieve the value of rows with the corresponding title.
I already found a solution for extract each lines by using TextAbsorber. But when I do that, I lost the corresponding value of each ‘X’ value … That’s my problem.
var pdfDocument = new Aspose.Pdf.Document(new MemoryStream(Resource1.MyPDF));
var textAbsorber = new TextAbsorber();
pdfDocument.Pages[2].Accept(textAbsorber);
var text = textAbsorber.Text;
var lines = text
.Replace("\n", "")
.Split('\r')
.Select(e => Regex.Replace(e, @"\s+", " "))
.ToArray();
I also tried to use TableAbsorber
but it isn’t possible to use it in my case because PDF table structure are messy.
How could I keep the corresponding header title for each “X” value ? Is it possible with Aspose.PDF ?
Thanks in advance,