We use Aspose pdf for .net with Text absorber, to extract text from specific regions of a pdf page. We havo problems with a customer wich produces Pdf from AS/400 using a software (Laser 400) based on Amyuni Pdf Converter. We have two categories of files with errors, in the first (Gel_Bolla_ven.pdf)Text absorber seems find no text (in some points in debug we see extracted a serie of \0 only), in the second (Gel_Ord_For.pdf) we find the fixed text (these are invoices generated from a template filled with specific values in the fields) but not the values.
We use absorber with a rectangle, we tried also with a rectangle which covers the whole page,
var absorber = new TextAbsorber
{
TextSearchOptions =
{
LimitToPageBounds = true,
Rectangle = new Aspose.Pdf.Rectangle(left, pdf.PageInfo.Height - top, right, pdf.PageInfo.Height - bottom)
}
};
// Accept the absorber for page (1-based)
pdf.Pages[nPage + 1].Accept(absorber);
We tried also to set the options
SearchForTextRelatedGraphics,
UseFontEngineEncoding
but with no result
Gel_Bolla_Ven.pdf (544.4 KB)
Gel_Ord_For.pdf (517.6 KB)
@StefanoR,
Can you give me a code snippet that I can run, please?
You are missing some lines in order for me to run your code and replicate the issue.
@StefanoR,
I tried TextAbsorber and TextFragmentAbsorber but none worked properly on this PDF document. I will create a ticket for the dev team.
This is the code I used to read it. I drawed a rectangle on top of the text just to know if the coordinated where the correct ones.
private void Logic()
{
Document doc = new Document($"{PartialPath}_input.pdf");
var page = doc.Pages[1];
var ta = new TextAbsorber();
ta.TextSearchOptions.Rectangle = new Aspose.Pdf.Rectangle(310, 630, 210, 55);
ta.TextSearchOptions.LimitToPageBounds = true;
ta.TextSearchOptions.SearchForTextRelatedGraphics = false;
page.Accept(ta);
Console.WriteLine($"Text: {ta.Text}");
var tfa = new TextFragmentAbsorber();
tfa.TextSearchOptions.Rectangle = new Aspose.Pdf.Rectangle(310, 630, 210, 55);
tfa.TextSearchOptions.LimitToPageBounds = true;
tfa.TextSearchOptions.SearchForTextRelatedGraphics = false;
page.Accept(tfa);
int count = 0;
foreach (var fragment in tfa.TextFragments)
{
count++;
Console.WriteLine($"Frag {count}: {fragment.Text}");
}
var pageInfo = page.PageInfo;
var marginInfo = page.PageInfo.Margin;
var graph = new Graph((float)pageInfo.Width, (float)pageInfo.Height);
graph.Left = marginInfo.Left * -1;
graph.Top = marginInfo.Top * -1;
page.Paragraphs.Add(graph);
var rectangle = new Aspose.Pdf.Drawing.Rectangle(310, 630, 210, 55);
rectangle.GraphInfo.FillColor = Aspose.Pdf.Color.Red;
rectangle.GraphInfo.Color = Aspose.Pdf.Color.Black;
graph.Shapes.Add(rectangle);
// Save output PDF document
doc.Save($"{PartialPath}_output.pdf");
}
@StefanoR
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): PDFNET-53952
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.