We use Aspose pdf for .net with Text absorber, to extract text from specific regions of a pdf page. We havo problems with a customer wich produces Pdf from AS/400 using a software (Laser 400) based on Amyuni Pdf Converter. We have two categories of files with errors, in the first (Gel_Bolla_ven.pdf)Text absorber seems find no text (in some points in debug we see extracted a serie of \0 only), in the second (Gel_Ord_For.pdf) we find the fixed text (these are invoices generated from a template filled with specific values in the fields) but not the values.
We use absorber with a rectangle, we tried also with a rectangle which covers the whole page,
var absorber = new TextAbsorber
{
TextSearchOptions =
{
LimitToPageBounds = true,
Rectangle = new Aspose.Pdf.Rectangle(left, pdf.PageInfo.Height - top, right, pdf.PageInfo.Height - bottom)
}
};
// Accept the absorber for page (1-based)
pdf.Pages[nPage + 1].Accept(absorber);
We tried also to set the options
SearchForTextRelatedGraphics,
UseFontEngineEncoding
but with no result
Gel_Bolla_Ven.pdf (544.4 KB)
Gel_Ord_For.pdf (517.6 KB)