Aspose.Pdf.Newt-Error msg

mustafa-1 · May 21, 2014, 4:17am

Hello,

I am getting this error

(Invalid index: index should be in the range [1…n] where n equals to the text fragments count.)

when I try to search for a text that includes spaces. e.x. “NO 997559762 MVA”
I am using Aspose.PDF for .NET and trying to search for texts in the pdf and getting the correct coordinators for each text.

I need a help please ASAP

codewarior · May 21, 2014, 7:16am

Hi Mustafa,

Thanks for contacting support.

Can you please share the source PDF and code snippet which you are using so that we can test the scenario at our end. We are sorry for this inconvenience.

mustafa-1 · May 22, 2014, 2:07am

Hello,

I would Like first to show all respect to Aspose team for the very quick reply to my mail. This is really appreciated.

The pdf is attached to this mail, and the highlighted text is the text that I have problem with.

Here is my code:

var pdfDocument = new Document(“D:/23.pdf”);

var textFragmentAbsorber = new TextFragmentAbsorber(“NO 997 559 762 MVA”);

pdfDocument.Pages.Accept(textFragmentAbsorber);

Rectangle textRec = textFragmentAbsorber.TextFragments[1].Rectangle;

textFragmentAbsorber.TextFragments[1].Text = string.Empty;

pdfDocument.Save(“D:/23-1.pdf”);

I have another question if possible.

As you can see in the attached doc. There is a word called Fakturanr. 40015669 and there is also a word called Kundenr. 8042627.

I want to search for Fakturanr and Kundenr. and get the numbers written beside thes words. In another words.

I need to find a text and get it is value. Could you please help with?

tilal.ahmad · May 22, 2014, 11:41pm

mustafa-1:

The pdf is attached to this mail, and the highlighted text is the text that I have problem with.

Here is my code:

var pdfDocument = new Document(“D:/23.pdf”);
var textFragmentAbsorber = new TextFragmentAbsorber(“NO 997 559 762 MVA”);
pdfDocument.Pages.Accept(textFragmentAbsorber);
Rectangle textRec = textFragmentAbsorber.TextFragments[1].Rectangle;
textFragmentAbsorber.TextFragments[1].Text = string.Empty;
pdfDocument.Save(“D:/23-1.pdf”);

Hi Mustafa,

We are sorry for the inconvenience caused. We have managed to reproduce the reported issue and logged it in our bug tracking system as PDFNEWNET-36961 for further investigation and resolution. We will notify you via this thread as soon as it is resolved.

mustafa-1:

I have another question if possible.
As you can see in the attached doc. There is a word called Fakturanr. 40015669 and there is also a word called Kundenr. 8042627.
I want to search for Fakturanr and Kundenr. and get the numbers written beside thes words. In another words.
I need to find a text and get it is value. Could you please help with?

For this requirement you have to use regular expression. Please check documentation for details and code snippet to search text using a regular expression. You can use your customize regular expression for your requirements.

Best Regards,

mustafa-1 · May 23, 2014, 3:12am

Hello Aspose team,

Thank you for the help. My problem is solved now. :)

I have another question. I have scanned docs as images and would like to extract some texts from them.

I had already look at th examples that Aspose provides but I did not understand what this const string resourceFileName = @"2011.07.02 v1.0 Aspose.OCR.Resources.zip"; means

I wonder is this a zip file that I need to download from a place or what? Below is the example that is provided by ASPOSE

const string resourceFileName = @"2011.07.02 v1.0 Aspose.OCR.Resources.zip";

try

{

//Create an instance of OcrEngine and assign

//image, language and image settings

OcrEngine ocrEngine = new OcrEngine();

ocrEngine.Image = ImageStream.FromFile("Sample.bmp");

ocrEngine.Languages.AddLanguage(Language.Load("english"));

ocrEngine.Config.NeedRotationCorrection = false;

ocrEngine.Config.UseDefaultDictionaries = true;

//Define the block in which to recognize text

int startX = 0, startY = 0, width = 120, height = 100;

IRecognitionBlock rectangleBlock = Aspose.OCR.RecognitionBlock.FromRectangle(startX, startY, width, height);

ocrEngine.AddRecognitionBlock(rectangleBlock);

//Set the resource file name and extract OCR text

using (ocrEngine.Resource = new FileStream(resourceFileName, FileMode.Open))

{

try

{

if (ocrEngine.Process())

{

Console.WriteLine(rectangleBlock.Text.ToString());

}

catch (Exception ex)

{

Console.WriteLine("Exception: " + ex.Message);

}

ocrEngine = null;

}

catch (Exception ex)

{

Console.WriteLine("Exception: " + ex.Message);

}

babar.raza · May 23, 2014, 8:55am

Hi Mustafa,

Thank you for your interest in Aspose products.

The string “2011.07.02 v1.0 Aspose.OCR.Resources.zip” is used for illustration purposes in the documentation, and refers to a zip archive containing the additional resources required to perform an OCR operation. You can download the resource archive(s) from here. Please note, the archive will not have the same file name as referenced in the documentation. Once you download the archive, you have change the source code to reflect the downloaded file name & it’s location on your machice. For instance, the latest resources available for download are packaged in an archive with the file name “Aspose.OCR.1.9.0.Resources.zip”, therefore you will reference it in your code as follow,

C#

string resourceFileName = “Aspose.OCR.1.9.0.Resources.zip”;
string resourceFilePath = myFolder + resourceFileName;
using (ocrEngine.Resource = new FileStream(resourceFilePath, FileMode.Open))
{
//Peform OCR

Another key point to remember is that each release of Aspose.OCR API uses a specific resource file, therefore when you download the resource archive, please choose the one that matches the Aspose.OCR for .NET assembly version. Moreover, we would suggest you to use the latest build, that is Aspose.OCR for .NET 1.9.0 for your evaluation.

Hope this helps. Please feel free to write back in case you face any difficulty or have any ambiguity.