Hi,
Env: .Net4.0, Attempted on Aspose library 11.5.0 and Aspose10.3.0
I cannot seem to get text fragment to extract text from attached document.
String MYTEXT = “{{t:s;r:y;o:”Role”;}}”;
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(filename);
Aspose.Pdf.Text.TextFragmentAbsorber textFragmentAbsorber = new Aspose.Pdf.Text.TextFragmentAbsorber(MYTEXT);
Aspose.Pdf.Text.TextOptions.TextSearchOptions textSearchOptions =
new Aspose.Pdf.Text.TextOptions.TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
pdfDocument.Pages.Accept(textFragmentAbsorber);
Aspose.Pdf.Text.TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
foreach (Aspose.Pdf.Text.TextFragment textFragment in textFragmentCollection)
{<br> // DOES NOT ENTER THIS FOR loop<br>}<br><br></blockquote>Is this a bug? When I copy the same text in a word document & save as PDF via Print on Mac, it works fine.<br><br>Thank you,<br>Sireesha<br>
Hi Sireesha,
Thanks for your inquriy. I have tested the scenario both with Aspose.Pdf for .NET 11.5.0 and 10.3.0 and unable to notice the reported issue. Please share some more details or a sample console application to replicate the issue, so we will look into it and will guide you accordingly.
We are sorry for the inconvenience caused.
Best Regards,
Hi,
Attached are two documents. 1. ‘working.pdf’ and 2. ‘not_working.pdf’
Here is the code I am using for extracting text:
String MYTEXT=“{{t:s;r:y;o:”Role”;}}”;
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(filename);
Aspose.Pdf.Text.TextFragmentAbsorber textFragmentAbsorber = new Aspose.Pdf.Text.TextFragmentAbsorber(MYTEXT);
Aspose.Pdf.Text.TextOptions.TextSearchOptions textSearchOptions =
new Aspose.Pdf.Text.TextOptions.TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
pdfDocument.Pages.Accept(textFragmentAbsorber);
Aspose.Pdf.Text.TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
Aspose.Pdf.PageCollection pageCollection = pdfDocument.Pages;
foreach (Aspose.Pdf.Text.TextFragment textFragment in textFragmentCollection)
{
}
For the environment I am using: .Net 4.0 , I tried using Aspose.pdf library 10.3.0 and Aspose.pdf library 11.5.0. I get the same result: Extracts the text from document ‘working.pdf’ and not from document ‘not_working.pdf’
What is different in ‘not_working.pdf’ document that the same code does not absorb text fragment?
FYI, I created the ‘not_working.pdf’ from
http://selectpdf.com/ website that converts html code to pdf.
Thank you,
Sireesha