Hello Team,
Hi Suprith,
Hi Team,
Note : We will be knowing the field name ahead of time as they are the templates specified from us. So the challenge is how to extract the right content entered to that field.
Hi Suprith,
Thanks waiting for your reply.
Hi Suprith,
Thanks for your patience. In order to extract text from PDF file Aspose.Pdf offers various features and you may check the “Extract Text form PDF” section in the documentation. However as per your requirement there can be a workaround to extract text against specific string as your PDF file only contains plain text.
Please check the following code snippet which I have used to extract values against some fields i.e Name, Number and Value (USD). I have extracted all text from PDF using TextAbsorber and perform some string operations to extract the required value.
Document pdfDocument = new Document(dataDir + "SamplePDF.pdf");
TextAbsorber ta = new TextAbsorber();
pdfDocument.Pages.Accept(ta);
string extractedtext = ta.Text;
string[] contents = extractedtext.Split('\n');
foreach (string s in contents)
{
if (s.Contains("Name"))
Console.WriteLine(s.Replace("Name*", "").TrimStart().TrimEnd());
if (s.Contains("number"))
Console.WriteLine(s.Replace("number", "").TrimStart().TrimEnd());
if (s.Contains("Value"))
Console.WriteLine(s.Replace("Value (USD)*", "").TrimStart().TrimEnd());
}
You can use the above code snippet to extract values against particular field. In case of any further assistance please feel free to let us know.
Best Regards,
Thanks for the replay, this holds good for single text what about the field
"Additional details*" which is having multiple text. Thanks in advance.
Hi Suprith,
Thanks for your inquiry. Please check the following code snippet which I have used to extract the value of Additional Details from sample PDF.
Document pdfDocument = new Document(dataDir + "SamplePDF.pdf");
TextAbsorber ta = new TextAbsorber();
pdfDocument.Pages[1].Accept(ta);
string extractedtext = ta.Text;
string additionaldetails = extractedtext.Substring(extractedtext.IndexOf("Additional Details*") + "Additional Details*".Length).Replace("1", "").TrimEnd().TrimStart();
Console.WriteLine("Additional Details: " + additionaldetails);
You may use the above code to extract the multi-line text or you may also modify it as per your requirement. In case if you need any further assistance please feel free to contact us.
Best Regards,
Hi Team,
2. We have PDF document of 600 pages. Does product have any limitation on the number of pages?
Hi Suprith,
Thanks for your inquiry.
supam:
When we extract use the textabsorber will it retain the format of my text?
In order to preserve the formatting of the text, you need to use TextExtractionOptions and pass it to the constructor of TextAbsorber Class. Please check the following code snippet to use TextExtractionOptions while extracting text from a PDF.
Document pdfDocument = new Document(dataDir + "SamplePDF.pdf");
TextExtractionOptions textExtOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
TextAbsorber ta = new TextAbsorber(textExtOptions);
pdfDocument.Pages[1].Accept(ta);
supam:
We have PDF document of 600 pages. Does product have any limitation on the number of pages?
Aspose.Pdf for .NET API has no such limitations in terms of document size and number of pages a document contains. You may extract text from single page of the document as well as from the entire document. For more information please visit “Working With Text” section in our API documentation. In case you need further assistance, please feel free to let us know.
Best Regards,
Brilliant, Thanks for the response. We will get back to you if we face any issues.
Hi Suprith,
Hi Asad,
Hi Suprith,
We apologize for your inconvenience.
Best Regards,
Hi Team,
Could you pleaser let us know a way to lock the pdf from manipulation from the external process.
We are looking a way in aspose to mark the pdf as not editable in a way even from aspose for that matter.
Regards,
Suprith
Hi Amsuprith,
Thanks for contacting support.
In order to accomplish your requirements, please try following the instructions specified over Set Privileges, Encrypt and Decrypt PDF File
Besides this, you may consider using Viewer API of our sister company named GroupDocs which provides the feature to display PDF and other documents and users cannot make any modifications to the file being displayed (even they cannot copy the contents of file being displayed).