Selecting and extracting text

Hello,

I have a requirement from a customer to display an "image" in a window to a client in a hosted ASP.NET environment and give the client the ability to highlight text that he/she likes or dislikes. When they hit submit, this text needs to be extracted and sent back to the server. The "image" can be in any print format including pdf and will be determined by my client.

Is this something that seems doable with aspose pdf kit? It seems to me it would be. Here are the prerequisites for my choosing aspose for this solution:

  • able to display pdf in web-based viewer. (preferably without a toolbar)
  • ability to select text including multi-line blocks.
  • ability to select images within the document seperate from text.
  • ability to extract only selected items for storage in db.
  • Can be used in hosted (SAAS) environment.

Let me know if you need any further clarification.

Best regards.

-rb

Hi,

Thanks for considering Aspose.

Currently Aspose.Pdf.Kit offers the capability to extract the complete text from Pdf file. As per your requirement, you need to get the specific text portion programmatically, and I am afraid its currently not yet supported. Regarding rest of your queries,

1) In order to display the Pdf file, you can use PdfViewer offered by Aspose.Pdf.Kit. For more related information, please visit http://www.aspose.com/documentation/file-format-components/aspose.pdf.kit-for-.net-and-java/pdfviewer.html

2) Regarding your requirement to highlight the text, you can use
CreateMarkup feature present in PdfContentEditor class, which offers the capability to draw mark in the Pdf document, such as Highlight, Underline, Squiggly and StrikeOut. You can use the mouse coordinates to specify the region where you need to create the markup. And in order to display the highlighted area, once you have created markup, you need to again reload the Pdf file.

3)
PdfExtractor is a class which offers the methods, named ExtractImage and ExtractText which can be used to extract Images and Text from Pdf file. But I am afraid, currently Aspose.Pdf.Kit does not offer the capability to extract specific text from Pdf file and according to your requirement it would be little difficult to notice the highlighted text and extract it. Beside this if you need more information on how to Extract text, please visit http://www.aspose.com/documentation/file-format-components/aspose.pdf.kit-for-.net-and-java/extract-text-from-pdf-document.html and for information on how to extract Images, please visit http://www.aspose.com/documentation/file-format-components/aspose.pdf.kit-for-.net-and-java/extract-image-from-pdf-document.html

4) For information on working with database, please visit http://www.aspose.com/documentation/file-format-components/aspose.pdf.kit-for-.net-and-java/interoperate-with-database-net.html