Problem in PDF text extraction

Hi,
I am extracting text from PDF using TextDevice. I am able to extract text also.
But my problem is that extracted text is line by line.
For example if I have a table in PDF, it extracts text of complete row.
How can I get text cell by cell? And similarly how to extract complete paragraph text instead of line by line.

ruchi.eck:
I am extracting text from PDF using TextDevice. I am able to extract text also.
But my problem is that extracted text is line by line.
For example if I have a table in PDF, it extracts text of complete row. How can I get text cell by cell?
Hi Ruchi,

Thanks for contacting support.

I am afraid the current release of Aspose.Pdf for .NET does not support the feature to manipulate table in existing PDF files. However for the sake of implementation, we already have logged it as PDFNEWNET-36802. However as a workaround, you may consider converting PDF file containing table to MS Excel workbook format and then try using Aspose.Cells to manipulate text in individual table cell. For further details, please visit

My fellow workers from Aspose.Cells will share the required details on accessing data from individual cell inside Excel worksheet.
ruchi.eck:
And similarly how to extract complete paragraph text instead of line by line.
Do you mean extracting text on paragraph basis or extracting the complete PDF file text ? Please share the details so we may reply accordingly.

Hi,


I would like to help you to get values cell by cell in Excel spreadsheet via Aspose.Cells APIs. Please see the sample code below that demonstrates on how to get cell values cell by cell in the given worksheet:
e.g
Sample code:

Workbook workbook = new Workbook(“Book1.xlsx”);
//Get the first worksheet
Worksheet worksheet = workbook.Worksheets[0];
//Get the first worksheets cells (instantiated);
Cells cells = worksheet.Cells;

//Retrieve each cell value and its name.
foreach(Cell cell in cells)

{
string cellName = cell.Name;
string cellVal = cell.StringValue;


}


Also, please see the documents in the section for your further reference:
Data Handling Features

Thank you.

So for this to work do I need to have Microsoft word on my machine?

Hi Ruchi,


Thanks for your inquiry. While using Aspose APIs in you application, you do not need to install native file formatting applications e.g. Microsoft Office/Adobe etc on your machine for creating/manipulating the documents.

Best Regards,

The issues you have found earlier (filed as PDFNEWNET-36802) have been fixed in Aspose.Pdf for .NET 10.6.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.
(1)