We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

ExtractText and tables

Hi!

I’m considering to use aspose to translate online some PDFs we’ve got, but I’ve found a problem.
When I try to extract text from a PDF that contains a table, I got a txt with spaces between the words, so I don´t know where the cell finish.
Is there any way to change the character between cells?

The code I use is:

PdfExtractor extractor = new PdfExtractor();
extractor.bindPdf(pdfPath);
extractor.extractText();
extractor.getText(pdfPath + “.txt”);

Thank you in advance.
Carlos.

Hi Carlos,

Thank you very much for considering Aspose.

I would like to share with you that PdfExtractor allows you to extract the text in raw format, and you can’t retain the text formatting in the extracted text. However, we have already logged a new feature request as PDFKITJAVA-10540 to extract text while retaining the text order and formatting to some extent. You’ll be updated via this forum thread once it is supported.

Moreover, if you think that your requirement or issue is different then please share a sample PDF and elaborate your requirement a little bit, so we’ll be able to help you out.

If you have any further questions, please do let us know.

We’re sorry for the inconvenience.
Regards,

Thanks Shahlzad,

I can explain you a little more… Our application is a translation proxy for a big company. We’ve implemented successfully with HTML. The second step is to translate also MSOffice documents and PDFs. So, I have to extract text from a PDF, send them to my translation service, and then replace the texts in the original PDF so I can obtain a PDF that mantains the original style.
Your product works quite well with no tables pdf’s, replacing line by line.
But when I’ve got a PDF with a table like the one in the example, the replace doesn’t work.
When I extract the text, I’ve got a txt with one single line per row, and I don’t know where each cell finishes, so I can’t replace them back.

Thank you in advance

Hi Carlos,

Can you please share a sample PDF file, so we would be able to check your particular scenario?

We’re looking forward to help you out.
Regards,