Creating XML out of PDF page

Hi Team,


I am using Aspose PDF java tool for extracting data from PDF .Aspose tool giving good xml format , but It is not able extract the information of vertical aligned text . Is there any way to do that ? and what is the approach to deal with OCRED pdf (i.e image pdf) to extract data ?


Thanks
Himansu




Hi There,


Thanks for contacting support.

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; -webkit-text-stroke: #000000} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; -webkit-text-stroke: #000000; min-height: 14.0px} span.s1 {font-kerning: none}

but It is not able extract the information of vertical aligned text . Is there any way to do that ?

I will appreciate if you please share your sample code along with input PDF file. It will help us to understand your requirement exactly and address it accordingly.


what is the approach to deal with OCRED pdf (i.e image pdf) to extract data


You can use our Aspose.OCR for Java API to deal with OCR.


We are sorry for the inconvenience.


Best Regards,


Hi Fahad ,

Thanks for your reply .I have attached some files for your understanding of requirement i.e converting vertically aligned text into xml format.
In this particular instance i am looking for "CAPITOL VIEW" text in converted test.xml file but not able to find it.

Can you suggest any solution for it?

Thanks
Himansu

Hi Himansu,


Thanks for sharing further details.

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Arial; -webkit-text-stroke: #000000} span.s1 {font-kerning: none}

I have tested the scenario and have managed to reproduce the problem that in the xml "Capitol View" text is not appearing for the vertically aligned text. For the sake of correction, I have logged it as PDFJAVA-36546 in our issue tracking system. We will further look into the details of this problem and will keep you posted on the status of correction. Please be patient and spare us little time.


We are sorry for this inconvenience.


Best Regards,

Hi Team


Can You please update the status of PDFJAVA-36546 issue regarding reading vertical aligned text.?


Thanks
Himansu

Hi Himansu,


p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; -webkit-text-stroke: #000000} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; -webkit-text-stroke: #000000; min-height: 14.0px} span.s1 {font-kerning: none}

Thanks for your inquriy. I am afraid the issue investigation is not completed yet. We will update you as soon as our team completes the investigation and shares some updates.


Thanks for your patience and cooperation.


Best Regards,