Get paragraphs in pdf file

karine_87 · March 27, 2014, 11:31am

Hello,

I am using aspose.pdf 4.5.1to get all the paragraphs of a pdf file,
Here is the code:
<!–[if gte mso 9]>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>AR-SA</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
<m:mathPr>
<m:mathFont m:val=“Cambria Math”/>
<m:brkBin m:val=“before”/>
<m:brkBinSub m:val="–"/>
<m:smallFrac m:val=“off”/>
<m:dispDef/>
<m:lMargin m:val=“0”/>
<m:rMargin m:val=“0”/>
<m:defJc m:val=“centerGroup”/>
<m:wrapIndent m:val=“1440”/>
<m:intLim m:val=“subSup”/>
<m:naryLim m:val=“undOvr”/>
</m:mathPr></w:WordDocument>
<![endif]–>

com.aspose.pdf.Document pdfDoc = new com.aspose.pdf.Document(new FileInputStream(file));

PageCollection pdfPages = pdfDoc.getPages();

for (int i = 1 ; i <= pdfPages.size(); i ++){

try{

Page pdfPage = pdfPages.get_Item(i);

com.aspose.pdf.Paragraphs parags = pdfPage.getParagraphs();

int len = parags.getCount();

for (int j = 0; j < len; j++){

//do something on paragraph

}

}catch(Exception e){ e.printStackTrace();}

}

This line pdfPage.getParagraphs(); is throwing an error and exit directly my function without passing by the catch.

Is there something missing in my code?

Thank you

tilal.ahmad · March 28, 2014, 12:36am

Hi Karine,

Thanks for your inquiry. I’m afraid currently Aspose.Pdf Document class does not support accessing a particular paragraph and count is returning zero so your code is not entering in inner for loop and exits. We have logged a ticket as PDFNEWJAVA-34114 in our issue tracking system to access paragraph in a PDF document. You will be notified via this forum thread as soon as it is resolved.

We are sorry for the inconvenience caused.

Best Regards,

andrew.hoffman-1 · April 1, 2014, 9:41am

Hi,

How do you get to the Ticket to see the status? Is this close to being resolved or is there a planned date? This is really kind of a deal breaker if it doesn’t work.

Thanks,

Andy

tilal.ahmad · April 1, 2014, 10:21am

Hi Andy,

Thanks for your inquiry, I’m afraid you can’t access or login into Aspose JIRA, it is our internal issue tracking system. you can only ask us for update status. However, we will keep you updated regarding issue progress via this forum thread. You will be also notified via this forum thread and associated email id as soon as issue is resolved.

Thanks for your patience and cooperation.

Best Regards,

andrew.hoffman-1 · April 1, 2014, 1:53pm

Ok. Great. Thanks. Can you let me know if the issue is scheduled for development and an estimated timeline, or is it still in the in backlog?

Thanks,

Andy

codewarior · April 1, 2014, 11:19pm

Hi Andy,

As we recently have been able to notice this
issue, so development team requires little time to investigate and figure out
the reasons of this problem. As requested earlier, please be patient and spare us little time.

Furthermore, please
note that you have reported issue under
normal/free support forum and as a normal rule of practice, issues are
resolved in first come and first serve basis; but the problems logged/reported
under Enterprise or Priority support model, have high precedence in terms of
resolution, as compare to issues under normal/free support model.<o:p></o:p>

andrew.hoffman-1 · April 2, 2014, 10:18am

Thanks I appreciate your quick responses. I realize this stuff takes time and just wanted to know where this falls in the triage process.

You are correct re: free v. enterprise - I have a client that is evaluating Aspose.PDF and may purchase a site license. However, if a core piece of the product isn’t functional, it could impact their decision and is why I am in a hurry to understand status your timeline.

Thanks,

Andy

tilal.ahmad · April 3, 2014, 5:26am

Hi Andy,

Thanks for your feedback. We have recorded your concern and requested our development team to complete the issue investigation and share an ETA at their earliest. We will update you as soon as we make some progress towards issue resolution.

Thanks for your patience and cooperation.

Best Regards,

tilal.ahmad · May 5, 2016, 10:37pm

Hi Andy,

Thanks for your patience. Our product team has investigated the issue and found it is not a bug. The method getParagraphs() returns only objects created for generator. In the customers code snippet we just open the document and does not generating anything.

For example, the following code returns one paragraph on the first page:

com.aspose.pdf.Document pdfDoc = new com.aspose.pdf.Document(new FileInputStream(myDir + “Document _with_some_pages.pdf”));

PageCollection pdfPages = pdfDoc.getPages();

//create simple paragraph

TextFragment tf = new TextFragment(“hello”);

pdfPages.get_Item(1).getParagraphs().add(tf);

for (int i = 1 ; i <= pdfPages.size(); i ++){

try{

Page pdfPage = pdfPages.get_Item(i);

com.aspose.pdf.Paragraphs parags = pdfPage.getParagraphs();

int len = parags.getCount();

System.out.println("Page number: "+i+"Paragraph count: "+len);

// for (int j = 0; j <= len; j++){

// System.out.println("Page number: "+i+"Paragraph number: "+j);

// //do something on paragraph

// }

}

catch(Exception e){ e.printStackTrace();}

}

Please feel free to contact us for any further assistance.

Best Regards,