TextFragmentCollection breaks Iterable interface

willspecht · February 2, 2017, 10:04am

I’m guessing this is a problem from being a port from .net, but the iterator returned by TextFragmentCollection does not work like it is supposed to. hasNext consumes the current item and next will show you the same item over and over. I’m guessing these are thin wrappers around current and moveNext in IEnumerator but they break the Iterable interface. Calling next multiple times will give you the same element over and over. Calling hasNext multiple times will skip over items without ever showing them.

This makes it especially difficult to use this library in clojure with java interop because iterator-seq calls has next twice before calling next in it’s implementation.

fahadadeel · February 2, 2017, 11:57pm

Hello Will,

Thanks for using our API’s.

I will appreciate, if your share more details with sample code. It will help us to understand more about your problem and resolve it accordingly.

We are sorry for the inconvenience.

Best Regards,

willspecht · February 3, 2017, 10:11am

public static void replaceTextOnAllPages(String src, String dest) {
    // Open document
    Document pdfDocument = new Document(src);
    // Create TextAbsorber object to find all instances of the input search phrase
    TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("Will");
    // Accept the absorber for first page of document
    pdfDocument.getPages().accept(textFragmentAbsorber);
    // Get the extracted text fragments into collection
    TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();

    Iterator<TextFragment> iter = textFragmentCollection.iterator(); //assume textFragmentCollection has 2 results

    iter.next(); //will give you first TextFragment 
    iter.next(); //will give you first textFragment

    iter.hasNext(); //will be true
    iter.hasNext(); //will be false

    // Save the updated PDF file
    pdfDocument.save(dest);
}

codewarior · February 6, 2017, 4:32pm

Hello Will,

Thanks for sharing the details.

We are further looking into this matter and will keep you updated with our findings.

fahadadeel · February 6, 2017, 11:08pm

Hello Will,

I have tested the scenario and it is working fine as per my understanding. Please use following sample code for your reference. I have also attached input PDF document.

JAVA

String dataDir = “/Users/fahadadeel/Downloads/resources/”;
Document pdfDocument = new Document(dataDir + “pdf-sample.pdf”);
// Create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(“PDF”);
// Accept the absorber for first page of document
pdfDocument.getPages().accept(textFragmentAbsorber);
// Get the extracted text fragments into collection
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
Iterator iter = textFragmentCollection.iterator(); //assume textFragmentCollection has 2 results
while(iter.hasNext()){
System.out.println(( iter.next()).getText());
}
// Save the updated PDF file
pdfDocument.save("/Users/fahadadeel/Downloads/resources/outputfile.pdf");

If you still face any issue, please feel free to contact us for further assistance.

Best Regards,

willspecht · February 7, 2017, 1:25pm

I agree that that code works, what if I want to check if the iterator is empty before looping through it, something like

if(iterator.hasNext()){

while(iterator.hasNext()){

…

}

else{

return null;

}

The code above would skip the first element. It’s just very confusing to implement this interface in this way. If you are going to return an iterator, you should stick to the contract of the iterator.

fahadadeel · February 8, 2017, 12:56am

Hello Will,

Thanks for sharing more details.

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Arial; -webkit-text-stroke: #000000} span.s1 {font-kerning: none}

I have tested the scenario and have managed to reproduce the problem that TextFragment Collection breaks Iterable interface. For the sake of correction, I have logged it as PDFJAVA-36468 in our issue tracking system. We will further look into the details of this problem and will keep you posted on the status of correction.

Please be patient and spare us little time. We are sorry for this inconvenience.

Best Regards,

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Arial; -webkit-text-stroke: #000000} span.s1 {font-kerning: none}

willspecht · February 8, 2017, 10:56am

Great, am I able to track that ticket somewhere?

fahadadeel · February 8, 2017, 11:25pm

Hello Will,

I am afraid, you will not have access to track the ticket. Please note that issues are being resolved on first come first serve basis. There are other reported issues in the queue as well. We will update you on status through this thread once your issue is resolved.

We are sorry for the inconvenience.

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Arial; -webkit-text-stroke: #000000} span.s1 {font-kerning: none}

Best Regards,

aspose.notifier · April 7, 2017, 1:35am

The issues you have found earlier (filed as PDFJAVA-36468) have been fixed in Aspose.Pdf for Java 17.3.0 Release Notes.

This message was posted using Notification2Forum from Downloads module by Aspose Notifier.