Is it possible to get annotated element (text or paragraph) with Aspose.Pdf?

Hi,

I have a pdf document with annotated texts and paragraph (note annotation created from selected text or paragraph).
From Annotation java object, is it possible to retrieve the annotated element (text or paragraph) in order to change them according note annotation ?

@herve.ngounou

Thanks for contacting support.

In order to get and modify the content of the Annotations, you may extract them by using getAnnotations() method and set their text/content by using setContents("") method. Please check following code snippet where I have modified the text of a TextAnnotation inside a PDF document.

Document pdfDocument = new Document("input.pdf");

// Get particular annotation
TextAnnotation textAnnotation = (TextAnnotation) pdfDocument.getPages().get_Item(1).getAnnotations().get_Item(1);
textAnnotation.setContents("Modified");
		
pdfDocument.save(dataDir + "output.pdf");

In case you face any issue, please share your sample PDF document, so that we can test the scenario in our environment and address it accordingly.

This is note exactly what I want.

For example my pdf doc contains the following text:

This is the annotated text to update and here the remaining not annotated paragraph text.

I select the text This is the annotated text to update and create an HighlightAnnotation annotation with contents CODE INSTRUCTION.

With your code sample, Aspose.Pdf can retrieve this annotation and the contents of this annotation (CODE INSTRUCTION). But what a I want in addition to this is to retrieve the document text this annotation is placed on (This is the annotated text to update) and I want to modify this text.
Is it possible with Aspose.Pdf?

@herve.ngounou

Thanks for adding more details to the scenario.

In order to achieve your requirement, you may loop through all highlight annotations, extract rectangle of found annotation, search/extract text from obtained rectangle and replace it with desired values. Please check following code snippet where I have implemented this functionality with sample PDF.

Document doc = new Document();
doc.getPages().add().getParagraphs().add(new TextFragment("This is the annotated text to update and here the remaining not annotated paragraph text."));
doc.save(dataDir + "SamplePDF_Highlight.pdf");

Document doc2 = new Document(dataDir + "SamplePDF_Highlight.pdf");
TextFragmentAbsorber tfa = new TextFragmentAbsorber("This is the annotated text to update ");

doc2.getPages().get_Item(1).accept(tfa);
HighlightAnnotation ha = new HighlightAnnotation(doc2.getPages().get_Item(1), tfa.getTextFragments().get_Item(1).getRectangle());
ha.setColor(Color.getYellow());
doc2.getPages().get_Item(1).getAnnotations().add(ha);
doc2.save(dataDir + "SamplePDF_Highlight.pdf");

doc = new Document(dataDir + "SamplePDF_Highlight.pdf");
for (int i = 1; i <= doc.getPages().size(); i++)
{
  for(int j = 1; j <= doc.getPages().get_Item(i).getAnnotations().size(); j++)
  {
   if(doc.getPages().get_Item(i).getAnnotations().get_Item(j).getAnnotationType() == AnnotationType.Highlight)
     {
       Rectangle searchrectangle = doc.getPages().get_Item(i).getAnnotations().get_Item(j).getRect();
       TextFragmentAbsorber ta = new TextFragmentAbsorber();
       TextSearchOptions tso = new TextSearchOptions(searchrectangle);
       tso.setLimitToPageBounds(true);
       ta.setTextSearchOptions(tso);
       doc.getPages().get_Item(i).accept(ta);
       ta.getTextFragments().get_Item(1).setText("This is the replaced annotated text, ");
     }
  }
}
doc.save(dataDir + "SamplePDF_Highlight_Replaced.pdf");

For your reference, I have also attached PDF document(s), generated by above code snippet. In case of any further assistance, please feel free to let us know.

SamplePDF_Highlight_Replaced.pdf (2.6 KB)
SamplePDF_Highlight.pdf (2.6 KB)

Thanks for this exemple.
I use annotations as process instructions (annotation content is instruction which describes how to modify text the annotation is placed on).
After the text update, I want to remove the annotation.
It seems that Aspose.Pdf can do what we expect. Last point, does Aspose.Pdf supports the PDF/X format?
If no, is-it a future feature?

@herve.ngounou

Thanks for your inquiry.

You can remove all annotations as well as particular annotation from PDF document. Please use following lines of code, in the if statement of above shared code snippet, and it will delete highlighted annotation from the resultant PDF file.

// Delete all annotations from current page
//doc.getPages().get_Item(i).getAnnotations().delete();
//To delete particular annotation
doc.getPages().get_Item(i).getAnnotations().delete(doc.getPages().get_Item(i).getAnnotations().get_Item(j));

Aspose.Pdf supports PDF_X_3 and PDF_X_1A formats, and in order to convert PDF document into these formats, please use following code snippet.

Document document = new Document(dataDir + "input.pdf");
document.convert(dataDir + "Log_Conv.log", PdfFormat.PDF_X_1A, ConvertErrorAction.Delete);
document.save(dataDir + "output.pdf");

In case of any further assistance, please feel free to let us know.

I try to convert a PDF/A file to PDF/X file
original file:
original pdf/a
I get the following PDF/X file:
pdf/x_1A file
with the following errors:
err file

image.png (10.8 KB)

And validation with adobe tools of generated PDF/X file show the following errors:
image.png (7.8 KB)

We absolutly need a tool to convert pdf a to pdf x. Will those error be corrected?

@herve.ngounou

Thanks for writing back.

Would you please upload the relevant PDF file(s) again, as it seems that they were not uploaded correctly. Though image files are accessible and I was able to download them. Please upload the PDF documents again, so that we can test the scenario in our environment and share our feedback accordingly.

testpdfX.zip (1.2 MB)
Here is my test.
test.pdf is a pdf file générate with Aspose.words.
I send it in the zip file attached, with the resulting conversion with Aspose.pdf (and log file).
The check with adobe tool (Adobe acrobate DC) indicates the resulting conversion is not full pdf/X because colors remains coded in RGB (pdf for screen) and are note convert in CMYK (color code for printing)

@herve.ngounou

Thanks for sharing requested PDF files.

We have tested the scenario again in our environment while using Aspose.Pdf for .NET 17.10 and generated PDFs (X_3 and X_1A) have passed the compliance test in Adobe Preflight. For your reference, we have attached generated outputs along with screenshots of compliance test.

X_1ATest.png (35.5 KB)
X_3Test.png (38.2 KB)
testforX_PDF_X_3.pdf (540.2 KB)
testforX_PDF_X_1A.pdf (540.2 KB)

Please try with a valid license of Aspose.Pdf for .NET and in case you still face any issue in validating compliance, please let us know. You may apply for a temporary license over our website, in order to evaluate complete features of API without any limitations.

We work under java environment so for my test, I use Aspose.Pdf for java 17.10.
My source file is:
Certificat_travail.pdf (64.9 KB)
The resulting conversion of this file in PDF/X format with Aspose.Pdf for java 17.10 is the following file:
Certificat_travail_X_1A.pdf (577.7 KB)

When we check this resulting file with Adobe Acrobat Pro DC we get the following errors:
image.png (10.8 KB)

It seems that pictures remain coded in RGB and are note converted in CMYK.
What’s about it?

@herve.ngounou

Thanks for contacting support.

We have managed to replicate the same issue, while using Aspose.Pdf for Java 17.10 in our environment. For the sake of correction, we have logged it as PDFJAVA-37251 in our issue tracking system. We will further investigate the reasons behind this issue and keep you informed with the status of its rectification. Please be patient and spare us little time.

We are sorry for the inconvenience.

Hi
I come back to know if aspose.PDF can know generate PDF/X format.
We need to buy such api.

@herve.ngounou

Thanks for your inquiry.

We would like to share with you that investigation against the logged issue is pending due to long queue of pending issues in the queue. However, we have observed your comments and will definitely consider them while investigating and resolving the issue. We will surely inform you as soon as we have further updates regarding resolution progress. Please spare us little time.

We are sorry for the inconvenience.

@herve.ngounou

We would like to share with your that earlier reported issue is expected to be fixed in 18.11 version of the API which will be releasing in the end of the month November. We will surely keep you updated with the status of resolution progress.

We are still waiting for a pdfx file generation solution so if needed, we can test your next version that we want to buy if it meets our expectations

@herve.ngounou

Thanks for your inquiry.

I regret to share that earlier logged issue is not yet resolved due to high number of pending issues in the queue. However, as shared earlier, the expected availability of the fix will be available in upcoming release of API i.e. Aspose.PDF for Java 18.11. As soon as the said version is available for download, we will let you know. We greatly appreciate your patience and cooperation in this matter.

The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan

The issues you have found earlier (filed as PDFJAVA-37251) have been fixed in Aspose.PDF for Java 20.9.