I haven't yet get myself very familiarized with Aspose.PDF. I have the impression that it handles mainly creating new pdf docs.
Does it support manipulating existing document? Say, for example: I want to do a word count on the entire doc (or on each page), is it possible? Another example is: If I want to extract all the text content of a pdf, and extract comment/note from an existing pdf?
Thank you for considering Aspose products, and notes/comments extraction will be supported in the next hotfix version. Moreover, I want to know which information of comments you need, for example, rectangle, contents, createdate, popup flag, etc.
hi,you can download the new dll of Aspose.pdf.kit2.4.1.In
PdfContentEditor.cs, ExtractAnnotations() support to extract the
content of the annotations specified type from a existing pdf document.
Now the supported annotation types include “Text”,“Highlight”,
“Squiggly”, “Strikeout” and “Underline”. You can try to use it, if any other questions ,please dont hesitate to notify me.
One other question is, we need to ability to do word count per page, or to retrieve text elements per page, is it going to be implemented in the near future?
It is difficult to get word count for some of the languages (such as Chinese) so we have no plans in short to support this feature. We only support to extract text from the whole PDF File. We won’t recommend it, but if you want to use this feature then please refer to:
And if you need a work around then split Pdf to multiple PDFs having single page each. And then extract the text from each PDF File, so you can counter the text exracted from each page.
Thanks.
Adeel Ahmad
Support Developer
Aspose Changsha Team http://www.aspose.com/Wiki/default.aspx/Aspose.Corporate/ContactChangsha.html
We actually don't have to do word count per page, understanding the problem with counting Asian characters. However, it is an important feature that we provide extracted text per page, or just a boolean representing whether text element exists on a page. Can this feature be implemented?
I was trying the extract annotation function, I can't extract popup baloon and free text notes. Please see the attached two pdf files. These two types of annotations are what we want to extract.
I have checked and found that with the file named “File4_TextNotes.pdf”. Annotation are extracted. with the code :
PdfContentEditor editor = new PdfContentEditor();
string TestPath = @"D:\AsposeTest\TestData\";
editor.BindPdf(TestPath + "File4_FreeTextNotes.pdf");
string[] annotType ={ "Text", "Highlight" };
ArrayList annotList = editor.ExtractAnnotations(1, 2, annotType);
for (int i = 0; i < annotList.Count; i++)
{
Hashtable currentNode = (Hashtable)annotList[i];
object partValue = null;
foreach (string partName in currentNode.Keys)
{
partValue = currentNode[partName];
if (partValue is string)
{
Console.WriteLine(partName + ":" + currentNode[partName].ToString());
}
}
foreach (string partName in currentNode.Keys)
{
partValue = currentNode[partName];
if (partValue is Hashtable)
{
Console.WriteLine(partName);
Hashtable hashTable = (Hashtable)partValue;
if (partName.Equals("contents-richtext"))
Console.WriteLine(hashTable["Rc"].ToString().Substring(21));
else
{
foreach (string name in hashTable.Keys)
{
Console.WriteLine(name + ":" + hashTable[name].ToString());
}
}
}
}
}
Console.ReadKey(false);
I have checked with the file named "File4_FreeTextNotes.pdf"and found that it is not the noteType we support but it is the Text Box. I will discuss this issue with the developer and will let you know as soon as solution is found.
Thanks.
Adeel Ahmad
Support Developer
Aspose Changsha Team http://www.aspose.com/Wiki/default.aspx/Aspose.Corporate/ContactChangsha.html
I am using Aspose.Pdf.Kit 2.5.0.0, Aspose.Pdf wouldn't be needed for extracting annotations right? It is odd the annotation ArrayList still returns nothing.
Do you mind trying another file for me. I have it attached. Thank you!
Yes, I am sorry Aspose.Pdf.Kit is used to extract Annotations. I have reproduce the error. It was working in version 2.4.2.0 but have some problems with latest version. I will discuss this with the developers and we will try to fix it as soon as possible. Sorry for inconvenience.
Thanks.
Adeel Ahmad
Support Developer
Aspose Changsha Team http://www.aspose.com/Wiki/default.aspx/Aspose.Corporate/ContactChangsha.html