We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Cannot extract text from Form XObject

A pdf file, its real content on each page is in Form XObject of the resources of that page, like this:

v0.png (15.1 KB)
v2.png (70.0 KB)

By the following code, the sdk could get the Form XObject, but cannot get the content

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(filePath);
Stopwatch sw = new Stopwatch();
var pageCount = pdfDocument.Pages.Count;
result.DocumentPageNumber = pageCount;

var p5 = pdfDocument.Pages[5];
var forms = pdfDocument.Pages[5].Resources.Forms;
var form = forms[1];
var ab = new TextAbsorber
    TextSearchOptions = new Aspose.Pdf.Text.TextSearchOptions(false)
        SearchForTextRelatedGraphics = true,
        Rectangle = form.BBox
    ExtractionOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure)

var bc = new TableAbsorber();

var c = ab.Text;

And the Content property of the text absorber is right, but the Text is empty:

v4.png (31.1 KB)
v3.png (56.7 KB)

How to extract text from the Form XObject? Is there anything wrong with the code?

Furthermore, TextAbsorber support Visit an XForm:

public class TextAbsorber
    // Summary:
    //     Extracts text on the specified XForm.
    // Parameters:
    //   form:
    //     Pdf form object.
    public virtual void Visit(XForm form);

But TableAbsorber doesn’t have such a Visit override. It only supports Visit(Page). If there is a table on an Form XObject, how to extract the table?



Could you please attach your input PDF file here for testing? We will investigate the issue and provide you more information on it.