We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extract struct elements and the content of struct element

Does Aspose have the capability to extract struct elements and the content of struct elements?

@nehaani

It looks like you want to extract content from tagged PDF documents. Please check the below documentation article(s) related to extracting tagged content from PDF documents and feel free to let us know in case you notice some missing feature:

I have checked the documentation; I found a method to set text of of struct elements. Is there any method to extract text of struct elements. I have seen a method to extract actualtext but my objective is to extract text of struct elements along with tag.
Example: If there ia a tag H1 and text Heading 1 in a tagged PDF, can you let me know how this can be achieved using Aspose.PDF

@nehaani

Can you please share a sample PDF document for our reference? We will test the scenario in our environment and address it accordingly.

sample-29.pdf (6.1 KB)

@nehaani

You can access and get the Tag as well as Actual Text using below properties:

string tag = structureElement.StructureType.Tag;
string actualText = structureElement.ActualText;

Complete Code:

// Open Pdf Document
Document document = new Document(dataDir + "StructureElementsTree.pdf");

// Get Content for work with TaggedPdf
ITaggedContent taggedContent = document.TaggedContent;

// Access to root element(s)
ElementList elementList = taggedContent.StructTreeRootElement.ChildElements;
foreach (Element element in elementList)
{
    if (element is StructureElement)
    {
        StructureElement structureElement = element as StructureElement;

        // Get properties
        string title = structureElement.Title;
        string language = structureElement.Language;
        string actualText = structureElement.ActualText;
        string expansionText = structureElement.ExpansionText;
        string alternativeText = structureElement.AlternativeText;
    }
}

// Access to child elements of first element in root element
elementList = taggedContent.RootElement.ChildElements[1].ChildElements;
foreach (Element element in elementList)
{
    if (element is StructureElement)
    {
        StructureElement structureElement = element as StructureElement;

        // Set properties
        structureElement.Title = "title";
        structureElement.Language = "fr-FR";
        structureElement.ActualText = "actual text";
        structureElement.ExpansionText = "exp";
        structureElement.AlternativeText = "alt";
        // Get Properties
        string tag = structureElement.StructureType.Tag;
        string actualText = structureElement.ActualText;
    }
}