Hi Diego,
Thanks for contacting support.
The reason text is not being extracted using the TextAbsorber object is because the source XML is not in the correct format and its contents are not loaded. Please note that the source XML should be in Aspose.Pdf compatible format and in case you need to use your existing XML file, please try using an XSLT to make it compatible with source XML. Please try using the following code snippet and XML to accomplish desired results.
I would also suggest you to visit the following links for further details on
- Introduction to XML Technologies (Generator)
- Create a Hello World PDF document through XML
- Create a Hello World PDF document through XML and XSLT.
[C#]
Aspose.Pdf.LoadOptions options = new XmlLoadOptions();
// Apply TextAbsorber.
string extractedText = string.Empty;
using (Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document("c:/pdftest/source.xml", options))
{
// Create TextAbsorber object to extract text.
Aspose.Pdf.Text.TextAbsorber textAbsorber = new Aspose.Pdf.Text.TextAbsorber();
pdfDocument.Pages.Accept(textAbsorber);
extractedText = textAbsorber.Text;
}
Console.WriteLine(extractedText);
[XML]
<?xml version="1.0" encoding="utf-8" ?>
<Pdf xmlns="Aspose.Pdf">
<Section>
<Text>
<Segment>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum</Segment>
</Text>
</Section>
</Pdf>