Extract visible/rendered Text from HTML

Hello,

currently we extract (visible/rendered) text from attached HTML file (like the text when you load the HTML into a browser) with following code:

var htmlLoadOptions = new HtmlLoadOptions
{
PageLayoutOption = HtmlPageLayoutOption.ScaleToPageWidth
};
using (var pdfDocument = new Document(HtmlFile, htmlLoadOptions))
{
var textAbsorber = new TextAbsorber();
pdfDocument.Pages.Accept(textAbsorber);
return new StringBuilder(textAbsorber.Text);
}

With attached file, a exception (see also attached screenshot) will be thrown (but delayed), which cannot be catched:

System.ObjectDisposedException
bei System.Threading.CancellationTokenSource.ThrowObjectDisposedException()
bei #=z9x_fmliPZS6WYVDrqbMcKbg=+#=zeZvdCM9$OZO$R26MQg==.#=zD3$O$CJn_lkvKVtqxg==(System.Object)
bei System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
bei System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
bei System.Threading.TimerQueueTimer.CallCallback()
bei System.Threading.TimerQueueTimer.Fire()
bei System.Threading.TimerQueue.FireNextTimers()

HtmlExtraction.zip (35,7 KB)

Screenshot 2025-05-12 163802.png (26,2 KB)

Kind regards,
Andy

@AStelzner

We are checking it and will get back to you shortly.

@AStelzner

Can you please share which version of the API have you used and what OS you are working in?

Versions see attachment
2025-05-14 07_55_40-C__dev_komXwork_Dev_LivingData.KomX_src_Runtime – Datei-Explorer.png (3,0 KB)

We work with Windows 11 Enterprise, 24H2, Build 26100.3775

Regards,
Andy

@AStelzner

While testing the scenario with 25.4 version of the API, we did not notice any issues in our environment. Can you please try using the latest version and let us know if you still face any exceptions?

With version 25.4 ist works fine, thanks :slight_smile:

@AStelzner

Its nice to know that things started working for you. Please keep using the API and feel free to create a new topic in case you face any issues.