Extract visible/rendered Text from HTML

AStelzner · May 12, 2025, 2:38pm

Hello,

currently we extract (visible/rendered) text from attached HTML file (like the text when you load the HTML into a browser) with following code:

var htmlLoadOptions = new HtmlLoadOptions
{
PageLayoutOption = HtmlPageLayoutOption.ScaleToPageWidth
};
using (var pdfDocument = new Document(HtmlFile, htmlLoadOptions))
{
var textAbsorber = new TextAbsorber();
pdfDocument.Pages.Accept(textAbsorber);
return new StringBuilder(textAbsorber.Text);
}

With attached file, a exception (see also attached screenshot) will be thrown (but delayed), which cannot be catched:

System.ObjectDisposedException
bei System.Threading.CancellationTokenSource.ThrowObjectDisposedException()
bei #=z9x_fmliPZS6WYVDrqbMcKbg=+#=zeZvdCM9$OZO$R26MQg==.#=zD3$O$CJn_lkvKVtqxg==(System.Object)
bei System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
bei System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
bei System.Threading.TimerQueueTimer.CallCallback()
bei System.Threading.TimerQueueTimer.Fire()
bei System.Threading.TimerQueue.FireNextTimers()

HtmlExtraction.zip (35,7 KB)

Screenshot 2025-05-12 163802.png (26,2 KB)

Kind regards,
Andy

asad.ali · May 12, 2025, 10:13pm

@AStelzner

We are checking it and will get back to you shortly.

asad.ali · May 13, 2025, 8:08pm

@AStelzner

Can you please share which version of the API have you used and what OS you are working in?

AStelzner · May 14, 2025, 5:58am

Versions see attachment
2025-05-14 07_55_40-C__dev_komXwork_Dev_LivingData.KomX_src_Runtime – Datei-Explorer.png (3,0 KB)

We work with Windows 11 Enterprise, 24H2, Build 26100.3775

Regards,
Andy

asad.ali · May 14, 2025, 6:13pm

@AStelzner

While testing the scenario with 25.4 version of the API, we did not notice any issues in our environment. Can you please try using the latest version and let us know if you still face any exceptions?

AStelzner · May 15, 2025, 9:12am

With version 25.4 ist works fine, thanks

asad.ali · May 15, 2025, 7:14pm

@AStelzner

Its nice to know that things started working for you. Please keep using the API and feel free to create a new topic in case you face any issues.