Possible memory leak in HTMLDocument?

  1. Private Function ReadDatarowCollection(Code As String, Year As UShort, Quarter As Byte) As HTMLCollection
  2. Dim a As New HTMLDocument("http://vip.stock.finance.sina.com.cn/corp/go.php/vMS_FuQuanMarketHistory/stockid/" & Code & ".phtml?year=" & Year & "&jidu=" & Quarter)
  3. a.Dispose()
  4. Dim b As HTMLCollection = DirectCast(a.GetElementById("FundHoldSharesTable"), HTMLTableElement).Rows
  5. Return b
  6. End Function
I ran the function repeatedly with memory usage monitored. The memory usage rose with Dim a As New HTMLDocument, but didn't drop whena.Dispose(). The usage remained the same when the function was called again, and rose again for the New HTMLDocument, then still didn't drop when Dispose().

I think it’s very likely a memory leak. Dispose() doesn’t work. In application the function will be called thousands of times, so such leak is very serious, causing GBs of memory waste. What should I do?

@Silver_Fang

Thank you for contacting support.

Would you please share a sample application reproducing it, including this method and function call with the values of parameters Code, Year and Quarter, so that we may try to reproduce and investigate it in our environment.

With the following simple little demo, you should be able to reproduce it under win10 x64:

Imports Aspose.Html
Module Demo
	Sub Main()
		Dim a As Collections.HTMLCollection
		While True
			a = DirectCast(New HTMLDocument("http://vip.stock.finance.sina.com.cn/corp/go.php/vMS_FuQuanMarketHistory/stockid/000722.phtml").GetElementById("FundHoldSharesTable"), HTMLTableElement).Rows
		End While
	End Sub
End Module
</code>
In my test, the memory usage gradually grew up to 1GB and showed no signs of limit. Moreover, the Dispose() method did no good:
<code>
Imports Aspose.Html
Module Demo
	Sub Main()
		Dim a As Collections.HTMLCollection, b As HTMLDocument
		While True
			b = New HTMLDocument("http://vip.stock.finance.sina.com.cn/corp/go.php/vMS_FuQuanMarketHistory/stockid/000722.phtml")
			a = DirectCast(b.GetElementById("FundHoldSharesTable"), HTMLTableElement).Rows
			b.Dispose()
		End While
	End Sub
End Module

In this one each HTMLDocument is disposed after data extracted, but the disposition doesn’t suppress the memory growth at all.

@Silver_Fang

Thank you for sharing requested data.

We have have been able to notice huge memory consumption, in our environment. A ticket with ID HTMLNET-1205 has been logged in our issue management system for further investigation. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.

Any workaround? It’s excruciating to find a substitute functionally handier than Aspose.

@Silver_Fang

The issue reported by you has just been logged in our issue management system. We will be able to share our findings once it is investigated in our environment. Please be patient and spare us little time.

@Silver_Fang

Thank you for being patient.

We would like to share with you that the issue reported by you, HTMLNET-1205, has been resolved in Aspose.HTML for .NET 18.6.. However, instead of using:

b.Dispose()

Please use below code snippet, where b is the instance of HTMLDocument class.

b.Context.Dispose(); //this code frees maps with JS timers
b.Dispose();

We hope this will be helpful. Please feel free to contact us if you need any further assistance.

The issues you have found earlier (filed as HTMLNET-1205) have been fixed in this update.

Hi

We are noticing memory leak using HtmlDocument java version. We are using aspose.html java version 19.9. Is this issue resolved in java version as well. Are you still suggesting to use below 2 code lines to free the memory or this 19.9 version is supposed to free up memory without these 2 lines of code? Please confirm.

b.Context.Dispose(); //this code frees maps with JS timers
b.Dispose();

@paypaluser

Every Java release is ported from the equivalent version of .NET API which means the same fix/improvements would be included in both versions.

You can use these lines as a workaround to free the memory however, it is not recommended as API should free up the used resources after process completion. In case you are still noticing any memory leak with latest version of the API, please share some sample code snippet along with screenshots and input files. We will test the scenario in our environment and address it accordingly.

Yes, we do face memory leak issue. Here is the code we use to parse the html file and attached the input html file.

public void convertToMarkupTemplate(byte[] content, String destTemplateFile, boolean conditonalSection){
        HTMLDocument dct = new HTMLDocument(new ByteArrayInputStream(content),".");
        //Adding default date format in which the date value will come in which is iso
        addDateFormat(dct);
        //Adding style to fix column data overlapping in Russia locale. This style will wrap the data instead of overlapping.
        addStyle(dct);
        NodeList nodelist = dct.getDocumentElement().querySelectorAll("td");
        for (Node node : nodelist)
        {
            HTMLTableCellElement td = (HTMLTableCellElement)node;
            String text = td.getTextContent();
            Node textNode = dct.createTextNode(text);
            Node tr = td.getParentNode();
            if(text.contains("#list") || (text.contains("#if") && conditonalSection)) {
                //System.out.println("I am here" + text);
                Node parent = tr.getParentNode();
                parent.insertBefore(textNode, tr);
                parent.removeChild(tr);
            }
            else if(text.contains("#if") && !conditonalSection) {
                tr.insertBefore(textNode, td);
                tr.removeChild(td);
            }
        }
        dct.save(destTemplateFile);
        dct.getContext().dispose(); //added now
        dct.dispose(); // added now
    }

MainReport_Template.html.zip (13.0 KB)

Attaching the memory analysis we did with pictures showing big objects - Gmail - Re_ Compgireportmgmtserv profiling for memory leak.pdf (486.6 KB)

Hope you are working on it. Please keep us posted as this is our live issue. We are paid customer, if you want us to post the same in paid support o give this a priority we can do that. Let us know how do you want to take it further.

@paypaluser

Thanks for providing the requested information.

We have logged an issue as HTMLJAVA-566 in our issue tracking system for this scenario. We will further investigate it in order to improve the memory consumption and inform you as soon as it is resolved. Please be patient and spare us some time.

We are sorry for the inconvenience.

PS: In free support model, the issue will be analysed/resolved on first come first serve basis and we will be able share any update/ETA once the investigation is done. In case the issue is a blocker for you, we recommend to post it under paid support forum so that it can be escalated on urgent basis.

Thanks. We have already posted in paid support as this is our priority issue… Hope you can take it as priority

@paypaluser

Have you opened a new ticket in paid support or created a ticket there with the reference of issue ID logged in this thread?