Jason Sheedy's blog

An attempt to record some of my thoughts on the alphabet soup of software engineering and web development technology .... Java, Flex, ColdFusion, Linux, etc, etc

Cold Fusion Memory Leak

Cold Fusion Memory Leak

The Problem

For the past year or so one of our cold fusion server instances has been consuming more and more memory. I thought it was because the application load had increased, but the main problem seems to be a memory leak in cold fusion.

I increased the max heap size and tweaked the jvm config until I was blue in the face, but this only had the effect of delaying the inevitable. To compound the issue, one of the servers in our cluster died, forcing the entire load through one server. It was a kind of blessing in disguise, in that it forced the memory leak to show its ugly little face more rapidly. The server would not make it through a 24 hour period before it would eat up all its allocated memory and eventually crashed horribly.

The Analysis of the Problem

I started by enabling metrics logging in the servers jrun.xml file. By doing that I could see the state of the server at hang time and noticed that the memory was steadily increasing up until it hung. It seemed like garbage collector was on holidays.

Ok .. so now I know that it's memory usage problem, my initial reaction was to think that there must be something wrong with my app. I start thinking about all the places where I'm caching data and looking for areas where I can improve the memory management of the app. I thought that soft references may be the answer to my memory woes and read through these interesting articles with the view to using them for caching.

http://www-128.ibm.com/developerworks/java/library/j-jtp11225/index.html

http://www-128.ibm.com/developerworks/java/library/j-jtp01246.html

The next question in my mind is, What is using all the memory? I started looking for java memory profiling tools. Hprof looked promising, but is really only useful for profiling J2SE applications, not J2EE apps. HAT was also another option, but would only be useful for analysing static heap dumps. What I needed was a real time memory profiler and Borland's Optimizit has done the job nicely. There's a very good article here on integrating Optimizit with Jrun 4.

http://info.borland.com/techpubs/optimizeit/optimizeit6/integ_guide/JRun4.html

A ten day trial version can be downloaded here.

http://www.borland.com/downloads/download_optimizeit.html




Mike Schierberl's Blog Post on ColdFusion Memory Leaks

Building on Mike's 3 posts on the cold fusion memory leak he discovered, I created some more test cases and can confidently agree with most of his conclusions. There is definitely a memory leak in CF.

http://www.schierberl.com/cfblog/index.cfm/2006/10/12/ColdFusion_memoryLeak_profiler

http://www.schierberl.com/cfblog/index.cfm/2006/10/16/memoryLeak_variablesScope

http://www.schierberl.com/cfblog/index.cfm/2006/10/25/memoryLeaks_session

He suggest using BEA Jrockit JVM in place of the built in Sun jvm in JRun. The only problem with that is that it raised a few questions about whether or not it was the JVM that was causing the issue or actually CF's problem. As I mentioned previously, I got Borlands Optimizit profiler working with the built in jvm in Jrun. I ran the same tests as in Mike's posts and have replicated his results.

The test case I created actually shows that the problem is more severe that what Mike is suggesting. Under the particular conditions of the test case, it appears as though any cfc's that are instantiated in the variables scope of the calling cfm page are actually kept in memory for the life of the app. Not only that, but the temporary class created by CF are retained in memory for the life of the app. The problem seems to be in the garbage collection of the temporary CFM classes.

 

Test Case

I tried to simplify Mark's examples a little to remove any un-necessary complexity. The source code for my test case is listed here:

testCase.cfm
------------------------------------------------------------------------------
<CFapplication NAME="memory_test"
    sessionMANAGEMENT="YES"
    sessionTIMEOUT=#CreateTimespan(0,0,0,10)#
    SetClientCookies="no" clientmanagement="no">

 
<cfif not structKeyExists(application,"test") OR isDefined("URL.reset")>
    <cfset application.test = createObject("component","test").init() />
    <cflocation url="#cgi.SCRIPT_NAME#" addtoken="false">
</cfif>

<cfset foo = application.test />
<cfset fooB = createObject("component","testBBB") >
<cfset session[createUUID()] = fooB />

<cfoutput>
    <a href="#cgi.SCRIPT_NAME#?reset=true">reset application.test</a>
</cfoutput>

<!--- <cfset structClear(variables) /> --->
------------------------------------------------------------------------------


The cfc's I'm using are as simple as can be, but you could use any cfc to demonstrate the issue. i.e.

<cfcomponent name="test">
    <cffunction name="init">
    <cfreturn this >
    </cffunction>
</cfcomponent>

The first time you run this script you get the following output from the profiler.



Note: I'm filtering the class list using *cftest* on the classname

You can see that there are two instances of testCase.cfm, 1 instance of test.cfc and 1 instance of testBBB.cfc. After the session times out (after 10 seconds), the garbage collector should take testCase.cfm and testBBB.cfc back to 0 instances. This doesn't happen, It appears that the number of cfm instances retained is directly proportional to the number of cfc instances retained.

After running it again you get the following output. You can see that the cfm instance count has increase by one and the testBBB.cfc count has increase by one. test.cfc stays the same since it's being referenced from the existing instance in the application scope.



After the session times out again, the garbage collector does collect the new instances.



Clicking the link to reset application.test produces the following.



Session times out, garbage collection is done again and it returns to the base state.



Now for the tricky part , refresh the page 5 times and then click the link to reset the application.test instance.



After the session times out and the garbage is collected the leak is becoming more evident. The five instances of test.cfm and testBBB.cfc we created before we reset application.test are retained in memory.



Refresh the page again..



Garbage is collected ..



The leaked objects are maintained on the heap.

The Solution

Now what we have done as per Marks recommendation is to clear the variables scope on request end. This is more of a work around rather than a solution, but it has stopped our servers from hanging. Before this our server was maxing out at 500mb and crashing and now the same server is humming along at around 250 mb heap usage. If this isn't a leak I don't know what is.

One minor hiccup that I ran into with this work around is that I had to get rid of application.cfc and replace it with application.cfm .. onRequestEnd.cfm won't work in conjunction with application.cfc. I did try using structclear("variables") in the onRequestEnd method of application.cfc, but it was only clearing the local scope of the cfc and not the variables scope of the request.

Conclusions

It has been a very time consuming exercise in diagnosing this issue and I would much rather have spent the time working on my actual development projects. However, the upside is that I,ve learned a hell of a lot about Java memory management.

I am planning on passing this report onto Adobe in the hopes that they acknowledge the issue and release a fix. For now we will continue to operate with the work around.

Resources

Java Performance Tuning Tools

http://www.javaperformancetuning.com/tools/index.shtml

HeapRoots

http://www.alphaworks.ibm.com/tech/heaproots

HProf

http://java.sun.com/developer/technicalArticles/Programming/HPROF.html

Java Performance Analysis

http://java.sun.com/developer/onlineTraining/Programming/JDCBook/perf3.html

HAT, Java Heap Analysis Tool

https://hat.dev.java.net/doc/README.html

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Kudos for performing an exhaustive and well documented test. I am sure the CF Team has seen this by now. It will be interesting to see how they respond.


Well done.
# Posted By Dan Wilson | 4/14/07 7:41 AM
Thanks for the hard work on investigating and documenting this issue. We've been looking into a very similar JVM/memory issue for the past 6 months that's only arisen since moving to using CFCs for key functionality. We've implemented one of your recommended workarounds so will monitor and feed back the result.
# Posted By Ed | 4/14/07 9:21 AM
Thanks, we're investigating.
# Posted By Damon Cooper | 4/14/07 9:57 AM
Cheers Guys. I'm looking forward to seeing what come's out of your investigations Damon.
# Posted By Jason | 4/14/07 4:25 PM
btw Jason, are you deploying to your servers using CF's J2EE functionality? we are...
# Posted By Ed | 4/19/07 9:52 AM
There is a hotfix now available for this issue. Please see http://www.adobe.com/cfusion/knowledgebase/index.c...

This should fix the problem.
# Posted By Hemant | 4/24/07 11:58 AM
Wow. That was fast!
# Posted By Dan Wilson | 4/24/07 1:04 PM
I came to say the same thing Dan... that /was/ fast.
# Posted By Sammy Larbi | 4/24/07 1:19 PM
Cheers Hemant, I'll remove my work around and let you know if it fixes the problem. Much appreciated. :)
# Posted By Jason Sheedy | 4/24/07 6:56 PM
Jason, I'm curious - did any of the profiling tools you used also give you the size in memory of the classes (I realise 1 class != instances) ?
# Posted By kola | 4/25/07 6:58 AM
Hi Kola, optomizit only showed total instances, not memory usage per instance. There where some features in there that i didn't really try, so it may still have that functionality available. Also, just a note, I rang Borland to get pricing on Optimizit and was told that it has been depreciated in place of some other enterprize suite they're developing.. can't remember what it was called though, but it may have what you're looking for.
# Posted By Jason Sheedy | 4/26/07 6:18 PM
FWIW, this hotfix has corrected some chronic memory problems that surfaced once we started widely employing CFCs. My opinion of the scaleability of CF OOP is now on the rise :0)

I think we owe Mike and Jason a huge vote of thanks for this - I don't even want to think about the number of hours they must have burned. Adobe, how about coughing up a couple of trips to MAX or something?

BTW, did anyone notice the date on that KB article? This fix was released last month along with 7.0.2 cumulative hot fix 2.

Jaime Metcher
# Posted By Jaime Metcher | 4/27/07 3:58 PM
Cheers Jaime, I don't know if the trip to MAX would make up for my frustrations at this point. Maybe a trip to the moon or free lifetime upgrades to CF server enterprize. Sadly, the saga continues.... After running the hotfix for close to a week, it seems the memory is still not being cleaned up properly. I'm going to re-introduce the onrquestend.cfm structClear(variables) work around and see how it goes. Alternatively, I may become a fisherman .. how hard can it be ???
# Posted By Jason Sheedy | 4/27/07 7:00 PM
Jason, that's a real bummer to hear you say the problem remains. Have you mentioned this to Damon? He posted a link to this thread on his blog when he announced the fix (http://www.dcooper.org/blog/client/index.cfm?mode=... You may want to leave a comment there if your problem really does remain. I'm sure many of us would like to hear.

BTW, are you saying you applied the hotfix? Also, it appears from the "Additional info" section of the technote (http://www.adobe.com/cfusion/knowledgebase/index.c... that you need to apply this fix as a manual extra step. Just confirming that you caught and did that.
# Posted By Charlie Arehart | 4/30/07 10:21 PM
Hi Charlie, previously I added the hotfix manually to our servers as suggested and the memory usage problem re-appeared. I have re-introduced the work around to see if it fixes the problem, but don't have any solid results at the moment. i'm going to monitor it for a couple more days before I make any conclusions.
# Posted By Jason Sheedy | 5/1/07 12:12 AM
I am having a similar issue with CF 6x. Is this a problem in 6x also? Will moving to 7x resolve it?
# Posted By CHRIS ASHE | 5/1/07 10:49 AM
Chris, not sure about 6 I haven't tested it.
# Posted By Jason Sheedy | 5/1/07 5:51 PM
I am 200% sure there is memory leakage.

The solution I found is to store all complex objects in server scope rather in Application scope or sesssion scope.

my suggestion is to store all complex object in server scope and you will see there is no memory leakage problem.

I am happy to help anybody who is struggling like these issues.

Thanks
Sana
# Posted By Sana | 5/21/07 12:37 PM
I had also installed the hotfix, without any luck. Even if it is related, it is poorly described as being related to file upload issues. I'm trying clearing the variables onRequestEnd.cfm I've also tried every possible jvm config known to man, and always the heap grows despite gc.
# Posted By Ryan Brueske | 5/25/07 2:56 PM
@Ryan Brueske
are you using any application scope to store CFC or storing query results in Application scope.
If then use server scope rather Application, you don't need to call GC yourself, calling GC does not make any effects. If still problem persist then you can catch me on sanaullah73@gmail.com

Thanks
Sana
# Posted By Sana | 5/25/07 4:38 PM
Ok, let me dissent for a second.

I have also spent a lot of time trying to resolve a memory issue with our CF server. It was about 6 months ago, on a CFMX6.1 server (W2K server, now W23K server). I turned on some extensive logging, and found that with my jvm tweaked settings, that the garbage collection was running about every 3-5 seconds or so. The data in the young generation would be promoted to the permanent generation when a request spanned that 5 second garbage collection gap. Meaning that if you had a long running page, the variables you instantiate would end up in the permanent generation.

Now, that's actually Ok, because a full garbage collection would run when the memory usage finally made it up to the max memory setting in your config file, and at that time the jvm would release that memory from the permanent generation for re-use. We noted here that the task manager still showed our max memory setting (~1.3Gb), even though the garbage collection logging showed that memory usage was released down to under 100Mb. I now no longer am concerned by seeing the 1.3Gb in the task manager like I used to be. In fact, we run the whole day just fine at our max, because in actual fact that only represents what's been allocated to the jrun.exe, and not what is actually in use by the jrun.exe. We do restart every night.

Ok, having said all that, our server, which had been running fine with my tweaked config for over 12 months, just started dying on us again. 4 hangs now in the last 3 weeks, and my client is getting worried, as am I. So, I start looking for the problem again, and find this blog post.

I just thought I'd mention that my previous issue looked like a CF memory leak, and smelled like a CF memory leak, but in actual fact was not technically a CF memory leak. My testing showed that the memory would be eaten up (i.e. transferred to the permanent generation) when the page request spanned the garbage collection interval - which of course is extremely easy to cause by hitting F5 5 times fast. Even a simple page would cause the young generation variables to make it into the permanent generation when run multiple times, which is quite a common occurrence with web programming in general (think multiple users - and especially if you use some CFLocks to protect inserts and the like). In this situation though, the full garbage collection should release that memory again when it runs, as long as those variables are not used again. Therefore, perhaps the issues you guys have been having are not in fact a memory leak inside CF, but incorrectly tuned jvm settings? Perhaps you've been too concerned about the instances not being immediately released, when instead you should be concerned about why they do not get released from a full garbage collection? ... I guess you could argue that what I have described IS what a memory leak is... I'm just saying that I don't think it's caused by a programmatical bug in CF more than it simply is just an incorrectly tuned jvm.

Also note that the posted hotfix does not actually apply to the problem as you've described it. It stated it was for a CFFile Action="Upload" problem that they had discovered, and not a generic variable scope instantiation not being release from active memory.

Back to my current problem, I am almost certain it is because we are now storing some configuration data in the application scope, and I didn't realize how much data it was going to be until I just saw how big it was today. A struct with 4,000 odd stucture of arrays... Although I don't think I can tell exactly how many bytes it is using, I'm fairly confident that it is the reason for my hangs now - that struct has (clearly) made it into the permanent generation, and is in constant use (it's there all the day long), so it will never be cleared even by the full garbage collection process. I either have to re-tweak my config, or more likely rethink that idea of using the application scope for this large struct.

I may give that StructClear(Variables) a try, but I actually don't think that will help my problem (mine's in the application scope, and I can't very well clear that every page request, can I?)! :)

I hope I didn't muddy things too much for anyone out there. I have been very happy with our (Macromedia-assisted) jvm settings, and in case you are interested, here they are:
java.args=-server -DJINTEGRA_NATIVE_MODE -DJINTEGRA_PREFETCH_ENUMS -Xms1284m -Xmx1284m -XX:NewSize=64m -XX:MaxNewSize=64m -XX:SurvivorRatio=3 -XX:PermSize=64m -XX:MaxPermSize=128m -XX:+UseParNewGC -XX:+DisableExplicitGC -Dsun.rmi.dgc.client.gcInterval=600000 -Dsun.rmi.dgc.server.gcInterval=600000 -Dsun.io.useCanonCaches=false -Xbootclasspath/a:"{application.home}/../lib/webchartsJava2D.jar" -Djavax.xml.parsers.SAXParserFactory=com.macromedia.crimson.jaxp.SAXParserFactoryImpl -Djavax.xml.parsers.DocumentBuilderFactory=com.macromedia.crimson.jaxp.DocumentBuilderFactoryImpl

And yes, I did mean Macromedia and not Adobe. Sarge helped us.

Feel free to disagree with me. I am just posting the fruits of my similar days-long trudge through the jvm memory settings swamp. I had thought I had left it behind me... :)

Good luck!
Steven Van Gemert
# Posted By Steven Van Gemert | 5/26/07 2:01 PM
Check that, that first line - 6 months should read 12 months. We'd been going just fine for a full 12 months now.

Steven Van Gemert
# Posted By Steven Van Gemert | 5/26/07 2:14 PM
Hi Steven, thanks for the long and detailed explanation. I feel your pain. There are a couple of things I'd like to clear up about what you said. Are you sure the full GC is only executed when the heap reaches the max heap size? I don't think this is the case. If you enable metrics logging in your Jrun.xml file you'll should see a full GC more often than that. The tests I ran were done with full GC every time, so this is clearly not the problem in my case. You can see in my <a href="http://www.jmpj.net/jason/index.cfm/2007/5/11/Cold...">follow up post</a> that it's requirement in CF7 to have -XX:+UseParallelGC enabled. Without this enabled my servers where demonstrating the symptoms you describe. I'm also wondering why you have the -XX:MaxPermSize set so low. If as you say, "if you had a long running page, the variables you instantiate would end up in the permanent generation.", I think you're actually refering to the tenured generation. There's a really good article here on it at sun.com. http://java.sun.com/docs/hotspot/gc1.4.2/ and a really good summary here by Peter Freitag http://www.petefreitag.com/articles/gctuning/
# Posted By jason | 5/26/07 3:18 PM
Well, it's been a while since my head was deep into this. I could easily have mistaken something.

>Are you sure the full GC is only executed when the heap reaches the max heap size? I don't think this is the case.
I think you are right. I found that there was a gc happening often, and that it would clear the used memory back down to what it was before the page execution. However, when you run multiple pages, and cause objects to make it to the tenured generation (man, I may still be wrong with where it goes - I mean when it leaves the young generation because it is still in use), the overall memory usage rises. When it reaches the limit, at that point there is a more complete garbage collection performed because there is less than x% of the room left in the tenured generation (again now not sure of which generation). I think that x% was configurable, but I thought that at that point a more complete gc takes place, which would then release a huge chunk of memory back for availability to the jvm. This more complete one was only triggered because that threshhold was reached. Thus I called it a "full gc".

At least, that's what I saw. My memory creeped up over 5 or 6 bouts of holding down F5 for 30 seconds at a time, and when it reached the top, having had many (read 100) gc's occur but not clear out the memory, when it reached the top a final gc actually released a huge chunk (over a gig) of memory in one collection, that took quite a long time (5 seconds or so) compared to the others that took a few to a teen milliseconds each.

Again, that's what I remember, and now that I've thrown those log files away (oh, and it's a different server now too), I don't think I can verify it. Take it for what it's worth I guess...

>...it's requirement in CF7 to have -XX:+UseParallelGC enabled.
I recall finding some documentation that the new parallel gc was better than that. In fact, at one time I knew the exact reason why, and determined that I'd better use the new one or I'd crash and burn, hence the "XX:+UseParNewGC" in my jvm config.

>I'm also wondering why you have the -XX:MaxPermSize set so low. If as you say, "if you had a long running page, the variables you instantiate would end up in the permanent generation.", I think you're actually refering to the tenured generation.
Yeah I was referring to the tenured generation. My current problem might be that the tenured is making it to the permanent, and I don't have enough room for that (thanks for correcting me by the way).

My permanent so low? I thought it was quite large actually... I recall testing larger perm generation sizes and noting no performance difference in my testing. Of course, my testing might not simulate real-life usage, but it was what I went with. I think 128MB is quite a large permanent generation... you disagree?

Steven Van Gemert
# Posted By Steven Van Gemert | 5/26/07 3:49 PM
Steven, there is a lot to remember and every time i come back to it, I seem to have to re-learn it all again. I just meant that your perm size seemed low in relation to your max heap size. I tend to work on the principle of having about 1/3 set aside for perm gen. It really depends on the architecture of your app though. If you have a LOT of concurrent users and not much stored in application and session, I would understand the ratio you're using and from what you said about the state of the heap after the full GC being 100mb, this is probably true.

I'd really like to see how your server handled things if you didn't restart it every day. Also, if your heap is in reality only using around 100mb, i think you could reduce your max heap size considerably and rely on the GC to clean things up more regularly.

-XX:+UseParNewGC is only going to clean up the young generation, so i think you'd find using -XX:+UseParallelGC will make your fullGC start working properly.
# Posted By jason | 5/26/07 7:00 PM
We have gotten slightly off track on this tread and I don't want to forget to mention that there still seems to be a leak when referencing CFC's in the variables scope. I'm still employing the structClear(variables) hack to work around the issue.
# Posted By jason | 5/26/07 7:04 PM
Hi,

I think we have nailed down a similar problem and fixed it as part of our
efforts on performance and scalability.

We will see if we can get this available on CF7.0.2.

Thanks,
Hemant
# Posted By Hemant | 5/27/07 8:24 AM
Hi Hemant, are you on the CF engineering team? Are you thinking of releasing a hotfix for the issue?
# Posted By jason | 5/28/07 12:41 AM
I am very interested in this issue and have seen the same results.
There is also an issue with large numbers of DSN's on the same box.
# Posted By Bill | 6/1/07 9:37 AM
Would setting up multible instances allow you to use 2gb per instance?
# Posted By Bill | 6/1/07 9:40 AM
Hi Bill, if you had 2Gb of physical ram. You'd need to allocate a portion to the OS and other system tasks and the rest can would divided between you're separate J2EE servers. You'd need to allocate memory to each instance based on it's usage.
# Posted By Jason Sheedy | 6/1/07 5:46 PM
Just an update:

I mentioned earlier that the hotfix cured our memory problems. Well, it cured the fast leak. A slow leak remained in both the tenured and permanent generations. Clearing the variables scope in onRequestEnd has cured the slow leak in the tenured generation. I haven't yet been able to reproduce the perm gen leak on my test environment.

Jaime Metcher
# Posted By Jaime Metcher | 6/22/07 8:14 PM
Cheers Jaimie, thanks for the confirmation. It's good to know that I'm not barking mad. Based on the symptoms from our production environment, I would agree that there is still some kind of slow leak remaining even with the onRequestEnd work around in place.
# Posted By Jason | 6/22/07 8:51 PM
Having what looks like the same memory leak problem...
causes me to restart ColdFusion 2-3 times per day when site is under load.
This site uses lots of CFCs. Other sites that do not use CFC's have no problem.

Have installed 702 and the bundled hotfixes available as of today and not seen any resolution.
Have setup the workaround to clear out variables structure as well (which may have slowed the leak some).

Mostly just documenting this to be updated on any further developments, but if anyone has
any other ideas, please feel free to reply.

Anyone tested this on CF8 RC?
# Posted By patiman | 7/18/07 5:46 AM
Hi patiman, still no joy here. I've setup a cf probe to restart my servers automatically. I've not heard anything more about this from the engineering team at Adobe. I guess they're all busy with CF8. I haven't tested this with CF8. Having too much fun with AIR/Flex at the moment. :)
# Posted By Jason | 7/18/07 2:15 PM
is everyone running cf on jrun here? one potentially related thing I've noticed is that if one instance is having a memory meltdown, other instances go at the same time. theoretically this shouldn't happen, so i'm wondering if this is more of a jrun problem than a cf problem.
# Posted By Ryan Brueske | 7/18/07 3:16 PM
Yep, on JRun but only the single instance setup. Total memory is now stable up to 500,000 hits (haven't tested beyond that). Not sure what's happening with the perm gen because the GC logging keeps failing.
# Posted By Jaime Metcher | 7/18/07 4:35 PM
I'm running multi instances of CF under Jrun and only the CFC heavy server is affected. My old fusebox 3 site takes a lickin and keeps on kickin ;)
# Posted By Jason | 7/18/07 7:05 PM
Hi,
this article is very interesting! I'm investigating to resolve memory leak problems on my application, so I installed OptimizeIt. Can someone make to know me why I don't see cfc and cfm in class name list?
Thanks
# Posted By Gualtiero Sappa | 7/24/07 4:15 AM
Just wanted to chime in that I'm having the same memory related problems with CF7 fully patched. Going to add in the onRequestEnd workaround to see if that helps, but I'm not holding out too much hope.

I'm currently running java version 1.4.2_15. Has anyone tried to upgrade to 1.5? Grasping at straws here...
# Posted By joe | 9/11/07 12:42 PM
Hi Joe, i've still had no response from Adobe on this. The performance in cf8 is much improved, so I'm guess the problem is fix in there. It doesn't help much for those people still using cf7.

Regarding running cf7 on Java 1.5, it does work to a degree, but some things break. Don't remember exactly what, but when i tested it i ran into problems.
# Posted By Jason | 9/11/07 5:25 PM
Well, I'm having mixed results. I installed the patch from Adobe, which didn't do a thing. Memory kept climbing to infinity.

So I added the structClear(variables) statement to the bottom of our onRequestEnd.cfm. Immediately memory usage plateaued, and after a few hours it began to gradually decrease and level off at around 50%. I was stunned that this worked. I was ready to dance on my desk.

Problem is that after a 10 hour period, the memory started climbing again and the server needed a restart. Once we restart it, memory hangs around 50% usage for about 8-10 hours before it begins the climb to 100% again. The leak is about 1 meg every 6 mins for us.

I am so close to having this problem mitigated (resolved isn't the right word here). Anyone have a clue why this structClear technique is causing this new behavior? At least the problem has gotten a little better. Things have definately improved after adding that structClear, but we're still not quite stable.
# Posted By joe | 9/14/07 9:54 AM
Joe, we're in the same boat. The structClear(variables) work around has reduced the leak, but not removed it completly. My guess is that there's still some objects leaking out through the the other scopes ... session, request, etc. We still have issues, but are working around it by using a cold fusion probe to restart the server after it becomes un-responsive. At this stage our servers a still crashing around once a week. Please let me know if you find a solution. I just don't have the time to look into this any further at the moment.
# Posted By Jason | 9/14/07 10:00 PM
That's precisely the problem. I don't have time to be dealing with this!

I think I'll bring this up at the expert desk at MAX. Maybe they have a resolution.

Regardless, thank you for all the time you sunk into figuring out what the problem was. You've saved us countless hours.
# Posted By joe | 9/15/07 11:52 AM
no probs Joe.. Please let me know if you find out anything.
# Posted By Jason | 9/15/07 6:02 PM
Folks, I've been watching this thread for months and I know that many want to assert there's a memory leak. but with all due respect (and I know some of you have really investigated this thoroughly and are convinced), I'm one of the folks who is inclined to think more of these problems are not a memory leak but something else, and I'm prepared to put my money (time) where my mouth is to help you diagnose it.

I see a couple of you saying you don't have time to investigate--do you have time to let someone else investigate for you or with you? Sure, I realize you may not want to just give remote access to someone, myself included. But would you be willing to spend even a half hour working together over a remote shared desktop (like Adobe Connect) and phone call, both on my dime? I'm willing to offer that to help dig into this more. I'd love to help see it resolved (though it may be different things for different shops).

I have some ideas of things I'd like to explore, and rather than outline them all here, I'd prefer to work with someone experiencing this problem.

If you're interested, or if someone finds it in the future, drop me a note at charlie@carehart.org. If I get too many requests, I may not be able to take them all up, but certainly we'll report here if we find anything meaningful.
# Posted By Charlie Arehart | 9/15/07 8:03 PM
Hi Charlie, thanks for offering to help out. I'd be happy to let you have a look at our setup and run a few tests. Can you please email me directly to talk about it? I'm wondering what tests you're thinking of running that may shed some light on the issue.
# Posted By Jason | 9/17/07 7:08 PM
I wanted to share what we found out over the last couple weeks with regards to this memory leak problem.

First, thanks Charlie for spending some time taking a look at our setup. After talking with him, we backed off our session variable timeout to 5 hours. This helped a little, but we were still slowly climbing to 100% memory usage.

We also installed the latest CF updater, which had no effect.

Then, at the suggestion of a colleague, we disabled &quot;Maintain database connections&quot; in the admin and restarted the server. Ever since that change, memory usage has been hanging around 20% for about a week.

We do have the latest macromedia_drivers.jar file installed (the 3.5 drivers), so I don't think they were the issue. It seems as if CF had a problem clearing out those connections from memory, resulting in the problem we were seeing.
# Posted By joe | 9/28/07 9:01 PM
Joe, thanks for sharing.. i'll try switching off maintain db connections and see if it fixes things.
# Posted By Jason | 10/5/07 5:14 PM
As for Joe's comment, and how after talking with me they &quot;backed off our session variable timeout to 5 hours&quot;, I'll certainly say on everyone's behalf that of course that's still a very unusually high value and I urged him to lower it more for a variety of reasons. As is often the case, he felt the business benefits outweighed the performance and security concerns.

Interesting to hear that it was the maintain connections setting. Hey Joe, can you report what DBMS it is (I can't remember). It could be useful as people consider this.
# Posted By Charlie Arehart | 10/7/07 1:49 PM
In follow up to a few of the previous posts ......

Joe, I've been running with persistent DB connections switched off for the past few weeks and it doesn't seem to have made any difference. It was worth a try, but it looks like the memory leak saga continues.

Charlie, sorry I haven't had a chance to catch up with you yet. I'll try contacting you later this week to see if we can have a look at the issue together.
# Posted By Jason | 11/13/07 4:25 PM
Hey All - I believe I have just isolated what I believe many of you have been experiencing, at least for Joe it sounds exactly like your problem.

Working around the issue here has leveled our JVM memory consumption and we have been running smoothly for days now.

The bottom line: NEVER set a component to the session scope from within another CFC.

I have a test case/code that will reproduce the problem which I can send anyone who might be interested analyzing it. Email me - brett@webflint.com

.brett
# Posted By Brett S | 12/16/07 2:01 PM
Brett,
that IS only part of the problem .. it also applies to CFC's set in any other scope... i.e. variables, application, request ... but not var. In terms of good design practice, it's not a good idea to reference the shared scopes throughout your cfc's as it breaks encapsulation and creates bad dependencies. I use a session facade component to reference any variables/cfc's stored in the session scope.
# Posted By Jason | 12/16/07 4:30 PM
Jason,

The particular problem I found exists _even_ if you pass the SESSION through to the component as an argument, sorry I wasn't clear on that point before.

This would still lead me to believe your session facade is the problem, as it was in mine.

I have about 10 lines of code that will reproduce it if you want to look closer and see if it relates to what you are doing.

.brett
# Posted By Brett S | 12/16/07 4:45 PM
Brett/Jason, what version of CF are ya'll using? Brett, have you tested this under CF8? Is the problem there also?

Patrick
# Posted By patiman | 12/16/07 8:58 PM
Patrick, no i haven't tested this in CF8.

Brett, I agree with you that the session Facade is part of the problem, but IT SHOULDN'T BE !! The problem also exists in plain old cfm files in the view layer where cfc's are referenced in the variables scope. This is clearly a memory leak and problem with CF.
# Posted By Jason | 12/16/07 9:24 PM
patiman - The issue I have run into is on a fully patched version of CF8. Confirmed using the JRockit and Sun JVMs. I have not tested it on CF 7.

Jason - Yes my issue is a clear bug where stranded component classes can live for the life of a session and are not released for garbage collection after the request or cfc function call is completed, even though they are never assigned to the Session scope. The build up of these stranded components lead to the JVM memory being consumed and CF hanging (10s of thousands of component classes in my case).

I have not found a work around other than not assigning components to the Session scope inside of other CFCs. It does not matter if you pass in the session scope as an argument either.

Again if someone want to look at my code to reproduce this let me know, I'd like a better understanding of it myself.
# Posted By Brett S | 12/18/07 11:17 AM
I have run across a similar issue recently, although my issue has to do with discarded objects persisting for the life of a request. My challenge is to write an indexing routine that takes data from the database and pushes it to the Apache Solr search engine for indexing. In a given environment, batch indexing could involved pushing tens of thousands of records from the database into the search engine.

Using beans to hold each record from the database, I blew out the heap within a few thousands records. I spent several days writing and re-writing routines in procedural and OO code styles to see what worked best. I found that a fully procedural indexing routine was able to index 10,000 documents without ever exceeding 60 MB in the heap. Even the most carefully written use of components resulted in a heap size of 137 MB, while earlier iterations blew the heap out at 500 MB.

I just wrote a series of blog posts at blog.emergentpath.com on the subject if anyone wants to take a look at what I did. My general feeling is that CF needs a way to set objects to NULL and mark them for GC. Otherwise, as developers we will always be at the mercy of the underlying engine to correctly figure out which objects to retain and which to destroy.
# Posted By Rob Munn | 1/13/08 6:19 PM
We have used JRun and ColdFusion since 2004.
All versions appear to have leaks and do not scale well out of the box.

To make the leaks go away and get good scalability before we start JVM tuning we do this:
- Implement your database connection pool with Apache DBCP. The Macromedia/Adobe pool is a horrid pig.
- Use type 4 JDBC drivers which do not come from Data Direct, Microsoft, or Macromedia/Adobe. SQLServer is particularly bad, and you have to use jTDS.

We do this for all our new configurations and its like night and day.
# Posted By alg | 3/12/08 10:01 AM
alg .. that sounds interesting. I've never heard anything about this before. I'll have a look into it. However, I expect they are separate issues. The memory leak I've described here is definitely caused by improper garbage collection of redundant cfc instances.
# Posted By Jason | 3/12/08 3:57 PM
A couple of you mentioned that you were using a cfprobe routine to restart jrun. Can someone provide me with a sample? We're running our app on 5 cf servers, a ton of CFCs, and we're having to manually bounce servers.. at least 5-7 times (as a whole) per day. We've been down the long road of jvm.config optimization, and are actually chucking our CF 7.0.2 app in a few months for Java. I just need an automated process to help me get by for the time being.
# Posted By Steve | 3/26/08 12:25 PM
Hi Steve,
this one slipped through the cracks. We're hosting on windows so we just run a batch script using the 'sc' command to restart the CF service when the probe fails.
# Posted By Jason | 4/7/08 12:04 AM
Rather than trying to do a structclear on the VARIABLE scope inside app.cfc, have you tried doing a structclear on the CALLER scope? That's probably not the best solution, especially for CFCs that are called from other CFCs. But it might help some.
# Posted By Andy Matthews | 5/1/08 10:15 AM