Why is LinkedList.clear() linear?
Developer's Cave May 29th. 2009, 11:47amI was just perusing the Java source code (because I don’t read the morning newspaper), when I discovered that java.util.LinkedList.clear() is implemented to operate in linear time. The documentation does not specify its order of growth, and I intuitively assumed that a linked list clear implementation would be a constant time operation.
Could this be an artifact of early Java development, leaving a legacy of poor performance? The remainder of this article explores the potential to significantly improve application performance when using the LinkedList.clear() operation.
First consider java.util.ArrayList.clear(). ArrayList allocates an array in memory and tracks how many elements have actually been added to the list. The fastest way to clear the list would be to simply set the size as 0. However, this would leave leftover object references at abandoned indexes, preventing the garbage collector from freeing them. So, ArrayList sweeps through the list and sets each index to null, but retains the allocated array and sets the element count (list size) to 0.
Inspecting java.util.LinkedList.clear() reveals that it also performs a linear sweep, nulling the links between all nodes as well as setting each node’s element to a null reference. I would argue that the implementation should simply cut the chain of nodes loose, with one or two snips. The chain of nodes are now subject to garbage collection since they are no longer reachable. By extension, any former referenced elements of the list may also be unreachable and available for garbage collection. The list is cleared in constant time and no stale references are held.
The existing implementation implies that the developer did not trust the garbage collector to operate efficiently. If the garbage collector takes an inordinate amount of resources to clean up a large tract of linked objects, it could impact runtime performance. However, the garbage collector has the ability to schedule cleanup at opportune moments. Furthermore, different cleanup strategies can be employed based on application needs. A server with plenty of excess memory can allow garbage to accumulate until it can all be deallocated in one fell swoop. In any case, burdening the application code by presupposing the garbage collector’s duties is wrong. The only purpose to this linear operation is to help the garbage collector, which is never an appropriate excuse.
In practice, it could be possible that nulling out all the elements of a LinkedList improves runtime performance. To find out, performance measurements would have to be taken on a variety of platforms under each available garbage collection scheme. Suppose massive garbage collection is actually found to be less efficient at runtime than linearly stalling the application (which I really doubt). This empirical data would only be valid for one version of the garbage collector. Future improvements in the garbage collection routines could overcome such deficiencies. Then, developers are left with legacy cleanup code in their applications, which is now working against them. This may be the very case of the LinkedList.clear() implementation.
Since the LinkedList.clear() implementation is vaguely documented, there is room for improvement. The linear approach could be swapped out for a constant-time snip procedure with no compatibility impact. Until then, consider constructing a new LinkedList() instead of clearing an old one for reuse. This is not as pretty in the code, prevents the use of final member variables, and may not be appropriate if a list reference is exposed. When appropriate, and in cases of large lists, creating a new LinkedList instance would alleviate a linear operation and allow the garbage collector to free up the old list as efficiently as possible.













May 29th, 2009 at 1:29 pm
From here:
In other words, there are ways to get references to the nodes in the list. Without breaking all of the links, an external reference to a node in the LinkedList would prevent GC of all of the nodes.
May 29th, 2009 at 1:54 pm
Good explanation, but yipes. I incorrectly speculated that, in any sane universe, a linked list’s node structure would be kept private. Now I see why they’re stuck with this design.
It’s also sketchy that alternate views on the list (iterators and such) would rely on node unlinking to fail-fast. Instead of relying on this side-effect behavior, sub-views on the list should be checking the modification count.
Just another case of my speculations being entirely correct, and the universe utterly failing to follow through.
May 29th, 2009 at 2:55 pm
You know, if you got an Amazon Kindle, you could have a newspaper delivered wirelessly and ready for you in the morning.
May 29th, 2009 at 3:09 pm
Very well, I accept. Or were you not actually offering me one? Perhaps trying to justify yours? If you really think it will help…