The Case of the BlockingDeque Misstep
Developer's Cave, Potential RPG April 17th. 2009, 9:35pmFirst: Unless you’re a Java developer who enjoys weekend threading and locking studies, don’t read this, unless you want your brain to melt. I’m primarily writing this for my own sake (because I am a Java concurrency junkie).
Second: There is nothing wrong with LinkedBlockingDeque. It follows all the proper behavior of a concurrent collections class (of which I’m a big fan). The situation documented here boils down to a classic concurrency scenario I simply didn’t take into account.
The Exploit: The impact to my MMORPG was that a player could force his/her character to translocate over stretches of non-navigable terrain, such as bodies of water. This would constitute a major exploit. To add to the mystery, it only happened on my development platform, not on the alpha testing network (even over the same LAN).
Scene of the Crime: Character pathing (how to get from point A to point B in the world) is computed and stored in the client. One thread dumps a path into a queue: LinkedBlockingDeque.addAll(steps). Meanwhile, another thread consumes each pending step, using the handy blocking nature of BlockingDeque.takeFirst(), transacting each step with the server. If the server rejects a step (game rules enforcement), any pending steps are cleared from the LinkedBlockingDeque.
The Caper: The player could plot a path across, say, a body of water. Even though the server was correctly denying the character from walking on water, the character would end up on the far side of the lake.
The Culprit: The culprit is a classic thread timing issue, coupled with a (rather rookie) concurrency collections contract misinterpretation. LinkedBlockingDeque is a BlockingDeque, which is a BlockingQueue. The BlockingQueue contract (which I have subsequently read) clearly states:
… bulk Collection operations … are not necessarily performed atomically unless specified otherwise in an implementation.
Therefore, LinkedBlockingDeque.addAll() is not guaranteed to dump all steps atomically. Checking the implementation, LinkedBlockingDeque falls back on AbstractQueue.addAll(), which adds each element independently (taking a separate lock for each).
The Explanation: The development platform experiences almost no client-server lag. The path producing thread would call LinkedBlockingDeque.addAll() passing a list of steps. As soon as one was added, the consumer thread was able to remove the first step, get denied by the server, then call clear() on the LinkedBlockingDeque before the producer thread had actually inserted all the pending steps. After calling clear(), the producer thread continued along with its addAll() operation, giving the consumer more steps to transact.
The Bogus Alibi: With any lag at all (even LAN traffic), the consumer thread would block after reading the first step, waiting long enough for the server to respond, giving the producer thread time to finish adding all the pending steps before the transaction completed. Thus, when the server denied the first step of the navigation, the rest of the path was properly cleared.
The Solution: Instead of using a fancy LinkedBlockingDeque, I now have a plain old LinkedList to hold the pathing. Happily, I still get to use Java’s (really Doug Lea’s) Lock and Condition elements to coordinate access to the list. Specifically, the producer thread locks, only signaling the consumer thread after all (and this time, really adding all) steps have been added to the path.
The Plot Thickens: There is one final gap in this explanation, which is not timing or concurrency related at all. As a hint, the translocation exploit is not fully fixed, and the reason could be divined from this article. Rest assured that it will be addressed before beta, but I’m curious if anyone can figure out what resulting behavior shouldn’t have happened based on the case file above.












