Feb 27
Java Sockets: Detecting Lost Clients
Developer's Cave, Island Forge Tags: Java, networking, RPG AlphaNo Comments »
Running autonomous test clients against my game server, I kept generating a situation in which several client network connections (TCP) remained open. This resulted in those client sessions remaining active long after the client had disconnected. This had the potential to retain game state in the server, preventing memory from being reclaimed (effectively a memory leak). The player might not be able to log in later, because they still have an active (albeit bogus) session.
No matter how many safety checks I put into the networking logic, I could not detect these broken, lingering connections. As it turns out, Java sockets cannot report they have closed when the socket was not shut down cleanly, without actively attempting to read/write data. My game server protocols are rather conservative, and do not chatter unnecessarily with clients. As a result, an abruptly disconnected client socket could idle indefinitely, so long as no data was destined to be sent its way.
To alleviate this problem, the server networking logic now takes note of client communication times. After a configurable timeout with no activity, the server sends a ping, to which the client is to pong. This has one of three results. If the pong is not returned within another probationary timeout, the client is considered lost, and the session is forcibly closed. If the client is, in fact, alive and well, the pong refreshes its most recent activity. The third result is what I've most commonly found: the attempt to send the ping over a broken connection triggers a network error (IOException), which is caught and handled, more-or-less, cleanly.
In a perfect world, client software would cleanly tear down all TCP connections. In the real world, several factors can prevent this ideal behavior (network failure, software crash, killed application). In any case, server systems cannot rely on clients to behave ideally. While I could spend the rest of my days improving network communication logic (resume available upon request), the above technique is simple enough and appears to be working well in ongoing tests.


