Archive for the 'Developer's Cave' Category

Java Graphic Performance

Developer's Cave, Potential RPG No Comments »

I’m often amazed how much processing can be performed in a few milliseconds, even in the face of sloppy algorithms. Graphic processing, on the other hand, seems to be another story. Any excess painting can be significantly detrimental.

For the record, I’m developing this game in Linux (Debian) with no accelerated graphics drivers, and I am determined for it to run well on my platform of preference. Sun promises a vastly improved graphic pipeline with its upcoming Java update release, but this only helps with DirectX in Windows.

It’s easy to unjustly blame sluggish graphics on Java or Linux. So far in my burgeoning graphics experience, I’ve found that performance problems invariably indicate that I’ve done something horribly wrong. Reading an article on Swing animation last week inspired me to vastly improve my graphic processing.

Without getting into the technical details (unless there’s an interest), the conclusion is that Java on Linux can certainly drive graphic content, as long as you’re meticulous in your implementation. While Java’s general software performance is exceptional, only so much graphic excess can be swept under the rug.

Design Process

Developer's Cave, Potential RPG 4 Comments »

I don’t yet have as defined a game design process as my software process, but I’ve evolved a simple methodology for recording and refining my game design ideas. The technology needed is a pen, a journal, and a wiki. The process involves three activities:

For one, I write down (with the pen, in the journal) every random gameplay thought, with little regard for feasibility or conformity to the rest of the design. From my technical background, I tend to auto-cull things that would be impossible to implement. From my board game background, I tend to prefer strategic mechanics, rather than fuzzy “wouldn’t it be cool if” concepts.

The second activity also involves a pen and a journal (I use the same one from above). The goal is to refine the random thoughts into coherent game mechanics. Terminology like “attributes” and “weapons” and their interactions must be defined. The goal is to draft and refine the game systems.

The final activity, which I had underestimated, is concisely documenting the final game design. This phase forces decisions, exposing gaps and conflicts. If you can’t play the game in your head from these designs, and design the software, then there is something missing. I’ve found that a wiki provides the best balance between formality and flexibility.

All of these activities happen more or less at the same time. This is not a sophisticated requirements tracking system, and document versioning would be needed to coordinate design and software teams, but this works for my team of one.

Overall, this methodology seems rather obvious, but it’s important to be aware of the process, and write it down. Unless something is well documented, it cannot be well understood.

Transaction Processing, Part 3: Distributed Server Processing

Developer's Cave No Comments »

In Part 1 of this article, I defined non-blocking transaction processing in the game client. In Part 3, the cheating exploits discussed in Part 2 are addressed. First, I adapt the common database technique of resource locking to the Potential Engine (the MMORPG game server). Then, I introduce the concept of resource isolation, which potentially allows for server-side concurrency and distributed load balancing.

To recap the previous discussion, it may be desirable to process server-side transactions concurrently. However, this may result in various exploits and cheats, if the server follows a faulty parallel processing paradigm.

When many threads may be processing a store of data, it is imperative that transactions are performed atomically. That is, each logical action (even if executing concurrently with other actions) must access and manipulate required pieces of data without interference from other transactions. This is the goal of the concurrency model: to allow actions to execute in parallel without manipulating data in a conflicting way.

The common database approach is to use resource locking. In a relational database, locking occurs on tables, and/or rows, and/or cells. In the Potential Engine, content objects are the resources. In this context, each action knows the content objects it needs to transact the action. The engine obtains an exclusive lock on each before any data is read/manipulated. This ensures no two actions will interfere with each other’s data.

This approach to resource locking, as defined here, is lacking in many details, such as deadlock avoidance. Notice that this technique is very low-level (involving individual content objects). This model also implies a system-wide, centralized database.

Next, I submit for your consideration, a concept of resource isolation. This approach to concurrent transaction processing operates at a higher level than resource locking. In the context of a game server, resource isolation involves defining what content is associated, according to game rules. This forms clusters of content that are mutually isolated. Each cluster can operate in parallel with all other clusters.

For example, an MMORPG may define each cave as a geographic region, in which all gameplay is isolated from the rest of the online world. Then, each cave can be considered a cluster. Minimally, this model allows a game server to process transactions sequentially within each cluster, but process all clusters concurrently. Of course, resource locking can be utilized within each cluster to improve internal throughput.

Furthermore, since the content for a cluster has been isolated, that content can reside in a completely separate processing context. That is, resource isolation also defines a model of distributed data processing. This allows, for example, a game’s data to be managed by a server farm rather than one centralized server/database.

The resource isolation technique is more theoretical, whereas the resource locking approach is a rather well-defined approach. These resource isolation concepts are not unique, but I believe further formalization and definition of the concept is important to understanding the nature of distributed software design. Is that an unusual hobby?

Transaction Processing, Part 2: Cheating and Exploits

Developer's Cave No Comments »

In Part 1 of this article, I defined non-blocking transaction processing in the game client. Part 2 discusses security concerns of concurrent transactions in the game server, which could expose cheating and exploits. A simple server-side sequential transaction approach is described that can fix the security issues. Part 3 will discuss how to safely perform server-side parallel processing, using two techniques.

Security in this context is defined in terms of game rules. The security concern is that players may be able to exploit the transaction model to cheat the rules. For example, suppose two players’ characters attempt to pick up the same item from the world surface. The failure case is that the server processes both transactions at the same time, placing the item into each character’s inventory. Similarly, suppose one player finds a way to send two concurrent requests to pick up an item. This could result in the item showing up twice in inventory. This represents a common item duplication exploit in online games.

To avoid the item-dupe exploit, the server must be more careful how it processes transactions. There are several potential approaches. The simplest technique is to serialize all transaction processing in the server. That is, each transaction is executed sequentially (one-at-a-time). No matter the timing between players picking up the item, the server will accept the first and deny the second.

Notice that the order in which requests (e.g., to pick up an item) are processed depends on client and network timing. This is a fairness concern, rather than a transaction security issue, which is not addressed here. (Solving the fairness problem involves lots of interesting techniques, such as distributed coordinated game time. This would make a fine future discussion.)

The simplest solution is the most secure, and should be used until proven insufficient. If performance testing proves that serial transaction processing is the bottleneck preventing the server from supporting the required number of online players, parallel transaction processing techniques can be employed, under the presumption that concurrency will provide greater transaction throughput. Concurrency always results in a multitude of really interesting (read: tricky) issues to deal with.

In Part 3, I discuss safe server-side concurrent transaction processing, using resource locking and resource isolation.

Transaction Processing, Part 1: Concurrency

Developer's Cave No Comments »

Looking for exciting new gameplay? Sorry, this month is a busy time. I have managed to implement some transaction processing improvements. Consequently, watch out for new client behavioral anomalies!

Specifically, I’m testing the client with non-blocking transactions. Previously, client processing would block (stall) to complete a transaction with the server. A transaction (round-trip client-server request-response message) is needed for almost all game interaction (walking, inventory control, combat). Transactions are designed to be quick, but interaction could become choppy under high network latency (lag).

Non-blocking mode allows the client to perform transactions concurrently and without waiting for the server. This prevents stalling the client, but has the side effect of delayed updates. For example, when dropping an item, blocking-mode would fully complete the transaction before the client became responsive again (the item would be removed from inventory and show on the world surface). In non-blocking mode, you can drop an item and keep right on playing, but the item may linger in your backpack a bit before appearing on the ground.

User interface smarts could hide some of this out-of-sync behavior. For example, when you drop an item to the ground, the client’s user interface could shade (and block use of) the item in your backpack until the transaction actually completes. In the event the transaction fails (e.g., something blocks the drop location), the user interface would relinquish control over the item.

These timing issues bring up stability and security concerns. In Part 2 of this article, I’ll discuss how the server deals with transactions to ensure stability and prevent cheating exploits.

Java Web Start: Antithesis of Usability

Developer's Cave No Comments »

Java Web Start should be a major benefit for Java application developers and, more importantly, end users. The JWS concept is simple: A configuration file (JNLP) sitting on a Web server defines how to install, update, and launch a Java application. At each launch, application resources are fetched, installed, and updated for the user as automatically as possible.

Great concept, but the reality of Sun’s JWS implementation is abysmal. End users are forced though a gauntlet of obstacles, in which any error tends to be catastrophic.

Assume JWS is installed and browser integration is configured correctly for all browsers (a MIME type must be set up to launch JWS when a JNLP file is clicked). When a JNLP is first launched, shortcuts are installed on the user’s desktop and in menus, as directed by the JNLP (barring some bugs). These shortcuts conveniently launch the application. However, there are several bugs against JWS in which these shortcuts become corrupted. The result is a cryptic error that implies the application is broken, which is enough to prevent most end users from using your application. The situation is not easily repaired (as discussed below).

One such case prompted this article (rant): Upgrading from Java 5 to Java 6 caused all JWS application shortcuts to become broken. To correct the situation, users must find the JWS Viewer application. I couldn’t find the shortcut in Windows, so had to run “javaws -viewer” on the command line. This could be a friendly viewer of installed Java applications, but is actually considered a sub-dialog of the hideous Java Control Panel, showing the JWS “application cache.” The applications had to be removed, then the original JNLP link clicked to initiate a new installation. If the user cannot perform these steps, and does not know the JNLP URL, the error is unrecoverable.

The same problem can occur in several other ways, such as renaming shortcuts, or making several JNLP changes. The end result is an error shown to the user, implying the application is broken. In all cases, JWS should be smart enough to detect the corrupted shortcut and reinstall the application. Instead, an exception and stack trace are shown to the user. This is a completely unacceptable scenario for the end user.

Another problem occurs if the user saves the actual JNLP file locally, then launches from that file. Even though the JNLP has the full URL information of the application resources, JWS does not go to the source when the JNLP is clicked locally. Thus, no updates are ever discovered, which breaks the basic contract of a so-called “online” JWS application: the user will always have the latest version.Overall, Java Web Start should be redesigned to provide a robust user experience. Otherwise, I don’t anticipate moving MMORPG out of alpha with JWS. An alternative is to simply create a launcher script (or Windows executable) that checks resources before launching the application. However, Java developers shouldn’t have to reinvent this wheel.

Resources:
This bug should never have been closed: Sun Bug 6446676
This bug appears to replace the previous: Sun Bug 6563938
Similar broken shortcut bug: Sun Bug 6549428

Deadlock Avoidance

Developer's Cave, Potential RPG No Comments »

Here’s a hacky technique that is preventing a particular deadlock in the MMORPG client.

Due to Java’s AWT painting rules, the AWT/Swing thread must be used when painting components. Unfortunately, I’ve introduced thread contention between the AWT thread and the world Rendering thread. In the right circumstance, these threads deadlock on each other (each holds a lock the other needs).

The real solution is to resolve the contention. Putting this off for a rainy day, I’ve implemented a quick-and-dirty deadlock avoidance approach. First, note that I’m using the elegant java.util.concurrent.locks API (rather than synchronized keyword). I must acquire a lock within my JPanel’s paintComponent(Graphics g) method. Instead of calling Lock.lock(), call Lock.tryLock(timeout) with some arbitrary (short) timeout. If the Rendering and AWT threads are deadlocked, the AWT thread will soon release its resources and the Rendering can continue.

Leveraging the passive nature of AWT painting, the failed attempt is re-dispatched by calling repaint(clip) (passing the clipping region of the original paint request). Hopefully, the thread contention is cleared up on the subsequent paintComponent. If not, this technique risks forever requesting repaints (more logic could catch this and trigger client failure, or maybe try to restart the Rendering engine).

Debugging shows this happens rarely, but cleans up nicely. There is the potential to leave some graphic artifacts when the AWT paint fails, but this is tolerable for now. In any case, this hacky fix is better than the client locking up.

Correct: No. Elegant: Almost. Clever: Barely. Recommended: Absolutely not.

ACSPLAN (and the Puppet-Master)

Developer's Cave, Potential RPG No Comments »

ACSPLAN (autonomous client-side progressive latency-avoidance navigation) - \ACKS-plan\ - n - A temporal skew optimization for distributed simulation environments (and online games) providing client-perceived performance improvement in the face of network latency and server processing delay (lag).

MMORPG had a puppet-master model of character navigation. The client would directly manipulate and display the state of a puppet character. In the background, the puppet’s movements were requested of the server. The server maintains and updates the master, which is observed by the client. Thus, in the face of lag, the client has some slack to autonomously move the puppet without direct approval of the server.

In the common case, since both systems use the same character logic, the master would eventually be updated to the state of the puppet. However, circumstances exist (especially during pathfinding) in which the puppet would become out-of-sync with the master copy. Then, it is up to the client to reconcile the difference.

MMORPG’s initial reconciliation logic was very poor, causing wild navigational problems. So, as Knight to Taylor’s sport jacket, I threw it out. In fact, the whole puppet-master model caused several discrepancies. Now, how to get everyone to play on a gigabit switched LAN…

ProxyIcon

Developer's Cave, Potential RPG No Comments »

For those who enjoy design patterns, I’ve written a ProxyIcon for the MMORPG client. The reason is to decrease memory allocation in the client, which is due in large part to graphic images, under the theory that most are not needed most of the time. This article describes the ProxyIcon (implementation details are left as an exercise for the reader). First, there is already a central factory of Icons. So, integrating a new proxy pattern only involved changing the factory. ProxyIcon implements the javax.swing.Icon interface, so users are none the wiser.

ProxyIcon knows the source of the image to load. When any Icon methods are accessed, it makes sure the internal image is loaded. If not accessed for some timeout, the internal icon can be unloaded. The simplest implementation would have set the internal Icon reference to null, allowing the garbage collector to clean it up. Why the present perfect tense? Because there’s an even better approach:

ProxyIcon holds two references to the internal Icon: A hard reference and a soft reference. After the timeout, the hard reference is set null, but the soft reference is retained. Since the embedded icon is only referenced (never exposed) from the ProxyIcon object, this leaves only the soft reference. Java cleans up soft references (only) when memory is needed. If, however, the ProxyIcon is accessed before being cleaned up, the hard reference is reinstated. This provides the best of both worlds: Application logic can set policy on when to release resources but is backed up by the Java Virtual Machine’s garbage collection logic to avoid unhelpful cleanup.

The primary benefit, of course, is that unused icon resources can be deallocated. The software design benefit is that image users do not need to concern themselves with whether the icon is currently loaded or not. Notice that all ProxyIcon references stay in memory forever (in the factory). They could also be deallocated by using soft or weak references in the factory, but the objects themselves are (theoretically) miniscule compared to image memory consumption.

Another minor benefit is lazy image loading. Instead of loading upon construction, the internal icon is loaded lazily upon first access. There are a number of icons that are requested in the code, but not always used. This feature has the benefit that the code can greedily ask for icons from the factory (e.g., as static variables) without impacting memory if not needed.

One major gotcha is that Java itself may be holding cached references to the image data, which prevents garbage collection. I have not investigated this fully, but do not use Toolkit.getImage(), as it’s API warns of just this problem. Also read the code for javax.swing.ImageIcon, which makes (undocumented) use of Toolkit.getImage(). Instead, use Toolkit.createImage() or ImageIO.read() with ImageIO.setUseCache(false). Further investigation is needed to determine exactly who has access to underlying image data depending on how it is loaded.

For Alpha testers, look at $HOME/.potentialrpg/stat/IconFactory.stat, which shows:

timestamp, total, loaded, hard_referenced, average_ms_timeout, ms_stat_time

Flyspray Bug Tracking

Developer's Cave, Potential RPG No Comments »

I’ve installed a new bug tracking tool: Flyspray. This supplants the previous BUGS (The Bug Genie) system, which I could never get into.

Rant: BUGS was supposed to be easy-to-use, but that seemed to translate to “pretty icons” rather than clean functionality. In fact, I found the UI interaction to be particularly clunky and confusing. There were too many redundant and conflicting ways to manage bug reports. I suspect the developers were more familiar with CSS than bug tracking. It did have pretty icons, though.

So far, I like Flyspray much better. In fact, the only reason I didn’t install it originally (last year) was that their latest version did not have an installer. No big deal, except that my previous Web host only had FTP access (yech), so I couldn’t do a manual install. That, and a test install of the previous version failed.

So it took them several months to finish the installer… who am I to complain about lengthy software development?