Adept Software Development Blog for Architects & Developers

This is not a training session on multi-tasking in Java. I'll leave that hot potato for the more qualified technical writers. This is a starting set of guidelines for things to watch in code, that can be run by more than one thread, at any one time. In a J2SE environment this can only happen if you create and start threads. In any application server environment, from Tomcat to Weblogic, each client has at least one thread - and they can all be running in common code. So:

Do look closely at each and every piece of static data in the system. Threads can only clash when sharing static data, or data with a static element somewhere in the reference chain.
Don't put static data in library routines, as you can't be sure when they'll be used by threaded code. A classic problem is a cache, which is usually a synchronised static map. Because any data retrieved from such a cache has a static element in the reference chain, there is a risk of it being accessed by more than one thread. In my code this happened with the session cache. A single browser inside a single session can post two almost simultaneous requests serviced by two separate server threads - both accessing the same session data.
Do use syncronized to wrap any code that can be run by more than one thread and accesses common data. It will cause second and subsequent threads to wait until the first thread to have a syncronized block to leave said block.
Don't synchronize everything. It's a common beginner's mistake to synchronize everything in a class to forestall any problems. In the first case it won't stop problems as deadlocks are still possible, and in the second case it can make a program impossibly slow, by creating massive bottlenecks.
Do keep synchronised code sections to a minimum. I realise that technically this is the same point as the one above - but it's so important that it bears repeating. I can't believe how often I have seen synchronised methods that call logging functions. Logging, like disk or network I/O, can cause delays - not to mention the CPU time required to turn data into human readable form. If it's a popular piece of code, system response drops off while many threads wait for one to complete. Telltale symptoms of this mistake, then, are low CPU usage with long response times.
Don't use synchronised maps and lists unless they are absolutely necessary. There is no need to synchronise a map that's inside synchronised code in your application. Unlike items synchronised to the same semaphore, doing this will incur a considerable overhead to no advantage. The older Java containers were synchronised by default, whereas the new aren't.
Do think carefully on any synchronised code - because it's virtually impossible to unit test. Problems usually only manifest themselves in production with heavy varied load. These problems aren't easy to reproduce on request..
Don't assume that because code always works on a single-CPU desktop or test machine that it will on a multi-cpu server. Even desktops will be vulnerable once the new multi-core CPUs hit the market in quantity. A single CPU system is still linear; even when it looks like it's multi-tasking it's really just swapping between tasks very quickly. A multi-CPU system can have completely separate processors running the same code and accessing the same data. Certainly synchronize should work the same way in both cases, but the completely different mechanisms employed means that your code will be exercised differently. The end result is that code that works perfectly well on a single-CPU system may fail randomly on a 2 or 4 CPU server.

Enough, enough already. I think you get the gist: multi-tasking is a specialist field and any demonstration should come with a 'don't try this at home' tag. Application servers attempt to make the multi-tasking invisible at the expense of performance.

I always like to give an example. Unfortunately, thread awareness is such a complex issue that it's impossible to give a simple valid example. The best I could find in my code was in a double-hash database index. Don't try and understand the code out of context - download it at The Adept Library if you want to do that. This is an extreme case for minimising synchronised sections. If the method were synchronised it would lock out for a long time. Instead, only the bucket change is synchronised.

Note that the test is made twice - once to see if we need to do it and once inside the synchronised block to check that it has not changed. The odds of two threads deleting the same index at exactly the same time is probably billions to one, but it is not hard to protect against if you think on it. Because only the change is synchronised, care is taken in the rest of the code that reads are consistent even if the data under them changes. This takes thought, but is a lot more efficient than synchronising everything.

boolean delete( int hash, int record) throws IOException
  {
    int bucket = getBucket( hash);
    IntegerStack possibles = findInBucket( bucket, hash);
    int deleted = 0;
    int possible;
    while (! possibles.isEmpty())
      {
        possible = possibles.pop();
        if (index.file.getInt( possible) == record)
          {
            index.file.putInt( possible, -1);
            int secondaryBucketLocation
              = (hash >> primaryShift) & primaryMask;
            
            if (secondaryBucketCounts[secondaryBucketLocation] >= 0)
              synchronized(this)
                {  // only if sharing secondary bucket hash
                 if (secondaryBucketCounts[secondaryBucketLocation] >= 0)
                  {
                      secondaryBucketCounts[secondaryBucketLocation]--;
                      secondaryBucketLocation
                        = primaryBuckets[secondaryBucketLocation];
                      
                      if (secondaryBucketLocation == -1)
                        // in the middle of a split
                          secondaryBucketLocation = afterSplitLocation;
                      
                      index.file.putInt( secondaryBucketLocation,
                        index.file.getInt( secondaryBucketLocation) - 1);
                    }
                }
            deleted++;
          }
      }
    return deleted > 0;
  }

Adept Software Development

Wednesday, June 08, 2005

Synchronized - Dos and Don'ts

0 Comments:

Post a Comment

Feeds and Email

Adept Links

Adept Blogs

Blogs I read

Sponsors

About Me