Adept Software Development

Adept: (A)pplication (D)evelopment (E)nterprise to (P)ersonal (T)ransition. It is a system I am developing to leverage Enterprise developer skills to produce stand-alone software for other market segments. This is a general software development blog discussing issues about project, architecture, design and development. The emphasis will be in Java, but many of the issues will be more general. Almost all will be technical.

http://marringtons.com

Friday, July 22, 2005

Coding Standards and Breaking the Rules - Part 1, Layout

Coding Standards and Breaking the Rules - Part 1, Layout

Code Layout

I have always been a bit of a heretic when it comes to coding standards. I like, for example, to have the braces on their own lines with a double-index before and after.
    if (instanceData == null)
      {
        instanceData = new InstanceData();
    /* more code goes here */
      }

    /* Code outside the if */

 

I have never had any trouble reading code using differed layout styles - except in that some styles are intrinsicly a little harder to scan.
    if (instanceData == null) {
        instanceData = new InstanceData();
    /* more code goes here */
    }

    /* Code outside the if */

 

I do concede, however, that a single style should be used in a file or at least method body. Otherwise it can be very confusing.

Having said that, I recognise that many people are militant about enforcing layout styles, and it is really not so important to me that I will make an issue for them.

Of course if you want a particular style, all the IDEs will reformat a file to your specifications. Don't do it to others. Many people get annoyed when their file is changed. More importantly, don't do it during the release phase of an iteration. Tracking changes is much harder when the whole file has changed.

Tuesday, July 12, 2005

What and where to document

Software developers believe in the holy grail of self-documenting code. Sorry, Sir Gawain - but it doesn't exist. Any developer who has gone back to maintain code written a month ago will often have as much difficulty understanding it as if it had been written by another. And yet still the developer will go to any length to write code but not comments. I know, I was one of the worst 'self-documenters' - and most developers who have worked for me still are.

My journey to code documentation nirvana started when I decided to port a 10 year old C++ library to Java. The names I had thought so descriptive then just baffled me now. They told me what, Clear code structure told me 'how' - but nothing told me 'why'.

Every now and then over the years I find myself fixing code written by my team. It's a good productive form of code review. I used to just put a comment against the end of changed lines with my initials, date and a comment on what was changed in one short sentence. I felt that this way I was not confusing the flow of the code. Silly. Why a problem occurred and how it was fixed provide important information that must be placed in the code. Otherwise the next person in line will reverse your changes to 'fix' another problem.

So I love JavaDoc. I have always believed that good documentation should be in the code. It's the only place of any use to maintenance staff - and later users of the class. Hava a look at my Adept library at http://library.marringtons.com/doc/javadoc. I hope you can see the difference between my documentation and the standard comments you see out there for open source projects. And internal commercial code documentation is not as good! Every good IDE uses the generated JavaDoc. When you pass the mouse over a method, you can get a description of why and how it's used - assuming it doesn't simply pop up with 'TODO.

Good JavaDoc is great for 'how', but it's still not enough for 'why'. 'Why' is the result multiplied by 1.1512? 'Why' are we looping through a list to get a single value rather than asking the database for it directly? For almost every active line of code in an application there is a story. And next time you need to maintain that piece of code, the story will be of immense value.

Don't try and go back to document all that old code. Whenever you make a change, however, give us the gossip. You won't regret it.

I'll leave you with an example of how I write code now.

/**
 * Collect common instanceData for all Panel commands.
 * Called before any command is executed.
 * @see com.marringtons.adept.action.Action#setup()
 */
protected void setup()
  {
    /*
     * First we retrieve the session scope panel data -
     * including a list of panels (opened and closed).
     */
    sessionData
        = (SessionData) request.session.get( panelCacheKey);
    /*
     * The panel ID can be from target or id HTML tag
     * parameters.
     */
    String id = parameters.get( "target", parameters.get( "id"));
    /*
     * If the command is not related to a particular
     * window then we cannot continue.
     */
    if (id == null)
      return;
    /*
     * We cannot be sure whether the id starts with
     * 'panel.' or not,It depends on where it was
     * generated in the JavaScript.
     */
    if (id.startsWith( "panel."))
      id = id.substring( 6);
    /*
     * We need to remove the ID from the parameters
     * so it does not add panel. back.
    parameters.remove( "id");
    /*
     * Given the id of a panel, retrieve it from a
     * panel instanceData structure in the session.
     * This instanceData can be updated and persists
     * between program runs.
     */
    instanceData = (InstanceData) sessionData.portals.get( id);
    if (instanceData == null)
      {
        instanceData = new InstanceData();
        instanceData.id = instanceData.title = id;
        instanceData.url = "";
        instanceData.showTab =
            instanceData.showBorder =
            instanceData.showShadows =
            instanceData.allowResize = true;
        instanceData.x = (nextX += 50);
        if (nextX > 600) nextX = 0;
        instanceData.y = (nextY += 50);
        if (nextY > 400) nextY = 0;
        instanceData.width = instanceData.height = 200;

        sessionData.portals.put( id, instanceData);
        instanceData.focusOrder = instanceData.openOrder
            = ++sessionData.focusCounter;
      }

    /*
     * Move parameters from the command line into
     * the instance data.
     */
    ObjectScraper.fromProperties( instanceData, parameters, null);
  }

 

Saturday, July 02, 2005

Application Servers vs CGI - why there is no clear winner

With CGI becoming redundant as time passes and technology moves on, it may seem a little dated to be comparing it with application servers. Bear with me - because now is the perfect time to look at the differences and to clearly understand what we are leaving behind. A retrospective will help us make best use of current and future technologies. For the sake of completeness I'll describe the technologies - from my point of view., of course.

What is CGI?

When the web was new and we still printed out on stone tablets, the first HTML web browsers and servers were created (and Al Gore looked onto all that he thought he had created, and it was good). The first servers just sent files directly to the browser for interpretation. This worked fine while the web was full of static content such as documents and images. Information dissemination is still the primary purpose of the internet. However, the original web was created and used by technophiles who soon wanted their web servers to do something dynamic - like show the time of day, or list information from a database.

What would be the simplest and most flexible way to do this? Ah yes! To treat the browser as just another terminal client to the Unix server. If the command that the browser gave the web server was of a registered 'CGI' type, it ran as a standard operating system command, just as if it were typed in at a command prompt. The output that would normally be sent to the terminal was fired right back to the browser. The benefits were immediately apparent:

  1. The creator could use whatever computer language they were comfortable with: shell scripts, C, C++, Basic, Perl or any other program that can be run from a command line.
  2. Debugging was easy - one could simply run the script in the command line and look at the output.
  3. It worked the same way as other programs and scripts these early implementors were used to.
The only down-side was that the CGI program had to be bright enough to format its responses in a way that the browser would understand. This meant it needed to send a correct HTTP header as well as the HTML content. This was (and still is) simple stuff, easily solved with trivial library routines.

What is an Application Server?

Well, technically even the first web server was an application server. It knew how to interpret a browser command, read files from disk, add an appropriate HTTP header and send it back to the browser. The one thing missing was programmability - the ability to give it instructions that will change the data to be sent to the browser.

The first attempt at programmability was with CGI as described above. There were real and perceived problems with CGI that led to the invention of the first true application servers.

An application server is a program that runs continually, in parallel with the web server. The web server knows how to interpret requests for itself and any application servers attached to it. The application server interprets string commands sent to it into a call to specific methods, functions or code with attached parameters.

The term application server was coined by the corporate world. Microsoft sport an ASP (read embedded VB) front end with C# for computational work in the .NET framework. IBM. Sun and others sport J2EE, being Embedded Java JSP and Java on an J2EE-compliant application server for the heavy stuff.

The smaller Internet sites prefer a simpler approach. PHP is an embedded-only system that is still, by definition, an application server. While Perl is normally used for CGI, there is a module for the Apache web server called mod-perl that turns it into a basic application server.

CGI Benefits

  1. Easier to develop in. Since each browser/server exchange runs a unique program, it can be implemented and tested in almost complete isolation. This makes it simple to implement a web site page by page.
  2. No garbage to collect. Each browser/server exchange runs a separate program that does it's job and exits in a fraction of a second. There are no opportunities for memory leaks or memory hogging code to clutter up a system and bring it to a grinding halt.

CGI Disadvantages

  1. Resource Intensive. Each exchange runs a new program! For an operating system to run a program, it must execute some expensive operations. Every type of operating system has a latency when a program starts, so for CGI this latency adds to each exchange. However, it's important to note that the latency is minimised because the CGI interpreter will be cached when loaded so regularly.
  2. No Pooling. Because CGI runs an independent program for each exchange, there is no opportunity to pool resources that are expensive to get. The classic example is database connections. On large commercial relational databases getting a connection can take seconds. Most application servers pool connections and reuse then when needed. CGI provides no such facilities.
  3. Limited Session Data. The only way to store data to be used between exchanges is in cookies held by the browser and sent as part of the HTTP header in each exchange. The amount and format of information stored this way is limited both by practicality and by the browser. A typical restriction is 20 cookies of no more than 4kb each.

Application Server Benefits

  1. Session Data Available. Most application servers use a single cookie or URL parameter to hold a unique session key. This key can be used to return a session data structure on the server. Size is limited by system only. If you expect 100 simultaneous sessions to be running on a 2Gb server with 1Gb allocated to the application server, then you will not want sessions to exceed 10Mb. In practice unique session data is far smaller than that. More importantly, because the data is held in server classes (whether Java, C# or whatever), it is not subject to the restrictions placed on text-only cookies.
  2. Pooling is possible. Outside services such as databases, workflow engines or external services are expensive to connect to in that a connection can take seconds to do. Most application servers implement a pool for this situation. Once a session or conversation is finished with a connection it's returned to the pool for someone else to use.
  3. Fast Exchange Startup, A CGI exchange requires that a program be run every time the browser requests a conversational exchange. For an application server it is merely the interpretation of an command to call an internal method or function. In theory, it should be quite a bit faster.
  4. Scalable. The additional control provided by an application server allows it to be designed to work across one to many physical servers.

Application Server Disadvantages

  1. Memory and Resource Leaks. Typically an application server is a program that runs for days or even months between restarts. If a program has memory leaks (and, yes this is possible with garbage collection), it can cause a system to run slower as time goes by. Resource leaks can happen occasionally in rarely occurring logic conditions, causing problems that only occur in production.
  2. RAM Fragmentation. Garbage collectors cause RAM fragmentation. Like memory leaks it causes the program to run slower over time. A good garbage collector will clean itself up, but it's next to impossible to avoid increasing fragmentation over time as long-lived objects break up the physical RAM.

Which is Faster?

Common sense tells up that an application server should be much faster than CGI. Imagine running a program every time we have an exchange between browser and server. Empirical evidence does not support the theory. In practice CGI systems doing similar jobs provide similar levels of performance. Why? The overheads of program start in CGI is offset by the speed of running 'clean' programs. Also the CGI writers write for performance when dealing with data. The simple hash databases such as berkely-db are much faster than large relational systems. A CGI writer is also more likely to create static pages offline if they change once a day or less rather than generating them from data on the fly.

Which is Better?

All my architectural training tells me that application servers are the way to go. So, why can I produce a CGI system more quickly that is easier to release and maintain?

And the Winner is...

Application servers do come out on top, but for the most because they're politically correct. The availability of session data is a convenience, but resource pooling can sometimes be an essential. There are valid work-arounds for both problems in CGI but these cannot overcome the pressures of 'correct' architecture.