Adept Software Development

Adept: (A)pplication (D)evelopment (E)nterprise to (P)ersonal (T)ransition. It is a system I am developing to leverage Enterprise developer skills to produce stand-alone software for other market segments. This is a general software development blog discussing issues about project, architecture, design and development. The emphasis will be in Java, but many of the issues will be more general. Almost all will be technical.

http://marringtons.com

Wednesday, March 23, 2005

Data Lifetimes

Our first experience with data lifetime is usually in a beginner's handbook, supported by a particular language of choice for the text. Out of necessity these books tend to focus on a single use, single threaded application - in a nutshell:

Code Block Local: C would only allow function local at the top of the function. C++ allows data to be defined anywhere in the function - but typically allocates it when the function is called. Unlike other forms of data, block local doesn't have a default value, so it must be set before it is accessed. This is because in the "old days" such data was kept on the processor return stack - it was quick to allocate, free, and most processors had specific instructions to read/write such data quickly. Said data goes out of scope when the method/function returns.
Instance Local: Fields that aren't static are created when an instance of a class/struct is created. They exist until the enclosing structure is freed or garbage collected.
File Local: Data or classes that aren't inside another structure are file local. In Java, file local is seldom used - and even then can only be a private class. A class is considered data here since any static fields are instantiated when the class file is loaded. The data lives for the life of the program unless explicitly cleared.
Class Local: Fields set to static are only instantiated once when the class is loaded. This is the way to produce singletons - very useful for caches and constant data. The data lives for the life of the program unless explicitly cleared.

Having said that, this article is really about data lifetimes in more complex environments. Any type of services architecture - including web services - require that information be kept for conversations and sessions.

Every interaction with the program can be considered a conversation. This could be a call from client to server, or a post or get command from a web browser. The conversation is complete once the program has responded. For the browser this is when the next page is displayed. A session lasts for as long as a single client interacts with the system - and typically encompasses may varied conversations.

So think of your program as a chocolate layer cake (the flavour isn't significant - I just happen to like chocolate - so there!). The cake layers are the application tiers - the top layer being GUI (with plenty of icing, I hope), above business logic, above persistence. The cream between the layers represents the interfaces. Objects are just parts of the cake contained completely within one layer - for the sake of the analogy imagine them as almonds. A conversation is like sticking a skewer through the cake to see if it's cooked. If any object wants to tell the user of the skewer it is uncooked it will leave raw cake sticking to the skewer. The session is the person holding the skewer. The session/person notes how may objects respond to the conversation by leaving cake on the skewer. They note the difference between the results of conversations to decide how long the cake has left to bake.

Conversation Data

A conversation starts with a client making a request on the program/server. Because this is usually a sequential operation, conversation specific data is rarely required, although I personally find an excellent use of it for messages. As each object gets involved in the conversation, there is the opportunity for a problem to arise. Some, such as validation and informational messages should be displayed to the user at the end of the conversation. I actually use the same system for exceptions and errors. I believe it's better to say to the user "Oops, you've encountered a problem" rather than display an unsightly and uninformative stack dump.

Traditionally, conversation data is passed to the service methods as a parameter and passed down through the tiers as necessary. This sours the cake because it reduces the independence of said modules and is of no use for methods that don't have the parameter passed.

A conversation is almost always a sequential set of steps from initiation to reply. By definition this means it will work in a single thread and that thread will do nothing else by service said conversation to completion. So, data keyed on the thread that is cleared at the start of every conversation will work as conversation data. The Adept Java Library has a class com.marringtons.util.ThreadData that provides the code necessary retrieve and update conversation specific data.

Session Data

A session starts when a client first accesses (anonymous) or logs in to the server. Most web application servers such as Tomcat maintain a session with the client browser. The class/method called to process the request from the browser has easy access to the session, but problems arise when code in the lower tiers need to keep session related information. They often need access to authorisation or environmental information and sometimes need historical accessess to their own usage.

In both cases a class in a lower tier needs access to a session dictionary (map). Session and environment data need well-known keys, while local 'memory' can use the class name as a key. Examples? A service may use the client's name to customise a message or get user specific data like a history of transactions. Local data for a class that is to live for the life of a session can be used for caching information that does not change regularly for the user. You might, for example, wish to generate a menu tree for specific to the current user. By caching this as a private session variable, it only needs to be created once per login. Another equally valid question is "How?". This depends on your application container.

It can be achieved without any infrastructure by passing a reference to the session data down through the tiers as a parameter to the method calls. This always works - and I have worked on both cleint/server and J2EE projects where it was used - but I have always thought this method a bit tacky. It always seems that when you need it the most it's for methods that do not have access to it. My preferred method is to provide infrastructure.

As long as you have access to some connection to the original request you can gain access to session data from a dictionary of dictionaries.

For Tomcat or similar 'simple' servers, create a dictionary using the thread ID pointing to the session information when a conversation starts. Use an ageing cache since HTTP is a connectionless protocol. Remove session data after a timeout.

For J2EE servers life can become even more complicated. If you can be sure that all your EJBs are to be run in a single container then a method similar to above will work. If not, then we have a problem. Don't revert to passing all session data as a parameter: RMI calls between servers are expensive and passing additional unused data is inefficient.

One solution is to have a consolidation layer between your GUI support layer and the EJBs. All EJB calls are fine-grained and are only passed the information they need to do a specific task. The other way is to use stateful session beans. The container manages a connection to a single client, allowing you to keep data between calls. While stateful session beans are not as evil as many developers make out, turning your whole application into a box of stateful session beans may not be the best way to go. Besides, they can only hold session private data for the bean and not common session data. So for my contribution, the Adept Java Library has a class com.marringtons.util.SessionData that provides the code necessary retrieve and update session specific specific data for single-cpu servers.

Thursday, March 17, 2005

Code Generalisation - The Do's and Don'ts

Ever since we started developing software there has been calls for code reusability. First there was (and still is) the library. In the 80's we talked about the 'black box', meaning component objects where only the interfaces were published. Later COM extended this principle. Then came object-oriented design and we talked objects. Now we have beans, activeX components, EJBs, applets, scriptlets and a myriad of ways to provide code for reuse.

Even when technologies work together, their view of generalisation is different. For example, an EJB uses objects. Conventionally, objects can have instance and common (static) data. Objects used by EJBs, however can have separate 'common' data - uncommon data. I digress. This article is about when to write specific code and when to generalise.

Why generalise code? There are two valid reasons:

Code reuse.
Clarity

Generalisation for Code Clarity

Let's take clarity first because it is easiest. Clarity is tantamount. Self documenting code is far easier to maintain that a long stream of unrelated groups of statements.

public Account getAccount()
  {
    User user = getUser();
    Account account = readAccount( user);
    updateTransations( account);
    return account;
  }

private Account readAccount( User user)
  {
    // connect to account system, retrieve and translate account details
    ... lots of technical code ...
  }

private void updateTransations( Account account)
  {
    // Retrieve recent transactions and update the account details accordingly.
    ... lots of technical code ...
  }

The public getAccount() method clearly tells us functionally what is involved in retrieving account details and the names and function clearly match the business requirement. The private methods readAccount() and updateTransations() are never used elsewhere, but remove implementation details from the functional code. It makes sense to hold aside functional code from implementation into separate objects, quite possibly in different application tiers.

In short, code separation for clarity is the main use for generalisation techniques and should be practiced constantly.

Generalisation for Code Reuse

Everyone leaves university with the belief that every line of code they write is sacred and will be used over and over again (Or was I really that pig-headed?). Unfortunately no-one is taught how the rest of the world will know to use these new pearls of the developer's art. In fact, there are a heavy set of benefits in writing code specific to the task:

It's more clear because the internals are not generalised (accountKey instead if key).
It's more concise, because the best generalised code must take into account conditions that in a specific instance would not occur. Why check for a null parameter in a specific method when the one caller cannot - under any conditions - pass a null? For a general method, one must cover for outcomes not obvious for any one caller.
For the same reason, it's faster to write - since we can design the internals to match the known user, we don't have to wrap our heads around all the possible uses that our new code could be put to.
It's easier to maintain because there is no fear of changing code that will cause other callers to behave differently. How often have we seen code that uses quirks of a known interface rather than just it's published uses? How often does this happen by accident?
It's easier on system testing since changes to more generalised code is more likely to require broad regression testing.

For the sake of impartiality, here's the argument for code reuse:

Changes are made in one place - and effect all callers.
Smaller code base.
Behaviour is consistent across callers.

Hmm, do we see a trend here? Personally I follow this checklist:

If I do not know of another use for the code I will write it in a way totally specific to the requirement.
If I suspect that other parts of the application are likely to used code the same or similar I will take care that the code involved is fairly separate. I will also take care that this does not take extra time. There will be no general interface or other non-specialised code.
When a second caller requires nearly or completely identical code I will review the common code and and refactor it as required. It should go no higher up the object tree than the common need.
If I identify the need for a low level common object I will be tempted to take the time to create it. I do not, however, add more general interface above what I need. Why account for float and double parameters when you only ever use the int ones? Only when the additional functionality is needed will I update the library class.

Pitfalls of Early Generalisation

You'll spend excessive time adding tests and interfaces that will not be used in case they are needed later.
You'll end up with code that has an excessive number of if() statements or similar branches to cater for different clients.
You'll have obscure object inheritances making it difficult to find who is doing what.

Do you want to see a beauty?

public static boolean isSet(Object o) {
    if (o == null) {
        return false;
    } else if (o instanceof Boolean) {
        return isBooleanSet((Boolean) o);
    } else if (o instanceof String) {
        return isStringSet((String) o);
    } else if (o instanceof Long) {
        return isLongSet((Long) o);
 ...

This one is possibly useful if the calling code did not know the type of object, but in all cases in the project that uses this method they do!

Code Generalisation Methods

The simplest and most common is at the method level internal to an object. As we are creating the class we see use for code elsewhere and refactor it into a private method so that both can call it. This usually also makes it easier to read the calling method.

Subclassing can be used to place generalised code in the parent class to be used by children when needed. While the code is not as visible as when it is in the working class, it is clearly associated with the object heirarchy. The same method can be used to separate functional from implementation code, with the restriction that Java only allows single inheritance.

Helpers are separate objects or static class methods in a separate class that provides common code. A modern code library is a collection of helpers. Care must be taken with code helpers to ensure that all developers know of their existence. Because they are not physically connected to a class (as in inheritence) they can often be lost leading to inconsistencies and code duplication.

A bean is an independant item with a clear interface that can be used to ask it questions or have it perform actions. A bean is in truth the implementation of the software black box.

How to Find General Code - The Unanswered Question

Code generalisation is a wonderful thing. It attracts designers and developers like moths to a flame. But, to carry on with the metaphors - there is a fly in the ointment. No-one has found an even marginally successful method of documenting common code in a way that potential users know that it exists. Sure, we all familiarise ourselves with the core libraries of the packages we use (do we?). We'll also look for libraries that fill our needs. The problem arises internal to a project. Most developers will develop a component for a complex system by looking for and finding a similar component and duplicating it's functionality. Common code may be pushed up the inheritence tree or refactored into helpers, but unless the team is small and tightly knit or the communications are very good, only a small percentage of the developers will make use of the new tools provided. Enforcing clear javadoc helps - if it is read. What other techniques are useful?

Monday, March 07, 2005

Unit Testing - Good for All

I am an avid unit test supporter. For my own projects I write unit tests at all levels for all classes; I don't consider a class complete without a matching unit test. But I get the bigger picture: because it's my code I see it from the perspective of developer, architect, designer, tester and stake holder. In the corporate world things are a little different. Unit testing is starting to see wide acceptance, but at best as a necessary evil.

The Developer sees it as a waste of development time. A working reproducable unit test can easily double the testing time.
The Architect ignores them as not his problem.
The Development Manager has to continually balance schedules and decide whether there is time to write 'correct' tests.
The Designers don't want to know about them.
The Project Manager does not want to have to justify the push-out to the schedule they cause.
The Test Manager is only interested on how many tests there are and that they have all passed, and is generally only interested in adding to that tally.
The Stake Holders see no value in them and resent the potential effects to time and budget.

For unit testing to reach its full potential, each and every group needs to see the value to themselves and the project as a whole.

The Developer has the involvement in creating and maintaining unit tests. Surprisingly they often feel that it gives them the least gain. This could not be further from the truth. Changing their development style encompass unit testing provides important of gains:
1. Perspective: The developer gets to exercise all the functionality of any given class in the way that they see its clients using it. By writing a unit test for each class before it is ever called from elsewhere, you get the all important second perspective on the code design phase – often revealing aspects that could be changed or improved.
2. Javadoc: Unit test code works well for examples of use in the Javadoc.
3. Self Documentation: Developers commonly use classes and packages by example. The best and least likely to be abused examples will be in the unit tests attached to the class. If you need to use the target class in a different way, update the unit test first - both to test the usage and to provide a valid example for the future.
4. Level of Confidence: The developer can release the class for use knowing it meets a clear group of tests. Since these tests run regularly, they can also be confident that changes to the code or dependancies do not effect the expected uses for the class.
5. Issue Resolution: When problems are reported in testing, reproducing them can be quite convoluted. By updating the unit test to reproduce the error, it is easier to debug and prove fixed.
6. No Error Deja Vu: In projects without unit tests it is common to have a problem fixed then come back on the next release. If you have added unit tests to isolate problems before resolving them you can be confident they will not return without being noticed until too late.
The Architect should consider unit tests as part of the overall technical design. Current practice is to have the developer totally in control of unit tests. This works fine for smaller systems and stand-alone classes. It is less than practical in real-world corporate projects where the information required to run a particular test is much larger than the test itself. To test a close account service, the unit test needs to open the account and set conditions to both stop closure (outstanding invoices) and to allow closure again (cancelled invoices). If left to the developer this can be a daunting task. Don't leave your developers out in the cold! If the unit test structure is part of the architectural design, test frameworks can be built up that make individual tests easy to create and read.
The Designer also has an important role to play also. By providing valid real-world examples as part of the design document the designer can ensure that the unit tests will use real information and as such are much more likely to work in the final product. The designer should also be on the lookout for variations so that they can be documented for the testing process. Note that all this is already a by-product of the design operation - one simply has to be conscious to record everything.
The Project Manager is trained to look at the big picture. Competent unit testing means less errors - more than making up for the additional development time. In addition the increased level of confidence in product stability means targets more accurately met. Finally, questions on functionality implementation can be more quickly answered by developers viewing the unit test rather than reviewing the code. If the unit test does not excercise said functionality then it's implementation is incidental and not to be trusted.
The Test Manager can use code coverage tools to measure the level of test coverage unit tests provide. The test manager should also have input in the review of test data to ensure that the unit tests provide for real-time situations. Lastly by using unit test pass as a pre-requestite for test releases the test manager can ensure a stable system before more advanced testing phases start.
The Stake Holders simply need to be convinced that over the life of the project unit testing improves their bottom line. They will appreciate the higher level of confidence that unit tests provide for releases. Perhaps we need to research or instigate studies measuring the time saved by reducing the need for bug fixes during test and product release life against the onset of additional development time.

Wednesday, March 02, 2005

Data Retrieval Patterns: The Data Transfer Object (DTO) Pattern

One of the foremost drives of developers and architects for the last 3 decades has been the divide-and-conquer concept of isolationism - 'black-box development', as we coined it in the 80's.

These days, we use objects and messages. A DTO or Data Transfer Object (read also, DAO or Data Access Object) is pure data used to pass cohesive information between functionally separate parts of the system.

Unlike the rest of the software development world where we strive to reuse code, it is considered bad form to use a DTO in more than one transfer. For example, a request for some database data may involve a DTO between the persistence and service tiers and a second between service and GUI. As you can imagine, this methodology causes more transfer of data between objects, but it goes a long way to providing a level of isolation that would otherwise be impossible.

Since the DTO used by the persistence layer is never passed to the GUI it can't accidentally be changed and written back by the GUI. The down-side is a lot of deep copying of data. This is not as annoying as it sounds (touch wood!), since each black-box usually needs a different view. The persistence layer can have differed DTOs for different tables. The service layer may combine, extract and apply business rules to that data before sending a result back to the GUI as a different type of DTO.

Because a DTO has a limited life as a message between two well-known components, continued consistency of its contents can be the responsibility of the receiver. For this reason any of the earlier data access patterns can be used for the DTO, including the Public Data Access Pattern.

 class MyFirstDTO
   {
     public int integer;
  public String string;
   }

Strengths

Provides true isolation between modules/tiers/objects/systems.
Used correctly it will present the requester with only the information they require.
Information can be provided using names clearly readable in the context in which they are used. Often an item named for what it is in one location is better known for why elsewhere. Thus, a persistence tier may name a column "upper_case_description", while the service tier could name it by preference, for example "descriptionToSearchOn".
Minimises the need for empty fields. If the DAO has an entry you should be able to assume you have the data. Use the class container method below to minimise source files.

Weaknesses

Source File Explosion: Many DTOs means many objects means a full source tree, but use the class containers technique below to reduce the source file count.
Your data gets copied like a bag full of rabbits. However, all this data copying is not usually time-consuming as DTOs hold mostly immutable data - such as Strings where the copy can be just a pointer.
Code bloat: the server side of a client-server object pair can end up with a lot of code, even just loading DTO messages to be sent to the client - often just a transfer from another DTO after communicating with another tier or object. This is unfortunately a necessary part of a DAO architecture model. So don't clutter business logic with DAO transfer code, but place it in a separate transfer object or method. It's even acceptable with a consolidator DAO to provide it with DAOs from another tier in the constructor, to allow it to populate itself.
```
class ConsolidatorDAO
  {
    public ConsolidatorDAO(
        FirstDAO firstDAO,
        SecondDAO secondDAO)
      {
        integer = firstDAO.integer;
        string = secondDAO.string;
      }

    public int integer;
    public String string;
  }
```

Uses

SOE (Service Oriented Architecture): Where the GUI layer calls a service layer for all business logic. The service layer will also in it's turn call a persistence layer for database and other data storage or retrieval mechanisms.

Hints - Class Containers

One of the problems of using DTO to transfer data between components is the proliferation of files in the source tree. The temptation is to use a common DTO with only some fields populated, which is bad practise at the best of times. Where possible, variations should have their own DAO objects fully populated and ready for use, but to reduce the source tree and keep those common DTO types together, use the class container pattern.

public class CarDAO
  {
    int engineCapacity;
 String colour;
 String manufacturer;
 String model;
 int year;

    public static class Sports extends Car
   {
     String roofType;
  String suspensionType;
   }
   
   public static class OffRoad extends Car
     {
    boolean constant4WD;
    int clearanceInCM;
    boolean snorkel;
  }
  }
  //...
  CarDAO.Sports sportsCar = new CarDAO.Sports();
  CarDAO.OffRoad fourWheelDrive = new CarDAO.OffRoad();
  CarDAO oldCar = new CarDAO();

Here we have a transfer object for Car information. If we are dealing with a sports car we can create and read with CarDTO.Sports - and all the common Data Transfer Objects are in a single file.

: Name: Paul Marrington; Location: Brisbane, Queensland, Australia

I have 28 years experience in the software development industry, working alone, in teams, as team leader, as technical project lead, development manager, technical project manager and as architect. I have been involved in design, development, implementation and problem solving for commercial, technical and systems software. I am a first class problem solver, whether the problems are environmental, design, code or implementation related for software in any language. I do not see barriers, only challenges. This is the only way to develop good and effective software. I have managed to successfully use these skills in all areas of my professional life.

View my complete profile