Showing posts with label architecture. Show all posts
Showing posts with label architecture. Show all posts

Tuesday, May 06, 2008

How building a bridge is the same as building software

This is not the simplistic analogy you may be expecting.

Where I live in Minneapolis MN, we have a high profile bridge project going on right now, due to the tragic collapse of the previous structure. If you look at the project page on the state's website, you can see the following "features of the new bridge":
  • 100-year life span
  • 10 lanes of traffic, five in each direction—two lanes wider than the former bridge
  • 189 feet wide—the previous bridge was 113 feet wide
  • 13 foot wide right shoulders and 14 foot wide left shoulders, the previous bridge had no shoulders
  • Light Rail Transport-ready which may help accommodate future transportation needs
  • Design-build project complete in 437 days.
  • Designed to be aesthetically pleasing and fit in with its environment
These are the most high-level (public) stakeholder values for the new bridge. From an engineer's perspective, they are the constraints under which the bridge must be delivered. In addition, we see an architect's rendition (picture) of what the new bridge will look like. This is also a constraint - an engineer cannot add things that will substantially change the appearance of the bridge.

Now, we can be sure that the design of the bridge was a collaborative effort of a team of people. Engineers, Marketing, Architects and the client. Is the design ongoing as they build the structure? Yes - they are using a growing technique known as Design-Build - they purposely start construction before the design is complete.

At first glance, design-build might sounds like a simple case of parallel work - one team is working on designing just-in-time, and another is working on the construction. In practice though, there is a collaborative environment that reportedly results in avoidance of disputes, faster project delivery, and less need for project management oversight.

The Analogy
So how is this similar to building software?

Firstly, the process of programming it is like the design of a bridge - it is the bringing together of people in different roles to creatively find ways to build the end result. Ideally, development involves a lot of Thinking, Talking, and Tweaking, just like a bridge design. In design, we often find that two heads are better than one. Pair-programming has been suggested as one way to do this in software development. Of course, we have many other collaborative techniques to communicate and discuss design.

Like a bridge design, the output of building software can be represented by piles of paper. The bridge has drawings, engineering specifications and requirements. A program has something better though - its code. (No, I'm not arguing that "the code is the design". I'm just saying that the code "is a representation of the design"). The code accurately describes the parts of the design that it touches.

This "programming code = bridge design" point is key to what I'm trying to convey - the process of programming produces a design output, not a product. The final product is the result of implementing that design (just as the bridge itself is the result of implementing its design).

Specifically, the "building" is the deployment of software in its final environment. Deployments are where the "tires meet the road" - they are the intersection of the design with reality (just like construction). Mostly, the design holds up and does not need tweaking after deployment. Sometimes though, the harsh lights of reality expose the hidden flaws in the design. (In light of that, it is best to expose an application to its first deployment as soon as possible).

Some software groups have QA (quality) departments. Historically, these departments have taken the role of performing trial deployments - they will take the software, and expose it to a simulation of the real environment. Large construction projects also have this role - an independent group audits the designs, with the hope of spotting problems that would cause a problem when the construction occurs.

Finally, we find that the best way of constructing a large bridge project is to simultaneously design and build. The analogy for software is small frequent releases. Research and experience has shown this to be a good way deliver quality software that meets the requirements.

Conclusion
If we accept that building a bridge and building software are similar (they contain the same basic steps), then we can use that information to produce some interesting insights:
  • That thing we need to do before developing is "architecture" - There is a fine distinction between architecture and design. The way I like to define it is that architecture describes the parts are visible from the outside, and design describes the inside. A bridge architect is able to construct a working model and rendition of the outside of a bridge without the full engineering specs. To do this, he needs to take into account all of the stakeholder values. Similarly, we need to be able to draw the edges of a software application before we start - we need to understand how the software will interact with the outside world, and how the outside world will interact with the software.
  • QA is a misnomer - the primary purpose of a separate QA department should not be to assure quality. We can get quality in better ways than that. The purpose of the QA department should be to validate the design of the software, by simulating real environments. Many QA professionals already know this, of course.
This blog post is inspired by a set of three essays by Jack W. Reeves.

Monday, March 17, 2008

Transaction Semantics

I have been lurking in a recent discussion of using [Transaction]-like attributes in C# to indicate that certain methods can participate in, or require a transaction. Castle ActiveRecord has a another technique of allowing the user to specify a TransactionContext, like
using new TransactionContext
{
...do stuff that will automatically be in the transaction
}


The problem with all of these techniques is that they are essentially procedural. Specifically, anything that you want to participate in the transaction has to be manually called as part of the call-stack. Put another way, they fail to separate the concerns of ATOMic persistence and the identification of what needs to be persisted (the unit of work). The result is that it becomes difficult to implement some aspects of persistence, leading to an increase in artificial complexity.

For example, an aspect of saving a deposit into an account is that there should be a "dual" entry in another account, and balances must be updated. A single aspect like this is somewhat manageable using the proposed semantics, but if you have just a few more, then they quickly lead to very wordy, procedural and possibly complex "Save" methods. You will also end up adding additional state variables to classes that contain "Save" methods, in order to support the logic of the save.

Another way of looking at this is as the problem of the typical "business entity" class that simply does too much. There are cross-cutting concerns that do not belong in one "business entity" or another. That is the major weakness of the ActiveRecord-style of data access - when you take the world-view that every business entity is a table, then you encumber your ability to clearly work with the aspects that are orthogonal or cross-cutting to the entities.

My own solution (there may be better ones) is to explicitly expose the unit of work, and have a technique that allows class instances to intelligently enlist into it. Its worth describing in a little more detail. First, you need an interface that a class can implement to enlist in the work:
Interface IWorkEnlistee
Sub Participate(work as UnitOfWork)
Readonly Property UniqueKey() as String
End Interface


The Participate method is called just before the database Save, but after validation of user-data. The UniqueKey property is necessary to prevent two identical instances from participating (I usually just return the hash-code of some entity instance). You could add methods to the interface to get greater functionality, such as in-memory rollback.

For the dual account entry example, I would have an instance of the above interface that participates by adding the reverse entry and updating the balances. Any state data it needs will be passed in the constructor, which is called before the in-memory data is changed (so that it can get a clear before-picture). It will probably have several related state variables that otherwise would have found themselves complicating some other piece of code.

Using this technique, the entity, presentation and flow logic of the application remains clean, and the cross-cutting aspects that participate in transactions are nicely separated and encapsulated.

Tuesday, February 26, 2008

List of Software Architecture Laws

There are several universally accepted software architecture laws. These have the characteristic that if you heed their principles, your software will be of better quality and will last longer. These are the ones I am aware of:

Law of Demeter
aka principle of least knowledge, aka only talk to your immediate friends, aka low coupling and high cohesion.

Separation of Concerns
The notion that it is better to allow the code (and the developer) to concentrate on one concern at a time. This is the mother of many other principles, for example layering, or splitting software along logical lines.

Conway's Law
Any piece of software reflects the organizational structure that produced it. The cause is more sociological than technical. The antidote is better communication, or smaller teams.

Shalloway's Law
aka DRY. "When N things need to change and N>1, Shalloway will find at most N-1 of these things". I like Shalloway's version, because it manages to capture the essence of DRY, with the added subtlety that it is ok to duplicate stuff, as long as you don't have to manually change it.

Friday, February 22, 2008

Steve's 2nd Law of Good Software Architecture

My first rule of good software architecture dealt with ways of making a particular code-base last a long time. The focus of the 2nd rule is different - it assumes that a problem domain will be solved multiple times by different software, or multiple versions of the same software. It suggests ways that we can make each new re-solving of the problem easier than the last.

To review, my 2nd law of good software architecture is:

Keep as much information as possible in an accessible, declarative form. This will eliminate duplication, and enable your software to be discarded and re-written without losing quite as much

This law is all about re-use of information, and describing how a particular problem domain can become better understood, even to the extreme where the "software" is just data.

We've all seen the tool-sets for generating entire applications - enter your requirements (mostly just your data structure) using vendor X's WonderMaker(tm) and lo and behold, out springs an application with handy generated forms for doing wonderful things. As it turns out, those wonderful things are pretty much Create, Read, Update and Delete. Not so useful after all.

Those generic tools do solve particular problems well - but it is usually not the problem we want to solve. Understandably, users demand more than just create, read update and delete - they want to use the software to perform some task that meets their goals.

We can achieve the goal of a tool that generates most of an application, but only once we understand the problem domain well enough. We need to understand the domain, because we need to know what we can generate, and what we need to leave open to extension.

It is always a mistake to design a v1.0 system where logic is executed based on models of application logic. There are plenty of horror stories about the architect who thought he could model the business logic using XML. Don't be the next one. This post is about evolving your understanding of a particular domain to the point where you can create models with confidence they will work.

That said, even in version 1.0, there are some things we can recognize. The first is that there are at least two easily identified models of the system. From the user's perspective, there is the model that they understand and interact with. At the other end, there is the database. The important logic of the application sits between the user's model and the database. This is the origin of the old 3-tiered concept - UI + Application + Database.

What we have to realize is that we can model each of these things in a way that is declarative. In version 1.0, we may not understand the way the user wants to use the UI well enough to do much work in this regard. However, we can certainly model the database in declarative form, and have that model persist after version 1.0.

Modeling the Application layer is the last evolutionary step. You will not reach that point until the core application requirements are stable and well-understood.

For existing products, the process of modeling requires a re-write of some portion of the system. This is unavoidable, because you have to extract information from where it is hidden in the code, and represent it outside of the code. The code will no longer work. The good news is that once you have correctly modeled a part of the system, the model can be extended to capture new types of information, and need not be re-written again.


A re-usable database Model
So what does a re-usable database model look like?
  • It treats relationships as a first class concept - they have names, and they have attributes (one-to-many, cascade-delete behavior).
  • It describes fields in a rich, descriptive manner. Strings have maximum lengths, phone numbers are represented by a phone-number data type, etc.
  • It describes lookups (sets of values that are acceptable for a field)
  • It describes roles - how field values come together to represent a particular flavor of record that has meaning to the user. A particular flavor may be extended with additional fields and properties.
  • It should be directly and easily accessible to the rest of the code (re-usable).
  • It should be able to be transformed into something that the data access layer (or ORM tool) can use directly.
  • It should be able to be transformed into an empty database (it is complete).
The most obvious storage form of the model is as XML, because it is very accessible, and because it can represent hierarchical data. Other forms are ok, as long as they meet the above criteria.

Why all of the richness? We want to capture as much information as possible in a single place. This allows us to make use of that information at higher layers of the application, in ways that enhance the user and the developer experience. DRY (Don't Repeat Yourself) is a powerful architectural technique.

Not all database structures represent the model we wish we had. We may have inherited a database, and it may be a horrible thing to behold. My first law of good software architecture applies - since we are exposing the model directly to the developer, we want it to be the one we wish we had. If the real database is too far from what we want, then we need to take steps to address that inconsistency. To do otherwise is to invite artificial complexity in the application code.


A re-usable UI Model
As mentioned previously, I do not expect that many version 1.0 products have a very good UI model. Still, if we can understand what a UI model looks like, then we can work towards it.

Firstly, a UI model has a relationship with the database model. The relationship is mapped - i.e. there is some automated transformation that can be used to relate a field on the UI back to one or more fields in the database. This is important, because the relationship is what allows us to re-use information defined at the database model (such as rich data types, lookups, maximum field lengths etc). If we're re-using information, then we are not duplicating it.

A UI model can grow in pieces. First, you can model screens, then larger pieces that describe how various screens fit together. Screen models are the easiest. (Even today, many applications make use of screen models).

Again, XML is a good choice for representing the UI model.

Beware of including layout information in the UI model. That is a different aspect that belongs in a different model. The primary purpose of the UI model is to bring together fields and screens in a way that represents how the user sees them. This may include their likely order on the screen, but should not include their actual co-ordinates.

UI models are re-usable in several ways. Security, Form Design, and Ad-hoc user queries are a few.


Layout of Forms (Views)
Form layout can be defined declaratively, but it is seldom worth the trouble to do that manually. We cannot predict the next evolution of UI well enough to design a representation that is good enough. The best you can probably do is favor form-design tools that save themselves declaratively (for example, XAML).

Your form layout should make use of the UI Model directly (via data binding and control-binding). Otherwise, you are just duplicating yourself. (Control-binding is the technique of having the final appearance of a particular control determined based on metadata. See the screen shots in my Egg UI post for an example).

Form layouts can be generated. This is a dangerous path, because it can limit your ability to satisfy the needs of the end-user.


Security
It is particularly useful to relate security to the UI model. One reason is that security is highly contextual - whether a user has rights to touch particular data elements can be driven by many factors, including the time of day. Another reason is that users need to understand security in order to effectively define it. The UI model's shared understanding of the user's perspective allows a good point of interaction for security.


Ad-hoc user queries
Often, we may want to expose the ability for users to query a database in some way that is fairly dynamic. A UI model that is mapped back to the database model provides a simple way to provide that feature.

Thursday, February 21, 2008

Steve's First Law of Good Software Architecture

First, I should touch on the intent of good architecture. The intent is to build something of quality that will last a long time. The "something" we build will not be static - it will be changed, and should be amenable to those changes without loss of quality. Small applications are easy to replace, rather than change - so good architecture is most relevant to medium to large applications.

To review, my first law of good architecture is:

Identify all core services to the application. Code against the interface of the service you wish you had, not to the implementation of the one you actually have.

Now when I talk of services here, I am specifically *not* talking about SOA. I am talking about all the pieces of your application that are not business logic. In this context, a "core service" is pretty much everything that is not the business logic itself. This includes the entire user interface, the database, the file system, and the application settings.


User Interface
Lets talk about the user interface as a service. This is a little-known technique, so I'll take the time to motivate it as best I can.

Consider this statement: Logic naturally wants to be at the points of control. By default, the main point of control of an application is its user interface. This is why it is so hard for developers to keep it out of there! For non-visual applications, this is still true - the logic wants to be on the edges. From an architectural perspective, this tendency is very dangerous - the user interface is the most likely part of the application to be discarded, and the hardest (practically impossible) to re-use.

One well-known technique for limiting the damage is layering of the user-interface on top of the application logic. With discipline, this can work well. However, good architecture does not assume discipline. It assumes team members of average talent at best, and structures the application so that they are as effective as possible. Layering is not the best answer.

Given that logic wants to be at the point of control, we can make a conscious decision to put the application logic in control. This will make the other parts to the application subservient to the application logic. As it turns out, that is the definition of a service - a part of the application that is subservient to another.

Another important characteristic of a service is that it has a well-defined API. So well defined in fact, that we can define its interface, and code against that interface rather than the actual service. So score 1 for the user interface as a service - it makes automated testing of business logic easy.

Of course, the user will still interact with the application - clicking on menus, entering data, and generally driving the flow of the application. However, they will be doing that within the context that the application logic has defined and supplied to the user interface.

In practice, this is a lot easier than it sounds. You can evolve the interface as you develop the application. Modern inversion of control techniques make it easy to inject the actual user interface at the time of execution, and to supply an appropriate context that the user interface can operate within.


The File System
Some pre-built services, such as the file system are very broad. Do we create an interface over that entire surface? No. We code against the interface we wish we had. The file system may provide the implementation, but we would be introducing unnecessary complexity if we dealt with the file system directly.


Settings
My own view is that you can combine settings with the user interface service. This is because settings can often be user-choices in one implementation, and settings in another, and hard-coded in yet another. There is no perceivable downside to having the user interface implementation control the settings.


The Database
It turns out, the most difficult aspect of the application to make into a service is the database. A database is like a pool of data. Most times, the interface we wish for is to be able to scoop up the data with a bucket, play with it, then throw the data back into the pool. A simple data access layer can be good enough for this.

Sometimes we want more - for example, we may want to have data access run on a different application server, or be scaled across multiple servers. We may want to provide the ability to have the application run disconnected from a server. Or we may want to totally insulate the application from the data structure or vendor. These are all up-front choices we must make. All come with a cost. In the more expensive cases, the interface we wish for will be more service-like than a simple data access layer would.

If we do have to make data access into a service, we should still be sure to make it the service we wish we had. This implies that design of the service should be driven based on the needs of the application.


Conclusion
Good architecture puts the important logic at the center and treats the less important logic as subservient (services). We code against the services we wish we had, because to do otherwise introduces artificial complexity. (The implementation of the services can take care of translating back to the reality of the underlying provider).

"User Interface as-a-service" is a new concept. I have implemented it with success, although I didn't understand it then as well as I do now. Others have too - Cockburn's hexagonal architecture is a similar concept to what I have described. I think I will be writing more about it in later posts, because I have treated it too high-level here. People will want to know how to actually do it before they believe it is a good idea.

Wednesday, February 20, 2008

Introducing the Egg UI Pattern

For the longest time, I have been trying to characterize the UI design I have been using for the last few years. I think I finally have a handle on explaining it. I hope you find it as interesting as I do.

To summarize up front...the Egg UI pattern is a technique for creating a rich user interface that is decoupled from specific application logic, but coupled to a large piece of infrastructure code. Some business value is invested in a common infrastructure, enabling the business to quickly add more modules that behave in similar ways. For a particular module, business value is invested in application logic, where it can be re-used independent of the infrastructure or UI. Almost zero business value is invested in the actual UI for a particular module.

If it sounds like I have frameworkitus, hold off on the judgment for a minute. A framework can be bad, because it risks coupling of your application logic to the framework. The Egg UI pattern does not do that. The UI is an egg, and the application logic is an egg. I'm calling them eggs, because they are self-contained (as opposed to layers, which have one-way dependencies).

In the diagram above, direct dependencies are shown as solid lines, and indirect dependencies are shown with dashed lines. The defining characteristic is that the UI is provided as a stateless service to the application. Everything else flows from that. The target platform is a rich-forms environment, for very large, modular applications that display and edit lots of data. (It may work for other environments too, but I have only used it in the one).

Some components:
  • Presenter - responsible for applying form-level logic, providing field metadata, and validation.
  • Menus - represent possible user actions. These may be rendered on the UI as buttons, or menus. They have captions, and metadata describing their required context.
  • Commands - represent the details of the actions that menus execute.
  • UI Service Interface - defines all of the activities that can be requested of the user interface. Also defines all settings that the application may need, and methods for sending messages to the user.
  • UI Service Implementation - an implementation of the user interface. Uses data binding and infrastructure code to interact with the context (mostly the Presenter and the Menus)
  • Views - Simple data forms, or pieces of more complex forms.
  • UI Model - metadata, representing a shared understanding of the structure of the data in the user's view of the system. Provides a means for the Views to be bound, and a mapping of UI fields to the database.
  • Context - A holder for any context that the UI may need. Includes a minimum of the Menus, the Presenter, and other supporting methods. The Application Egg owns the context, but the UI Egg can see it and add to it.
So what is this pattern good for? I'm glad you asked.

Most importantly, User Interface and Application logic are decoupled from each other as much as is feasible. This pattern is almost at the extreme end of user interface decoupling. Any further and the forms would be drawing themselves (not a good thing, in my experience).

This decoupling provides an environment where it is very, very obvious to the developers where their code should go (hint - a presenter or a command). We can partially or completely re-work the UI infrastructure (e.g. Winforms => WPF) without concern for the application logic. We can extend the application logic (e.g. add additional user choices or change the types of fields) without touching the user interface code. We can test the Application without being concerned with the UI.

We can also repeat the pattern over and over in many modules that together comprise the application as a whole. In other words, it is amenable to vertical layering of the system, a factor which increases the workable size of the application by at least an order of magnitude.

There are many benefits, but there is also a big one-time cost - a significant amount of infrastructure (framework) code. This is necessary for any pattern where you want the user interface to be dumb. (And this user interface is particularly stupid). The UI needs to be able to act as a reflection of the application logic. This requires an investment in components that can read metadata and use that metadata to extend on the "drawn" user interface.

For example, this is a screen shot of a form in design mode:
Here is the same form at runtime:
And this is the user-code behind the form (the presenter contains all the meaningful code):
And here is the grid from which the form was accessed (no user-code):
And this is an intentionally blurred image of the context in which the grid was accessed (to demonstrate that this works at multiple levels, not just a simple master-detail example):
The infrastructure code has taken metadata, and used it to show labels, buttons, menus, images, treeviews, icons, dates, times, and dropdown controls. It also applies security, handles validation errors and generally gives a very rich user interaction experience. Unfortunately, this sort of power requires an investment. To me, that investment represents direct business advantage - in the ability to provide a unique, consistent experience with the richness and stability the users demand, while still leaving the door open to future possibilities.

Friday, February 15, 2008

Steve's Laws of Good Software Architecture


Meditate on these, and you may achieve some enlightenment :)


Steve's first law of good architecture:
Identify all core services to the application. Code against the interface of the service you wish you had, not to the implementation of the one you actually have.

Steve's second law of good architecture:
Keep as much information as possible in an accessible, declarative form. This will eliminate duplication, and enable your software to be discarded and re-written without losing quite as much.

Corollary to Steve's second law of good architecture:
A particular application domain is effectively solved (and no longer requires custom code) once all information about the application can be represented in declarative form.

Another Corollary Steve's second law of good architecture:
When using a tool to generate some or all of an application, ensure that the declarative data of the tool is stored in an accessible form.

Steve's third law of good architecture:
Usable components may evolve, but practical, re-usable components must be designed.

Corollary to Steve's third law of good architecture:
Component re-use is only practical once you have designed an approachable, stable interface to the component.


Enlightenment Image by Sakka, licensed under Creative Commons ShareAlike version 2.5

Monday, February 11, 2008

Matt Blodgett's First Law of Software Development

See Matt Blodgett's First Law of Software Development

A development process that involves any amount of tedium will eventually be done poorly or not at all.

I like that. To me, it is yet another argument for DRY (Don't Repeat Yourself), which I consider to be the most important aspect of long term software quality.

If you are doing DRY, then you are not repeating yourself. Therefore, you are doing the least amount that you can in order to solve the problem. Any tedium is thus inherent in the problem, and could not be avoided.

(Of course, if you find or invent the right tool, you can also mitigate the remaining tedium. For example, using a diagramming tool to draw your database relationships rather than typing them in XML or SQL).

Wednesday, January 23, 2008

The fundamental abstraction that most programmers never "get"

Is...

The separation of user interface (screens, forms, web pages) from application logic.


My current estimate is that 1% of programmers understand the abstraction and apply it successfully.

If 100% of programmers could make this leap, then I predict software quality would improve 1000x.

Enough said.

Friday, January 11, 2008

From framework to component - the road less travelled

So you develop a framework for a project, and it is great. All is good. It does what it needed to, and it does it well. Maybe a few unexpected requests come in, and you manage to incorporate them into the framework. You are very satisfied. Other people are impressed too. Co-workers on other projects start to notice, and they want to use part of your framework to make their own project easier.

Except...that part that they want to re-use has some dependencies on other parts of the framework, that they do not want. "No thanks", they say.

Sound familiar? Perhaps you have been the co-worker, asking after the framework? Was the framework wasted? Is re-use unachievable in your organization?

The fact is, the only way that re-use has been shown to work is to "componentize". A framework is just *too big* to do this with.

The best you can do is ensure that your framework has many individual parts, each of which has potential for re-use. Even then, coupling is a big challenge. The parts become dependent on each other (for good reason).

The missing piece of the puzzle is packaging (componentizing). Only when you do this can you truly achieve re-usability across a broad range of projects. This is not as trivial as the word "packaging" implies. There is documentation, removal (or internalizing) of dependencies, retrofitting it into the existing system, versioning and deployment. There is also the challenge of letting others know it is available.

When your co-worker came and asked you for part of your framework, the sad truth is that it was already too late. Packaging takes time, and the other project needed its answer today.

Your organization needed to be more pro-active. It needed *someone* to notice the potential of the situation, and have the time and the resources to make something of it.

Wednesday, January 09, 2008

Agile needs Architecture

Consider TDD (test-driven-development). TDD is a great design technique. It creates systems that are wonderfully decoupled. It lets you build something very quickly and effectively. It allows developers to transcend their own limitations, and results in a system that is more than the sum of its parts. Beautiful.

TDD software is an evolved work of art, beautiful like an organically grown crystal.


Compare that with a system that is designed by a software architect. Architecture is about drawing lines, and encapsulation. It is about understanding current and future needs and using that understanding to define the edges of a system, how they should interact, and which pieces should be interchangeable.

Software designed by an architect is like a piece of machinery. It has lots of moving parts, which interact in well-defined ways.

A TDD system that is under the guidance of an architect will be better than one which is not. Similarly, an architected system that is implemented using TDD will be better than one which is not. TDD and architecture are complementary techniques.

At a small scale, the distinction is not as important. For a small system, TDD can evolve something very nice, with limited architectural input. At larger scales however, someone needs to be looking at the big picture. You simply cannot evolve the design of a machine, or a house.

Getting to my point...

It bothers me that none of the Agile techniques stress the need for the role of architect. They assume that you can put together a group of equally skilled programmers and the design will evolve. This is true to an extent, but TDD and similar techniques can only take you so far. At some point, you need someone who can see the "big picture".

Thursday, December 20, 2007

The importance of Relational Simplicity

I don't know if others have fallen into this trap - I only know that I have. As such, this post is mostly a reminder to myself for future projects....

When building software that uses a database, it is important not to create what I will call "implicit, difficult-to-resolve relationships". An unrealistic, but illustrative example...

Suppose you have a database-lookup of zip-codes, such that each zip-code looks up a US state. Suppose then, that you decide to add a time-dimension to those zip-codes, such that each zip-code has a period during which it is valid. In our address table, we store the zip-code. When needed, we can lookup the US state, but each time we do it, we have to factor in some date (because the zip codes are only valid for some period).

It is not always a date (they are the worst though - avoid like the plague). It could be a customer id (customize your lookup tables for each client within the database, but still maintain some common set of values for all clients). If the relationship can no longer be simply resolved, then you have a problem.

It is tempting to add flexibility in this way, but ultimately a mistake. Traversing relationships is done very often in a software system - anything artificial you do to make a relationship more complex than "key = foreign key" has a direct negative impact on system complexity/quality. A good guideline - if your O/R Mapper cannot represent the relationship, then it is too complex (you do use some form of O/R Mapper, don't you!?).

So, keep your relationships unambiguous (by linking on a key field instead, for example). Even duplicating the data is better - it is far easier to keep multiple copies of a data field than it is to consistently correctly traverse a difficult-to-resolve relationship. Keep the zip-code field if you must, but add an unambiguous foreign key to the related data.

Wednesday, December 19, 2007

Why fixing bugs is more risky than adding new features

In any well designed software application, one of the fundamental principles we try to abide by is the open-closed principle. Basically, this means that we try and structure the code in such a way that adding new functionality does not require us to change existing code. That is, we strive to add code instead of changing code.

Following this principle dramatically improves quality. The reason is that changing code allows a risk of altering the meaning (semantics) of some part of that code. Because of that, whenever we change code, we run the risk of breaking code that depends on that code.

So how does that relate to fixing bugs? Fixing a bug almost always involves altering the semantics. The old meaning was wrong (buggy), so we need to fix it. This leads to some interesting paradoxes. First...

Fixing bugs is one of the most risky things you can do to a software application.

The better you have followed good design principles, the more this is true. Conversely, in less well designed systems, it is less true...

If you have a poorly designed system, then bug changes are not particularly risky.

This is because in a poorly designed system, all changes are equally risky.

You can mitigate the impact of bug-fixes through the following techniques:
  • reduce dependencies - the less components that are linked to the code being changed, the less risky a change is.
  • regression testing - A regression test will improve the likelihood of discovering breaking changes. Regression tests can take many forms - anything you run daily, (or at the same time as your builds) is a regression test. This includes NUnit-style unit tests, and FIT tests.
  • Code reviews - part of a bug-fix code-review can be to review the impact of the change. Any tool that shows coupling (such as NDepends) can probably help with this.
  • Impact analysis - Pre-identify parts of the system that others depend on heavily. Changes in these parts of the system are particularly risky and need careful attention.

Tuesday, November 27, 2007

What's the most important aspect of long-term-quality software?

Just doodling...What's the most important aspect of long-term-quality software? I'll define long-term quality software as some piece of software with a lifespan of many years, over which that software can be extended and changed to suit new needs without compromising quality.

Some potential answers, with my best spur-of-the-moment arguments:
  • Strong typing - not just variable types, but any sort of type, like a database table. If something changes in the contract (field name changes, or a 1-1 relationship becomes 1-many), then I should be able to make the change and within minutes, know each of the places that are impacted in the code. The justified fear of making changes to a system is driven by the unknown impacts. If I know all of the impacts, then I am in a very strong place.
  • DRY (Don't Repeat Yourself) - If logic is represented in multiple places, then someone will only change it in one, which will automatically create some inconsistency. If you are lucky, then the inconsistency will be noticed quickly. If you are not, then you will only find out later when the damage is done.
  • YAGNI (You Ain't Gonna Need It) - Software is complex. At some point, the complexity becomes too much for us to fit in our minds at one time. The longer we can defer that point in time, the more maintainable (and learnable) the software will be. There are two distinct types of complexity though - inherent complexity (because the problem is complex) and artificial (unnecessary) complexity. By introducing functionality before we know for sure that we need it, we are creating artificial complexity. Thus, we will reach the point of too much complexity before we should have.
  • Minimized Coupling - The complexity of software is directly related to how big it is. When we couple things together, we are making something more monolithic, and thus harder to understand. We also cross a line that is difficult to un-cross. (One coupling-point is just the first of many). Minimized coupling is an antidote to complexity.
I think minimized coupling is perhaps the most important, because it has such a direct impact on complexity. I looove DRY though - it is addictive once you try it in earnest. Strong-typing is of limited use without DRY. YAGNI is good advice, although some take it too far.

Are there other candidates?

Wednesday, November 14, 2007

Domain Specific Languages - the real ones

People talk (and write) about Domain Specific Languages as if it is the way of the future. It is. But they do not understand why, and nor can they until they discard the term.

The idea is what is important. And this is the idea:
A componentized (re-usable) abstraction of some problem domain.

Nothing to do with languages, or scripting, or any such programmer-oriented way of thinking. The term "Domain Specific Language" (DSL for short) is soo 90's CASE tool thinking. Still, I'm gonna continue to use it in this blog entry, because I do not have a better term.

As with all advances in software development techniques, it is about increasing the level of abstraction. To my way of thinking, this blog is published using a DSL. I have never looked at any HTML or even CSS on this site. Someone else discovered a neat way to abstract the problem domain of Blog writing. They encapsulated their idea in a piece of software (Blogger.com if you're interested), and now I am 100-times more productive than I would have been if I used HTML.

(I made up that number of course. Its probably closer to infinity, because I would not write a blog at all if I had to use low-level tools).

So to my way of thinking, a true DSL increases the level of abstraction of a problem to the point where it is orders of magnitude easier to solve the instances of that problem.

While the buzz on DSLs is relatively high, the buzz on "frameworks" is low, to the point of being a dirty word for some people. I disagree. A framework is just a premature DSL. Someone's attempt to abstract some aspects of a problem domain.

This is what I am doing with my new website, PerfectAPI.com. I am building abstractions that I think will increase developer productivity by orders of magnitude. I'm betting that the abstractions I am creating are sufficiently mature that they will stand the test of time. Wish me luck.

Thursday, October 25, 2007

The myth of code re-use

It is disturbing for me when I review new code, and I notice that it is almost identical to similar code elsewhere. Usually, copy-and-paste programming is the cause.

Sometimes it is ok, because what is being copied is essentially configuration metadata. But mostly it is a bad thing.

I want to talk about a scenario that I see often, and I think is very common in software development.

We work on some problem domain where we will need multiple implementations. We gain enough understanding to output a version 1.0 implementation, wherein we develop some re-usable parts and some not-so reusable parts.

This is ok. We have learned, and when we come to doing a second implementation we will apply what we learned to make it even better. Or that is how it should be! But it does not happen.

What happens instead is that programmers love re-use (and why wouldn't we - it makes our jobs seem easier). We love it so much, that we will use copy-paste to achieve it. That is, we will copy implementation 1.0 and then try to shove implementation 2.0 into that box.

Never mind that we do not understand the problem sufficiently to know if implementation 2.0 is sufficiently like implementation 1.0 to use the same box. We will copy the box, and then try and mold it to our needs.

This is a recipe for untidy, silly code that cannot handle the little edge cases that come up, because implementation 2.0 is never the same as 1.0.

So what is the solution?

Young grasshopper...forget re-use. It is a red herring. A diversion, an evil distraction. It is not achievable in the way you think.

Forget re-using implementation 1.0. Implementation 2.0 is a chance to start over with a clean slate. An empty page, a new design. The only re-use is in your head - refining and learning. Implementation 2.0 is your chance to apply what you have learned to the problem.

The secret you have to accept is this:

Increasing your understanding of the problem domain is the only way you will achieve sustainable re-use.

When someone (or groups of someones) increase their knowledge of the problem domain to the point at which they achieve a form of enlightenment, then sustainable re-use is not only possible - it is inevitable.

It may take 3 implementations before you achieve enlightenment, or it may take more. The simple lesson is this:

Don't try to re-use existing code. What you want to re-use is an API - a way of thinking of the problem domain. Keep trying new things to improve the way you solve the problem. Re-use will find you when you are ready.

Tuesday, October 16, 2007

API Design vs. OO Design

Traditional OO lore teaches us that objects are things that have both data and behavior. Blindly following this rule can lead us to make poor design choices, especially around what many refer to as "business objects".

The pattern is that these objects already have data, so we seek to add behavior as well. In this way we can feel happy and content that we have a true "object", and we are successful OO programmers.

The problem is that adding behavior as a sort of "suffix" to an object is ignoring a more important aspect of objects, which is that they should do one thing, and do it well. Add too many "suffix" behaviors, and pretty soon you can have a tightly coupled bowl of spaghetti.

This is not just theoretical - I have seen it happen, more than once. I've even been guilty of it.

So what is the solution? When we have classes that are primarily data, should we resist adding behavior?

My answer is "it depends". To understand why, we need to take a small detour into API design...

Sometimes, programmers expect things to be a certain, simple way. They do not want to ask a FactoryLocator for an IObjectPersistorFactory, use that to get an IObjectPersistor, and finally tell the IObjectPersistor to Save their object to the database. They just want to write:

myObject.Save()
or
myObject.Load(id)

This ActiveRecord implementation is easy to write and easy to read. In short, it is good because it is a nice API for the client of the object. It has drawbacks (no transaction support, high risk of coupling to database). But in many systems, this API will be sufficient.

So the ActiveRecord "suffix" is mostly ok. What other behaviors can we add? How about validation? The save method should probably validate before it saves, so as to ensure we have good data in the database. How about some initial field values for new objects? And some event driven behavior - let field A be defaulted when field B changes? And we need properties for other objects. MyCustomer.Address.ZipCode works real nice. We can even lazy-load the Address property. Not too hard.

Hmm. Question. If we save the Customer object, should the Address save too? Probably. So we need to add some more code to the Save method for that.

etc. etc.

You get the picture (I hope). You can create a perfectly functional system in this way, but the coupling of all functionality to a single class will make it difficult to change in any substantial way. It will also have poor quality, because we are ignoring several key principles, such as DRY and Open-Closed.

There is only one way in which you can mitigate the problem. Use code-generation to generate your "business object" implementations. This mitigates quality problems substantially (DRY does not apply to generated code). It also forces you to either state some things declaratively (such as required fields), or else move them into their own dedicated area.

Tuesday, October 09, 2007

Presenter-Model View with Controllers

At my current (soon to be gone) workplace, we have a unique style of doing our UI....

I think I'll call what we have Presenter-Model View with Controllers. (There is a View and there is a very rich Presenter Model. There are Controllers too).

We mostly drop generic container controls onto forms with zero or minimal code. We have extended properties to be able to bind those controls at design-time. (The appearance is determined at run-time). We have bi-directional deep (multiple dots) data binding, which allows the view to be completely driven by the Presenter Model.

The Presenter Model is more than simply a device for binding a form. It is a first class object in the system, used by security. It also supplies Validation.

Underlying that, we have a custom O/R Mapper with integrated support for database structure evolution.

It took a long time to set that all up, and it saddens me that the product will die soon :(

Tuesday, September 25, 2007

Wikipedia - Software Architect

I decided a few weeks ago that the Wikipedia entry for Software Architect was awful. In the spirit of the Wiki, I rewrote the article.

So...now I can say that although I did not write the book on being a Software Architect, I did write the Wikipedia article :)

Of course, the article has already evolved through the contributions of others. So I was only the "sole author" for a day or so.

Tuesday, September 04, 2007

Software Architect Definition

Today I came up with a short definition of a software architect that I think is broadly applicable (to myself and others that perform the role).
"The role of a Software Architect is to recognize the edges of systems, communicate them, and define their APIs."

"recognize the edges" - By edges, I mean both vertical (more conventionally called layers) and horizontally (separate applications). The edges are not always easy to recognize. Lots of time there is some overlap in system functionality, and the role of the architect is to recognize those overlaps and envision the possibilities of resolving them. Sometimes this means inventing new systems to fulfill more specialized roles, and then retrofit existing systems to use the new specialized systems. Sometimes it means nothing other than putting procedures in place to limit the impact of the duplication.

"communicate them" - The edges are where the politics and the business of architecture reside. A good architect can help an organization discover more efficient ways of doing business, but they will have to communicate to make it happen. And that is at it should be - the architect is the technical expert, but the business people know the business, and change is only worthwhile if it has business value.

"Define their APIs". Good APIs are always designed. They never evolve by themselves. I know this because I have created some of my own, and I have listened to presentations from the people that designed the APIs for Java and .NET (probably the 2 biggest APIs around). Characteristics of a good API are:
  • Easy to learn and use, even without documentation (discoverable)
  • Hard to misuse
  • Sufficiently powerful to satisfy requirements (but not much more)
  • Easy to extend
  • Appropriate to audience
The APIs within a system are low-level design. They are important to the maintainability of the system, but they can evolve by themselves, using techniques such as Test-Driven Design (TDD). Their audience is small and specialized. The architect has little business dictating this level of design, although they should certainly dictate which external APIs the developers will be making use of.

The APIs on the edges of a system are at a higher level. TDD (and use of design patterns) makes for bad "system" APIs, because they are not discoverable or appropriate to their audience. To use such an API, you typically need to create a few classes (maybe Strategies), or even implement an interface, and only then can you call a method which will do what you want. This is where the architect needs to step in and (writing code if necessary) ensure that the system can expose itself to the outside world in a proper way. There is clear business value in API design at this level.