I don't know if others have fallen into this trap - I only know that I have. As such, this post is mostly a reminder to myself for future projects....
When building software that uses a database, it is important not to create what I will call "implicit, difficult-to-resolve relationships". An unrealistic, but illustrative example...
Suppose you have a database-lookup of zip-codes, such that each zip-code looks up a US state. Suppose then, that you decide to add a time-dimension to those zip-codes, such that each zip-code has a period during which it is valid. In our address table, we store the zip-code. When needed, we can lookup the US state, but each time we do it, we have to factor in some date (because the zip codes are only valid for some period).
It is not always a date (they are the worst though - avoid like the plague). It could be a customer id (customize your lookup tables for each client within the database, but still maintain some common set of values for all clients). If the relationship can no longer be simply resolved, then you have a problem.
It is tempting to add flexibility in this way, but ultimately a mistake. Traversing relationships is done very often in a software system - anything artificial you do to make a relationship more complex than "key = foreign key" has a direct negative impact on system complexity/quality. A good guideline - if your O/R Mapper cannot represent the relationship, then it is too complex (you do use some form of O/R Mapper, don't you!?).
So, keep your relationships unambiguous (by linking on a key field instead, for example). Even duplicating the data is better - it is far easier to keep multiple copies of a data field than it is to consistently correctly traverse a difficult-to-resolve relationship. Keep the zip-code field if you must, but add an unambiguous foreign key to the related data.
Thursday, December 20, 2007
Subscribe to:
Post Comments (Atom)

0 comments:
Post a Comment