Implementing Data Integrity
- 0 Comments
In this blog post, we’ll look at a few of the issues involved in creating the physical model of the problem space: the database schema. Moving from one level to another is primarily a change in terminology—relations become tables and attributes become fields—except for issues of data integrity. These never map quite as cleanly as one wants.
Unknown and Nonexistent Values (Again)
I somewhat blithely declared that domains and attributes should be examined to determine whether they are permitted to be empty or unknown without considering how these constraints might be implemented. The implementation issue (and it is an issue) can’t be avoided once we turn to the database schema.
The soi-disant “missing information problem” has been acknowledged since the relational model was first proposed. How does one indicate that any given bit of information is either missing (the customer does have a surname, we just don’t know what it is) or nonexistent (the customer doesn’t have a middle name)? Most relational databases, including Microsoft Jet databases and SQL Server databases, have incorporated the null as a way of handling missing and nonexistent values.
To call the null a solution to the issue is probably excessive, as it has numerous problems. Some database experts reject nulls entirely. C. J. Date declares that they “wreck the model,”1 and I’ve lost track of how many times I’ve heard them declared “evil.” Any remarks about the complexity of handling nulls or rueful admissions to having been caught out by them will result in remarks along the lines of “Good. You shouldn’t be using them. They should hurt.”
As an alternative, the “nulls are evil” school recommends the use of specific values of the appropriate domain to indicate unknown or nonexistent values, or both. I think of this as the conventional value approach. The conventional value approach has several problems. First, in many instances the chosen value is only conventional. A date of 9/9/1900 doesn’t really mean the date is unknown, we just agree that’s what we’ll interpret it to mean. I fail to see that this approach is an improvement over the null. A null is a conventional value as well, of course, but it can’t be confused with anything else, and it has the advantage of being supported by the relational model and most relational database engines.
The second, and to my mind disqualifying, problem with the conventional value approach is its impact on referential integrity. Take, for example, an optional relationship between a Customer and a Customer Service Representative (CSR), such that the CSR, if one is assigned, must be listed in the CSR table. The conventional value approach requires that a record be added to the CSR table to match the conventional value chosen to indicate that no CSR is assigned