Fun with MySQL date and time types

Just stumbled across some details of MySQL that might be the beginning of a neat discussion between DB and O-O guys. Handling date and time correctly at application-level can be hard enough, but it is more fun when the DBMS applies its own special quirks :-). Cited from the MySQL 5.1 reference documentation, Ch. 11.3. (Date and Time Types):

You can get MySQL to accept certain dates, such as '1999-11-31', by using the ALLOW_INVALID_DATES SQL mode. This is useful when you want to store a “possibly wrong” value which the user has specified (for example, in a web form) in the database for future processing. Under this mode, MySQL verifies only that the month is in the range from 0 to 12 and that the day is in the range from 0 to 31.

I think this special mode would probably outsmart most developers extending or maintaing an application. Just imagine the real-world case of two applications (legacy vs. pretty-future-hope) writing to the the same shared DB with either one not validating (or calculating) dates correctly (java.util.Date stores months 0-11).

If I were a DB Engine designer and would have the task to provide customers with a vendor-specific way to store invalid dates – why not provide a type PossiblyInvalidDate type? Remember – different things should have different names. Next.

These ranges are defined to include zero because MySQL allows you to store dates where the day or month and day are zero in a DATE or DATETIME column. This is extremely useful for applications that need to store a birthdate for which you do not know the exact date. In this case, you simply store the date as '1999-00-00' or '1999-01-00'. If you store dates such as these, you should not expect to get correct results for functions such as DATE_SUB() or DATE_ADD that require complete dates. (If you do not want to allow zero in dates, you can use the NO_ZERO_IN_DATE SQL mode).

Another feature that might very well wreak havoc in migration scenarios. Forget intention-revealing interfaces – just learn about all the custom modes your DBMS provides,apply them where possible and you got a job for life! Obviously storing fuzzy birth-dates is a case for the application programmer. Next.

MySQL also allows you to store '0000-00-00' as a “dummy date” (if you are not using the NO_ZERO_DATE SQL mode). This is in some cases is more convenient (and uses less space in data and index) than using NULL values.

I sense premature optimization. Now, as good computer scientists, we would have to have a look at the implementation and make experiments to see which is best, as the reference guide does not provide this data. Next.

MySQL retrieves values for a given date or time type in a standard output format, but it attempts to interpret a variety of formats for input values that you supply (for example, when you specify a value to be assigned to or compared to a date or time type). Only the formats described in the following sections are supported. It is expected that you supply legal values. Unpredictable results may occur if you use values in other formats.

So why let the DBMS check values at all if results are unpredictable if preconditions are not met? Just an idea: Introduce a one-to-one JavaUtilDate (and probably RubyDate, PythonDate, …) type and 99% of all DB-related date/time issues are history. Next.

Year values in the range 70-99 are converted to 1970-1999.

Year values in the range 00-69 are converted to 2000-2069

A lot of applications (i.e. Excel) do something like this, but at a much higher level. Wouldn’t it be simpler to leave such logic to the applications that know nobody can be born in the year 2050 in 2013?

By default, when MySQL encounters a value for a date or time type that is out of range or otherwise illegal for the type (as described at the beginning of this section), it converts the value to the “zero” value for that type. The exception is that out-of-range TIME values are clipped to the appropriate endpoint of the TIME range.

It gets better:

The following table shows the format of the “zero” value for each type. Note that the use of these values produces warnings if the NO_ZERO_DATE SQL mode is enabled. […] DATE [zero value]  '0000-00-00'.

In my understanding this means, that even though I assume zero dates are rejected, MySQL feels free to ignore that (but maybe the JDBC driver checks – who knows?). Next.

The “zero” values are special, but you can store or refer to them explicitly using the values shown in the table. You can also do this using the values '0' or 0, which are easier to write.

Being a context-sensitive reader – that does not really come as a surprise. Next.

“Zero” date or time values used through MyODBC are converted automatically to NULL in MyODBC 2.50.12 and above, because ODBC cannot handle such values.

Just for the sake of completeness – if you feel you just filled up your mind with useless knowledge this won’t hurt anymore.

Conclusion

Database vendors obviously make a lot of assumptions about the use-cases of its users and this is what scares me off on a regular basis as it usually hurts so much more than it helps. As most DBMSs are mere storage providers for object-oriented application programs today, it is hard to understand why one has to basically completely understand the designated DBMS to make it store what comes in and retrieve what went in. (The oracle “empty varchar => NULL” thing is another example of semantically flawed, vendor-specific internal optimizations.) Relational DBMS are quite off track when it comes to usability, which might explain why commerical support is so expensive.

What do you think? Are these options/quirks, that are specific to MySQL, really necessary? Would you use such specialities in your own applications? What are your favorite “options” in other DBMSs? Feel free to comment.

Advertisements