Sie sind auf Seite 1von 7

21/3/2014 Print preview: On Handling Dates in SQL

https://www.simple-talk.com/content/print.aspx?article=1947 1/7
T
On Handling Dates in SQL
06 March 2014
by Joe Celko
The calendar is inherently complex by the very nature of the astronomy that underlies the year, and the
conflicting historical conventions. The handling of dates in TSQL is even more complex because, when
SQL Server was Sybase, it was forced by the lack of prevailing standards in SQL to create its own ways of
processing and formatting dates and times. Joe Celko looks forward to a future when it is possible to
write standard SQL date-processing code with SQL Server.
"I've been on a calendar, but I've never been on time."
--Marilyn Monroe
he most international of all units of measurement on earth is the Common Era Calendar. This used to be known as
the Gregorian Calendar, Western Calendar or Christian Calendar when it was issued by Pope Gregory in 1582.
And the original company is still in business. There have been no upgrades since then, unless you want to count
adding local names for the months. For example, Czech months are of Slavic origin (jen), whereas the Slovak months
are of Latin origin (Oktber), but most languages use the Latin roots. But I am in database, so I only care about the
subset of ISO-8601 display formats used in my language. The only use of the term ' field ' in SQL is for the parts of a
timestamp (YEAR, MONTH, DAY, HOUR, MINUTE, SECOND ); for this article, all I care about is the DATE set of fields
(YEAR, MONTH, DAY ). Do not panic, this subset is good enough for most commercial work. Those of you who deal with
museums, talk to me off line.
Today, every SQL implementation has a DATE data type, but the functions available for them vary quite a bit. The most
common ones are a constructor that builds a date from integers or strings; extractors to pull out the month, day, or year;
and some display options to format output. Do not mimic 1970's COBOL and format data for display, using a proprietary
vendor feature.
In Standard SQL, the constructor is CAST (<string expression> AS DATE) for a string expression. Hey, we have
this in T-SQL now! In Standard SQL, the only ISO-8601 format "yyyy-mm-dd" allowed for date values. Hey, we have this
is how T-SQL displays by default in the new DATE, TIME and DATE TIME2(n)! Do you think that Microsoft is trying to
catch up? The ANSI Standard also has DATE (<string expression>) option lacking in T-SQL, but that is simple
syntactic sugar.
The ISO-8601 Standard has other date and time formats which are not ambiguous and could be added to the strings
recognized by SQL. But we had to make a decision for the SQL Standard, so that we have one and only one way to work
with dates. This avoids problems like confusing the British (dd/mm/yy), American (mm/dd/yy) and other national
traditional shorthands. This format is also embedded in lots of other ISO Standards, hardware counters and anything
that has to be sorted in temporal order. We picked the most readable, most universal one we had.
SQL Server still has the 1970's Sybase/UNIX convert() string function. This is because when it was written, we did not
know how to use tiered architectures. Client/Server was brand new! We had decades of monolithic COBOL as the only
programming model (yes, I am that old). This means that all the display formatting is mixed in the the other code for data
retrieval, computations, etc. To accommodate this, Sybase added convert() with local, non-standard displays for dates.
There was no concept of independent presentation layers, separated from the database layer, doing local translations,
computations and all the display work.
Sybase also added MONEY as a data type to put punctuation marks (currency symbol, comma, decimal point) in currency
display. MONEY is another issue; it does not do correct math, does not port and has illegal precision for many GAAP and
EU regulations. This was done with a PICTURE spec in COBOL (http://en.wikipedia.org/wiki/Picture_clause), if you want
to see where Sybase got this.
The ANSI/ISO extractor for is the function EXTRACT(<extract field> FROM <extract source>)
The extract field options for dates are a little easier to see this in a table
Meaning of <primary date field> Keyword
Year in Common Era Calendar (returns 0001-9999) YEAR
21/3/2014 Print preview: On Handling Dates in SQL
https://www.simple-talk.com/content/print.aspx?article=1947 2/7
Month within year (returns 01-12) MONTH
Day within month (returns 01-31) DAY
The obvious ordering is from most significant to least significant: YEAR,MONTH and DAY .
The date fields contain non-negative integer values, constrained by the natural rules for dates using the Common Era
calendar. Please note a few things you might not think about if you are not a data junkie who stayed up to watch the
leap-second one night.
There is no year zero. There are no negative years, either. The correct postfix is CE, and BCE for Common Era and
Before Common Era dates, replacing AD (Anno Domini ') and BC (Before Christ).
Standard SQL also has an INTERVAL data type. This is a mixed measurement, made up of temporal fields, just as we
Americans mix feet and inches. An INTERVAL is a period of time, such as "3 years 2 months" or "90 days"; in T-SQL we
have to use a single unit of temporal measurement in a parameter in an old Sybase function call to DATEADD() ,
DATEDIFF() and so forth.
The reason for the Sybase model is historical. The temporal data types were not in the Standards and the Sybase clock
was based on a floating point number in the hardware. Following the UNIX model, we used clock ticks on the old 16-bit
hardware. It works well with C programs, too.
There are effectively two kinds of INTERVAL sub-types for using with dates (there are more for time fields , but I am not
talking about it here).
1. YEAR -MONTH contains one or both YEAR and MONTH fields.
2. "DAY " contains a positive or negative number of days. This looks like what we do in T-SQL, but without the mixed
units and different syntax. .
The reason for the two interval types is to avoid the ambiguity that would arise if a MONTH value was combined with a
field of lower significance, e.g. DAY, given that different months contain differing numbers of days. For example, the
interval "2 months and 10 days" could vary between 69 and 72 days in length, depending on the months involved.
Therefore, to avoid unwanted variations in the downstream arithmetic etc., the variable length MONTH component may
only exist at the lowest significance level in an INTERVAL .
The value of a MONTH field, which is not in the leading field position, is constrained between 0 and 11, in an INTERVAL
(and not between 1 and 12 as in a DATE). You have to round up a dozen months to a year, just you do not describe
yourself as five foot, 1 fourteen inches on an application.
You can assume that your SQL implementation has simple date arithmetic functions, although with different syntax from
product to product. The basic functions you need are just those that work with dates:
A date plus or minus an interval of days yields a new date.
A date minus a second date yields an interval of days.
Here is a table of the valid combinations of <date> and <interval> data types in Standard SQL:
ANSI/ISO Standard SQL Math
<date> - <date> = <interval>
<date> + <interval> = <date>
<interval> (* or /) <numeric> = <interval>
<interval> + <date> = <date>
<interval> + <interval> = <interval>
<numeric> * <interval> = <interval>
There are other rules, which deal with time zones and the relative precision of the two operands that are intuitively
obvious. You can also CAST (<string expression> AS INTERVAL <interval type>) if you need it.
There should also be a function that returns the current date from the system clock. This function has a different name
21/3/2014 Print preview: On Handling Dates in SQL
https://www.simple-talk.com/content/print.aspx?article=1947 3/7
with each vendor: TODAY, SYSDATE, NOW(), CURRENT DATE and getdate() are some examples. The correct
keyword is CURRENT_TIMESTAMP for the system timestamp and CURRENT_DATE for the system date. T-SQL now has
CURRENT_TIMESTAMP to replace the 1970's getdate() from Sybase. Microsoft has been bad about getting up to specs,
so T-SQL will still generate the old getdate() function call. Try to avoid it so you will look like one of cool kids and hope
Microsoft keeps making progress. You can also use CAST (CURRENT_TIMESTAMP AS DATE) for CURRENT_DATE .
Leap Year
You ought to know better, but I am posting a warning anyway. You might remember being told in grade school that there
are 365 days per year and that the accumulation of the fractional day creates a leap year every four years. Once more,
your teachers lied to you; there are really 365.2422 days per year. Every four years, the extra 0.2400 days accumulate and
create an additional day; this gives us a leap year. Every 400 years the extra 0.0022 days accumulate enough to create
an additional day and give us this special leap year. Since most of us are not over 400 years old, we did not have to worry
about this until the year 2000. However, every 100 years the missing 0.01 days (i.e. 365.25 - 365.2422 rounded up)
balances out and we do not have a leap year.
The correct test for leap years is:
CASE WHEN (my_year %400) = 0 THEN 'Yes'
WHEN MOD(my_year % 100) = 0 THEN 'No'
ELSE CASE WHEN MOD(my_year % 4) = 0
THEN 'Yes' ELSE 'No 'END ;
END;
People who did not know this algorithm wrote lots of programs. I do not mean COBOL legacy programs in your
organization; I mean packaged programs for which you paid good money. The date functions in the first releases of
Lotus, Excel and Quattro Pro did not handle the day 2000-02-29 correctly. Lotus simply made an error and the others
followed suit to maintain "Lotus compatibility" in their products. Microsoft Excel for Windows Version 4 shows correctly
that the next day after 2000-02-28 is 2000-03-01. However, it thought that the next day after 1900-02-28 is also February
29 instead of March 01. Microsoft Excel for Macintosh did not handle the years 1900-1903.
Do not worry about working with other products and Standard SQL. I just want to help you read Standard SQL, but we are
not done yet. The ISO-8601 Standards include an ordinal date and a week date format.
Daily Exercise
If you do not like the way that Pope Gregory arranged the months, do not blame him. He inherited that stuff from the
Romans. The simplest way to label the days of the year is to number the days from 1 to 365 (or 366 in a leap year)
without any further aggregations. This is called an Ordinal date and it is formatted as yyyy-ddd' where ddd is an
integer string with leading zeros between '001' to '366' when displayed. It is not called the Julian date, contrary to
popular belief. The Julian date is a special count of days used by astronomers based on the Julian Day Number (JDN)
(http://en.wikipedia.org/wiki/Julian_date). The Julian day number zero begins with 4714-11-24 BCE, it and gets big in
modern times. For example, the Julian day number for 2000-01-01 was 2,451,545.
You probably want to use a 'Julianized' date. This sequence starts with some date that makes sense for your enterprise,
say a century or two. Obviously this numbering will make math easier. But good trick is the 'Julianized' business date.
Instead of using a monotonic increasing sequence, use an increasing sequence (i.e. it has duplicated values). This is
easier to explain with an example. Assume it is 2007 and our company takes Easter as a long holiday (this is a year with
Catholic and Orthodox calendars agree).
CREATE TABLE Business_Calendar
(cal_date DATE NOT NULL PRIMARY KEY,
julian_business_nbr INTEGER NOT NULL,
...);
INSERT INTO Business_Calendar
VALUES ('2007-04-05', 242),
('2007-04-06', 243), -- Good Friday
21/3/2014 Print preview: On Handling Dates in SQL
https://www.simple-talk.com/content/print.aspx?article=1947 4/7
('2007-04-07', 243),
('2007-04-08', 243), -- Easter Sunday
('2007-04-09', 244),
('2007-04-10', 245),; --Tuesday
.. ;
To compute the business days from Thursday of this week to next Tuesdays:
SELECT (C2.julian_business_nbr - C1.julian_business_nbr)
FROM Business_Calendar AS C1, Business_Calendar AS C2
WHERE C1.cal_date = '2007-04-05',
AND C2.cal_date = '2007-04-10';
Do not try to compute holidays with temporal math functions. Look at the formula for Easter; it is mess. And which Easter
did you mean? If you are an American, do you remember all the holidays for the fifty states? When did we create
President's Day from Washington's and Lincoln's Birthday?
The DATE data type is only three bytes, and a century worth of dates is only 36,525 rows. This is cheap and small for
today's computers. The table is read-only forever and is indexed on the cal_date with other display formats as
INCLUDE columns on the index. But more than that, the common Business Calendar prevents bad data. I worked with a
Greek and a Russian Orthodox programmer who had gotten their Easter list from their priest. I had two matching
answers, so I accepted the it. But other calendars in the company were done by Catholics. You can see the problems.
A Week Defense
In most SQL products there is a vendor function to return the day of the week from a date, which is sometimes called
DOW() or WEEKDAY (). The underlying formula is called Zeller's congruence
(http://en.wikipedia.org/wiki/Zeller%27s_congruence) which has versions for both the old Julian and the Common Era
calendars. This was what we used in the old days because the math was hidden in a procedure call in C. We all had a
copy of Standard C Date/Time Library by Lance Latham, (ISBN : 087930-496-0 or ASIN: B00336ESWE now on Kindle).
The problem is that Zeller's formula returns a number between zero and six it has a MOD() function in it. Too bad this
wrong. The ISO Standard Day-of-Week is defined as 1 = Monday to 7 = Sunday. The Microsoft DATE PART() also returns
a number from 1 to 7, but with the default of 1 = Sunday to 7 = Saturday. As a generalization, ISO prefers to use zero as a
value for a missing or unknown value in an encoding scheme. I will explain why that is useful a little later in this article.
The ISO-8601 Standard includes a week date format based on the day-within-week within year that is popular in Nordic
countries. This format is 'yyyyWww-d' where yyyy is the year, W is a separator token, ww is (01-53) week number and d
is (1-7) day of the week, as we just discussed. To make this easy to validate, use:
CHECK (week_date LIKE '[12][0-9][0-9][0-9]W[[0-5][0-9]-[1-7]'
Weeks start with Monday. The first week of a year is the week that contains the first Thursday (and, hence, January 4th) of
their year. There are several websites with calendars you can cut and paste, but you can start your search with: Week
Numbers for 2014. Now look at this year and notice that 2014W52 ends in 2014, but 2015W01 overlaps into 2015.
Week Start Date End Date
1 2014-12-30 2014-01-05
2 2014-01-06 2014-01-12
3 2014-01-13 2014-01-19
4 2014-01-20 2014-01-26
5 2014-01-27 2014-02-02
6 2014-02-03 2014-02-09
7 2014-02-10 2014-02-16
21/3/2014 Print preview: On Handling Dates in SQL
https://www.simple-talk.com/content/print.aspx?article=1947 5/7
8 2014-02-17 2014-02-23
9 2014-02-24 2014-03-02
10 2014-03-03 2014-03-09
11 2014-03-10 2014-03-16
12 2014-03-17 2014-03-23
13 2014-03-24 2014-03-30
14 2014-03-31 2014-04-06
15 2014-04-07 2014-04-13
16 2014-04-14 2014-04-20
17 2014-04-21 2014-04-27
18 2014-04-28 2014-05-04
19 2014-05-05 2014-05-11
20 2014-05-12 2014-05-18
21 2014-05-19 2014-05-25
22 2014-05-26 2014-06-01
23 2014-06-02 2014-06-08
24 2014-06-09 2014-06-15
25 2014-06-16 2014-06-22
26 2014-06-23 2014-06-29
27 2014-06-30 2014-07-06
28 2014-07-07 2014-07-13
29 2014-07-14 2014-07-20
30 2014-07-21 2014-07-27
31 2014-07-28 2014-08-03
32 2014-08-04 2014-08-10
33 2014-08-11 2014-08-17
34 2014-08-18 2014-08-24
35 2014-08-25 2014-08-31
36 2014-09-01 2014-09-07
37 2014-09-08 2014-09-14
38 2014-09-15 2014-09-21
39 2014-09-22 2014-09-28
40 2014-09-29 2014-10-05
41 2014-10-06 2014-10-12
42 2014-10-13 2014-10-19
43 2014-10-20 2014-10-26
44 2014-10-27 2014-11-02
45 2014-11-03 2014-11-09
46 2014-11-10 2014-11-16
47 2014-11-17 2014-11-23
48 2014-11-24 2014-11-30
49 2014-12-01 2014-12-07
50 2014-12-08 2014-12-14
51 2014-12-15 2014-12-21
21/3/2014 Print preview: On Handling Dates in SQL
https://www.simple-talk.com/content/print.aspx?article=1947 6/7
52 2014-12-22 2014-29-28
1 2014-12-29 2015-01-04
Report Period Tables
Since SQL is a database language, we prefer to do look-ups and not calculations. They can be optimized while temporal
math messes up optimization. A useful idiom is a report period calendar that everyone uses so there is no way to get
disagreements in the DML. The report period table gives a name to a range of dates that is common to the entire
enterprise and which applies to a particular report's reporting periods.

CREATE TABLE Something_Report_Periods
(something_report_name CHAR(??) NOT NULL PRIMARY KEY
CHECK (something_report_name LIKE <sortable pattern>),
something_report_start_date DATE NOT NULL,
something_report_end_date DATE NOT NULL,
CONSTRAINT Date_Ordering
CHECK (something_report_start_date <= something_report_end_date),
etc);
These report periods can overlap, have gaps or no particular pattern at all. I like the MySQL convention of using double
zeroes for whole months and years. That uses 'yyyy-mm-00' for a month within a year and 'yyyy-00-00' for the whole
year. The patterns for using LIKE are '[12][0-9][0-9][0-9]-00-00' and '[12][0-9][0-9][0-9]-[01][0-9]-00'
in a validation.
The advantages are that it will sort with the ISO-8601 data format required by Standard SQL and it is language
independent. Using the Date_Ordering CHECK() constraint will help the optimizer as well as preventing unintended
time travel.
Another useful trick is the ISO half-open interval model of time. Time moves in one direction, forward., into an unknown
future. This is why an event will start before it ends and why we might not have a known ending date.
Unlike the IEEE 754 Floating Point Standards, we do not have a plus and minus eternity to correspond to their numeric
plus and minus infinities. But we can use a CAST (NULL AS DATE) in the end_date column and a non-NULL
start_date. The trick is to remember to coalesce that NULL in queries:
SELECT COUNT(room_nbr) AS occupancy_tot
FROM Hotel_Rooms
WHERE room_status = 'occupied'
AND @my_report_date
BETWEEN checkin_date
AND COALESCE (checkout_date, CURRENT_TIMESTAMP);
Reading Assignment:
Here are a few books for a quick reading list.
Humanity's Epic Struggle to Determine a True and Accurate Calendar: by David Ewing Duncan; ISBN-13: 978-
0380975280.
The Oxford Companion to the Year: An Exploration of Calendar Customs and Time-Reckoning by Bonnie
Blackburn; ISBN-13: 978-0192142313.
Mapping Time: The Calendar and Its History by E. G. Richards; ISBN-13: 978-0192862051.
The Seven Day Circle by E. Zerubavel; ISBN-13: 978-0029346808. This is the detailed history of the week and
other units less than a month in many calendars.
21/3/2014 Print preview: On Handling Dates in SQL
https://www.simple-talk.com/content/print.aspx?article=1947 7/7
Simple-Talk.com

Das könnte Ihnen auch gefallen