This is the second of two entries on the subject of my role at Kronos working on the so-called "Y2K problem". The initial entry on this subject was called "Y2K". Throughout 1999, I kept a humorous "Y2K emergency kit" in my office at Kronos Incorporated. The picture to the left shows one of the five items in the kit: a copy of the "Weekly World News" with a sensational front-page story on Y2K. The other items in the kit are shown in the next four pictures in this entry (which also show their display labels). In each case, you can click on the image to see a larger version, and to read a bit more information. When I first arrived at Kronos in April, 1979, their only product (at the time, yet to be introduced) was a microprocessor controlled timeclock (meaning a device used by workers in the factory to punch in and out, recording their time worked). It turned out that the software controlling the clock was very badly written, including an extremely difficult-to-trace bug which caused it to lock up, and burn out one of its components. While we were rewriting the majority of the software in order to eliminate that problem, I took a look at the way the software internally managed its representation of time. In keeping with the software's general lack of good design, this was done extremely poorly. For our timeclocks, 1 minute was the smallest unit of time we needed to track. Time in the original program was represented with separate variables that contained the year (the final two digits only), a code representing the month, the date within the month, an a.m./p.m. indicator, the hour (using a 12 hour clock), and the minutes. In keeping with the general insanity of the code, these values were actually stored in B.C.D. format ("Binary-Coded Decimal"). The above is an absolutely terrible way to keep track of time, for various reasons that computer programmers will recognize, and that I don't need to go into for everyone else. When we rewrote the software, I changed the internal time representation to a single un-signed 32-bit number representing the number of minutes elapsed since midnight, January 1, 1900 (although for purposes of the representation, I assumed 1900 to be a leap year, which it wasn't). When a date was entered into the software, it needed to be converted from external form (year, month, date, hour, minute) to this internal form. Similarly, when the software needed to output a date/time, it needed to convert the internal form into the external representation. But these conversions are not at all difficult, and inside the software, dealing with timestamps was extremely simple. Inside the software, there was absolutely no concern about month, year, century, or even millennium. If you subtracted one of these 32-bit time stamps from another, the result would be the elapsed number of minutes between them. It didn't matter if one of the timestamps was 3:57 AM on November 4, 1999, the other was 12:45 PM on January 3, 2000, and the start of the new millennium had occurred in between. The software doing the subtraction didn't need to know the number of days in November and the number of days in December. It didn't need to know anything; it just needed to subtract two 32-bit un-signed numbers. In actual fact, since timestamps within the clock could cover only a limited range, we generally truncated the 32-bit numbers to only 16 bits for storage purposes, reconstructing the high order 16 bits from the current date and time. But that's a fine point. My purpose in explaining all this is to boast a bit about my good software design in this case (not too much, because I've made plenty of design blunders in my time). The software I wrote in 1979 was perfectly capable of operating into the year 2000 and beyond, with no changes required. This explains the sentence at the beginning of Kronos's "Year 2000 Readiness Disclosure": "Since 1979, Kronos Incorporated has designed its terminals, and TKC DOS, TKC Unix, TKC/S, and TKCWin software to operate into the year 2000 and beyond." Our original timeclock ultimately came to be called the "System 40", and it spawned a whole series of time clocks with different numbers, which all shared the basic code for processing time. One of these systems was a very small, inexpensive unit called the "System 10". Rather late in 1999, I got a panicky call from a customer who owned a system 10, asking if it would work in the year 2000. Now, as we started addressing the Y2K problem, one of the first things the Kronos "Y2K steering committee" did was to determine which of our older systems no longer warranted support, and consequently did not need to be tested. The system 10 was one of the systems which had been declared obsolete. So I told my caller that since it was no longer being supported, we were not testing it to see if it would work properly in the year 2000. Obviously, by late 1999, I had been thoroughly indoctrinated by the lawyers as to what I could say and what I should not say. The system 10 had not been tested, would not be tested, and we would make no official statements about it. My caller was rather upset by this. Since he had foolishly waited until he had only a couple of months left, he really had no time to replace his unit, and thus was in a rather difficult position. He said, "Oh come on, you must know something about it. Didn't anyone give it a try?". Of course, he couldn't easily test his own system 10 by setting the clock forward, since it was in 24-hour-a-day use as an active time and attendance terminal. Well, engineers must really make lawyers tear their hair out, because I confess that I ultimately relented. "Okay," I said "if you turn out to have any problems, I'm going to deny that this conversation ever took place. But yes, out of curiosity, we did set a System 10 forward into the year 2000, and it appeared to work perfectly." There were particular problems with older systems, because if a problem turned up, it could be hard to fix. In the worst-case, the so-called "source code" of the computer program might have actually been lost, making a very old program impossible to alter. But actual loss of the source code rarely happens. Computer programmers in particular are very careful to maintain their original programs. In fact, they make use of elaborate computerized tools called "source control programs" to make sure that nothing is lost. These tools not only keep track of the source code of your program, they keep track of preceding versions, to allow you to go backwards if you introduce bugs as you're working. They also generally have features that allow multiple programmers to work on a single program, keeping track of who has done what, and attempting to resolve any cases of interference between the various programmers working in the same area of the code. Yet even if you make sure you don't lose your source code, you can still lose the ability to change the program. These days, your source code is probably written in what is called a "higher-level language" which is translated into machine language ("object code") by a tool called a "compiler". But as compilers are changed and improved, they sometimes lose the ability to properly compile older code. You may have kept your source code, but lost the compiler capable of translating it. One reason the compiler might change could be a change in the underlying operating system. Any owner of a PC knows that Microsoft comes out with new operating systems every few years. And a new operating system may become incapable of running an old compiler. In its turn, the operating system is dependent on the hardware. An old program can become impossible to change unless you actually kept the old computer that it was originally compiled on (and kept it operational, and retained detailed operating instructions and manuals for how to use that compiler on that old computer). These are some of the problems that faced programmers who found a date related bug in an old ("legacy)" system. Dealing with the Y2K problem was also interesting from a management point of view. I was responsible for testing all of the Kronos software, and for repairing all the bugs that were found, without having any direct authority over the people who actually had to do the work. None of the people who had to write the test scripts, carry out the testing, or fix any bugs that came up reported to me. I was Vice President, Research and Development. The people who had to carry out the work reported either to Software Quality Assurance, or to the various engineering managers responsible for our assorted product lines. Our sometimes conflicting priorities thus occasionally brought me into conflict with the engineering managers in particular, who had to fit the Y2K work into their schedules, which was not easy. After all, they had product release schedules to meet, and customers waiting for new features. Writing test scripts and carrying out the testing distracted people from their primary, revenue producing work. The managers had to work very hard to shoehorn the Y2K work into their schedules. I came into conflict at one point with Dan Doherty, in charge of a large number of older, so-called "legacy" systems. Our difference of opinion on how to proceed eventually had to be mediated by the Vice President - Engineering to whom we both reported, Laura Woodburn. In a meeting with the three of us, she skillfully caused us each to appreciate the other's point of view, and smoothed over the conflict. In charge of the other major set of Kronos products, the "Vice President, Client/Server Systems", Peter George, proposed postponing almost all his testing until only a few months before the end of 1999. I objected, pointing out that no time would be left to fix any problems that arose. Remember, if a bug turned up, we couldn't just fix it in the software we were shipping at the time. We would also have to go out into the field and upgrade every previous installation already in use. While there were procedures for doing this sort of upgrading, they took time. It seemed to me that the Client/Server group was not allowing enough time for this possibility. From Peter's point of view, however, there was almost no chance that any bugs would be found. There were several reasons for his confidence, principal among them being that these were among our newest products, and had been coded at a time when the year 2000 was already coming into view. There was another, technical reason he thought there would be no problem: these products were coded in a computer language called "Smalltalk". A somewhat unusual language, Smalltalk was capable of doing arithmetic of infinite precision. That is, if its numbers got extremely large, it would automatically (at run-time) switch to numbers with a different coding, as required by the size of the value to be represented. In Smalltalk you could multiply 5482456790120547 by 5574200159542011, and without blinking an eye (figuratively speaking) it would return 30560311514172134603837000800017. So it seemed a bit less likely that our Smalltalk code would have any problem dealing with whatever dates we fed it. Until someone tried it. I have to confess that I wasn't the one who thought of the idea of simply doing a quickie test, without waiting for the formal script. But someone in Software Quality Assurance took the Client/Server group's latest and greatest product, and simply set the time into January of 2000. And it didn't work. Faced with that information, Peter was convinced to do his testing earlier than he had originally intended (with some harm to his product shipment schedule). In all of our testing, pretty much the only major problem with any Kronos-created products was with our Smalltalk code, among the newest programs we had at the time! But in the final analysis, having responsibility without authority on the Y2K project didn't cause all that much difficulty. After all, we were all on the same team, and wanted to be sure no disaster would happen on January 1, 2000. And it's actually not all that uncommon to have responsibility without direct authority, given that in industry, multiple departments of a company often cooperate on projects. In addition, in 1999, I had been with Kronos for 20 years, and was very well-known to almost everyone. Our employee badges had sequence numbers, and mine bore the number "3". It's not that I was the third employee hired, as the badges had been re-numbered at some point. But at that point in time, only two other employees had been there longer than I had, and employee number 1 (Mark Ain) was one of the founders. Most people were happy to cooperate and to comply with my requirements. So, what happened at midnight, January 1, 2000? Nothing much. Some people think that meant that the whole problem had been over-hyped. Others think it indicates the success of all the mitigation efforts made during 1998 and 1999. My own opinion falls between these two points of view. Yes, the hysteria was overblown, but there indeed were real problems in many computer programs that got repaired before they caused trouble. What has mankind learned from this experience? Well, as George Bernard Shaw put it, "We learn from history that we learn nothing from history." Apart from our Smalltalk code, most of our programs were written in a high-level language called "C". High-level languages are provided with extensive libraries of "subroutines", so that programmers can use proven code to carry out frequently used operations. There is no need to "reinvent the wheel", as it's usually phrased. In C, many of the most commonly used pre-written subroutines are found in the ANSI C standard library (ANSI is the American National Standards Institute). This library includes a particular standard format for variables holding time, with the name "time_t". It specifies a signed 32-bit integer representing the number of seconds elapsed since 00:00 hours, Jan 1, 1970 UTC ("Universal Time Coordinated"). And that means that it will overflow at 03:14:07 UTC on Tuesday, January 19, 2038 (you can work it out yourself). Note 1 While working on the Y2K problem, I suggested that we be sure that all our programs should work long into the future, and that a mere 38 or so years past the arrival of the year 2000 was not enough. Nobody wanted to go to the rather substantial trouble involved in writing code to replace the standard library functions. And after all, when the time came, the ANSI standards committee would no doubt simply change time_t into a signed 64-bit quantity, fixing the problem. The program would just have to be re-compiled. Yeah, right. See above in this blog entry for a discussion of the difficulties one can encounter running an old compiler. I pointed out that we had, after all, gotten into the Y2K problem in the first place by cutting corners (by doing things like storing years with only two-digits). It was all to no avail - I lost that battle. More than one person actually said to me, "What do I care - I'll be retired by then." So Kronos, and most other companies, are still shipping software which, if left unchanged, will crash in the year 2038. Do a web search on the phrase "2038 problem", including the quotes so as to see only pages with that exact phrase. As of this blog entry, you'll get over 90,000 hits. The breathless prose looks rather familiar to me. Some people are even calling it the Y2038 problem, or the Unix Millennium Bug. Plus ça change, plus c'est la même chose. I'll close with a few more photos of Y2K artifacts, hardly an exhaustive display of the items I received. On the left below, a tongue-in-cheek Y2K Bug Jar - watch out, or I'll stand near your computer and pull the cork! On the right, a Y2K countdown wristwatch, which prior to January 1, 2000 continuously displayed how much time we had left, down to the second. If you want a closer look at the watch, you can click on the image. Then come back here with your browser's "Back" button. Finally, when it was all over, I was of course awarded a plaque to commemorate our success. Click on the image if you want to see it in its full glory, then come back with your "Back" button: This entry, and the previous one called Y2K, have been rather technical. My next two blog entries are also on the Y2K problem, but from a lighter point of view. They contain the text of a humorous speech on the subject of Y2K that I gave in 1999 to about 800 listeners at the "Kronos Sales and Service Convention" in St. Louis, Missouri. In those entries, called Y2K speech, part 1 and Y2K speech, part 2, footnotes have been added for non-Kronos readers, to clarify some of the references.
Note 1: You really can work it out yourself - it's not that hard. 03:14:07 UTC on Tuesday, January 19, 2038: 2038 is 68 years past the starting point in 1970, which without leap years is 24,820 days. We need to add 68/4=17 leap days, for a total of 24,837 days to get to January 1, 2038. Then 18 more days take us to the start of January 19, 2038, for a total of 24,855 days. At 86,400 seconds/day (24 X 60 X 60), that's 2,147,472,000 seconds. Three hours is 10,800 seconds, and 14 minutes is 840 seconds. Adding those numbers plus 7 seconds gives 2,147,483,647. And that number is 231 - 1, the highest positive number that can be held in a 32-bit signed integer. [return to text]
|