It was an ordinary day of consulting. We were at the customer’s site, installing an update to a program that we made. I was in the process of handing-over my project to my protégé. As I walked him through the processes and systems, I was thinking about how glad I was that the second week of his OJT is nearly over. Next week, I would start a new project. It was slated to take 2 person-years to complete. I was really looking forward to leading a team for this new challenge.
Before we installed our database updates for the week, we asked the Admin a usual question “has the database and server been backed up recently”? The response from the admin was an eye-roll, and a sigh, “yes, of course we backed it up, like we always do. You know that.”
My colleague was at the keyboard entering SQL commands. He had a slightly different approach from me. He loaded all of the commands and highlighted each of them to run each one. My approach was much less efficient. I always have a text file with all of the commands, off to the side. I copy/paste each one between a “begin tran” and “rollback tran” block. After I see that the rowcount is correct, I change the “rollback tran” to a “commit tran”. After each one is committed, I undo (x2) and paste the next command between the tran/rollback block.
My colleague was proud of his efficiency. After running commands for a few minutes, he somehow made a small slip with the mouse and accidentally missed the “WHERE clause” at the end of an UPDATE command. It was not highlighted (which means that it would not be included as part of the command). He had a lot of momentum and was moving pretty quickly. He hit the run command before I can say “hold on”. As I said it, he already knew. The rowcount came back with 375,000 rows affected. It should have been 3. He looked over his shoulder at me and the color washed out of his face. You could see that he was genuinely scared. “OMG, what just happened?!”
“No problem” I reassured. “We know that we have a good backup. We will roll it forward and start over with our updates”. My colleague took a deep breath and felt somewhat relieved but still a little embarrassed.
We went to the admin and ask him to roll-back the tape from last night. “What do you mean, roll-back”? the admin said. “That would over-write the data from today. That would be a half-day of data, lost. We can’t just roll-back.”
We explained the mistake: It was a simple mistake. It could have happened to anyone. We just need to restore the 375,000 rows that were damaged. We could restore that into a temp table and simply copy those records into the current database.
Then the admin dropped the other shoe: “We don’t back-up that database every day. That database is backed up weekly. Our backup for THAT database is two and a half days old. Plus, that table gets partial changes all day long. Restoring that table would be disastrous. The company would lose 2.5 whole days of business with no record of what went on during those days.”
All three of our hearts stopped. We were silent for several minutes.
My colleague asked to step outside. I think he was about to yack. I stayed inside to ask the admin about logs or other secondary records of what had gone into the database for the past 2.5 days.
Soon, the office staff started paging the sys-admin. People were noticing that the database was having a problem. All of the records looked the same. That can’t be right. What was going-on?
Now we were all starting to get scared. The business was grinding to a halt. People were calling, but could not place orders or check the status of their accounts, unless they knew their customer IDs (of course, nobody ever knows their customer IDs. It is silly to even ask).
We decided that we needed to restore that table to its backed-up state from Sunday. It was better than having all of the data be wrong.
Before doing anything else, we made another backup of the corrupted data (just-in-case any of it was still useful).
As we were resolving that data, it occurred to us that we had paper receipts and packing-slips that had been printed-out for the entire week. None of it had been shipped yet. We could re-enter all of the week’s orders, BY HAND, based on those receipts and packing slips.
We ran to the shipping area and told them to start opening all boxes and make copies of each packing slip. They were not happy. We did the same for the mail room.
After we got all of the packing slips and invoices, the total count was in. The orders that needed to be re-entered came to, around 2000 orders. TWO THOUSAND.
I spent the rest of the week with the sys-admin and two others, re-entering all of those orders, by hand. Meanwhile, my colleague said that he could not show his face in that customer’s office any more. I could sympathise.
My boss got an earful from the customer. “If this guy won’t set foot inside of the customer’s building then what should we do with him? Should he be fired?” my boss inquired. I thought about the fact that this guy was equally qualified as me. He was sharp. We didn’t really want to lose him because of one mistake, even if it was a big one. So instead of firing him, I suggested that he take-over the dream-project that I was scheduled to lead. It hurt more than words can say. I would mentor him on that project and be his wing-man. Basically, his mistake would get him promoted and I would stay behind to do what he was supposed to be doing: paying dues.
My boss reluctantly agreed.
So, I stayed-back to repair the relationship with the ruffled customer and I helped introduce a few new company policies. My first new policy: All consultants must always make your own backups. No exceptions!