Deployment day is always gut-check time. When you put a revision into prod, there is always that breath-holding period until the deployment is completed and everything comes up. Shortly thereafter, there is the second period of shallow breathing as you wait for the phone to ring because someone found a bug already. If you’ve never experienced this, then you probably aren’t responsible for very much.
Back in the day, web development was not so easy. Computers were expensive. Most places weren’t dripping with computers and servers. I saw several companies that were so tight with the hardware that the developers actually did their work right on the server. “Deployment” simply meant that you renamed a file or two and then the menus would point to your newest files. * shudder *. It still gives me nightmares sometimes.
In the mid 90’s, things improved. I mostly worked on two-stage environments: Dev, Prod. I did the work on my machine and when I felt like I had tested it, I flung it up onto prod and watched for smoke. If I had good error-logging, I could detect errors before people were done dialing my number. Yipee.
Then things got better. We got a QA environment. This was nice, because we stopped being so baffled by the old “huh, it worked on my machine”. With our new QA environment, we could put the stuff on QA and if it didn’t run, we could take a few minutes to figure it out, without people freaking out and running around like they were on fire. My nerves were much better after we got our QA server(s).
Of course, when I say QA, don’t get too excited. Sometimes, QA was simply a tester (or two) with his own PC (acting as a make-shift server), or maybe it was a copy of the prod database (or dev database). Either way, it required much less intestinal fortitude to write a program and let a tester bang away on it for a while. Afterwards, I felt better about flinging it up to prod.
Since 2002, VMs (virtual machines) have been around. Companies don’t have to drop big coin to get another server any more. Anybody with admin rights can spin-up a server in an hour or two. If you have templates, it takes even less time. So, hardware and cost are not really a limiting factor any more. It is a little suprising that things haven’t changed much since then. For most places, these are still the servers/environments that they have:
Of course some places will have another environment called “training”, which is really just another Prod. It doesn’t really add any safety measures onto the current mix.
In 1999, I was working on a big project with serious up-time reqirements. My server admin (and mentor) introduced to a new server environment: The staging server.
This fella saved our butts from some pretty serious stuff. Let me give an example of the problem:
Manager The developers have been jamming away on their dev environment. They released an early preview and have been cranking-out updates every couple of weeks. We have been testing and things are getting close to a release.
Lead Developer We’ve come a long way since our early preview. So much has changed. It has been a bit of a blur. Oh golly. As we approach our release date, I hope we have kept good notes on what went onto QA. Several of our updates broke QA and we managed to fix it within a day or two. I hope things go better when we deploy to prod. If not, I’m sure we can fix it.
Server admin Those guys scare the crap out of me. They say crazy crap like “don’t worry” and “we can fix it”. I hate that talk. I should demand some kind of guarantee from them, but I know it would just be smoke. I want it done right, but I can’t do it myself.
The “staging server” concept is meant to be a unique QA environment for the sys-admin (exclusively). The developers can test their work on the QA server, but the sys-admin needs a place to do his testing.
The staging process
Once the dev team believes they are ready to release a new version, the sys-admin makes a backup(s) of prod and restores it to the staging environment(s). The sys-admin confirms that the staging environment is good and then proceeds. The developers make a deployment set (files) with instructions for deployment and a test plan for confirming a successful deployment. Then they are locked out of the room. No touchie, no lookie.
The sys-admin follows those instructions and deploys to the staging environment. Then a test is performed to confirm (or not) that the deployment was successful. If it succeeds, then onward to prod. If it fails, then the developers can be allowed to look into what is wrong. Then everything gets wiped and the staging process restarts from the beginning. Repeat until everything is successful in staging.
Passing the test
The key to this being successful is obviously the confirmation test. You are mistaken if you believe that a simple “monkey test” will do (monkey test = arbitrarily click around, in an un-organized manner until you become bored). No, that kind of testing provides very little value. So, if you are just going to use a monkey test, then there is little point in all of this staging hoopla. On the other end of the spectrum, your entire QA test suite/plan would be excessive. You don’t need that level of logical testing. What you need, is a special test designed to quickly hit every corner of the app to make sure the deployment hasn’t broken something or missed something.
This staging process has caught some very serious problems for my team, just before deployment. When we follow a good process and use the staging server properly, everyone sleeps a little better at night. Especially the server admin (and his boss).