Friday, February 26, 2010

Unmitigated disaster

Work required me to go to Gloucester on Tuesday.  I've been there several times now, but only driven once -- it fact, that was my first long drive in England, and although Google maps said it was a 2 hour trip, it took me 3 hours.  The train gets there faster, costs about the same, and lets me get in a couple of hours of work (or sleep) so I've taken it ever since.

My last visit, however,  I missed the returning train by seconds -- I was running down the platform when it pulled away -- leaving me stuck in the station for an hour, and I didn't get home until almost 9pm.  Plus my boss indicated I might have to stay overnight, so I decided to drive this time, confident that with more driving under my belt, I could make it this time in 2 hours.  So confident, in fact, that I took a different route.

Needless to say, I arrived an hour late, but that was fine because all I needed to do was fix a problem my boss had encountered the previous Thursday, finish the migration, make sure the client was happy, and go home.  A couple of hours, tops.

By lunchtime, I still could not re-create the problem my boss was having.  Ironically, had the problem been manifest, I could have fixed it and moved on, but instead it took me longer to show there was no problem.  I still have no idea what he was doing, but I convinced him it didn't matter what was going on before, it was running fine now.  In fact, the job he was complaining took an hour and a half (and the reason he suggested I might be there overnight)  was running successfully in 15 minutes.

While trying to figure out what had happened, I discovered another team had updated some of the test data and, thinking that might be the cause, I refreshed the data with production data.  That's when I found out that production data was wrong, and the test data (that I had just overwritten) was the fix. And no, I didn't make a backup (it was test!) but they could re-send the new data...on Friday.

So now there was no point in migrating the system into production, because production had bad data, so the new plan was to refresh all of the test data, wait for the other team to update the data again, and then run the new reports from test.  Since this particular data was only updated monthly, that was good enough for 4 weeks, and I would come back on March 16.  However, refreshing the entire test system would wipe out all of my new reports, so I had to wait two hours for the system admin to copy the data and then migrate my changes again.

At 4pm, the system admin announced the restore wasn't finished yet, but he had to leave to pick up his kids, so he'd finish it in the morning.  That meant I was stuck in Gloucester overnight so I checked into the nearby Holiday Grim, who put me in a handicap-accessible room that was bigger than my flat.  (After a rather embarrassing incident last year, I've learned what the red pull-cords are for and I no longer pull them...)
 The next morning, the database was up and I started to migrate my changes, only to find out the company had implemented a new migration process without telling me.  (This is not uncommon, and a large part of my frustration with this company, especially considering there are only 9 employees.)  My boss had written down some terse instructions which turned out to be incomplete and, in several places, wrong, and it took me just over two hours to figure it out.  (The old process took 15 minutes.)  When it was all finished, the web server crashed.

At first we couldn't figure out what had happened, and by this time the client was getting quite upset, and I didn't blame her, but all we could do was ask the system admin to bounce the web server.  He was in a meeting so she left a large post-it note on his keyboard to ensure he did it before going to lunch.  He apparently went to lunch without ever looking at his desk.  And apparently on Wednesday's he plays footb soccer, so he was gone for two hours.  When he finally returned, it took him all of 2 minutes to get the system working again.

I had not been sitting on my hands, of course, and by this time I had made quite a few updates that I wanted to get in so the user could test them, so  I did another migration -- with the proper instructions, it only took 15 minutes -- and the web server crashed again.  We got this resolved quickly and at 2:30pm the user ran the new reports and got ... nothing.

Not wrong data, not bad data, but nothing at all.  I was horrified.  I had no idea what the problem could be, no idea how to fix it, no idea even where to start.  I finally started from the beginning, manually running each step, which is not easy because the tool I use--and the reason I hate it so much--hides everything, like a car with its hood sealed shut.  I finally realized the problem was the process -- the same process that was breaking last Thursday, that was working fine yesterday, the entire reason for my trip out there -- I'd forgotten to run again since the database had been refreshed!

We kicked it off at 3:30pm but at this point I was quite worried because the rental car had to be back by 6pm, and I was taking Jess to a concert in London that evening, so I convinced the client to let me go by promising to return on Thursday if there were any problems.  (Fortunately, there weren't.)  I got in the car, looked at the map to determine my route, and promptly headed off in the wrong direction.

I don't know what it is about roundabouts, but they throw me off my bearings every time.  Couple that a complete ignorance of the local geography -- they never post directions, only places, such that I have to choose between A417 to Ledbury or A417 to Cirencester -- and my general aptitude for going the wrong way, and it gets me every time.  By the time I was able to turn around and come back to where I'd started, it was after 4:30pm and I was hitting rush-hour traffic.  I called and extended the rental car for an extra day.

It took another 3 hours to get home, the last 45 minutes just being the last 3 miles of the journey.  (It kills me to know Jess does this every day.)  I was tired, sore, and the last thing I wanted to do was go out for the evening, but we did nonetheless.  Jess offered to drive and I gratefully accepted.  However, we could not find any parking near the theater, and ended up parking at a mall about a mile away.  We walked in the near freezing temperatures, occasionally pelted by rain, only to find that 'unreserved seating' at Shepherd's Bush Empire actually means 'unreserved seating or standing, when we run out of chairs.'  And they had run out of chairs.

But it actually worked out well: We were on the second level, leaning on the railing, with plenty of space and an excellent view of both the stage and the people crowded shoulder-to-shoulder on the ground floor.  Standing for two hours -- after driving for 3 -- was actually a pleasant change, and the music was fantastic.  (It was Corinne Bailey Rae's first concert in two years, it was swamped by the press, and I was lucky to get tickets.) 

P.S. Just to add insult to injury, the next day I had to return the rental car, which is all of 3 miles from my house.  I got lost, drove for over an hour, and screamed myself hoarse.  I hate driving in London.

No comments: