August 15, 2008 outage

What happened on Aug 15?
We messed up in a big way by introducing buggy Monitus Tools code that essentially turned pages blank on merchants’ live sites, thereby making them unusable to website visitors.

How long did it take to fix the bug?
Yahoo! Store support first reported the problem at 11.48AM ET. The bug fix was completed at approximately 1.25PM ET whereas Yahoo! Store support reported the issue to have been resolved at 2.22PM ET. The difference in time is most likely due to the bad code having been cached. So, my best estimate is that the issue lasted between two and three hours.
What was the nature of the bug?
A little background first. When you look at the Monitus code on your site, you will see a reference to monitus_tools.js. This file loads the Google Analytics files, static Monitus Javascript files and a dynamic Monitus Javascript file. The Google Analytics files have to be fully loaded in order for our dynamic file to work. These files are loaded synchronously in the browser meaning that all elements have to load before the page load is reported as complete. Furthermore, the Google Analytics tracking code as well as many other Javascript scripts use the document.write method. Document.writes are easy to use, but they also have some drawbacks; you can only use document.write before the page has finished loading. Or put another way, document.writes have to be run before the page can continue loading. If you have an unresponsive script that uses this method it will hold up the loading of the page. You can read more about document.write.

We found in our log files that on certain occasions our dynamic code would run before the Google Analytics files had fully loaded, causing an error in the log files. Since we did a server upgrade recently we thought that our new server setup was perhaps “too fast”. What we tried to do next is load our files asynchronously, similar to Ajax-based sites, like Google Maps. With this approach code is loaded independently of the page load. We wanted to try this approach to make sure that our dynamic code would wait for the static files to load fully. We also felt that it would result in a better end-user experience because the tracking files would not affect page loads anymore.The error we made was that we continued using document.write: Running a document.write on a page that has already loaded causes it to be redrawn and overwrite existing content.

Why did you not respond earlier?
I pretty much handle all the communications here at Monitus, but I was traveling on Aug 15. When I finally got back online I saw all the various email and voicemail and started responding one on one as best as I could. Jean did his best to respond to email sent to him. Although the bug itself was fixed by the time I got back, I realize that many people were left in the dark, not knowing what was going on. I am looking at ways I can do better in this regard, such as having a way to contact us directly in the case of an emergency. Your suggestions in this regard would be welcome. I hate the fact that I allowed myself to be put in a position where I was unable to help right away.
What’s next for you and Monitus Tools?
Building up trust takes a long time and can be lost in a very short amount of time. I consider it a privilege for our code to be on your site and I unfortunately failed you on August 15. I am very upset that our mistake had such a big effect on you and I will do everything I can to do better in the future.

Please let me know what I can do to regain your trust.

Yours sincerely,
Michael Whitaker