Friday, November 21, 2014
Microsoft owns up to its latest Azure fumble
The evening of Nov. 18 saw Microsoft's Azure cloud experience a costly embarrassing outage, this one lasting about five hours. Users across the majority of the northern hemisphere were impacted by the system failure, according to Visual Studio Magazine. Since the failure, Microsoft has expressed its commitment to preventing future outages and exploring new ways to be more transparent about outages as they occur. This is Amazon's second major outage this year - another systematic failure occurred last August - so it's easy to peg the blame on Microsoft's negligence. However, considering how many resources Microsoft pours into its AWS cloud, it would be more fair to say the outage reflects how difficult it is to maintain a flawless data center.
Another update, another outage
Microsoft claimed that its last outage was the result of a configuration update, a similar explanation provided by the company's executives for this outage. The company's blog announced that the crash occurred while the company was performing flighted configuration updates to its U.S., Europe and Asia facilities. Despite routine testing, full-scale implementation of the configuration update caused an infinite loop in the front end storage blob, resulting in the cloud outages subsequently felt worldwide.
Once the outage was recognized, Microsoft worked around the clock to correct the roll-out. AWS users were connected to the cloud within hours of the original service loss. Despite suffering multiple major outages within a few months of one another, Microsoft seems dedicated to taking the flub in stride. Going forward, the company promised customers that it would work to rectify the front-end data blob problem immediately. The company has promised to post on its system health information board more often as well.
Recovery tips for for IT
Data Center Knowledge offers advice for IT teams looking to avoid mimicking Amazon's costly pattern of shutdowns. First, IT teams can benefit from establishing an implementation strategy that makes system changes in increments instead of all at once. Staff can also minimize the impact of a data center outage by employing a diverse array of recovery strategies. Using a remote console server to facilitate off-site and cloud storage, for example, is a cost-effective way to add resiliency. Most importantly, it's vital for data center staffs to have an established protocol to follow when data center failure inevitably happens. This pre-planning will assist companies in recovering their data as promptly as possible.
Perle's wide range of 1 to 48 port Perle Console Servers provide data center managers and network administrators with secure remote management of any device with a serial console port. Plus, they are the only truly fault tolerant Console Servers on the market with the advanced security functionality needed to easily perform secure remote data center management and out-of-band management of IT assets from anywhere in the world.