Guild Launch News
6/24/2009 - Extended Downtime
Greetings Everyone,
What a morning! As you have no doubt have noticed we have been down since roughly 3:30am Eastern and are just now at 11:30am Eastern coming back up. It seems like just yesterday I was discussing with a user how we have had no more than 30 minutes of unscheduled downtime in more than 9 months. I need more wood to knock on when I say these things.
First off, I apologize for the downtime, and we are working with the data center to improve our processes to keep this sort of event from incurring the level of downtime that it did if it were ever to occur again.
Now, on to why were down:
Yesterday around 4:30pm there were 2 tornadoes in the general vicinity of our Datacenter. This isn't that much of a cause for alarm since the DC has fully redundant power, backup generators, and is a hardened facility. Unfortunately, the Tornadoes knocked out power in the area. So, the data center had been running, without incident, on backup generators and UPSs all night.
This morning power was returned to the area and the DC cut over to main power again. This is when something went wrong. They have been focusing on restoring service and are as of yet unsure what happened exactly, but during the cutover they lost power (it should be seamless) and a few devices in the datacenter were damaged.
One of those devices was our Load Balancer. Load balancers have a very low rate of failure typically. The Data Center immediately moved to figure out the issue. They initially replaced the power supply and that didn't fix the issue. They then got a new load balancer, put the harddrive from the old one in it and that didn't fix this issue because the drive was bad. They then put a new drive in a new load balancer and began configuring based on our backed up records of the configuration.
So, here we are now, with a brand new load balancer and we are good to go. We've looked over all of our other machines and there are no signs of issues. Siglaunch should be returning shortly as well as our ad server and the other secondary services.
I sincerely appreciate your patience during this downtime and can assure you we will review our processes to determine if we can lessen the effect of a similar event in the future. I very much appreciate the feedback on Twitter where I have been posting updates on our progress throughout the morning. You can find us at http://twitter.com/guildlaunch
Enjoy!
Stephen
What a morning! As you have no doubt have noticed we have been down since roughly 3:30am Eastern and are just now at 11:30am Eastern coming back up. It seems like just yesterday I was discussing with a user how we have had no more than 30 minutes of unscheduled downtime in more than 9 months. I need more wood to knock on when I say these things.
First off, I apologize for the downtime, and we are working with the data center to improve our processes to keep this sort of event from incurring the level of downtime that it did if it were ever to occur again.
Now, on to why were down:
Yesterday around 4:30pm there were 2 tornadoes in the general vicinity of our Datacenter. This isn't that much of a cause for alarm since the DC has fully redundant power, backup generators, and is a hardened facility. Unfortunately, the Tornadoes knocked out power in the area. So, the data center had been running, without incident, on backup generators and UPSs all night.
This morning power was returned to the area and the DC cut over to main power again. This is when something went wrong. They have been focusing on restoring service and are as of yet unsure what happened exactly, but during the cutover they lost power (it should be seamless) and a few devices in the datacenter were damaged.
One of those devices was our Load Balancer. Load balancers have a very low rate of failure typically. The Data Center immediately moved to figure out the issue. They initially replaced the power supply and that didn't fix the issue. They then got a new load balancer, put the harddrive from the old one in it and that didn't fix this issue because the drive was bad. They then put a new drive in a new load balancer and began configuring based on our backed up records of the configuration.
So, here we are now, with a brand new load balancer and we are good to go. We've looked over all of our other machines and there are no signs of issues. Siglaunch should be returning shortly as well as our ad server and the other secondary services.
I sincerely appreciate your patience during this downtime and can assure you we will review our processes to determine if we can lessen the effect of a similar event in the future. I very much appreciate the feedback on Twitter where I have been posting updates on our progress throughout the morning. You can find us at http://twitter.com/guildlaunch
Enjoy!
Stephen
Last edited by GL_Support on Wed Jun 24, 2009 10:19 am; edited 2 times in total
I did wonder what was going on. Good your back up and with people comments on twitter I think I check it out.
Wightnight wrote:
The server for Thumper's Ventrilo widget is a seperate server. It's web farm is being recreated as we speak.
-Stephen
My "ventrilo status" does not work anymore
The server for Thumper's Ventrilo widget is a seperate server. It's web farm is being recreated as we speak.
-Stephen