Guild Launch News
To All of Our Customers - Backup Process Improvements
Greetings Everyone,
On Thursday, April 15th we had an extended downtime that lasted approximately 31 hours. This downtime was the result of a very rare database corruption event during a backup process. Database events of this type that require database recovery are exceedingly rare. This is the only one we have ever experienced. The issue of the extended recovery time is something that we are already in the process of addressing. The length of the recovery time was as unacceptable to us as I'm sure it was to all of you. Our top priority right now is to find a backup solution that is as effective as the one we have from the standpoint of recovering data, while also faster to recover. We are evaluating solutions and will implement them as quickly as possible with minimal disruption. Below, I am going to try to directly address questions that I am sure you have.
---------------------------------------------------------------------------
Did anything go right here?
Yes! We accomplished the most important goal which was to bring your site back up with no data loss. The backup method we had was reliable and accurate. In addition, we used Twitter to keep you all informed as best we could. Keeping communication open is important to us.
What about the long recovery time?
Early on in the event we had a few false starts which ate up 3 or 4 hours. Initially we thought the issue was much less severe than it was. We spent a number of hours early on with recovery methods that ended up being dead ends. I'm not sure this was avoidable, but we are reviewing options for a faster recovery. If we had a faster recovery option available then we could have possibly chosen to go straight to full recovery sooner than we did. We will address that in our backup process review.
In the end, the recovery time was primarily due to a slow recovery method. This is partly due to the mySQL recovery process that we had chosen which is inherently slow. In our backup verifications our experience had been that recovery should be faster than it was. We were disappointed at the speed and will be addressing this.
How will you keep this from happening again?
We have identified the backend processes that led to the corruption during the backup and have identified what changes need to be made to keep it from happening again. We will be taking the database server down for 30 minutes later this week to apply hardware and software changes.
What are the next steps?
We have engaged a team of mySQL consultants who are very experienced with mySQL. Some of them are former mySQL employees. They are evaluating our current backup system and recommending a system which accomplishes the following:
1. Maintains the data reliability and accuracy of the current backup process
2. Provides a faster recovery time
3. Allows us to reduce or remove the weekly downtime for backups if possible
Unfortunately, it is impossible to fully predict exactly how long a database recovery will take. Recovering a full database of our size will never be instantaneous. However, I believe we can reduce the time significantly with modifications and additions to our backup process.
What about the time we were down?
We want to make up for the downtime as best we can. To do this we are offering two things:
1. We will be adding 2 days to the subscription time of all subscription hosting accounts. While we cannot change the rebill date of PayPal subscriptions, your subscription will be extended. The current system doesn't have a credit system because we have never needed one, but we will add it next week and roll out the credit.
2. We are now offering everyone a 20% off coupon for Ventrilo service which is available until April 24th. This coupon will provide you a flat 20% discount on any Ventrilo server that you order with any billing period. This is available to all customers both new and existing. If your Ventrilo server is in the previous Ventrilo ordering system please contact support@guildlaunch.com and we will work out the discount.
The coupon code is: GIVE_ME_20 and you can order service at http://billing.guildlaunch.net or you can apply the coupon to your next payment.
---------------------------------------------------------------------------
I want you all to know that we take these sorts of things very serioulsy and we will take appropriate steps to solve these issues. We want to thank everyone for their support and patience during this extended downtime. We know that downtime of this type is very frustrating. We will keep you all informed as we review the backup process and implement changes to improve this aspect of our service. We look forward to continuing to provide, and improving, the best Guild Site Hosting available to our truly Epic customers.
-Stephen & The Guild Launch Team: Jun, Mike & Vicki
On Thursday, April 15th we had an extended downtime that lasted approximately 31 hours. This downtime was the result of a very rare database corruption event during a backup process. Database events of this type that require database recovery are exceedingly rare. This is the only one we have ever experienced. The issue of the extended recovery time is something that we are already in the process of addressing. The length of the recovery time was as unacceptable to us as I'm sure it was to all of you. Our top priority right now is to find a backup solution that is as effective as the one we have from the standpoint of recovering data, while also faster to recover. We are evaluating solutions and will implement them as quickly as possible with minimal disruption. Below, I am going to try to directly address questions that I am sure you have.
---------------------------------------------------------------------------
Did anything go right here?
Yes! We accomplished the most important goal which was to bring your site back up with no data loss. The backup method we had was reliable and accurate. In addition, we used Twitter to keep you all informed as best we could. Keeping communication open is important to us.
What about the long recovery time?
Early on in the event we had a few false starts which ate up 3 or 4 hours. Initially we thought the issue was much less severe than it was. We spent a number of hours early on with recovery methods that ended up being dead ends. I'm not sure this was avoidable, but we are reviewing options for a faster recovery. If we had a faster recovery option available then we could have possibly chosen to go straight to full recovery sooner than we did. We will address that in our backup process review.
In the end, the recovery time was primarily due to a slow recovery method. This is partly due to the mySQL recovery process that we had chosen which is inherently slow. In our backup verifications our experience had been that recovery should be faster than it was. We were disappointed at the speed and will be addressing this.
How will you keep this from happening again?
We have identified the backend processes that led to the corruption during the backup and have identified what changes need to be made to keep it from happening again. We will be taking the database server down for 30 minutes later this week to apply hardware and software changes.
What are the next steps?
We have engaged a team of mySQL consultants who are very experienced with mySQL. Some of them are former mySQL employees. They are evaluating our current backup system and recommending a system which accomplishes the following:
1. Maintains the data reliability and accuracy of the current backup process
2. Provides a faster recovery time
3. Allows us to reduce or remove the weekly downtime for backups if possible
Unfortunately, it is impossible to fully predict exactly how long a database recovery will take. Recovering a full database of our size will never be instantaneous. However, I believe we can reduce the time significantly with modifications and additions to our backup process.
What about the time we were down?
We want to make up for the downtime as best we can. To do this we are offering two things:
1. We will be adding 2 days to the subscription time of all subscription hosting accounts. While we cannot change the rebill date of PayPal subscriptions, your subscription will be extended. The current system doesn't have a credit system because we have never needed one, but we will add it next week and roll out the credit.
2. We are now offering everyone a 20% off coupon for Ventrilo service which is available until April 24th. This coupon will provide you a flat 20% discount on any Ventrilo server that you order with any billing period. This is available to all customers both new and existing. If your Ventrilo server is in the previous Ventrilo ordering system please contact support@guildlaunch.com and we will work out the discount.
The coupon code is: GIVE_ME_20 and you can order service at http://billing.guildlaunch.net or you can apply the coupon to your next payment.
---------------------------------------------------------------------------
I want you all to know that we take these sorts of things very serioulsy and we will take appropriate steps to solve these issues. We want to thank everyone for their support and patience during this extended downtime. We know that downtime of this type is very frustrating. We will keep you all informed as we review the backup process and implement changes to improve this aspect of our service. We look forward to continuing to provide, and improving, the best Guild Site Hosting available to our truly Epic customers.
-Stephen & The Guild Launch Team: Jun, Mike & Vicki
Guys,
You did the best you could in a bad situation. As a customer, I was encouraged by your professionalism and efforts to keep us informed. I plan to continue my service, and greatly look forward to the new things to come.
Again, $h!t happens - and it is how we deal with those situations that define us. You guys proved to me that you are a top notch group of professionals.
You did the best you could in a bad situation. As a customer, I was encouraged by your professionalism and efforts to keep us informed. I plan to continue my service, and greatly look forward to the new things to come.
Again, $h!t happens - and it is how we deal with those situations that define us. You guys proved to me that you are a top notch group of professionals.
As an IT professional that has had experience maintaining clustered, complex server environments, a ./salute to you and your team. Your communications were timely and professional. You had a backup plan and executed it with no data loss. You re-implemented the sites in a nice smooth manner to ensure you didn't get crushed coming back. You're reviewing and making process improvements.
Couldn't ask for anything more and fully respect what you guys have been through in the last week.
Keep up the good work.
Couldn't ask for anything more and fully respect what you guys have been through in the last week.
Keep up the good work.
Stephen and team, no worries. We know you're doing all you can for us and encountered an extremely rare situation. After all, this was a learning situations that helps you find a better solution for backups in the future which equates to better service for us.
Thank you for the credit time and 20% off Ventrilo hosting!
For the 20%, how do I apply that to my account with the coupon code? I can't find a way to add that in there. Thanks!
Thank you for the credit time and 20% off Ventrilo hosting!
For the 20%, how do I apply that to my account with the coupon code? I can't find a way to add that in there. Thanks!