rssLink RSS for all categories
 
icon_red
icon_green
icon_red
icon_red
icon_blue
icon_green
icon_green
icon_red
icon_red
icon_red
icon_orange
icon_green
icon_green
icon_green
icon_green
icon_blue
icon_red
icon_orange
icon_red
icon_red
icon_red
icon_red
icon_green
icon_red
icon_red
icon_red
icon_red
icon_orange
icon_green
 

FS#1586 — FS#5570 — pcc-000159

Attached to Project— VPS
Incident
Backend / Core
CLOSED
100%
The temporary filer used for beta tests of VPS,which has not been removed from production (internal error) is down.
We are currently migrating our customers who renewed the VPS after the beta.
258 VPS are impacted.
Data is not lost.
Customers should get back their service in about 1 hour.
Date:  Thursday, 30 June 2011, 23:44PM
Reason for closing:  Done
Comment by OVH - Thursday, 30 June 2011, 00:08AM

There was a hardware problem on the filer.
We're moving data on a new filer.


Comment by OVH - Thursday, 30 June 2011, 09:17AM

206 vps impacted were put back into production.


Comment by OVH - Thursday, 30 June 2011, 16:58PM

It misses some communication. We are sorry that the
information are not received in a tense flow even if the
team is working on the problem the whole time.
Here is some information which were posted on the ml
vps@ml.ovh.net

Date: Thu, 30 Jun 2011 00:39:15 +0200
From: Oles <oles@ovh.net>
To: "<vps@ml.ovh.net>" <vps@ml.ovh.net>
Cc: "vps@ml.ovh.net" <vps@ml.ovh.net>
Subject: Re: [vps] the filer of the beta

some explications.

The maintenance task advances yet slower than expected.
In easier words, we have lost one of the filers of the 1st
generation which we have used for the beta.
We should have switched the customers since a long time ago
yet since they did not have all a 99.99%
therefore the switch meant an unavailability.
We will thus switch everybody to
99.99% then make the switches of filers on the spot.
The commercial offer has changed yesterday and we were preparing for
all the migrations and modification.
Unluckily, one of the discs has blocked half of the
filer and since this is the first generation, there is no second
half. Consequently, it is the crash. In the last version, the NAS
is HA with 2 shelves of discs
and not 1. The disc has so much crashed the NAS that
the zfs filesystem
is dead in writing. We were successful in mounting the
zfs in reading only and we copied the data from one
filer to another. The data are there, thus there is no loss
yet we need to switch everything to a new other filer.
In case of problem, we have the backups but since the data are there,
we prefer to recover the most recent data , i.e, that of the filer.

We hope we can finish during the night. In all cases,
we are working on it at 100%. We are sad and angry as you
due to this crash, because of this problem all the work
which we did all around the VPS was damaged.
This has proven again that we should not consider the price
but the reliability and availability. with 99.99% by default the migrations
should have been already done. and this problem et this problem would never
have existed. but it does and we will undertake it all along the 3 years to come.

In brief :(

Well.

As soon as it is fixed, we will continue to work on the
migrations. we will move everybody to 99.99%
then we will we will do migrations on the new filers.
We were at 30% in preparation. This wk, the migrations
should start on the spot.


Comment by OVH - Thursday, 30 June 2011, 23:44PM

All impacted vps are back to normal.