Hosted Private Cloud Status - FS#11706

OVHcloud Private Cloud Status

Current status

Legend

Operational
Degraded performance
Partial Outage
Major Outage
Under maintenance

FS#11706 — pcc-27-n5

Incident Report for Hosted Private Cloud

Resolved

Several FEX are down on this switch. As the pcc-26 is still configuring, certain hosts are down.

Update(s):

Date: 2014-09-29 09:10:20 UTC
More details on this afternoon's downtime (approx. 18:30 Paris time):

Following hardware issues (fans) on the pcc-26 this morning, we replaced it with the spare and the service was maintained by only the pcc-27. Synchronisation of the configuration took a few hours, which is normal. However, one of the resync scripts seemed to have caused a CPU load peak on the pcc-27 (process ethpm). The consequence is that the pcc-27 ended up losing connection with the FEXs. At that time, around 18:15, we had an isolated, reconfiguring pcc-26 and a pcc-27 cut of from the FEX. The two hosts connected to this pair were cut off - this caused downtime until the pcc-27 came back after a forced reboot around 19:00. Only from this time did the hosts begin to remount.

We are currently finishing to get the pcc-26 back up so that this pair is completely redundant.

Date: 2014-09-29 09:02:01 UTC
There's no longer an issue with the switch. The configuration is now normalised.

Date: 2014-09-29 09:01:31 UTC
4 FEX out of 13 are down on the pcc-27-n5 following the peak load of a process.

As the situation can not be remedied at this level, we've forced the pcc-27 to reload to remount the FEX. All the FEX are now up and the switch is running the configuration from 16:36. We will redo the changes from that.

The network is stable again. The team will work on remounting the hosts.

Posted Sep 29, 2014 - 08:58 UTC