Several FEX are down on this switch. As the pcc-26 is still configuring, certain hosts are down.
Update(s):
Date: 2014-09-29 09:10:20 UTC More details on this afternoon's downtime (approx. 18:30 Paris time):
Following hardware issues (fans) on the pcc-26 this morning, we replaced it with the spare and the service was maintained by only the pcc-27. Synchronisation of the configuration took a few hours, which is normal. However, one of the resync scripts seemed to have caused a CPU load peak on the pcc-27 (process ethpm). The consequence is that the pcc-27 ended up losing connection with the FEXs. At that time, around 18:15, we had an isolated, reconfiguring pcc-26 and a pcc-27 cut of from the FEX. The two hosts connected to this pair were cut off - this caused downtime until the pcc-27 came back after a forced reboot around 19:00. Only from this time did the hosts begin to remount.
We are currently finishing to get the pcc-26 back up so that this pair is completely redundant.
Date: 2014-09-29 09:02:01 UTC There's no longer an issue with the switch. The configuration is now normalised.
Date: 2014-09-29 09:01:31 UTC 4 FEX out of 13 are down on the pcc-27-n5 following the peak load of a process.
As the situation can not be remedied at this level, we've forced the pcc-27 to reload to remount the FEX. All the FEX are now up and the switch is running the configuration from 16:36. We will redo the changes from that.
The network is stable again. The team will work on remounting the hosts.