Hosted Private Cloud Status - FS#9663

OVHcloud Private Cloud Status

Current status

Legend

Operational
Degraded performance
Partial Outage
Major Outage
Under maintenance

FS#9663 — pcc-1a/b-n7

Incident Report for Hosted Private Cloud

Resolved

We have an incident on application of the config on the switches.

\"ERROR: Configuration Failed with Error: Failure Returned from Policy Server\"
\"CEST: %VLAN_MGR-2-CRITICAL_MSG: Switchport Configuration Failed for msgid 0x37f0c9 rrtoken 0x37f0c9\"

We are contacting the manufacturer.

Update(s):

Date: 2013-11-12 12:24:09 UTC
Everything is up.

Date: 2013-11-12 07:09:16 UTC
The pcc-1b rebooted properly. The infrastructure works properly. We will replace a few optics following the reboot.

Date: 2013-11-12 07:07:43 UTC
We rebooted pcc-1b.

Date: 2013-11-12 01:22:47 UTC
The pcc-1a-n7 is up.The reload has effectively set the data structure of the vlan config.
We are gradually remounting ports then we will try again the manipulation on pcc-1b-n7.

Date: 2013-11-12 01:20:48 UTC
We decided to try to isolate the other switch of the pair, the pcc-1a-n7 whose role is \"primary\" in the vPC pair. This time, we did not have had any problems.
We are currently rebooting the chassis to fix the problem of data structure on the vlans config.

Date: 2013-11-12 01:19:00 UTC
We kept switching with Cisco.
The first cards are completed and all switching went well.

On the latest cards, following losses of connectivity to pcc-106-n5, pcc-107-n5, pcc-108-n5, pcc-109-n5, pcc-116-n5 and pcc-117-n5, we reactivated all ports which had been cut in order to resume traffic as soon as possible.

We are working with Cisco to understand what causes the malfunction that is related to N5 access (25, 28 and 29).

Date: 2013-11-12 01:15:55 UTC
We started to cut the ports gradually on the pcc-1b-n7 in order to isolate the network. This would not have any impact because the traffic is switched in // by pcc-1a-n7.
However, we still lost connectivity to 3 switches N5 access (25, 28 and 29). Then we have reactivated all ports that had been cut in order to resume traffic as soon as possible.
We are working with Cisco to understand what causes this problem.

Date: 2013-11-12 01:11:52 UTC
Th intervention started , we rebooted pcc-1b-n7.

Date: 2013-11-08 23:21:41 UTC
We deferred this operation to the night of Monday 11th to Tuesday, 12th November at 00:00 CET.

Date: 2013-11-08 23:20:22 UTC
Cisco has identified the source of errors as the maximum achieved with the last NXOS version.
Following the update of http://status.ovh.co.uk/?do=details&id=5713 , the Nexus 7000 have not integrated totally the new configurations.

A reload of the chassis is necessary to apply the new configurations.
The reload will be done with cisco teams.

This operation is planned for midnight on the night of Friday, November 8th to Saturday 9th of November.

Date: 2013-11-08 18:33:09 UTC
We are working with Cisco to fix this problem. The case in now in P1, which is high priority. It's currently not possible to modify the vlans configuration on the 2 core switchs of RBX Dedicated Cloud. We still don't know if this is related to the NXOS upgrade, the OS that runs on these equipments or if it's a problem related to new routing configurations or something else.
There is no impact on the traffic but it's currently not possible to add new resources.

Posted Nov 08, 2013 - 15:23 UTC