We had a crash on a switch supporting the vCenter service.
We are intervening.
Update(s):
Date: 2012-05-31 11:53:12 UTC All the vShield Manager are operational.
Date: 2012-05-29 19:59:52 UTC All the infrastructures are operational.
There are still some unavailabilities of vShield Manager which will be fixed during the night.
Date: 2012-05-29 17:46:18 UTC The majority of the infrastructures are now operational.
We are continuing the maintenance.
Date: 2012-05-29 17:45:20 UTC We isolated this switch with the FEX 105 which was having a problem.
The switch does not crash only with this FEX.
We try to remount the 4 other FEX which were intially plugged on.
Date: 2012-05-29 17:43:49 UTC We have connected the FEX only from one side. This made the concerned switch crash.
The switching continues from the other side.
The core dumps were got back and escalated to the developers in Cisco.
------------------
2012 May 29 16:33:05 pcc-30a-n5 %SYSMGR-2-SERVICE_CRASHED: Service \"fwm\" (PID 3166) hasn't caught signal 6 (core will be saved).
Broadcast message from root (console) (Tue May 29 16:33:18 2012):
The system is going down for reboot NOW!
------------------
Date: 2012-05-29 17:42:25 UTC We are connecting the new FEX.
Date: 2012-05-29 13:43:38 UTC the 2 switches crashed again. We identified the damaged FEX. We are replacing it.
Date: 2012-05-29 13:02:20 UTC The vCenter services are up at 95%. We restart the last services which pose a problem.
Date: 2012-05-29 13:01:26 UTC We are searching for the origin of the crash with the manufacturer.
Date: 2012-05-29 12:29:43 UTC Reason: Reset triggered due to HA policy of Reset
Date: 2012-05-29 12:29:34 UTC We checked the connectivity of each of the hosts which are running the vCenter services.
Date: 2012-05-29 12:28:57 UTC We had also a crash of one of the 2 switches. The other ensured the redondance.
--------------
2012 May 29 13:31:36 pcc-30b-n5 %SYSMGR-2-SERVICE_CRASHED: Service \"fwm\" (PID 3163) hasn't caught signal 6 (core will be saved).
Broadcast message from root (console) (Tue May 29 13:31:50 2012):
The system is going down for reboot NOW!
--------------
Date: 2012-05-29 12:24:40 UTC The 2 switched in dual-home supporting the vCenter service crashed one after the other :
pcc-30a-n5:
-------------
2012 May 29 13:04:12 pcc-30b-n5 %SYSMGR-2-SERVICE_CRASHED: Service \"fwm\" (PID 3277) hasn't caught signal 6 (core will be saved).
Broadcast message from root (console) (Tue May 29 13:04:25 2012):
The system is going down for reboot NOW!
--------------
pcc-30a-n5:
-------------
2012 May 29 13:04:30 pcc-30a-n5 %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 30, VPC peer keep-alive receive has failed
2012 May 29 13:05:01 pcc-30a-n5 %SYSMGR-2-SERVICE_CRASHED: Service \"fwm\" (PID 3284) hasn't caught signal 6 (core will be saved).
Broadcast message from root (console) (Tue May 29 13:05:13 2012):
The system is going down for reboot NOW!
-------------
The switches are back.
We launch a check-up in the vCenter service infrastructure.