rssLink RSS for all categories
 
icon_red
icon_green
icon_red
icon_red
icon_blue
icon_green
icon_green
icon_red
icon_red
icon_red
icon_orange
icon_green
icon_green
icon_green
icon_green
icon_blue
icon_green
icon_orange
icon_red
icon_green
icon_red
icon_red
icon_green
icon_red
icon_red
icon_red
icon_red
icon_orange
icon_green
 

FS#2827 — FS#6788 — Service vCenter

Attached to Project— Dedicated Cloud
Incident
Backend / Core
CLOSED
100%
We had a crash on a switch supporting the vCenter service.
We are intervening.
Date:  Thursday, 31 May 2012, 13:51PM
Reason for closing:  Done
Additional comments about closing:  All the vShield Manager are operational.
Comment by OVH - Tuesday, 29 May 2012, 14:24PM

The 2 switched in dual-home supporting the vCenter service crashed one after the other :

pcc-30a-n5:
-------------
2012 May 29 13:04:12 pcc-30b-n5 %SYSMGR-2-SERVICE_CRASHED: Service "fwm" (PID 3277) hasn't caught signal 6 (core will be saved).

Broadcast message from root (console) (Tue May 29 13:04:25 2012):

The system is going down for reboot NOW!
--------------

pcc-30a-n5:
-------------
2012 May 29 13:04:30 pcc-30a-n5 %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 30, VPC peer keep-alive receive has failed
2012 May 29 13:05:01 pcc-30a-n5 %SYSMGR-2-SERVICE_CRASHED: Service "fwm" (PID 3284) hasn't caught signal 6 (core will be saved).

Broadcast message from root (console) (Tue May 29 13:05:13 2012):

The system is going down for reboot NOW!
-------------



The switches are back.
We launch a check-up in the vCenter service infrastructure.


Comment by OVH - Tuesday, 29 May 2012, 14:28PM

We had also a crash of one of the 2 switches. The other ensured the redondance.

--------------
2012 May 29 13:31:36 pcc-30b-n5 %SYSMGR-2-SERVICE_CRASHED: Service "fwm" (PID 3163) hasn't caught signal 6 (core will be saved).

Broadcast message from root (console) (Tue May 29 13:31:50 2012):

The system is going down for reboot NOW!
--------------


Comment by OVH - Tuesday, 29 May 2012, 14:29PM

We checked the connectivity of each of the hosts which are running the vCenter services.


Comment by OVH - Tuesday, 29 May 2012, 14:29PM

Reason: Reset triggered due to HA policy of Reset


Comment by OVH - Tuesday, 29 May 2012, 15:01PM

We are searching for the origin of the crash with the manufacturer.


Comment by OVH - Tuesday, 29 May 2012, 15:02PM

The vCenter services are up at 95%. We restart the last services which pose a problem.


Comment by OVH - Tuesday, 29 May 2012, 15:43PM

the 2 switches crashed again. We identified the damaged FEX. We are replacing it.


Comment by OVH - Tuesday, 29 May 2012, 19:42PM

We are connecting the new FEX.


Comment by OVH - Tuesday, 29 May 2012, 19:43PM

We have connected the FEX only from one side. This made the concerned switch crash.
The switching continues from the other side.

The core dumps were got back and escalated to the developers in Cisco.

------------------
2012 May 29 16:33:05 pcc-30a-n5 %SYSMGR-2-SERVICE_CRASHED: Service "fwm" (PID 3166) hasn't caught signal 6 (core will be saved).

Broadcast message from root (console) (Tue May 29 16:33:18 2012):

The system is going down for reboot NOW!
------------------


Comment by OVH - Tuesday, 29 May 2012, 19:45PM

We isolated this switch with the FEX 105 which was having a problem.
The switch does not crash only with this FEX.
We try to remount the 4 other FEX which were intially plugged on.


Comment by OVH - Tuesday, 29 May 2012, 19:46PM

The majority of the infrastructures are now operational.

We are continuing the maintenance.


Comment by OVH - Tuesday, 29 May 2012, 21:59PM

All the infrastructures are operational.

There are still some unavailabilities of vShield Manager which will be fixed during the night.