rssLink RSS for all categories
 
icon_red
icon_green
icon_red
icon_red
icon_blue
icon_green
icon_green
icon_red
icon_red
icon_red
icon_orange
icon_green
icon_green
icon_green
icon_green
icon_blue
icon_green
icon_orange
icon_red
icon_green
icon_red
icon_red
icon_green
icon_red
icon_red
icon_red
icon_red
icon_orange
icon_green
 

FS#2637 — FS#6598 — vss-6a-6k

Attached to Project— Network
Incident
Whole Network
CLOSED
100%
We have an incident on this routeur.
Date:  Monday, 16 April 2012, 18:36PM
Reason for closing:  Done
Comment by OVH - Monday, 16 April 2012, 13:19PM

The router was rebooted, we seek the cause of the problem.


Comment by OVH - Monday, 16 April 2012, 13:19PM

Last reload reason: bus error at PC 0x42DB1ED8, address 0x0


Comment by OVH - Monday, 16 April 2012, 13:19PM

Apr 16 11:24:40 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4CE] JA_AG_PM_RAM_FULLNESS_3 = 0x0
Apr 16 11:24:40 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4F1] JA_AG_RM_RAM_FULLNESS_0 = 0x0
Apr 16 11:24:40 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4F2] JA_AG_RM_RAM_FULLNESS_1 = 0x0
Apr 16 11:24:40 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4F3] JA_AG_RM_RAM_FULLNESS_2 = 0x0
Apr 16 11:24:40 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4F4] JA_AG_RM_RAM_FULLNESS_3 = 0x0
Apr 16 11:24:40 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x5F6] JA_TM_HI_FULLNESS = 0x0
Apr 16 11:24:40 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x5F7] JA_TM_LO_FULLNESS = 0x0
Apr 16 11:24:41 GMT: %C6KPWR-SP-4-DISABLED: power to module in slot 9 set off (Fabric channel errors)
Apr 16 11:24:42 GMT: %EARL-DFC1-2-SWITCH_BUS_IDLE: Switching bus is idle for 5 seconds. The card grant is 0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x46] JA_DR_RI_0_STA_FCO = 0x3
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x85] JA_DR_RI_1_STA_FCO = 0x3
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x28A] JA_FI_FT_RCV_RATE_SEL = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x289] JA_FI_FT_XMIT_SHAPE = 0xFFF
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x4CB] JA_AG_PM_RAM_FULLNESS_0 = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x4CC] JA_AG_PM_RAM_FULLNESS_1 = 0xE
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x4CD] JA_AG_PM_RAM_FULLNESS_2 = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x4CE] JA_AG_PM_RAM_FULLNESS_3 = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x4F1] JA_AG_RM_RAM_FULLNESS_0 = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x4F2] JA_AG_RM_RAM_FULLNESS_1 = 0x4
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x4F3] JA_AG_RM_RAM_FULLNESS_2 = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x4F4] JA_AG_RM_RAM_FULLNESS_3 = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x5F6] JA_TM_HI_FULLNESS = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [0:0x5F7] JA_TM_LO_FULLNESS = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x46] JA_DR_RI_0_STA_FCO = 0x3
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x85] JA_DR_RI_1_STA_FCO = 0x3
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x28A] JA_FI_FT_RCV_RATE_SEL = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x289] JA_FI_FT_XMIT_SHAPE = 0xFFF
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x4CB] JA_AG_PM_RAM_FULLNESS_0 = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x4CC] JA_AG_PM_RAM_FULLNESS_1 = 0x5
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x4CD] JA_AG_PM_RAM_FULLNESS_2 = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x4CE] JA_AG_PM_RAM_FULLNESS_3 = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x4F1] JA_AG_RM_RAM_FULLNESS_0 = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x4F2] JA_AG_RM_RAM_FULLNESS_1 = 0x1
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x4F3] JA_AG_RM_RAM_FULLNESS_2 = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x4F4] JA_AG_RM_RAM_FULLNESS_3 = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x5F6] JA_TM_HI_FULLNESS = 0x0
Apr 16 11:24:42 GMT: %PF_ASIC-DFC1-3-ASIC_DUMP: [1:0x5F7] JA_TM_LO_FULLNESS = 0x0
Apr 16 11:24:46 GMT: %DIAG-SP-3-TEST_FAIL: Module 2: TestFabricCh1Health{ID=2} has failed. Error code = 0x2B (DIAG_CHECK_ETHER_PAK_ERROR)
Apr 16 11:24:50 GMT: %EARL-DFC2-2-SWITCH_BUS_IDLE: Switching bus is idle for 5 seconds. The card grant is 0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x46] JA_DR_RI_0_STA_FCO = 0x3
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x85] JA_DR_RI_1_STA_FCO = 0x3
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x28A] JA_FI_FT_RCV_RATE_SEL = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x289] JA_FI_FT_XMIT_SHAPE = 0xFFF
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x4CB] JA_AG_PM_RAM_FULLNESS_0 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x4CC] JA_AG_PM_RAM_FULLNESS_1 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x4CD] JA_AG_PM_RAM_FULLNESS_2 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x4CE] JA_AG_PM_RAM_FULLNESS_3 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x4F1] JA_AG_RM_RAM_FULLNESS_0 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x4F2] JA_AG_RM_RAM_FULLNESS_1 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x4F3] JA_AG_RM_RAM_FULLNESS_2 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x4F4] JA_AG_RM_RAM_FULLNESS_3 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x5F6] JA_TM_HI_FULLNESS = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [0:0x5F7] JA_TM_LO_FULLNESS = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x46] JA_DR_RI_0_STA_FCO = 0x3
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x85] JA_DR_RI_1_STA_FCO = 0x3
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x28A] JA_FI_FT_RCV_RATE_SEL = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x289] JA_FI_FT_XMIT_SHAPE = 0xFFF
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4CB] JA_AG_PM_RAM_FULLNESS_0 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4CC] JA_AG_PM_RAM_FULLNESS_1 = 0x6
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4CD] JA_AG_PM_RAM_FULLNESS_2 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4CE] JA_AG_PM_RAM_FULLNESS_3 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4F1] JA_AG_RM_RAM_FULLNESS_0 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4F2] JA_AG_RM_RAM_FULLNESS_1 = 0x2
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4F3] JA_AG_RM_RAM_FULLNESS_2 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x4F4] JA_AG_RM_RAM_FULLNESS_3 = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x5F6] JA_TM_HI_FULLNESS = 0x0
Apr 16 11:24:50 GMT: %PF_ASIC-DFC2-3-ASIC_DUMP: [1:0x5F7] JA_TM_LO_FULLNESS = 0x0
Apr 16 11:24:51 GMT: %DIAG-SP-3-TEST_FAIL: Module 2: TestFabricCh0Health{ID=1} has failed. Error code = 0x2B (DIAG_CHECK_ETHER_PAK_ERROR)
Apr 16 11:24:51 GMT: %DIAG-SP-3-TEST_FAIL: Module 2: TestSynchedFabChannel{ID=6} has failed. Error code = 0x73 (DIAG_INVALID_CHANNEL_STATUS)
Apr 16 11:24:52 GMT: %C6KPWR-SP-4-DISABLED: power to module in slot 1 set off (Fabric channel errors)
Apr 16 11:24:55 GMT: %DIAG-SP-3-TEST_FAIL: Module 2: TestMacNotification{ID=14} has failed. Error code = 0x3B (DIAG_L2_INDEX_MISMATCH_ERROR)
Apr 16 11:25:00 GMT: %DIAG-SP-3-TEST_FAIL: Module 2: TestFabricCh1Health{ID=2} has failed. Error code = 0x28 (DIAG_DEST_INDEX_CFG_ERROR)
Apr 16 11:25:05 GMT: %C6K_PLATFORM-2-PEER_RESET: RP is being reset by the SP
%Software-forced reload


Comment by OVH - Monday, 16 April 2012, 13:20PM

vss-6a is up but vss-6b has just crashed.


Comment by OVH - Monday, 16 April 2012, 13:21PM

vss-6b ne redémarre pas:
Apr 16 11:46:10 GMT: %FABRIC-SP-5-CLEAR_BLOCK: Clear block option is off for the fabric in slot 5.
Apr 16 11:46:10 GMT: %FABRIC-SP-5-FABRIC_MODULE_ACTIVE: The Switch Fabric Module in slot 5 became active.
Apr 16 11:46:11 GMT: %CPU_MONITOR-3-PEER_EXCEPTION: CPU_MONITOR peer has failed due to exception , reset by [5/0]
*** System received a Software forced crash ***
signal= 0x17, code= 0x24, context= 0x46644dd4
PC = 0x42da4ebc, SP = 0x44954918, RA = 0x413ea2bc
Cause Reg = 0x00003820, Status Reg = 0x34008002

Routing has resumed by vss-6a.


Comment by OVH - Monday, 16 April 2012, 13:22PM

vss-6b has crashed at the same time. The 2 routers were out
at the same time ..


Comment by OVH - Monday, 16 April 2012, 13:22PM

We have a problem of temperature in the room, Our team work on the problem.


Comment by OVH - Monday, 16 April 2012, 13:44PM

The 2 routers are down again at the same time.

this is a problem of air conditioning in the routing
rooms of RBX4. Apparently we have a SPOF due to
the bad internal reflection.

We try to stabilize the situation and then we will
review it!


Comment by OVH - Monday, 16 April 2012, 13:45PM

vss-6a has just crash again. Networks behind vss-6a / b are cut.

We have implemented an emergency ventilation in the room to dissipate the heat. In //, we managed to re-route the air conditioning system. The temperature gradually decreases in the room.


Comment by OVH - Monday, 16 April 2012, 13:50PM

Temperatures are back to normal.
vss-6a is up again. The routing is restored.
vss-6b completes its boot sequence.


Comment by OVH - Monday, 16 April 2012, 18:35PM

vss-6 A and B are back. The temperatur is correct


The compressors of the 2 AC systems were stopped
but not disjunted. We did not have any alarms.


We had no information about the alarms of the sudden
temperatur increase in the routing rooms. We have a system
which calculates the temperatur every 60 seconds and gives the
information in the datacenter through MARCEL.
It did not work neither.


To restart them, we had to stop them a few seconds and then
restarted them. The temperature was down. We are checking
why the 2 systems are stopped but not disjuncted.


We are checking also why it had an impact on the 2
vss-6 A and B only. At worst a room is impacted and
so on of the 2 routers.

The other routers were hot but had no problem.

In short, a mega SPOF ! that we are going to fix !


Comment by OVH - Monday, 16 April 2012, 18:36PM

The situation is stable. We're looking
to fix the problem in the next days.