rssLink RSS for all categories
 
icon_red
icon_green
icon_red
icon_red
icon_blue
icon_green
icon_green
icon_red
icon_red
icon_red
icon_orange
icon_green
icon_green
icon_green
icon_green
icon_blue
icon_green
icon_orange
icon_red
icon_green
icon_red
icon_red
icon_green
icon_red
icon_red
icon_red
icon_red
icon_orange
icon_green
 

FS#8390 — FS#12261 — bhs2-15b-n6

Attached to Project— Network
Incident
Beauharnois → BHS-2
CLOSED
100%
The n6 rebooted due to a bug related to port-security.

Kernel uptime is 0 day(s), 0 hour(s), 27 minute(s), 33 second(s)

Last reset at 423002 usecs after Mon Dec 22 04:35:09 2014

Reason: Reset triggered due to HA policy of Reset
System version: 6.0(2)N2(4)
Service: eth_port_sec hap reset

During the reboot, forwarding was by by the 15a, no downtime..
All the FEX are UP and present.


Date:  Monday, 22 December 2014, 10:28AM
Reason for closing:  Done
Comment by OVH - Monday, 22 December 2014, 09:58AM

I wrote too fast, the 15a will be rebooted in an instant. (forwarding by 15b).

I wprepared the ISSU upgrade for the pair.


Comment by OVH - Monday, 22 December 2014, 10:00AM

Okay

The image is being downloaded on the n6.

The pair is stable, I will make the ISSU upgrade at 4/5 am.


Comment by OVH - Monday, 22 December 2014, 10:01AM

The images are downloaded.

The B has been rebooted.



Comment by OVH - Monday, 22 December 2014, 10:14AM

After the reboot the FEX are not UP.

As soon as both side are up, I turn on port-secu and make the ISSU.


Comment by OVH - Monday, 22 December 2014, 10:15AM

Ready to go for the upgrade.
notifying services about system upgrade.
[####################] 100% -- SUCCESS



Compatibility check is done:
Module bootable Impact Install-type Reason
------ -------- -------------- ------------ ------
1 yes non-disruptive reset
2 yes non-disruptive rolling
100 yes non-disruptive rolling
101 yes non-disruptive rolling
102 yes non-disruptive rolling
103 yes non-disruptive rolling
104 yes non-disruptive rolling
105 yes non-disruptive rolling
106 yes non-disruptive rolling
107 yes non-disruptive rolling
108 yes non-disruptive rolling
109 yes non-disruptive rolling
110 yes non-disruptive rolling
111 yes non-disruptive rolling
112 yes non-disruptive rolling
113 yes non-disruptive rolling
114 yes non-disruptive rolling
115 yes non-disruptive rolling
116 yes non-disruptive rolling
117 yes non-disruptive rolling
118 yes non-disruptive rolling
119 yes non-disruptive rolling
120 yes non-disruptive rolling


Comment by OVH - Monday, 22 December 2014, 10:17AM

The ISSU does not function!

le fex 100 est bloqué
FEX FEX FEX FEX Fex
Number Description State Model Serial
------------------------------------------------------------------------
100 FEX100|T02A40 Check Upg Seq N2K-C2248TP-E-1GE SSI16410495

bhs2-15a-n6# sh fex
FEX FEX FEX FEX Fex
Number Description State Model Serial
------------------------------------------------------------------------
100 FEX100|T02A40 Image Download N2K-C2248TP-E-1GE SSI16410495


Comment by OVH - Monday, 22 December 2014, 10:19AM

We can't do it without a cut.
I am stopping the ISSU on B.


Remaining action::
"Module(s) 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120 still need to be upgraded".

Install has been aborted.

Upgrade failed during the update of the FEX 100, the servers are down.
The B is blocked on Check UP Seq FEX 100.


Plan of Action.
- I cut the FEX on B
- I reload B
- I update the nxos on A and then transfer the FEX on B.

There will be down time during the update of the FEX.


Comment by OVH - Monday, 22 December 2014, 10:22AM

The B has crashed during the manip.

bb [local7.err] === : 2014 Dec 22 07:58:33 CET: %SYSMGR-3-HEARTBEAT_FAILURE: Service "afm" sent SIGABRT for not setting heartbeat for last 4 periods. Last heartbeat 175.15 secs ago.
ba [local7.crit] === : 2014 Dec 22 07:58:33 CET: %SYSMGR-2-SERVICE_CRASHED: Service "afm" (PID 3986) hasn't caught signal 6 (core will be saved).
ba [local7.crit] === : 2014 Dec 22 07:58:33 CET: %SYSMGR-2-HAP_FAILURE_SUP_RESET: System reset due to service "afm" in vdc 1 has had a hap failure


Comment by OVH - Monday, 22 December 2014, 10:23AM

nothing goes as planned ...
The A has crashed too.

I go back on FEX B in the latest version, the FEX is updating.
FEX FEX FEX FEX Fex
Number Description State Model Serial
------------------------------------------------------------------------
100 FEX100|T02A40 Image Download N2K-C2248TP-E-1GE SSI16410495
101 FEX101|T02A41 Connected N2K-C2248TP-E-1GE FOX1724G9CL
102 FEX102|T02A42 Connected N2K-C2248TP-E-1GE SSI17160DEA
103 FEX103|T02A43 Connected N2K-C2248TP-E-1GE FOX1724GZ4S
104 FEX104|T02A44 Connected N2K-C2248TP-E-1GE FOX1724GZ5S
105 FEX105|T02A45 Online N2K-C2248TP-E-1GE SSI17160D7R
106 FEX106|T02A46 Online N2K-C2248TP-E-1GE FOX1720GEK6
107 FEX107|T02A47 Connected N2K-C2248TP-1GE SSI1601073V
108 FEX108|T02A48 Online N2K-C2248TP-E-1GE FOX1720GE3G
109 FEX109|T02A49 Connected N2K-C2248TP-E-1GE FOX1720GEMP
110 FEX110|T02D05 Connected N2K-C2248TP-E-1GE SSI173608P6
111 FEX111|T02A61 Connected N2K-C2248TP-E-1GE SSI1641048V
112 FEX112|T02D06 Connected N2K-C2248TP-E-1GE FOX1750GJ2J
113 FEX113|T02D07 Connected N2K-C2248TP-E-1GE SSI173608RT
114 FEX114|T02D08 Connected N2K-C2248TP-E-1GE SSI173606JB
115 FEX115|T02D09 Connected N2K-C2248TP-E-1GE FOX1749GBF5
116 FEX116|T02D10 Online N2K-C2248TP-E-1GE SSI1736062S
117 FEX117|T02D11 Online N2K-C2248TP-E-1GE FOX1748G4U1
118 FEX118|T02D12 Online N2K-C2248TP-E-1GE SSI173606JS
119 FEX119|T02D13 Connected N2K-C2248TP-E-1GE FOX1748G4T6
120 FEX120|T02D14 Connected N2K-C2248TP-E-1GE FOX1750GNV3


Comment by OVH - Monday, 22 December 2014, 10:24AM

Currently the A has been updated.
The B after the reboot was in a weird state, it kept uplink port to bhs-3a / b-a9 in suspended state, yet the VPC is UP.

On the reload starting on a clean base

16 servers remain down on this pair.


Comment by OVH - Monday, 22 December 2014, 10:25AM

The pair is UP, the FEX are all UP.
4 servers remain down.


Comment by OVH - Monday, 22 December 2014, 10:26AM

We still have ports in err-disab on A. N6 is on B all is stable again.

bhs2-15a-n6# sh inter status | i err
Eth102/1/42 server-EG err-disab trunk full auto --
Eth108/1/36 server-EG err-disab trunk full auto --
Eth109/1/14 server-EG err-disab trunk auto auto --
Eth109/1/43 server-EG err-disab trunk auto auto --
Eth110/1/22 server-SP-HOST err-disab trunk auto auto --
Eth113/1/44 server-SP-HOST err-disab trunk auto auto --
Eth115/1/2 server-SP-HOST err-disab trunk auto auto --

We are going to make the last reload on the n6 A. All FEX are up on the B, the traffic will be forwarded by the latter during the reboot.


Comment by OVH - Monday, 22 December 2014, 10:28AM

The pair of Nexus is stable again! We have not seen any problems for 10 minutes.