Get webhook notifications whenever Network & Infrastructure creates an incident, updates an incident, resolves an incident or changes a component status.
In order to manage the traffic between our backbone routers on Roubaix (rbx-1-6k<>rbx-2-6k<>vss-1-6k<>vss-2-6k<>rbx-99-6k), we are establishing a new routing architecture. Switching to this new architecture would take place tonight starting from midnight.
This maintenance concerns the links Roubaix <> Brussels (bru-1-6k).
We are switching the links one by one which would not cause any impact on the traffic.
Update(s):
Date: 2010-07-31 00:25:03 UTC MTU problem is resolved with the passing of nexus 5000 to nexus 7000:
Date: 2010-07-31 00:23:38 UTC The switching is accomplished. Remains a defected link (rbx-1<>sw.int-1) passed tonight in interim. that would be fixed by tomorrow.
Date: 2010-07-31 00:21:10 UTC We are switching the traffic on the new links sw.int-1 <> vss-1/2 and rbx-99
Date: 2010-07-31 00:20:31 UTC We are starting the tasks.
Date: 2010-07-31 00:20:02 UTC We are pursuing the tasks tonight hoping that dealing with MTU allows to fix the problem once at all and to switch totally on the new infra.
Date: 2010-07-31 00:11:39 UTC It is an MTU problem and a bug.
There is no problem between Nexus 5000 and 6509 standard and/or en SXF.
We are setting the MTU 9216 and that works properly.
Nexus 5000:
policy-map type network-qos jumbo
class type network-qos class-default
mtu 9216
system qos
service-policy type network-qos jumbo
BOOTLDR: s72033_rp Software (s72033_rp-IPSERVICESK9-M), Version 12.2(18)SXF16, RELEASE SOFTWARE (fc2)
interface Port-channelXXX
mtu 9216
The bug exists between Nexus 5000 and VSS in SXI.
Cisco IOS Software, s72033_rp Software (s72033_rp-ADVIPSERVICESK9-M), Version 12.2(33)SXI3, RELEASE SOFTWARE (fc2)
2 bits are missing.
with
interface Port-channelXXX
mtu 9216
there is CRC on the interfaces
with
interface Port-channelXXX
mtu 9214
No more problems.
We have noticed it on the weft's height in BGP sessions.
Datagrams (max data segment is 9214 bytes):
# ping ip XXXX size 9216 df-bit
Type escape sequence to abort.
Sending 5, 9216-byte ICMP Echos to XXXX, timeout is 2 seconds:
Packet sent with the DF bit set
.....
Success rate is 0 percent (0/5)
-> that's OK from 9214:
#ping ip XXXX size 9214 df-bit
Type escape sequence to abort.
Sending 5, 9214-byte ICMP Echos to XXXX, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 8/52/204 ms
We are going to finalise the internal routing infrastructure with this \"workaround\" then report the bug to Cisco ...
Date: 2010-07-31 00:09:39 UTC The traffic is switched.
Date: 2010-07-31 00:09:21 UTC we are starting the switching operation.
Date: 2010-07-31 00:08:54 UTC Tonight, there will be tasks on the network Roubaix2. We are switching the traffic ss-1 <> vss-2 on a new infra nexus. in case of problem, we would return back immediately.
Date: 2010-07-31 00:06:48 UTC We reattempted switching the links 10G on the new infra but we are facing always difficulties. We are switching back to the old configuration unless rbx-1 <> rbx-2 which is the only link running correctly via this new infra.
Date: 2010-07-31 00:04:30 UTC Defected links are now repaired. We are beneficing to repair other defected links.
Date: 2010-07-31 00:01:41 UTC We are starting the maintenance.
Date: 2010-07-31 00:01:19 UTC Defected links repairing will take place tonight from 23:00. Regarding the way we are improving in this part, we clutch on the switching the routing links on the new internal routing switches.
Date: 2010-07-30 23:57:36 UTC We located some problems on the link rbx-1<>vss-2 before even the start of the switching. We established a fiber temporarily and we expect a maintenance intervention so as to repair it once at all.
We are measuring an abnormal high attenuation on the links vss-2 <> rbx-99 that we would fix.
Date: 2010-07-30 23:53:20 UTC We are switching the links rbx-1<>vss-2 and rbx-2 <> vss-1
Date: 2010-07-30 23:52:48 UTC We modified the MTU configuration of the N5 switches and switched the link rbx-1<>rbx-2 above. The BGP session is actually stable. we are going to switch progressively the other links.
Date: 2010-07-30 23:50:38 UTC the problem is probably due to MTU which is XXXXX managed on N5
the XXXX to replace by \"bad\", \"differently\", etc
Date: 2010-07-30 23:49:03 UTC Lost.
We will return the links back as before and we will forward the bugs to Cisco ...
Date: 2010-07-30 23:47:48 UTC We believe the CRC problems caused by non compatible optics (!?) between Cisco N5 and Cisco 6509 ...
We are retesting.
Date: 2010-07-30 23:45:52 UTC Maintenances are not running well. We have the CRC between the routers. We returned to the initial setting. With more pains because of bugs:
rbx-99-6k#sh inter ten 9/1
[...]
30 second output rate 90000 bits/sec, 98 packets/sec
[...]
No way to pass the traffic.
rbx-99-6k#conf t
Enter configuration commands, one per line. End with CNTL/Z.
rbx-99-6k(config)#inter ten 9/1
rbx-99-6k(config-if)#shutdown
rbx-99-6k(config-if)#no shutdown
rbx-99-6k#sh inter ten 9/1
[...]
30 second output rate 2345596000 bits/sec, 384765 packets/sec
[...]
This is what we call a nice bug which wastes 2h at night.