FS#812 — FS#4847 — HG, under windows
Attached to Project— Dedicated servers
Incident | |
RBX2 | |
CLOSED | |
![]() |
We have some HG, apparently under windows that does not ping
since 6h36. We continue to seek the origin of the problem.
Date: Thursday, 18 November 2010, 16:41PMsince 6h36. We continue to seek the origin of the problem.
Reason for closing: Done
We have tried a different re-configuration of the port. it does not
work. We have recovered a server by changing the switch port.
It seems that it is a bug in the switch system.
We will see if we can recover the servers by restarting the
switch.
Same thing.
We will therefore change the ports for the 7 HG servers under Windows
which no longer function.
It does not work.
We will update the switch in order to see if it will fix the problem.
We will restart the switch.
Meanwhile, we have looked internally for similar problems
and apparently we had problems on the linux on 10G. we
had to introduce specific procedures in order to run
the linux with the choice of SFP+ cables and the network cards
due to incompatibilities. We did not have this problem
under windows.
Thus, we will see at the same time if this problem is not the same
under linux but this happens many times after the introduction of
windows and under a network. very weird.
The boot of the switch has started.
sw-n5-14.242# install all kickstart bootflash:n5000-uk9-kickstart.4.2.1.N1.1.bin system bootflash:n5000-uk9.4.2.1.N1.1.bin
Verifying image bootflash:/n5000-uk9-kickstart.4.2.1.N1.1.bin for boot variable "kickstart".
[####################] 100% -- SUCCESS
Verifying image bootflash:/n5000-uk9.4.2.1.N1.1.bin for boot variable "system".
[####################] 100% -- SUCCESS
Verifying image type.
[####################] 100% -- SUCCESS
Extracting "system" version from image bootflash:/n5000-uk9.4.2.1.N1.1.bin.
[####################] 100% -- SUCCESS
Extracting "kickstart" version from image bootflash:/n5000-uk9-kickstart.4.2.1.N1.1.bin.
[####################] 100% -- SUCCESS
Extracting "bios" version from image bootflash:/n5000-uk9.4.2.1.N1.1.bin.
[####################] 100% -- SUCCESS
Notifying services about system upgrade.
[####################] 100% -- SUCCESS
Compatibility check is done:
Module bootable Impact Install-type Reason
------ -------- -------------- ------------ ------
1 yes disruptive reset Reset due to single supervisor
Images will be upgraded according to following table:
Module Image Running-Version New-Version Upg-Required
------ ---------- ---------------------- ---------------------- ------------
1 system 4.1(3)N2(1) 4.2(1)N1(1) yes
1 kickstart 4.1(3)N2(1) 4.2(1)N1(1) yes
1 bios v1.3.0(09/08/09) v1.3.0(09/08/09) no
1 power-seq v1.2 v1.2 no
Switch will be reloaded for disruptive upgrade.
Do you want to continue with the installation (y/n)? [n] y
Install is in progress, please wait.
Setting boot variables.
[####################] 100% -- SUCCESS
Performing configuration copy.
[####################] 100% -- SUCCESS
Module 1: Refreshing compact flash and upgrading bios/loader/bootrom/power-seq.
Warning: please do not remove or power off the module at this time.
Note: Power-seq upgrade needs a power-cycle to take into effect.
On success of power-seq upgrade, SWITCH OFF THE POWER to the system and then, power it up.
[####################] 100% -- SUCCESS
Finishing the upgrade, switch will reboot in 10 seconds.
sw-n5-14.242#
Broadcast message from root (Thu Nov 18 10:26:57 2010):
The system is going down for reboot NOW!
2010 Nov 18 10:26:57 sw-n5-14.242 %KERN-0-SYSTEM_MSG: writing reset reason 31, - kernel
The switch is up-to-date. It does not work.
Now, there is still the hardware problems. We will intervene to change
the hardware.
The servers push well the MAC on the network, but it does not function.
53 windows in the racks 27XXX on the network in question,
there are only 18 which do not function. They use dhcp
to boot.
We will change the network cards of one of the servers to see if it will
fix the problem.
The origin of the problem was found. Tonight, the teams which
take care of the introduction of new servers has put in place
the new HG servers. They have taken by mistake the IP
of the DHCP servers. This has caused the crashing of all of the HG servers
which use DHCP.
The lack of communication between the internal teams in the same
data centre is at the origin of this problem. We will fix
this communication problem. We will introduce a DHCP
external to the network. Then, we will refund the customers impacted by
the crash.