The monitoring system has detected a large quantity of faulty hosts.
We will investigate.
Update(s):
Date: 2014-07-18 08:19:22 UTC The bug impacted some of the remaining servers in the infrastructure.
Tomorrow all the remaining servers with this version of driver will be rebooted in order to update the network driver and to ensure that they are no longer impacted.
Date: 2014-07-02 17:08:15 UTC We are checking the entire infrastructure to see if there are any other hosts affected by this update.
Date: 2014-07-02 17:07:26 UTC All host servers are up-to-date and the tickets concerning the impacted machines have been opened.
We now need to reboot the host servers to apply the driver update.
Date: 2014-07-02 13:09:11 UTC The first host servers are up-to-date.
A reboot is necessary to apply the update.
We will open a ticket for the relevant host servers.
Date: 2014-07-02 13:07:44 UTC The ESXi version is not relevant, there are still a few host servers that have the bugged version of the driver.
Date: 2014-07-02 10:31:49 UTC The same issue has just arisen.
We are currently checking all the hosts and controlling the host drivers.
Date: 2014-06-27 08:28:37 UTC VMware engineering found corrupted data in the headings of the frames networks.
The exact reason for the corruption is unknown but it originates for the Intel IGB driver.
The current versions of Firmware and Driver are not the latest and we will proceed with an update of the drivers.
Logs analysis: (Bug Id 1272069)
The PSOD is due to that the head pointer of (&(container->slabInfo[2].pktList))->csList is corrupted.
[esx-host3922.ovh.net-2014-06-18--09.04]
(gdb) f 4
#4 PktContainerGetPkt (slabType=PKT_SLAB_HIGH_MEM, container=0x410004c49f00, index=2) at bora/vmkernel/net/pkt.c:3733
3733 entry = PktList_PopHead(&(container->slabInfo[index].pktList));
(gdb) p container
$11 = (PktContainer *) 0x410004c49f00
(gdb) p &(container->slabInfo[index].pktList)
$12 = (PktList *) 0x410004c49fa8
(gdb) p ((PktList *) 0x410004c49fa8)->csList
$13 = {
slist = {
head = 0x61646e656974656c, tail = 0x4100085e4980
},
numElements = 11
}
Date: 2014-06-26 15:56:57 UTC The root of the problem has been found.
\"Engineering have analyzed the dumps and found that the PSOD's were due to corruption which originated from the igb network driver.\"
We will escalate the SR in order to find the root of the corruption.
Date: 2014-06-18 11:32:51 UTC We have opened an SR with VMware for the root cause analysis.
A diagnostic is in progress.
Date: 2014-06-18 11:32:18 UTC All servers have been checked and rebooted.
We are checking that back to normal on the monitoring system.
Date: 2014-06-18 11:32:07 UTC Over half the affected hosts have been checked and rebooted.
The intervention is in progress.
Date: 2014-06-18 11:29:44 UTC The affected hosts all appear to be in version 5.0update1.
There are in purple screen state.
They are being rebooted.