rssLink RSS for all categories
 
icon_red
icon_green
icon_red
icon_red
icon_blue
icon_green
icon_green
icon_red
icon_red
icon_red
icon_orange
icon_green
icon_green
icon_green
icon_green
icon_blue
icon_green
icon_orange
icon_red
icon_green
icon_red
icon_red
icon_green
icon_red
icon_red
icon_red
icon_red
icon_orange
icon_green
 

FS#11589 — FS#15680 Microsoft-Exchange 2013/2016

Attached to Project— E-mail
Incident
exchange
In progress
100%
Microsoft Exchange 2013 and 2016 have been affected by a network problem. You may have trouble accessing your account. We are investigating this issue.
Comment by OVH - Tuesday, 08 December 2015, 17:30PM

Smartphones and OWA are still working as normal. Outlook customers may still experience some problems.


Comment by OVH - Wednesday, 09 December 2015, 11:42AM

Problems of access to Outlook persist, we investigate, OWA and Activesync access are still functional.


Comment by OVH - Wednesday, 09 December 2015, 17:17PM

The network infrastructure is patched. The access will gradually will return.


Comment by OVH - Thursday, 10 December 2015, 11:06AM

Access the mailbox problems remain on Hosted offer, we-have Identified the cause.

We set up a technical solution to restore an optimal quality of service.

Sorry for the inconvenience.


Comment by OVH - Thursday, 10 December 2015, 16:21PM

We did a lot of work on the loadbalancer and front-end servers. The network configuration has been optimized on all items.

For now, we are seeing strong improvements on MAPI and RPC protocols. ActiveSync is working correctly.

OWA does not work, we seek the cause.

The overall load was distributed and your Outlook connections are expected to recover gradually.


Comment by OVH - Thursday, 10 December 2015, 16:36PM

Retrieving messages on Outlook slowed. We seek the point or points of congestion.


Comment by OVH - Thursday, 10 December 2015, 18:23PM

The service is stable and running again EXCEPT Webmail (OWA). To fix it, we have to disable MAPI (which we activated yesterday to try to fix the problems linked to the saturation of Load Balancing yesterday).

We'll have to restart the Exchange service with a service outage of 3-5 minutes and reconnection mass of all customers. We currently have over 150K
simultaneous open connections and we do not wish to perform this during working hours day. We will do it at 10 p.m. tonight (FR Time) when the load will be less.


Comment by OVH - Friday, 11 December 2015, 10:36AM

Dear Customer,

For 48 hours, we have had stability issues with our Exchange 2010/2013 infrastructure. A series of problems meant that some of our customers' experienced a partial or full outage of Exchange for a few hours today and yesterday.

Yesterday (the 9th of December) the Network team changed the network configuration in order to prepare to start work on the Paris backbone. This alteration led to poor load distribution between our Paris and Roubaix load balancing system. All connections went to Roubaix instead of going directly to our Paris datacentre. As a consequence the load balancers became overloaded and all our services which rely on Roubaix load balancers, including Exchange 2010/2013, experienced problems.

It took us a while to diagnose this problem because of poor internal communication. The different teams were not aware of the change in configuration and they each team tried to resolve the problem on their own. The Exchange team decided to activate MAPI which is less sensitive to network failure. After doing this the service worked better but not perfectly. Finally, the Network team fixed the network configuration which restored Exchange and by the end of the day everything was working well again.

This morning we started to experience problems again but this time it was down to CPU usage. Because MAPI consumes 2X more CPU, the Exchange team decided to add 2X more resource to the cluster in order to support MAPI. After adding this resource, the service stabilised and everything worked apart from webmail. It turns out that the combination of OWA + MAPI + Exchange 2010/2013 does not work for some of our customers.

To fix this problem we are going to roll back and remove MAPI, which we plan to do at 10pm tonight. There will be a service interruption for 3-5 minutes and then everything should work as it did before the incident...

We are very sorry for the Exchange outage which was caused by human error and a series of poor decisions. The devil is in the detail as they say and better coordination between our teams could have prevented this problem.

Regards
Octave