FS#15680 Microsoft-Exchange 2013/2016

OVHcloud Web Hosting Status

Current status

Legend

Operational
Degraded performance
Partial Outage
Major Outage
Under maintenance

Incident Report for Web Cloud

Resolved

Microsoft Exchange 2013 and 2016 have been affected by a network problem. You may have trouble accessing your account. We are investigating this issue.

Update(s):

Date: 2015-12-11 09:36:48 UTC
Dear Customer,

For 48 hours, we have had stability issues with our Exchange 2010/2013 infrastructure. A series of problems meant that some of our customers' experienced a partial or full outage of Exchange for a few hours today and yesterday.

Yesterday (the 9th of December) the Network team changed the network configuration in order to prepare to start work on the Paris backbone. This alteration led to poor load distribution between our Paris and Roubaix load balancing system. All connections went to Roubaix instead of going directly to our Paris datacentre. As a consequence the load balancers became overloaded and all our services which rely on Roubaix load balancers, including Exchange 2010/2013, experienced problems.

It took us a while to diagnose this problem because of poor internal communication. The different teams were not aware of the change in configuration and they each team tried to resolve the problem on their own. The Exchange team decided to activate MAPI which is less sensitive to network failure. After doing this the service worked better but not perfectly. Finally, the Network team fixed the network configuration which restored Exchange and by the end of the day everything was working well again.

This morning we started to experience problems again but this time it was down to CPU usage. Because MAPI consumes 2X more CPU, the Exchange team decided to add 2X more resource to the cluster in order to support MAPI. After adding this resource, the service stabilised and everything worked apart from webmail. It turns out that the combination of OWA + MAPI + Exchange 2010/2013 does not work for some of our customers.

To fix this problem we are going to roll back and remove MAPI, which we plan to do at 10pm tonight. There will be a service interruption for 3-5 minutes and then everything should work as it did before the incident...

We are very sorry for the Exchange outage which was caused by human error and a series of poor decisions. The devil is in the detail as they say and better coordination between our teams could have prevented this problem.

Regards
Octave

Date: 2015-12-10 17:23:53 UTC
The service is stable and running again EXCEPT Webmail (OWA). To fix it, we have to disable MAPI (which we activated yesterday to try to fix the problems linked to the saturation of Load Balancing yesterday).

We'll have to restart the Exchange service with a service outage of 3-5 minutes and reconnection mass of all customers. We currently have over 150K
simultaneous open connections and we do not wish to perform this during working hours day. We will do it at 10 p.m. tonight (FR Time) when the load will be less.

Date: 2015-12-10 15:36:39 UTC
Retrieving messages on Outlook slowed. We seek the point or points of congestion.

Date: 2015-12-10 15:21:12 UTC
We did a lot of work on the loadbalancer and front-end servers. The network configuration has been optimized on all items.

For now, we are seeing strong improvements on MAPI and RPC protocols. ActiveSync is working correctly.

OWA does not work, we seek the cause.

The overall load was distributed and your Outlook connections are expected to recover gradually.

Date: 2015-12-10 10:06:42 UTC
Access the mailbox problems remain on Hosted offer, we-have Identified the cause.

We set up a technical solution to restore an optimal quality of service.

Sorry for the inconvenience.

Date: 2015-12-09 16:17:45 UTC
The network infrastructure is patched. The access will gradually will return.

Date: 2015-12-09 10:42:48 UTC
Problems of access to Outlook persist, we investigate, OWA and Activesync access are still functional.

Date: 2015-12-08 16:30:25 UTC
Smartphones and OWA are still working as normal. Outlook customers may still experience some problems.

Posted Dec 08, 2015 - 16:28 UTC