rssLink RSS for all categories
 
icon_red
icon_green
icon_red
icon_red
icon_blue
icon_green
icon_green
icon_red
icon_red
icon_red
icon_orange
icon_green
icon_green
icon_green
icon_green
icon_blue
icon_green
icon_orange
icon_red
icon_green
icon_red
icon_red
icon_green
icon_red
icon_red
icon_red
icon_red
icon_orange
icon_green
 

FS#282 — FS#4366 — backbone

Attached to Project— Network
Maintenance
Whole Network
CLOSED
100%
We had a BGP incident on the backbone which concerned many OVH's principal backbone routers between 5:30 and 6:00. Actually, all is back to normal. We are searching the problem's origin.
Date:  Friday, 16 July 2010, 18:06PM
Reason for closing:  Done
Comment by OVH - Monday, 12 July 2010, 08:40AM

The system antiscan is the origin of the problem.


Comment by OVH - Monday, 12 July 2010, 12:40PM

The server which manages the grouping of the scan alerts
has saturated the disc space on one of the partitions.
/dev/md0 71679728 71679728 0 100% /home.2
We check to see why there was suddenly much registered
information.

The scripts which introduce the access-list
on the routers were expected to manage this case of
situations.
7380 + Jul 12 05:02:11 root ( 1) antiscan /home/antiscan/check2router.pl
7381 N + Jul 12 05:02:18 root ( 1) antiscan /home/antiscan/check2router.pl
7382 N + Jul 12 05:02:25 root ( 1) antiscan /home/antiscan/check2router.pl
7383 N + Jul 12 05:02:32 root ( 1) antiscan /home/antiscan/check2router.pl
7384 N + Jul 12 05:02:39 root ( 1) antiscan /home/antiscan/check2router.pl

writing problem /home/antiscan//access-list/access-list-ovh.1278903731
writing problem /home/antiscan//access-list/access-list-route.1278903738
writing problem /home/antiscan//access-list/access-list-route.1278903745

The problem is that another script has taken the information which were
partially written and has made the "diff" and modified the access-list
on the routers. We have also a protection with "permit ip any any"
which were not visibly added automatically on the output
on the routers.

The consequence is that Ovh was isolated from the Internet network on
Jul 12 05:31:54

The system has corrected the access-list on
Jul 12 06:01:31
in a way that OVH was again accessible via the internet.

there, we have taken a look at the origin of the problem but we did not
have much time to fix it ... because

on
Jul 12 06:45:13
the system has isolated OVH again from the internet.

We had to come to the office in order to be able to get connected to the
internal network and in order to take off the access-list of 4 principal
routers at Paris.

On
Jul 12 07:10:40
the situation was fixed.

Then,
Jul 12 07:15:43
the access-list were completed on the other routers
in a way that it functions again on the backbone.

The situation is stabilised. We are taking a look at
the logs in order to understand the order of things then correct
the scripts with this type of problems.


Comment by OVH - Monday, 12 July 2010, 12:41PM

All in all, OVH was isolated twice this morning
30 minutes (Jul 12 05:31:54 / Jul 12 06:01:31)
and
25 minutes (Jul 12 06:45:13 / Jul 12 07:10:40)