[freebsd] apache stops serving request between 1:45 and 2:10 AM

patpro ~ Patrick Proniewski

2012-08-23 07:21:30 UTC

(xpost comp.unix.bsd.freebsd.misc alt.apache.configuration
(fu2 comp.unix.bsd.freebsd.misc

Hello,

I have a very weird problem on a FreeBSD 8.3 server (a VM on ESXi v5).
This server is running Apache 2.2, php 5.4, on top of ZFS compressed
zpool. It hosts around 260 web sites, and the zpool is divided in almost
as many ZFS volumes.
For few weeks now, Apache stops serving requests between 1:45 AM and
2:10 AM (+/- few minutes). My monitoring (nagios/centreon, munin) shows
clearly that http connection fails, but nothing else breaks.

Bandwidth consumption falls to almost 0, and a peak in process count
occurs (httpd processes). IO on the zfs pool falls to 0, but the
monitoring of the VM and underlying SAN storage is all green.

I don't have any script/crontab/periodic running in this particular time
window. And other tier servers still work OK too (1 MySQL server for PHP
and one NFS server for /usr/ports)

I've monitored the output of sockstat during last night and the parsed
output is here:

time ? www
01:40 57 8
01:42 61 2
01:44 36 4 -- problem starts
01:46 74 0
01:48 151 0
01:50 201 0
01:52 198 0
01:54 197 0
01:56 197 0
01:58 193 0
02:00 193 0
02:02 193 0
02:04 195 0
02:06 193 0
02:08 194 0
02:10 193 0
02:12 193 0 -- problem ends
02:14 114 7
02:16 93 8
02:18 96 12
02:20 63 9

time: time of the measurement
?: open socket not associated with any file descriptor, but on port 80.
They correspond to FIN_WAIT_*, LAST_ACK, TIME_WAIT netstat entries.
www: open socket owned by www. They correspond to ESTABLISHED, CLOSED,
SYN_SENT netstat entries.

So basically, between 1:45 and 2:10, every nights, my Apache is
saturated with FIN_WAIT_*, LAST_ACK, and TIME_WAIT sockets. And I can't
find why, neither I can't find a proper way to investigate this issue.

Apache logs show nothing but a drop in activity (0 served requests),
system logs and remote monitoring show nothing special (no timeout, no
network problem)

Any idea?

patpro