ippl has been running for a few weeks, and it has stopped logging
packets. It appears to have hanged or gone to sleep.
Jun 18 23:50:37 captain-flack ippl: ICMP message type echo request from troop.granta.internal [10.0.0.15]
Jun 18 23:50:52 captain-flack ippl: ICMP message type echo request from troop.granta.internal [10.0.0.15]
Jun 18 23:51:00 captain-flack ippl: netbios-ssn connection attempt from troop.granta.internal [10.0.0.15]
Jun 18 23:51:03 captain-flack ippl: imap2 connection attempt from troop.granta.internal [10.0.0.15]
Jun 18 23:51:05 captain-flack ippl: smtp connection attempt from troop.granta.internal [10.0.0.15]
Jun 18 23:51:07 captain-flack ippl: ICMP message type echo request from troop.granta.internal [10.0.0.15]
Jun 18 23:51:22 captain-flack ippl: ICMP message type echo request from troop.granta.internal [10.0.0.15]
Jun 18 23:51:37 captain-flack ippl: ICMP message type echo request from troop.granta.internal [10.0.0.15]
Jun 18 23:51:52 captain-flack ippl: ICMP message type echo request from troop.granta.internal [10.0.0.15]
Jun 18 23:51:58 captain-flack ippl: ssh connection attempt from (--me at home--)
After this, silence for days until I noticed. The regular pings and
probes from the 'mon' on troop continue as normal but are not logged.
Last time this happened, a few weeks ago, the last ippl event was an
ftp probe from somewhere, but there is no ftp server here.
All the network interfaces attached and in use (eth0, eth1, ippp0, lo,
sl0) appear to be functioning normally.
Until recently, ippl had run reliably for months. I had changed the
config file (details below), and I believe I did this before the first
hang .. but the logs have expired so I don't know for sure. I believe
my new config is sane.
`pstree -p` reports
init(1)-+-apache(25841)-+-apache(20278)
[...]
|-ippl(18629)---ippl(18630)-+-ippl(18631)
| |-ippl(18632)
| `-ippl(18633)
[...]
This looks normal. `ps lxa` reports
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
[...]
040 0 18629 1 9 0 7820 688 rt_sig S ? 0:00 /usr/sbin/ippl
040 0 18630 18629 9 0 7820 688 poll S ? 0:00 /usr/sbin/ippl
140 65534 18631 18630 9 0 7820 688 rt_sig S ? 0:19 /usr/sbin/ippl
140 65534 18632 18630 9 0 7820 688 rt_sig S ? 2:04 /usr/sbin/ippl
140 65534 18633 18630 9 0 7820 688 rt_sig S ? 0:30 /usr/sbin/ippl
[...]
I don't know what 'rt_sig' is, and Google isn't being very helpful. I
can investigate further but I'm hoping the cause will be more obvious
to someone who knows the code.
Running strace on the lot (I think I can do this, I'm not a great
strace user),
matthew@captain-flack:~$ sudo strace -p 18629 -p 18630 -p 18631 -p 18632 -p 18633
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
[pid 18633] --- SIGSTOP (Stopped (signal)) ---
[pid 18630] --- SIGSTOP (Stopped (signal)) ---
(I believe these stop states were caused by strace sessions from
moments before, which I then killed)
[pid 18630] getppid() = 18629
[pid 18630] poll([{fd=5, events=POLLIN}], 1, 2000) = 0
[pid 18630] getppid() = 18629
[pid 18630] poll([{fd=5, events=POLLIN}], 1, 2000) = 0
[pid 18630] getppid() = 18629
[pid 18630] poll([{fd=5, events=POLLIN}], 1, 2000) = 0
[pid 18630] getppid() = 18629
[pid 18630] poll([{fd=5, events=POLLIN}], 1, 2000) = 0
[pid 18630] getppid() = 18629
[pid 18630] poll([{fd=5, events=POLLIN}], 1, 2000) = 0
[pid 18630] getppid() = 18629
[pid 18630] poll([{fd=5, events=POLLIN}], 1, 2000) = 0
[pid 18630] getppid() = 18629
The other pids are quiet, even when I strace them directly by
themselves.
fd 5 is the first end of a pipe, according to /proc/18630/fd .
In the interest of making something happen, I started throwing signals
at all the processes.
`sudo kill -HUP 18630 18631 18632 18633 18629` at first caused an
earlier `strace -p 18629` to report "--- SIGHUP (Hangup) ---" and
exit. Oops, I thought they'd all gone away already.
Further -HUPs did nothing, nor did -0 or -CONT. -ALRM got a response:
[...]
[pid 18630] poll([{fd=5, events=POLLIN}], 1, 2000) = 0
[pid 18630] getppid() = 18629
[pid 18630] poll( <unfinished ...>
[pid 18632] --- SIGALRM (Alarm clock) ---
[pid 18632] +++ killed by SIGALRM +++
[pid 18630] <... poll resumed> [{fd=5, events=POLLIN}], 1, 2000) = 0
[pid 18630] getppid() = 1
[pid 18630] kill(18633, SIGKILL <unfinished ...>
[pid 18633] +++ killed by SIGKILL +++
[pid 18630] <... kill resumed> ) = 0
[pid 18630] kill(18632, SIGKILL) = 0
[pid 18630] kill(18631, SIGKILL <unfinished ...>
[pid 18631] +++ killed by SIGKILL +++
<... kill resumed> ) = 0
_exit(0) = ?
and the ippl processes have all gone away.
I've restarted it and it now works:
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
[...]
040 0 31369 1 9 0 7816 888 nanosl S ? 0:00 /usr/sbin/ippl
040 0 31370 31369 9 0 7816 888 poll S ? 0:00 /usr/sbin/ippl
140 65534 31371 31370 9 0 7816 888 wait_f S ? 0:00 /usr/sbin/ippl
140 65534 31372 31370 9 0 7816 888 wait_f S ? 0:00 /usr/sbin/ippl
140 65534 31373 31370 9 0 7816 888 wait_f S ? 0:00 /usr/sbin/ippl
Reportbug didn't include the config file, not sure why.
matthew@captain-flack:~$ grep -vE '^#' /etc/ippl.conf
resolve all
run all
ignore icmp type echo_reply
log options ident,resolve tcp port telnet
ignore udp port domain
ignore udp srcport domain
ignore udp to 10.0.0.255
ignore udp port 9163
ignore tcp port pop3 from 10.0.0.0/24 to 10.0.0.2
ignore tcp port 3128 from 10.0.0.0/24 to 10.0.0.2
(blank lines removed)
Comparing against the backup config,
matthew@captain-flack:~$ grep -vE '^#' /etc/ippl.conf | perl -ne 'print if /\S/' > /tmp/conf
matthew@captain-flack:~$ grep -vE '^#' /etc/ippl.conf~ | perl -ne 'print if /\S/' > /tmp/conf~
matthew@captain-flack:~$ touch -r /etc/ippl.conf~ /tmp/conf~
matthew@captain-flack:~$ touch -r /etc/ippl.conf /tmp/conf
matthew@captain-flack:~$ diff -u /tmp/conf{~,}
--- /tmp/conf~ Tue Jan 1 10:51:08 2002
+++ /tmp/conf Wed May 15 14:50:30 2002
@@ -1,3 +1,10 @@
-run icmp tcp
+resolve all
+run all
ignore icmp type echo_reply
log options ident,resolve tcp port telnet
+ignore udp port domain
+ignore udp srcport domain
+ignore udp to 10.0.0.255
+ignore udp port 9163
+ignore tcp port pop3 from 10.0.0.0/24 to 10.0.0.2
+ignore tcp port 3128 from 10.0.0.0/24 to 10.0.0.2