Hi, we are using net-acct for our ISP accounting system to collect the data. We have been using net-acct for quite some time on a variety of systems. Recently, we have been experiencing problems that net-acct stops writing out data and takes all available CPU. With debugging options set, we get logs that look like this: 18/09 21:06:24 [MISC ] Total of 24464 entries 18/09 21:06:24 [STATE] dump child: synchronized with parent 18/09 21:06:24 [STATE] * dump process 10525 forked 18/09 21:06:24 [STATE] parent: synchronized with dump child 18/09 21:06:24 [STATE] dumppid is 10525 18/09 21:06:24 [STATE] lck = 0 18/09 21:06:24 [MISC ] Split into 8849 [hold] and 15615 [write] = 24464 [total] entries 18/09 21:06:24 [STATE] write child: synchronized with parent 18/09 21:06:24 [STATE] * write process 10526 forked 18/09 21:06:24 [STATE] parent: synchronized with write child 18/09 21:06:24 [STATE] writepid is 10526 18/09 21:06:24 [STATE] done freeing 18/09 21:06:25 [STATE] * dump finished, count = 8849 18/09 21:06:25 [STATE] -> got signal 17, handling 18/09 21:06:25 [SIG ] waitpid returned 10525, status = 0, errno = 4 18/09 21:06:25 [STATE] set dumping to 0 18/09 21:06:25 [STATE] child 10525 signaled return 18/09 21:06:25 [STATE] <- got signal 17, done handling 18/09 21:06:25 [STATE] * write finished, count = 15615 18/09 21:06:25 [STATE] -> got signal 17, handling 18/09 21:06:25 [SIG ] waitpid returned 10526, status = 0, errno = 4 18/09 21:06:25 [STATE] set writing to 0 18/09 21:06:25 [STATE] child 10526 signaled return 18/09 21:06:25 [SIG ] waitpid returned -1, status = 0, errno = 10 18/09 21:06:25 [STATE] <- got signal 17, done handling 18/09 21:07:24 [STATE] write_log called 18/09 21:07:24 [STATE] flushing delayed (writing == 0, lck == 1, may_write == 1) 18/09 21:08:25 [STATE] write_log called 18/09 21:08:25 [STATE] flushing delayed (writing == 0, lck == 1, may_write == 1) 18/09 21:09:24 [STATE] write_log called 18/09 21:09:24 [STATE] flushing delayed (writing == 0, lck == 1, may_write == 1) 18/09 21:10:24 [STATE] write_log called 18/09 21:10:24 [STATE] flushing delayed (writing == 0, lck == 1, may_write == 1) 18/09 21:11:24 [STATE] write_log called 18/09 21:11:24 [STATE] flushing delayed (writing == 0, lck == 1, may_write == 1) 18/09 21:12:24 [STATE] write_log called 18/09 21:12:24 [STATE] flushing delayed (writing == 0, lck == 1, may_write == 1) 18/09 21:13:24 [STATE] write_log called 18/09 21:13:24 [STATE] flushing delayed (writing == 0, lck == 1, may_write == 1) 18/09 21:14:24 [STATE] write_log called 18/09 21:14:24 [STATE] flushing delayed (writing == 0, lck == 1, may_write == 1) 18/09 21:15:24 [STATE] write_log called 18/09 21:15:24 [STATE] flushing delayed (writing == 0, lck == 1, may_write == 1) 18/09 21:16:24 [STATE] write_log called 18/09 21:16:24 [STATE] flushing delayed (writing == 0, lck == 1, may_write == 1) 18/09 21:17:25 [STATE] write_log called 18/09 21:17:25 [STATE] flushing delayed (writing == 0, lck == 1, may_write == 1) 18/09 21:18:24 [STATS] ignored: 2889478 netignored: 0 local:0 ip:1340990 unenc:0 notdev:0 dropped:0 18/09 21:18:24 [STATS] udp: 236757 tcp:1075249 icmp:24883 other:4101 This usually repeats ad infinitum, but in this rare case, the daemon got back on track, eventually writing out the data. We have inspected the code and didn't find any occurences where lck is written to without debug output, but obviously, lck was set to 1 somewhen between 21:06:24 and 21:07:24, and reset between 21:17:25 and 21:18:24. This behavior can be reproduced by setting "large" timeout values for flush (300) and fdelay (120). We successfully backported sid's net-acct 0.71-2 to potato, but the bug shows here as well. There does not seem to be a public mailing list about net-acct, so the BTS probably is the only way to get in touch with knowledgeable people. Any ideas what might be going wrong here? Greetings Marc
Please find attached a patch that lets write_log return a value
indicating if it was successfully able to write. Only after a
successful write, the next write cycle is scheduled. This way, with
lck == 1, you get one attempt to acquire the lock per second instead
per flush seconds.
With this patch, we are now back on track, and it rarely happens that
it takes longer than ten seconds (==tries) to acquire the lock.
Maybe, it would be a good idea to have write_log called from a place
where lck is definetely zero, for example from the end of
register_packet. But I refrained from doing so because I was afraid of
overlooking some aspects of the asynchronous function call and the
forks happening during the write. It is up to people knowing the code
better to comment about this. The proposed patch to register_packet is
in the patch, but commented out.
Greetings
Marc
patch follows
diff -ur net-acct-0.71-glibc2/src/daemon.c net-acct-0.71/src/daemon.c
--- net-acct-0.71-glibc2/src/daemon.c Thu May 7 16:10:38 1998
+++ net-acct-0.71/src/daemon.c Thu Sep 20 17:08:57 2001
@@ -42,10 +42,15 @@
{
DEBUG(DBG_STATE, sprintf(dbg, "entering daemon_stop\n"));
- write_log(1);
-
- DEBUG(DBG_STATE, sprintf(dbg, "wrote final log\n"));
-
+ if( write_log(1) == 0 )
+ {
+ DEBUG(DBG_STATE, sprintf(dbg, "wrote final log\n"));
+ }
+ else
+ {
+ DEBUG(DBG_STATE, sprintf(dbg, "writing final log unsuccessful\n"));
+ };
+
unlink(PID_FILE);
DEBUG(DBG_STATE, sprintf(dbg, "unlinked PID_FILE\n"));
diff -ur net-acct-0.71-glibc2/src/netacct.h net-acct-0.71/src/netacct.h
--- net-acct-0.71-glibc2/src/netacct.h Sun Sep 30 19:46:26 2001
+++ net-acct-0.71/src/netacct.h Thu Sep 20 16:40:00 2001
@@ -239,7 +239,7 @@
/* process.c */
void register_packet(unsigned long int src,unsigned long int dst, unsigned char proto, unsigned short srcport, unsigned short dstport, int size, char *devname, char *user);
-void write_log(int force);
+int write_log(int force);
void alarm_handler(int sig);
void child_finished(int sig);
void signal_debug(int sig);
diff -ur net-acct-0.71-glibc2/src/process.c net-acct-0.71/src/process.c
--- net-acct-0.71-glibc2/src/process.c Sun Sep 30 19:46:26 2001
+++ net-acct-0.71/src/process.c Thu Sep 20 18:23:10 2001
@@ -47,6 +47,7 @@
int err_delay, max_err_delay;
volatile time_t now;
+static time_t next_write_log = 0;
/* statistics */
unsigned int list_compares, list_lookups;
@@ -97,12 +98,17 @@
may_write = 0;
save2 = lck;
lck = 1;
- /* end */
+ /* end */
+
+ DEBUG(DBG_STATE, sprintf(dbg, "reopen_socket [before]: lck = 1\n"));
exit_capture();
init_capture();
lck = save2;
+
+ DEBUG(DBG_STATE, sprintf(dbg, "reopen_socket [after]: lck = %d\n",save2));
+
may_write = save1;
}
@@ -230,6 +236,15 @@
{
packets->dropped++;
}
+/*
+ if(now >= next_write_log)
+ {
+ if( write_log(0) == 0)
+ {
+ next_write_log = now + cfg -> flush;
+ }
+ }
+ */
}
int do_write_list(FILE *f, struct ipdata *list[])
@@ -585,7 +600,6 @@
void alarm_handler(int sig)
{
static time_t last_check = 0;
- static time_t next_write_log = 0;
DEBUG( ((sig == SIGALRM) ? DBG_ANNOYING : (DBG_SIGNAL | DBG_STATE)), sprintf(dbg, "got signal %d, handling\n", sig));
@@ -609,18 +623,21 @@
if(now >= next_write_log)
{
- write_log(0);
- next_write_log = now + cfg -> flush;
+ if( write_log(0) == 0)
+ {
+ next_write_log = now + cfg -> flush;
+ }
}
alarm(1);
}
Hi Marc, I noticed your patch beeing published on exorsus.net. Do you get any feedback back? Is it ok to include it in the next net-acct version, or do you have a refined version for me? Greetings Bernd
No, never heard about it again. But the patched version seems to work fine on our accounting box. Please use it if you want to, and there have been no changes against the patch submitted to the bug report yet. Greetings Marc
Hi,
I think we hit this bug and I think I figured out why it was
happening tonight but I am yet to test my theory. I think
the problem is when the alarm() goes off, SIGALRM is received and the alarm
handler is called while lck = 1. The alarm handler calls write_log
if(now >= next_write_log)
{
write_log(0);
next_write_log = now + cfg -> flush;
}
alarm(1);
}
which cannot do much because lck still equals 1 ie fails this check:
process.c: 616: else if((writing == 0) && (lck == 0) && (may_write == 1)) /* delay if another write cycle is still in progress */
and ends up:
process.c: 722 else
process.c: 723 {
process.c: 724 DEBUG(DBG_STATE, sprintf(dbg, "flushing delayed (writing == %d, lck == %d, may_write == %d)\n",writing,lck,may_write));
process.c: 725 }
The thing is, because lck is 1, it can never reach the bit of the code where lck is set to 0 and so nothing
is written to the net-acct.log file until you restart nacctd.
This is just a theory at the moment. I intend on trying the patch in this bug and my own patch
over the next week and can provide feedback to this bug. Additionally, if people can provide
feedback about my analysis, feel free to email me.
Best regards,
Ben Marsh
Sehr geehrte/r Arbeitsuchender, folgender Jobvorschlag ist für alle geeignet, da diese Arbeit ohne Vorwissen auch von zu Hause zu bewerkstelligen ist. Der Arbeitnehmer hat keine Ausgaben und muss keine besonderen Kenntnisse mitbringen. Die benötigte technische Ausrüstung wird von uns frei zur Verfügung gestellt Zu Ihren Aufgaben gehört die Datendigitalisierung, das Erstellen von Mediatheken, die Koordinierung, die Betreuung von Onlinebibliotheken und das Digitalisieren von Buchwerken. Wir bieten eine attraktive Vergütung in Höhe von 16€ pro Stunde. Unser Betrieb verfügt über internationale Firmensitze in ganz Europa und wir arbeiten in der Digitalisierung. Im Moment suchen wir nach neuen Mitarbeitern. Worauf wir bei unseren Mitarbeiter Wert legen wäre: Eine Tätigkeit von zu Hause entspricht Ihrer Vorstellung, Sie arbeiten gern von zu Hause, eine Tätigkeit von zu Hause entspricht Ihrer Vorstellung, Sie besitzen Flexibilität, Grundkenntnisse von MS-Office sind von Vorteil und Sie besitzen eine teamorientierte Arbeitsweise. Konnten wir Ihr Interesse wecken? Dann freuen wir uns über Ihre Bewerbung! Senden Sie Ihre kompletten Bewerbungen an: ElleRichardsone@gmx.com Wir freuen uns auf Ihre Bewerbung. Mit freundlichen Grüßen Dopp GmbH
Dear Customer, Your parcel was successfully delivered January 20 to USPS Station, but our courier cound not contact you. Please check delivery label attached! With thanks and appreciation, Antonio Best, USPS Station Manager.
Good Day, Johnson & Johnson Belgium (NV) hereby invites your company to submit formal quotation for the item listed below in support of an ongoing project: Product: AIS-P-1009 Pump Product Code: AIS-P-1009 Quantity: 38 units If your company does not currently have this product available, we kindly ask that you source it and supply to us. Please submit your quotation including pricing, lead time and warranty information, to orders.be@jnj-supplies.com at your earliest convenience to facilitate timely review and approval. We appreciate your support and prompt response in assisting Johnson & Johnson Belgium with this procurement. Regards, Dennis J. Smith Global Procurement Manager Johnson & Johnson Medical NV Address: Leonardo Da Vincilaan 15, 1831 Diegem, Belgium VAT Number: BE 0425.967.580 Tel: + 32 480 20 5319 E-mail: orders.be@jnj-supplies.com
Good Day, Johnson & Johnson Belgium (NV) hereby invites your company to submit formal quotation for the item listed below in support of an ongoing project: Product: AIS-P-1009 Pump Product Code: AIS-P-1009 Quantity: 38 units If your company does not currently have this product available, we kindly ask that you source it and supply to us. Please submit your quotation including pricing, lead time and warranty information, to orders.be@jnj-supplies.com at your earliest convenience to facilitate timely review and approval. We appreciate your support and prompt response in assisting Johnson & Johnson Belgium with this procurement. Regards, Dennis J. Smith Global Procurement Manager Johnson & Johnson Medical NV Address: Leonardo Da Vincilaan 15, 1831 Diegem, Belgium VAT Number: BE 0425.967.580 Tel: + 32 480 20 5319 E-mail: orders.be@jnj-supplies.com
Good Day, Johnson & Johnson Belgium (NV) hereby invites your company to submit formal quotation for the item listed below in support of an ongoing project: Product: AIS-P-1009 Pump Product Code: AIS-P-1009 Quantity: 38 units If your company does not currently have this product available, we kindly ask that you source it and supply to us. Please submit your quotation including pricing, lead time and warranty information, to orders.be@jnj-supplies.com at your earliest convenience to facilitate timely review and approval. We appreciate your support and prompt response in assisting Johnson & Johnson Belgium with this procurement. Regards, Dennis J. Smith Global Procurement Manager Johnson & Johnson Medical NV Address: Leonardo Da Vincilaan 15, 1831 Diegem, Belgium VAT Number: BE 0425.967.580 Tel: + 32 480 20 5319 E-mail: orders.be@jnj-supplies.com