#590074 awstats: DO NOT use cron scripts to update stats database

#590074#5
Date:
2010-07-23 12:31:57 UTC
From:
To:
Currently this package installs a cron job that runs every ten minutes. This
is a VERY bad idea:

- if logrotate(8) runs during those 10 minutes, some log entries will fail to
  be accounted for by awstats

- it wastes resources parsing the same log files every 10 minutes, especially
  if they get big

- it makes logcheck(8) spam my inbox every hour due to the cron job failing
  every 10 minutes

A better solution is to hook the update script onto the logrotate(8) entries
for any installed webservers (eg. /etc/logrotate.d/lighttpd,apache2). This
solves all of the 3 problems I just mentioned.

#590074#10
Date:
2010-07-23 13:31:27 UTC
From:
To:
Hi Ximin,

Good points!

Frequent updates of logfiles have its use too, however.  But not always
- and the backsides you raise here are valid.

I suggest to a) split the current cron job into infrequent and frequent
jobs, b) make the frequent one optional (ideally through debconf), and
c) invoke the infrequent job also (or instead?) as a logrotate hook.


How does that sound?


  - Jonas

#590074#15
Date:
2010-07-23 13:57:43 UTC
From:
To:
Yeah, that works. Though looking at the current script, there's not really much
need to split it up.

The main problem is to do the logrotate hook itself - ideally you'd add it
directly to the webserver entry rather than a new awstats entry. Is that going
to be a pain - editing another package's configuration files? I dunno what
infrastructure / policy Debian has for this sort of thing.

#590074#20
Date:
2010-07-23 16:03:23 UTC
From:
To:
Or more accurately: It needs implemented in webserver packages that this
awstats package can hook into - not doable in awstats alone.


  - Jonas

#590074#25
Date:
2010-07-23 16:46:27 UTC
From:
To:
logrotate every 10 minutes - could be the source of trouble.  Not awstats.

Do you mean, it parses _same_ log entires?  No, awstats doesn't do
such a stupid things.  Actually, it does lseek on file to the last
known entry and then begin parsing.

Why exactly it fails?  Do you try first to comment out crontab entry
and fix the source of failure?

I'm disagree with severity.  Looks like a very
site-specific/workload-specific issue.  Your logrotate-based solution
could be suggested as an option in README.Debian for specific setups.

True.

How to split a) or c)?  It's easy only from the local admin side.

We can make cron job frequency to be debconfigured.  Is it an option?

#590074#30
Date:
2010-07-23 16:46:44 UTC
From:
To:
clone 590074 -1 -2 -3
reassign -1 apache2
reassign -2 lighttpd
reassign -3 nginx
retitle -1 add pre-rotate hook to logrotate script
retitle -2 add pre-rotate hook to logrotate script
retitle -3 add pre-rotate hook to logrotate script
severity -1 wishlist
severity -2 wishlist
severity -3 wishlist
thanks

Please add something like the following snippet to the logrotate script for
your package:

prerotate
	if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
		run-parts /etc/logrotate.d/httpd-prerotate; \
	fi; \
endscript

(or some suitable directory other than the one suggested; I'm not sure what
Debian naming conventions are.)

This would be greatly helpful to log-parsing packages such as awstats, which
can then set up hooks to processes these logs before they get rotated (see
#590074 for an example).

X

#590074#37
Date:
2010-07-23 16:55:14 UTC
From:
To:
No, logrotate isn't running every 10 minutes. I think you misunderstood my
point. If logrotate runs between the 10-minute cron runs of awstats, it will
rotate the log entries since the last 10-minute run, so the next 10-minute run
won't be able to see it any more.

What if the file has been truncated or removed by logrotate?

I guess because I haven't written a proper config file yet? Anyway, it's still
spamming my syslog *every 10 minutes*. This should at least be an option that's
off by default.

logrotate is part of the standard install for Debian webservers (at least
apache2 and lighttpd). this is not "site specific".

#590074#42
Date:
2010-07-23 17:39:19 UTC
From:
To:
Looks like possible language problem:

  during != every

  during ~= in between

If this didn't help, please follow-up on Ximin's response instead of
mine :-)

I experienced cron spam too when trying to install awstats recently (and
too busy at the time to investigate further - just cursed a bit and
uninstalled awstats again).

Possibly not a helpful comment - just want to hint that there might
actually be an issue of cron spam in virgin installs of awstats
currently.

I must admit that I have lost track of most recent improvements by you,
but seem to recall in the past that it made sense for my local scripts
to distinguish between hevier monthly/weekly log analysis routines and
smaller hourly ones.  But perhaps that was because I (for other reasons)
analyzed the files from scratch again each month...

Let's first figure out if current frequent cron job really is heavy on
system resources, and only if it is I can try elaborate more on my ideas
here.

I would prefer to keep the cron file as a conffile and instead have the
invoked script check a flag in /etc/default/awstats if it should really
run or just quit immediately.

But again, let's first resolve if it really is necessary.


  - Jonas

#590074#47
Date:
2010-07-23 17:54:23 UTC
From:
To:
That's true.  But it's a known issue and your logrotate hint is
already documented in README.Debian for this purpose.

Probably ;)

Please consider to enable EnableLockForUpdate feature.  From the README.Debian:
---->8--------
Also consider enabling lock files in /etc/awstats/awstats.conf with
EnableLockForUpdate=1 so that only one AWStats update process is
running at a time.  This will reduce system resources especially if
the AWStats update process takes longer than 10 minutes to complete.
This solution has some security drawbacks: lockfile with well-known
name and writable by www-data user.
----------------->8--------

It's a fresh install, right?

I guess, we can disable cron jobs by default on a fresh install.  As
/etc/awstats/awstats.conf is not configured by default,

Yes, but your logrotate settings is very "site specific" and far, far
away from defaults...

#590074#52
Date:
2010-07-23 18:35:51 UTC
From:
To:
yeah.

Where did I suggest that I edited my logrotate scripts? They are unchanged
since being installed...

Or do you mean the solution I proposed? AFAICS Debian utility packages normally
assume they're going to be used on/by other Debian packages, so it's fine to
assume that awstats is being installed for the logs on some local Debian
webserver package.

In fact, the solution I described in the cloned bug reports above, won't put
any extra effort on awstats maintenance:

- awstats adds some update scripts into <DIR>. job done on the awstats side.
- default logrotate scripts of various webservers call <DIR> when rotating
logs. (what I made those cloned reports for)
- if a site admin wants to use non-default log settings, then they'll need to
edit their logrotate scripts anyhow.

X

#590074#57
Date:
2010-07-23 18:44:18 UTC
From:
To:
A virgin install must not cause cron spam.  If you implicitly
acknowledge above that awstats currently does, then yes, we should
disable it by default (or figure out something more clever).
provided the specific logrotate config of that host?

If you are simply guessing, I suggest you state that more clearly, and
be kinder about alternative viewpoints here. :-)


  - Jonas

#590074#62
Date:
2010-09-10 09:31:56 UTC
From:
To:
I'm also experiencing such spamming, and it gets even worse as it runs as www-data, and there's no /etc/aliases redirecting it to a real user by default in exim, it seems.

So there's an ever growing mailbox in /var/spool/mail/www-data :-(

See #496029 that seems to relate to the aliases problem.

Still this needs to be addressed on awstats side too, I guess.

Hope this helps.

Best regards,

#590074#67
Date:
2010-11-01 20:57:53 UTC
From:
To:
Comming back on the fact that your mail is spammed by the cron job
failing. Even when awstats is set up correctly I get the same
troubles.

I did a fresh install of awstats on a box running Debian Lenny, set up
awstats.conf files for all my vhosts, fix rights on apache log files,
but I get `CRON...permission denied` messages in syslog every 10
minutes.

It seems that using a file in `/etc/cron.d` with user `www-data` is
not allowed. I do not understand why, and where the problem comes
from.

#590074#70
Date:
2011-12-21 14:34:03 UTC
From:
To:
severity 590074 +wishlist
thanks

Hello,

I'll lower the severity of this bugreport.  It's actualy
a wishlist bug.  The points of the original bugreport was
addressed:
  be accounted for by awstats

True (it was noted in README.Debian), but a longer period can introduce
a bigger holes in statistics, due to log rotation.

Probably, it's a good idea to introduce the right infrastructure
in the logrotate package first.  Not all maintainers are agreed with
your suggestion, see http://bugs.debian.org/590097

Remember, it still does not solve the "lost entires" problem
completely or it can introduce downside effects if you will start
update.sh in background like
  su -l -c /usr/share/awstats/tools/update.sh www-data &
in the prerotate hook.
  if they get big

Just wrong.
  every 10 minutes

Why not fix the causes of this failing instead?

Olivier Berger wrote:

See #652665.  I'm still not sure if we must abuse
root account by MAILTO=root.  AWStats mails does not
go to the black hole per default.  You can read www-data's mailbox
or just add the alias for www-data pointing to root.

Bruno BEAUFILS <bruno@boulgour.com> wrote:

No, it's allowed.

Then you should provide more information (awstats configuration, detailed
error messages, etc) and fill another bugreport.

#590074#77
Date:
2012-05-21 15:54:47 UTC
From:
To:
I notice some of the issues in this bug relate to the way awstats is run
at log rotation time

The README.Debian recommends:

"Make sure to run AWStats right _before_ web logs are rotated.  For
example, insert the following lines in /etc/logrotate.d/apache2:

    prerotate
      if [ -x /usr/share/awstats/tools/update.sh ]; then
        su - -c /usr/share/awstats/tools/update.sh www-data
      fi
    endscript"
----------------------------- This means that a) sharedscripts must be set in logrotate b) data is likely to be missed if it is logged between the time update.sh finishes and the rotation of a file for any particular vhost Why does the README insist on prerotate and not use postrotate? I've discovered that after rotation, logrotate can give the rotated filename to the postrotate script, and using nosharedscripts, logrotate can call awstats multiple times, once for each vhost, just as it finishes the rotation of that host: nosharedscripts postrotate /path-to-cgi/awstats.pl -LogFile=$1 endscript Where it says `$1' in the postrotate script, logrotate actually puts the rotated filename, e.g. /var/log/apache2/vhost1/access.log.1 so it will override the normal filename defined in /etc/awstats/awstats.vhost1.com.conf The normal conf file will still work normally from cron. All that is needed is some wrapper script around awstats to select the correct domain based on the path in $1 and pass the -config option to awstats too Does this address all the issues raised by contributors to this bug report?
#590074#82
Date:
2012-05-21 20:52:16 UTC
From:
To:
I've contributed a script for fixing this, it is commit c0482b4109176e05
on master

It is based on Sergey's update.sh, but it does each log file separately
just after rotation

#590074#87
Date:
2012-06-13 18:05:38 UTC
From:
To:
Probably, because it's easy.

I'm not sure if reloading Apache per every vhost is a good idea.  Why
this can't be done once?
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=590074#70

For next release I've reverted the above commits.  I don't like idea of
code duplication.  Can you consider to add needed code to update.sh?

#590074#92
Date:
2022-04-08 11:20:21 UTC
From:
To:
Dear ,


Please can I have your attention and possibly help me for humanity's
sake please. I am writing this message with a heavy heart filled with
sorrows and sadness.

Please if you can respond, i have an issue that i will be most
grateful if you could help me deal with it please.

Julian