#55940 htdig: default install daily error emails

Package:
htdig
Source:
htdig
Description:
web search and indexing system - binaries
Submitter:
Jeff Breidenbach
Date:
2005-07-18 03:31:31 UTC
Severity:
normal
#55940#5
Date:
2000-01-22 22:21:07 UTC
From:
To:
On a fresh potato:
  apt-get install emacs20
  apt-get install apache
  apt-get install htdig

The install asks if I wanted to calculate the endings database;
I said yes. No additional configuration was done. Now cron mails
me the following every day:

/etc/cron.daily/htdig:
Can't determine type of file /var/spool/htdig/htdext.7383; content-type: application/msword; URL: http://localhost/doc/emacs20/etc/edt-user.doc
Can't determine type of file /var/spool/htdig/htdext.7383; content-type: application/msword; URL: http://localhost/doc/emacs20/etc/enriched.doc

A default install should not produce daily error messages emailed
to root.

Jeff

PS. I think I see what is happening. Htdig is set to index localhost
by default. The default web page has a link to the doc directory,
which has some files which are confusing htdig. I don't know if the
answer is suppressing errors, tweaking htdig configuration, or
tweaking apache configuration.

#55940#10
Date:
2000-01-23 03:22:42 UTC
From:
To:
A possible solution, which has the side benefit of being very simple,
is to have htdig ignore anything with the .doc extension.  This is
consistant with Debian's apache, which is apparantly configured to
serve that extension as content-type: application/msword. Htdig can't
handle application/msword (I think).

Sigh...
Jeff


+++ htdig.conf.orig	Sat Jan 22 19:14:50 2000
@@ -58,7 +58,7 @@
 # exclude_url patterns are matched anywhere.
 #
 bad_extensions:		.wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \
-	.doc .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi
+		.jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi

 #
 # The string htdig will send in every request to identify the robot.  Change