Hyperlinks with only a single character between the <a href...> and </a> tags are not followed and the associated page is not included when building the search information. Catalog pages which are paginated, for example, are not searched. Have to generate special index page which has more characters between the tags which allows the page to be spidered. --- Begin /etc/cron.daily/htdig (modified conffile) #!/bin/sh nice /etc/htdig/vnwgindex.pl nice /usr/bin/htdig -c /etc/htdig/vnwg.conf nice /usr/bin/htmerge -c /etc/htdig/vnwg.conf--- End /etc/cron.daily/htdig
Bug is probably a documentation error. Setting the minimum_word_length to 1 solved the problem. This is supposed to effect what goes into the index, but it also appears to be used to determine the minimum link length. You can downgrade the severity of the bug since there is a workaround.
Hi William, I agree with you that this is not a very proper way to handle the configuration option of minimum_word_length. I'll downgrade the bug to wishlist and have a look at it. Thank you anyway! Stijn.
Ok, but what I'd really like is cookies since I'm spidering a .jsp site and each request is a new session. <sigh> I know its on the todo list. Sincerely, William Mussatto, Senior Systems Engineer CyberStrategies, Inc ph. 909-920-9154 ext. 27