#102981 htdig: htdig hangs when parsing PDF

Package:
htdig
Source:
htdig
Description:
web search and indexing system - binaries
Submitter:
Martin Godisch
Date:
2005-07-18 03:56:43 UTC
Severity:
normal
#102981#5
Date:
2001-07-01 13:01:44 UTC
From:
To:
--- /usr/share/htdig/parse_doc.pl.orig  Sun Jul  1 14:37:39 2001
+++ /usr/share/htdig/parse_doc.pl       Sun Jul  1 14:37:56 2001
@@ -115,7 +115,7 @@
         }
 } elsif ($magic =~ /%PDF-/) {           # it's PDF (Acrobat)
         $parser = $CATPDF;
-        $parsecmd = "$parser $ARGV[0] - |";
+        $parsecmd = "$parser $ARGV[0] |";
 # kludge to handle multi-column PDFs...  (needs patched pdftotext)
 #       $parsecmd = "$parser -rawdump $ARGV[0] - |";
         $type = "PDF";

I do not know, why this "-" argument is needed. I am using pstotext for
PDF parsing and it hangs everytime, waiting for standard input. Maybe this
patch breaks correct interaction with other PDF parsers, then there should
be a case distinction on $CATPDF.

Martin Godisch