--- Please enter the report below this line. ---
I am running recoll -t from a krunner plugin, i.e. forking it in the
background. This worked fine while back. However, now (last few versions),
using the -f (filename search) option returns no hits at all. Users of this
plugin has also reported far too few hits on default searches.
Debugging code showed correct command string and then XFNONE, 0 results
strings returned. Simply running the same command in a console works fine,
allbeit it sometimes with differing results than in the GUI program.
Note that users had reported problems with permissions on multi-user systems
so this may be a place to look.
Debian Release: wheezy/sid
990 unstable www.debian-multimedia.org
990 unstable liquorix.net
990 unstable http.us.debian.org
990 unstable debian.tagancha.org
990 unstable debian.scribus.net
990 unstable debian.pengutronix.de
650 testing security.debian.org
650 testing http.us.debian.org
650 testing dl.google.com
500 stable security.debian.org
500 stable http.us.debian.org
500 stable deb.opera.com
500 karmic ppa.launchpad.net
500 intrepid ppa.launchpad.net
101 experimental-snapshots qt-kde.debian.net
1 experimental debian.co.il
--- Package information. ---
Depends (Version) | Installed
==================================-+-===================
libc6 (>= 2.3.6-6~) | 2.13-21
libgcc1 (>= 1:4.1.1) | 1:4.6.1-12
libqtcore4 (>= 4:4.7.0~beta1) | 4:4.7.3-8
libqtgui4 (>= 4:4.5.3) | 4:4.7.3-8
libstdc++6 (>= 4.6) | 4.6.1-12
libx11-6 | 2:1.4.4-2
libxapian22 | 1.2.7-1
zlib1g (>= 1:1.2.0) | 1:1.2.3.4.dfsg-3
Recommends (Version) | Installed
=========================-+-===========
aspell | 0.60.7~20110707-1
python | 2.6.7-3
xdg-utils | 1.1.0~rc1-2
xsltproc | 1.1.26-8
Suggests (Version) | Installed
=====================================-+-===========
antiword | 0.37-6
catdoc |
flac | 1.2.1-5
ghostscript | 9.02~dfsg-3
libid3-tools | 3.8.3-14
libimage-exiftool-perl | 8.60-2
lyx |
poppler-utils | 0.16.7-2+b1
pstotext |
python-chm |
python-mutagen |
unrtf |
untex |
vorbis-tools | 1.4.0-1
David Baron writes: > Package: recoll > Version: 1.16.0-1 > Severity: important > > --- Please enter the report below this line. --- > I am running recoll -t from a krunner plugin, i.e. forking it in the > background. This worked fine while back. However, now (last few versions), > using the -f (filename search) option returns no hits at all. Users of this > plugin has also reported far too few hits on default searches. > > Debugging code showed correct command string and then XFNONE, 0 results > strings returned. Simply running the same command in a console works fine, > allbeit it sometimes with differing results than in the GUI program. > > Note that users had reported problems with permissions on multi-user systems > so this may be a place to look. Hello, It would be very helpful to have the log files for running the command in a console and through krunner. Please either set up recoll to log to a file, or arrange to retrieve stderr output, and set the debug level to 6, either through the config GUI or by editing ~/.recoll/recoll.conf: logfilename=/some/file/name loglevel = 6 - Run the search in a console. - Save the log file (it will be erased at the next step) - Run the same search from krunner. - Save the log file. Then please send both logs to me (jfd at recoll dot org). More than permissions (which are there to be observed), one possible area of concern might be wildcard character expansion by the shell: use proper quoting when running in the console (ie: recoll -t -f 'there *re *ildcards'), and we'll probably have to check how krunner executes the command too, but this kind of issue should be visible in the log file anyway. Regards, jf
console-log from console krunner-log from krunner
<Log files enclosed: <console-log from console <krunner-log from krunner Note that the krunner one has a query *'downloads'* !! I do not do this, obviously. I have asked a correspondant to do this same test with a non -f test which was also not succeding but returning 3 / 150 hits.
David Baron writes: > <Log files enclosed: > <console-log from console > <krunner-log from krunner > > Note that the krunner one has a query *'downloads'* !! > > I do not do this, obviously. > > I have asked a correspondant to do this same test with a non -f test > which was also not succeding but returning 3 / 150 hits. Ok, thanks for the logs, they make it clearer what is happening here. From krunner: :4:../rcldb/rclquery.cpp:174:Query::setQuery: :4:../rcldb/rcldb.cpp:1525:Rcl::Db::filenameWildExp: pattern:[*'Downloads'*] Command line: :4:../rcldb/rclquery.cpp:174:Query::setQuery: :4:../rcldb/rcldb.cpp:1525:Rcl::Db::filenameWildExp: pattern: [Downloads] I will be using [] for quoting in the rest of the message (the [] are not part of the strings). First a bit of explanation on the handling of file name searches: recoll will prepend and append a [*] to a file name search if it does not already contain wildcards and is not capitalized. Trying to do the right thing here, but maybe being slightly too clever. So the krunner search is expanded from ['Downloads'] to [*'Download'*] because ['] is not a capital (not punctuation either because of searches like o'donnell etc.) The second search is not expanded because [D] is a capital. Alternatively, searching for [download] would yield a [*download*] search. This is all particularly ennoying because it does not show in the end search, which only has the XNONENoMatchingTerms thing, because expansion actually occurs (or not) before the search is passed to Xapian. I'll easily admit that the Recoll choices are dubious here (I'm open to suggestions), and I was going to write that I'd least document this disconcerting behaviour of the file name search, but in fact, it is, already: http://www.recoll.org/usermanual/rcl.search.html#RCL.SEARCH.SIMPLE The actual problem here seems to be too much quoting in the data sent by krunner. The parameter incoming to recoll is really ['Download'] when it should just be [Download]. This might also cause the other query issues that you mention. What's strange is that such a krunner issue should also show with other commands ? Or was the search actually entered with single quotes in the krunner window ? I can't really guess what happens or should be done here because I don't know how krunner executes commands (sh -c or exec(2) or whatever...) Getting close... Cheers, jf
The run is being done by a start( QString cmd, QStringList args ) type of fork. I, as recommended, place the query string argument in single quotes in the program, not in krunner's text line window. I assume the internal start() function is an exec but I could be wrong. Question, since the query string is a singe QString, last entry in the QStringList, should the quotes not be there? Within this list of arguments, there is no ambiguity. Question would be after it is expanded in the run shell, would the non-quoted string be problematic? Easy enough to try out but not knowing recoll's internals, cannot really touch all the bases. The * problem does not explain a non-filename problem--I hope the correspondent did the same tests and logfiles and sent them as I suggested to him. Ultimately, I should probably take snippets from recoll's sources and do it directly to xapian rather than the fork, but this runner is meant to be simple and small. Performance in such an interactive environment is not an issue.
A big question, however: The GUI implies and my results seem to indicate that the and/or/query-language options do not work with filenames. Is this true? Or would they work WITH the quotes (seems not to)? This is a GUI design issue since if filename is an exclusive option, then it would radio-button with the others or gray them if checkboxed.
David Baron writes: > On Tuesday 28 Elul 5771 17:26:33 David Baron wrote: > > On Tuesday 28 Elul 5771 17:04:29 Jean-Francois Dockes wrote: > > > David Baron writes: > > > > <Log files enclosed: > > > > <console-log from console > > > > <krunner-log from krunner > > > > > > > > Note that the krunner one has a query *'downloads'* !! > > > > > > > > I do not do this, obviously. > > > > > > > > I have asked a correspondant to do this same test with a non -f test > > > > which was also not succeding but returning 3 / 150 hits. > > > > > > Ok, thanks for the logs, they make it clearer what is happening here. > > > > > > >From krunner: > > > :4:../rcldb/rclquery.cpp:174:Query::setQuery: > > > :4:../rcldb/rcldb.cpp:1525:Rcl::Db::filenameWildExp: > > > :pattern:[*'Downloads'*] > > > > > > Command line: > > > :4:../rcldb/rclquery.cpp:174:Query::setQuery: > > > :4:../rcldb/rcldb.cpp:1525:Rcl::Db::filenameWildExp: pattern: [Downloads] > > > > > > I will be using [] for quoting in the rest of the message (the [] are not > > > part of the strings). > > > > > > First a bit of explanation on the handling of file name searches: recoll > > > will prepend and append a [*] to a file name search if it does not > > > already contain wildcards and is not capitalized. Trying to do the right > > > thing here, but maybe being slightly too clever. > > > > > > So the krunner search is expanded from ['Downloads'] to [*'Download'*] > > > because ['] is not a capital (not punctuation either because of searches > > > like o'donnell etc.) > > > > > > The second search is not expanded because [D] is a capital. > > > Alternatively, searching for [download] would yield a [*download*] > > > search. > > > > > > This is all particularly ennoying because it does not show in the end > > > search, which only has the XNONENoMatchingTerms thing, because expansion > > > actually occurs (or not) before the search is passed to Xapian. > > > > > > I'll easily admit that the Recoll choices are dubious here (I'm open to > > > suggestions), and I was going to write that I'd least document this > > > disconcerting behaviour of the file name search, but in fact, it is, > > > already: > > > > > > http://www.recoll.org/usermanual/rcl.search.html#RCL.SEARCH.SIMPLE > > > > > > The actual problem here seems to be too much quoting in the data sent by > > > krunner. The parameter incoming to recoll is really ['Download'] when it > > > should just be [Download]. This might also cause the other query issues > > > that you mention. > > > > > > What's strange is that such a krunner issue should also show with other > > > commands ? Or was the search actually entered with single quotes in the > > > krunner window ? I can't really guess what happens or should be done here > > > because I don't know how krunner executes commands (sh -c or exec(2) or > > > whatever...) > > > > The run is being done by a start( QString cmd, QStringList args ) type of > > fork. I, as recommended, place the query string argument in single quotes > > in the program, not in krunner's text line window. I assume the internal > > start() function is an exec but I could be wrong. > > > > Question, since the query string is a singe QString, last entry in the > > QStringList, should the quotes not be there? Within this list of arguments, > > there is no ambiguity. Question would be after it is expanded in the run > > shell, would the non-quoted string be problematic? You'd have to check what the "start" function actually does. If it starts a shell to execute the command, in a way which will make the wildcards expand (and the quoting be removed), you need quoting. Given the look of the call, I'd guess that it's closer to a simple fork/exec operation, meaning that no wildcard expansion will take place before recoll receives the arguments, and that you must not quote. > > Easy enough to try out but not knowing recoll's internals, cannot really > > touch all the bases. Recoll internals are not in cause here, you'd have the same problems executing "vi" or "ls" > I tried it and lo and behold, I get filename search results. Ok, then confirmation of the fork/exec kind of spawn. > A big question, however: The GUI implies and my results seem to indicate > that the and/or/query-language options do not work with filenames. Is > this true? Or would they work WITH the quotes (seems not to)? > > This is a GUI design issue since if filename is an exclusive option, then it > would radio-button with the others or gray them if checkboxed. Hhm sorry, I'm a bit lost here, what radio-buttons ? I'm not sure what dialog we're talking about here ? Using the query language, filename queries can be normally combined in others ie, like in [wildcard filename:*manual*] (this would return among others usermanual.sgml which has the term [wildcard] in it). Using the simple search "File name" option, there is nothing to combine it with, this is a pure file name search. I think that this is more or less correctly described in the "search" section of the manual: http://www.lesbonscomptes.com/recoll/usermanual/rcl.search.html There many possible combinations though, and I'm not sure I've tested them all. I'll be glad to try and fix problems that I did not see. Cheers, jf
Sorry forgot to answer to these in the previous email: David Baron writes: > The * problem does not explain a non-filename problem--I hope the > correspondent did the same tests and logfiles and sent them as I > suggested to him. Excessive quoting may also affect non-filename searches, and there is also the capital issue, if the user is not careful about it, search results will be different (because of stemming/no stemming). > Ultimately, I should probably take snippets from recoll's sources and do > it directly to xapian rather than the fork, but this runner is meant to > be simple and small. Performance in such an interactive environment is > not an issue. Going directly to Xapian would be quite complex. Recoll does quite a lot of processing before asking stuff from Xapian, I would really not recommand this (except if you want to rewrite recoll :) ) Actually I think that your approach is quite reasonable. Another possibility would be to either use the Python API or the C++ interface which is just below this: as it's use to implement the Python and PHP Apis and also the recollq program, it's quite stable, and simple (take a look at recollq). The main problem going this way would be build issues, as Recoll is not currently structured to export a library (which is why the Python approach would really be the most natural, except that your program is C++ I guess). Cheers, JF
Attached are 4 log files :
* one from "recoll -t -q gazette" (155 results)
* one from recollrunner with the same query (only "default query
language" checked in recollrunner config) (3 results : only the ones
among the 155 which do not contain spaces in their pathes)
* one from recoll -t -f -q gazette" (46 results)
* one from recollrunner with the same query ("default query language
checked" and "match filenames" checked in recollrunner config) (0
result)
I hope it will help solving this issue.
Regards
Denis
Denis Prost writes:
> Attached are 4 log files :
> * one from "recoll -t -q gazette" (155 results)
> * one from recollrunner with the same query (only "default query
> language" checked in recollrunner config) (3 results : only the
> ones among the 155 which do not contain spaces in their pathes)
> * one from recoll -t -f -q gazette" (46 results)
> * one from recollrunner with the same query ("default query language
> checked" and "match filenames" checked in recollrunner config) (0
> result)
>
> I hope it will help solving this issue.
> Regards
> Denis
Thanks a lot for the log files, my comments below:
first:
> :4:../rcldb/rcldb.cpp:1525:Rcl::Db::filenameWildExp: pattern: [*gazette*]
My guess is that this is from the 3d query (recoll -t -f -q gazette). The
"-q" which would specify a "query language" query is ignored (because of how
the options are parsed), and this is a filename query where gazette is
transformed to *gazette* because it is neither capitalized nor contains
wildcards. It is supposed to return all documents with [gazette] as part of
their file name.
Second:
> :4:../rcldb/searchdata.cpp:782:StringToXapianQ:: query string: [gazette]
This is from [recoll -t -q gazette], which is a regular text search query,
returning all documents with gazette or a derivative ([gazettes]) in the
contents, or possibly in the file name field processed as text.
Third:
> :4:../rcldb/searchdata.cpp:782:StringToXapianQ:: query string: ['gazette']
This is probably from recollrunner with only 'default query language'
checked: there is excessive quoting, but it doesn't hurt much because this
is a full text search and the quotes get eliminated. I don't know why
recollrunner returns few results, but as you mention that these are only
the ones without spaces in the file name, I'd suspect a problem parsing the
output from recoll.
Fourth:
> :4:../rcldb/rcldb.cpp:1525:Rcl::Db::filenameWildExp: pattern: [*'gazette'*]
This is with recollrunner, "match filenames" and "default query language"
checked. "Match filename" takes precedence and the query fails because of the
excessive quoting.
The only thing that I find strange in the logs is that the 3rd one seems to
indicate that the query actually returns more results than the 1st one,
when I would have thought that they are identical. But the quoting may have
affected the query, the actual Xapian query is truncated in the log for
some reason, so we can't be sure:
:4:../rcldb/rclquery.cpp:237:Query::SetQuery: Q: ((gazette:(wqf=11) OR gazettes OR gazet:4:../rcldb/rclquery.cpp:344:Fetching for first 50, count 50
So I think that the first fixes should be for recollrunner to:
- Avoid excessive single quote quoting
- Indicate somehow that "query language" and "file name search" are
different and exclusive modes.
- Try to better parse the query output when there are spaces in the file
names.
And then we may get into possible Recoll issues. I'd be quite interested
though by the logs from the 2 following commands:
recoll -t -q gazette
recoll -t -q "'gazette'"
Cheers,
Jf
Here are the two logs : * recoll -t -q gazette.log (same as already sent) * recoll -t -q "gazette".log Regards, Denis
I am no longer quoting filename searches. I have changed the stdout line parsing to .....[ --> mimetype after trimming [......] --> URL/path [----] --> name, title, etc ... Spaces are not used for anything (except removed from the mimetype). I can see filenames with spaces. krunner seems to be not including every match I feed to it. In other words, I know I am getting three filename results into the program but only one of them (first one?) actually gets displayed. This may be why Denis only still sees three of his gazettes (unless this is still the space problem). In any event, I may post next week a new version on kde-apps.
David Baron writes: > On Wednesday 29 Elul 5771 09:35:41 Jean-Francois Dockes wrote: > > This is probably from recollrunner with only 'default query language' > > checked: there is excessive quoting, but it doesn't hurt much because this > > is a full text search and the quotes get eliminated. I don't know why > > recollrunner returns few results, but as you mention that these are only > > the ones without spaces in the file name, I'd suspect a problem parsing the > > output from recoll. > > I am no longer quoting filename searches. > > I have changed the stdout line parsing to > .....[ --> mimetype after trimming > [......] --> URL/path > [----] --> name, title, etc ... > > Spaces are not used for anything (except removed from the mimetype). I can see > filenames with spaces. > > krunner seems to be not including every match I feed to it. In other words, I > know I am getting three filename results into the program but only one of them > (first one?) actually gets displayed. This may be why Denis only still sees > three of his gazettes (unless this is still the space problem). In any event, > I may post next week a new version on kde-apps. Ok, I don't know enough about krunner to be of real usefulness here. We should be aware that the recollq/recoll -t output is not fully parseable at this point (a file name with ']' in it would break it). If you can get the krunner part to behave, and if you decide that the current approach is the sensible one (as compared to using an API), I could easily be convinced to provide a fully and easily parsable output format (for example by encoding the data parts in base64), we can talk about this. Cheers, jf
I do think that a fully, consistently parsable output is desirable. This would enable various scripting options, not just for my little krunner. Also, using %20 instead of spaces and appropriate codings for other illegal characters would make thie [URL] canonical/legal. "File://...." implies URL. Was it done this way previously (why space problem did not show up before)? On filename searches, simple text used as *name* makes sense. Capitals are sometime automatic, many times ignored, and are language specific. Name queried as Name* might make sense. "name" would imply no wildcards. This stuff is easier than regex but regex might be desirable as a query alternative for text and filenames. Using an API? Is there one in the works?
שלום ערב טוב, אנא התקשר אליי עכשיו או השב למייל ששלחתי לך מאתמול