How do I troubleshoot full-text search issues? When I try to search in Nautilus, it usually shows results from filenames and metadata, plain text files, but not PDF text, for example.
For example, when I download SICP from MIT and move it to the “Documents” directory, I can find it by anything I can get from running tracker3 info on the file, e.g. “Huffman”. But I cannot find it by “Horner”, for example. I checked it on a clean install of Fedora 37 and still no luck.
How do I check what is missing? Is the full-text search on PDF files supported? An older answer implies that it should.
Full text search no longer works for me in Nautilus since about version 40.
I’m not sure if this is a problem with my setup or if this feature has been abandoned…
Thank you for directing me to that bug report. After looking through it, I looked at man tracker3-search and man tracker3-info and confirmed that my query is missing in the full-text search:
$ LC_ALL=C tracker3 search "Horner"
Results:
But should be present:
$ tracker3 info sicp.pdf --plain-text-content | grep Horner
using a well-known algorithm called Horner?s rule, which
evaluates a polynomial using Horner?s rule. Assume that
16 According to Knuth 1981, this rule was formulated by W. G. Horner early in the
earlier. Horner?s rule evaluates the polynomial using fewer additions and multipli-
Horner?s rule, and thus Horner?s rule is an optimal algorithm for polynomial evaluation.
It is also present in the extract command, which provides similar output but in one line:
I also noticed that the output in both cases is around 1 MB and ends at around page 717. This is probably due to the maximum size, which is mentioned to be 10 MB by default in the linked bug report, but is in fact 1 MB:
$ gsettings get org.freedesktop.Tracker3.Extract max-bytes
1048576
I am considering setting it to 10 MB, which used to be the default at the time of the linked bug report, so that the whole PDF book is indexed. It should be as easy as:
$ gsettings set org.freedesktop.Tracker3.Extract max-bytes 10485760
However, my issue seems to be different, since the tracker extracts the text that I am trying to find. It is the search that omits it.
Still looking into what might be causing the issue.
The data is extracted the same way as before and still no search results as before, even after a reboot.
$ LC_ALL=C tracker3 search "Horner"
Results:
I guess that it is unnecessary to modify it because I see the desired output in extract and info --plain-text-content output anyway. If it was not extracted, I would not see it in the output as I found in another post. It is weird because it implies that text-allowlist might be ignored. Maybe it is ignored for some well-known file types, like PDF.
I noticed that there is a rule for PDF files, but it does not seem that it needs to be modified for the same reason as I said above: