How do I troubleshoot full-text search issues? When I try to search in Nautilus, it usually shows results from filenames and metadata, plain text files, but not PDF text, for example.
For example, when I download SICP from MIT and move it to the “Documents” directory, I can find it by anything I can get from running
tracker3 info on the file, e.g. “Huffman”. But I cannot find it by “Horner”, for example. I checked it on a clean install of Fedora 37 and still no luck.
How do I check what is missing? Is the full-text search on PDF files supported? An older answer implies that it should.
Full text search no longer works for me in Nautilus since about version 40.
I’m not sure if this is a problem with my setup or if this feature has been abandoned…
Thank you for directing me to that bug report. After looking through it, I looked at
man tracker3-search and
man tracker3-info and confirmed that my query is missing in the full-text search:
$ LC_ALL=C tracker3 search "Horner"
But should be present:
$ tracker3 info sicp.pdf --plain-text-content | grep Horner
using a well-known algorithm called Horner?s rule, which
evaluates a polynomial using Horner?s rule. Assume that
16 According to Knuth 1981, this rule was formulated by W. G. Horner early in the
earlier. Horner?s rule evaluates the polynomial using fewer additions and multipli-
Horner?s rule, and thus Horner?s rule is an optimal algorithm for polynomial evaluation.
It is also present in the
extract command, which provides similar output but in one line:
$ tracker3 extract sicp.pdf | grep nie:plainTextContent | grep Horner
I also noticed that the output in both cases is around 1 MB and ends at around page 717. This is probably due to the maximum size, which is mentioned to be 10 MB by default in the linked bug report, but is in fact 1 MB:
$ gsettings get org.freedesktop.Tracker3.Extract max-bytes
I am considering setting it to 10 MB, which used to be the default at the time of the linked bug report, so that the whole PDF book is indexed. It should be as easy as:
$ gsettings set org.freedesktop.Tracker3.Extract max-bytes 10485760
However, my issue seems to be different, since the tracker extracts the text that I am trying to find. It is the search that omits it.
Still looking into what might be causing the issue.
Try this way:
gsettings set org.freedesktop.Tracker3.Extract \
text-allowlist "['*.txt', '*.md', '*.mdwn', '*.pdf']"
That ends up with the same issue on my machine.
I added the PDF to the list as you proposed:
$ gsettings set org.freedesktop.Tracker3.Extract text-allowlist "['*.txt', '*.md', '*.mdwn', '*.pdf']"
$ gsettings get org.freedesktop.Tracker3.Extract text-allowlist
['*.txt', '*.md', '*.mdwn', '*.pdf']
$ tracker3 extract sicp.pdf
The data is extracted the same way as before and still no search results as before, even after a reboot.
$ LC_ALL=C tracker3 search "Horner"
I guess that it is unnecessary to modify it because I see the desired output in
info --plain-text-content output anyway. If it was not extracted, I would not see it in the output as I found in another post. It is weird because it implies that
text-allowlist might be ignored. Maybe it is ignored for some well-known file types, like PDF.
I noticed that there is a rule for PDF files, but it does not seem that it needs to be modified for the same reason as I said above:
$ cat /usr/share/tracker3-miners/extract-rules/10-pdf.rule
$ locate libextract-pdf.so
By the way, no issues are reported by the tracker about the file:
$ tracker3 status | grep sicp
It is weird because it implies that
text-allowlist might be ignored.
OK, I found that the value is intended for text files only.
For the record, it still doesn’t work by itself. The
/usr/share/tracker3-miners/extract-rules/15-text.rule needs to be modified for it to work. There is an open issue about that.