I’m trying to write a script that reads the text on an image taken by an screenshot using tesseract. However, in this version of Fedora I’m unable to use any screenshot software provided: flameshot (flatpak) gnome-screenshot (flatpak) ksnip (both flatpak and rpm) maim (toolbox)
The only tool that works so far is the screenshot softwared packaged by default, but it doesn’t have custom output capabilities and many other things I need for the script to work properly.
So far I’m using this mess of a script:
# Get timestamp before screenshot
before=$(ls -t ~/Pictures/Screenshots/*.png 2>/dev/null | head -1)
# Trigger GNOME screenshot UI
gdbus call --session \
--dest org.freedesktop.portal.Desktop \
--object-path /org/freedesktop/portal/desktop \
--method org.freedesktop.portal.Screenshot.Screenshot \
"" '{"interactive": <true>}' >/dev/null 2>&1
# Wait for new screenshot
for i in {1..30}; do
sleep 0.2
latest=$(ls -t ~/Pictures/Screenshots/*.png 2>/dev/null | head -1)
if [ "$latest" != "$before" ] && [ -f "$latest" ]; then
break
fi
done
# Get latest screenshot
latest=$(ls -t ~/Pictures/Screenshots/*.png 2>/dev/null | head -1)
# OCR and copy to clipboard >:)
if [ -f "$latest" ]; then
result=$(toolbox run -c python tesseract "$latest" stdout)
echo "$result" | wl-copy
echo "$result"
rm -f "$latest"
else
echo "No screenshot found"
exit 1
fi
Sometimes it works, but it’s tedious and in F41 I had everything working flawlessly. Please help me!
Not an actual solution to the problem you wrote about, but it might help:
Some of these tools don’t work because they rely on using X, while you seem to be using a wayland compositor. Only wayland tools will work there, and the compositor must support the screenshot protocol for it to work. This is especially true for `gnome-screenshot`
You are mentioning that you installed `maim` in toolbox. It might be possible that maim doesn’t have permissions to communicate with the outside. You might want to try installing it with `rpm-ostree apply-live` instead.
Depending on what you want to do with this, you might want to directly get the text from the application using accessibility software such as at-spi, so you can skip both getting a screenshot and using OCR on the screenshot to convert back to text.
That is exactly what I also do. Since Gnome 49, I indeed also cobbled something together using the build-in Gnome Shell utility. For the grabbing, I hacked with dotool, a keyboard simulation tool, to trigger the screenshot tool. However, today, I replaced this hack with your gdbus call command, which obviously is a way better solution.
So I hope I can help you on my turn with the second part: retrieving the screenshot, for which I think I have a better method.
I use inotify. It watches the Screenshots directory for changes. When a file is created, it can return the file name:
So you could plug in this line instead of your for…done loop to await the new screenshot, and the command to #Get latest screenshot that follows.
My script (with your input) currently looks like:
#!/bin/bash
# Dependencies: sudo dnf install tesseract tesseract-langpack-nld tesseract-langpack-fra inotify-tools
WATCH_DIR=~/Pictures/Screenshots
# Check if dependencies are installed
if ! command -v inotifywait >/dev/null 2>&1; then
echo "Error: inotifywait command not found."
exit 1
fi
if ! command -v tesseract >/dev/null 2>&1; then
echo "Error: tesseract command not found."
exit 1
fi
# Launch screenshot
gdbus call --session \
--dest org.freedesktop.portal.Desktop \
--object-path /org/freedesktop/portal/desktop \
--method org.freedesktop.portal.Screenshot.Screenshot \
"" '{"interactive": <true>}' >/dev/null 2>&1
# Wait for creation of new file
NEW_FILE=$(inotifywait -t 30 -q -e create --format "%w%f" "$WATCH_DIR")
# Do OCR on created screenshot
LANG=eng+nld
export LC_ALL=en_US.UTF-8 # - tesseract won't work if LC_ALL is unset so we set it here
tesseract -l "$LANG" "$NEW_FILE" "$NEW_FILE" # OCR in given language
head -c -1 "$NEW_FILE".txt | wl-copy
# notify-send -i dialog-information "Copied to clipboard" "The OCR-ed text has been copied to the clipboard"
# Cleanup
rm "$NEW_FILE" "$NEW_FILE".txt
Previously, I simulated hitting the PrtScr button with
This is what I love the most about forums. Thank you so much. I just modified the tesseract line to run inside the container I installed it in: toolbox run -c python tesseract -l "$LANG" "$NEW_FILE" "$NEW_FILE"
But everything works faster now. Still, this is a very hacky solution to a decades old problem that was already solved.