F42 Change Proposal: ibus-speech-to-text (self-contained)

ibus-speech-to-text

This is a proposed Change for Fedora Linux.
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.

Wiki
Announced

:link: Summary

ibus-speech-to-text will provide voice dictation capabilities to any application supporting IBus input methods in Fedora Linux 42, using VOSK for local voice recognition.

:link: Owner

:link: Detailed Description

  • ibus-speech-to-text provides a new input method that enables voice dictation in any application supporting IBus
  • Uses VOSK for local voice recognition, not requiring internet connectivity
  • Supports multiple languages through downloadable voice recognition models
  • Includes a setup tool built with GTK 4 and libadwaita for model management and configuration

:link: Feedback

:link: Benefit to Fedora

This package will bring several benefits to Fedora:

  • Provides accessibility improvements through voice input capabilities
  • Offers offline voice recognition, preserving user privacy
  • Integrates seamlessly with existing IBus infrastructure
  • Supports multiple languages through downloadable models
  • Enhances user productivity through voice commands

:link: Scope

  • Proposal owners:

    • Package ibus-speech-to-text (review) [done]
    • Package dependencies: gst-vosk (bz) and vosk-api (bz) [done]
  • Other developers: Parag Nemade

  • Release engineering: #Releng issue number

  • Policies and guidelines: N/A (not needed for this Change)

  • Trademark approval: N/A (not needed for this Change)

  • Alignment with the Fedora Strategy:

:link: Upgrade/compatibility impact

:link: Early Testing (Optional)

Do you require ‘QA Blueprint’ support? N

:link: How To Test

:link: Functionality Test

  1. Install required packages:sudo dnf install ibus-speech-to-text

  2. Restart IBus using ibus restart command

  3. Add Speech To Text in input sources

  4. Launch the IBus STT Setup tool from the preferences for a configuration and to download a language model

  5. Open a text editor

  6. This Input Method can also be enabled and disabled with the default shortcut (“Win + Space”) used to switch between IBus Input Methods

:link: User Experience

Users will be able to:

  • Dictate text in any application supporting IBus
  • Switch between typing and voice input easily
  • Manage language models through a modern IBus STT Setup tool

:link: Dependencies

:link: Contingency Plan

  • Contingency mechanism: Remove the package
  • Contingency deadline: N/A
  • Blocks release? N/A

:link: Documentation

:link: Release Notes

ibus-speech-to-text has been added to Fedora

Last edited by @amoloney 2025-01-23T20:13:03Z

Last edited by @amoloney 2025-01-23T20:13:03Z

2 Likes

How do you feel about the proposal as written?

  • Strongly in favor
  • In favor, with reservations
  • Neutral
  • Opposed, but could be convinced
  • Strongly opposed
0 voters

If you are in favor but have reservations, or are opposed but something could change your mind, please explain in a reply.

We want everyone to be heard, but many posts repeating the same thing actually makes that harder. If you have something new to say, please say it. If, instead, you find someone has already covered what you’d like to express, please simply give that post a :heart: instead of reiterating. You can even do this by email, by replying with the heart emoji or just “+1”. This will make long topics easier to follow.

Please note that this is an advisory “straw poll” meant to gauge sentiment. It isn’t a vote or a scientific survey. See About the Change Proposals category for more about the Change Process and moderation policy.

I see that the documentation for the current version states that this runs entirely offline. Is there any danger that a future update might (accidentally or otherwise) start using online resources? Could the “offline” policy be enforced somehow (e.g. SELinux rules)? Maybe “offline” should even be part of the package name to make that explicit and, if anyone later wants to use some sort of online system, they would have to build a different package?

2 Likes

This change proposal has now been submitted to FESCo with ticket #3363 for voting.

To find out more, please visit our Changes Policy documentation.

This change has been accepted by FESCo for Fedora Linux 42. A full list of approved changes to date can be found on the Change Set Page.

To find out more about how our changes policy works, please visit our docs site.

Is there any itent here to install this by default? Or just provide it as an option?

I am not sure if the Change owner Manish is aware of this discussion topic. Let me ping him personally.

Also, I see that the Change wiki page do not contain link to this discussion thread, I have fixed this now.

Hi @kevin , Currently ibus-speech-to-text is provided as an option, not installed by default and there is no plan to include it in the default Fedora installation.

1 Like

I think we can consider this as more of a “Tech Preview” - it is probably not ready nor mature enough for general use, but still it is an interesting open project, which people can try out and hopefully could improve over time.

1 Like

I am not sure how to do that - it would really require upstream work I think, but any pointers to similar handling are welcome I think.

Note it does require downloading voice model data when setting up. We will also be testing it as part of our I18N Test Week.

1 Like

I don’t know how to write such SELinux rules, but I’ve seen them work, so I’m pretty confident it can be done. For example, someone once tried to add a custom Python script that would send them notifications via the online Pushbullet service, to the smartd service, but they kept getting the below SELinux denial.

type=AVC msg=audit(1659121294.93:468): avc:  denied  { name_connect } for  pid=1770 comm="python" dest=443 scontext=system_u:system_r:fsdaemon_t:s0 tcontext=system_u:object_r:http_port_t:s0 tclass=tcp_socket permissive=0

SELinux was preventing the service that was running as fsdaemon_t from opening a TCP socket. That’s the sort of thing that I would like to happen if this new speech-to-text service were to try to route any data to any online service.

The full context of the above example can be found here: SELinux is preventing python from name_connect access on the tcp_socket port 443

P.S. I personally think locking this service down is quite important and I wouldn’t want to see it installed without such security measures. IMO, the potential of having an “open mic” plugged into the world-wide-web is a pretty serious concern.

ok. Thanks for the info! this looks like a pretty interesting thing.

1 Like

Could you open a bug so we can discuss and track it?
Sorry for the late response

1 Like

Hi Jens.

I made an attempt at filing a bug report for this issue:

Thanks.

1 Like

Thank you for putting this CP together. I followed the instructions in OP and am currently voice typing into this reply textarea.

A few notes:

  • Even after I turn voice typing off i.e. switch to another input method - the microphone recording icon remains in the GNOME upper right panel, which is a little sus tbh
  • The settings UI is pretty straightforward, however it wasn’t clear to me that I had successfully downloaded and activated any of the models. so it seems like there’s some state management that needs to be worked out there

tl;dr

I heard of that blog post by “a blind Linux user” saying that “the software which allows him to use Linux is being abandoned and works less and less, if absolutely not on Wayland”.
I believe that this single proposal is actually too small, and therefore, over time, all the time and effort spent developing it may “become wasted” for many unforeseen consequences (tl;dr “develop this alone, it dies alone”).


I’m not feeling great health-wise, so in the last part I wandered off, unable to “properly explain what I mean”.
If you don’t feel like it, just skip the ending. There’s nothing really essential in there, and it boils down to this:

tl;dr of the ending:

“Distros like Fedora KDE and Mint should aim to reduce the use of the Konsole to -1 and should aim to become WAY more End-User friendly”.




Main body:

I believe that any change like this would feel incomplete (and shouldn’t be implemented) if “the entirety of the core of the Fedora experience doesn’t get covered under the blind people can’t see issue”.

“We must begin somewhere” isn’t really a good rebuttal tho, because:

  1. Not implementing everything together will most probably “make stuff incompatible” (I’m not gonna pretend I am a Dev of any kind) as time passes as it gets slowly developed.
  2. Not making it a “necessary foundation” will, once more, make other software work less and less with it.
  3. By “eh, we already did 1/12th of the work” will be used as an excuse to “focus on other things”.

To expand upon what I mean:

It’s a tiring issue that this stuff should be implemented “as upstream as possible”, because there are SOME things which just must be done.
“Linux gives more freedom” is a meaningless sentence, because “in the context of creation” freedoms & limitations are equally important in creating something.

Some “necessary foundations” ARE always needed for compatibility AND “platform unity” (not sure if it’s the best way to express the idea).

Even if Fedora implements and manages to perfectly maintain (even just simply) a “global speech to text software” it’s not guaranteed that other softwares will “be engineered to work with it”, and even IF they all were to, other Distros may make their own S.2.T. software, which isn’t.

.

This thing, this very important thing, should be developed as a “necessary foundation” as upstream as possible so that every single Developer is forced to follow it, to avoid incompatibilities among Distros and through time.




Here’s the closure, where I can’t come up with a better way to express myself (my head hurts):

“Linux is for everybody” is true, and that is the reason WHY there are Distros.

Other than the fact that “if I don’t need it, I won’t install it” will remain forever true,
those “min/maxxers, ultra riching Megatron-Distro creators” (not offending, just describing) are a fraction of a fraction of a fraction of a fraction of anyone and everyone who uses a computer.
Linux is already “their home” and this will never be taken from them.

.

I chose Fedora KDE because it’s the most End-User friendly Distro out there with the latest stable software maintained by a large team, but it’s still far from ideal.

Many Distros are not being developed with the direct goal to grow their userbase because most people like their (computer) “not person, not friend, not pet, but TOOL” to not be another problem to take care of.

People have NO IDEA that Android is a Linux version BECAUSE it doesn’t bother the user, it just works, like Windows.
Baseline Fedora KDE can compete with a phone, but why would anyone get a computer or laptop to install Fedora KDE if a tablet already does everything they need to?

If Linux is to become larger, if Linux is to become for everyone, if Linux is to become Better Than Windows, then to aim for the Goal to become more End-User friendly both in a “now also blind people can easily use this” sense and in a “just and only use the UI, you won’t see the Konsole from install to when your PC burns down” sense is a priority of absolute and eternal importance.

The “greatest Linux leap” which has happened in the last decade is Valve expanding on WINE to make Proton, and since they are also developing SteamOS some other good changes are coming to Linux, not to talk about all the exposure.

If I had less time to take care of my computers I’d sadly have to use W11 on every computer in my house, because even “easy to use” Fedora KDE and “simple and hands-off” Linux Mint are not as friendly as Windows is, even as bad as 11 got (without counting the spying and forced ads; I am not using a Distro developed by “a dude” as daily driver, it can be “as open source as you won’t”, I don’t understand Developerese and I don’t care, it’s more difficult to “corrupt and/or compromise an entire large group” compared to a single “dude”).