F40 Change Request: Privacy-preserving Telemetry for Fedora Workstation (System-Wide)

This is a product-customer mindset that has creeped into many open source projects. In a business setting, customers don’t often participate in a community. And there it makes sense to worry about the experience of silent users, and telemetry is an easy answer. However in community projects like Fedora, community participation is the primary means of feedback. However that hasn’t been the case with Fedora for some years, and reaching for telemetry is its inevitable culmination.

4 Likes

I am cautiously optimistic regarding this change and I think it is being suggested in good faith.

A couple of thoughts, should the change request be approved:

  • It’s extremely important that the opt in/out toggle is part of the installation or first run-process for the DE so there’s no way users can claim not to have been aware of their implicit or explicit choice.
  • It’s also important to immediately tell users where to find the toggle should they change their mind later on.
  • Please put a “more information” button next to the opt in/out toggle, where you detail exactly what information is collected, who gets access to it and how it is anonymized - and preferably make sure you store so little information that it’s impossible to de-anonymize it after the fact.
  • Opting out should be doable via scripts
4 Likes

For the record, yes, I am a Red Hat employee!

I have in no way whatsoever have had any sort of initiative passed down on me. I work on a layered product (Satellite), and whatever Fedora decides bears no weight on me, work wise. I am also not engaging in this interaction based on some sense of fraternal obligation to a coworker or some allegiance to my employer.

I AM a Fedora user. As are many Red Hatters. I do not believe that one’s affiliation to a sponsoring company inherently minimizes or disqualifies their opinion or minimizes it in a community. Especially when said company has explicit clauses in their employment agreement that employees are free to make decisions against the employers interests when it comes to upstream communities. And I for one would hate to force something on the Fedora community that could even be remotely considered extremely short sighted or excessively self-serving.

My viewpoint on the subject is influenced by many aspects, as is everyone’s here. Being a developer has exposed me to various problems within the development sphere, as well as data quality and collection. And yes, this influences my opinion on data gathering to improve something we work on. And I believe experienced voices are beneficial to the debate.

On the flip side, it can also give me blinders to the reality of the concerns around what data is collected, and what concerns there are out there about this data. Which is where the aspects of community discussion come in, and one of the reasons I love open source. We all have different backgrounds that gives us different perspectives on the situation. Hearing everyone’s voice and making informed decisions is always beneficial.

(Sorry to soap box a little, this is a little more broadly targeted at the general sentiment of “RH Bad” that crops up, and felt it needed to be concisely stated)


I’ve also seen a few references to Fedora’s “Mission Statement” and “Vision”, and I thought it might be cool to organize and elaborate my thoughts around this.

Snipped segments are just to help with readability (i didn’t want to reproduce the whole page here), and are not indicative of any intention to ignore other parts.

Foundations

I’m not completely aware of the state of open source telemetry and usage data gathering, but I can’t fathom a reason why having a vested interest in quality solution that would be open source and transparent wouldn’t be beneficial in the long run, and if not help revolutionize could at least heavily influence the situation as a whole.

There is a friction to getting involved and helping in open source communities. Skills, knowing where to start, technical understanding, social anxiety, etc… I believe providing a quality telemetry solution would actually help empower users who are unsure how to get involved in helping are actually able to do something with minimal exertion. As simple as clicking a button during setup, and bam you are providing valuable data to the project. (not getting into the opt-in vs opt-out debate here, there is another, MAJOR thread for that)

Utilizing a quality telemetry system would assist in creating excellent software, and will create a novel data-driven channel for cooperating with various upstream communities to ensure all users benefit from our data.

Personally, I’m hard-pressed to think of a reason the telemetry space isn’t rife for open-source innovation. Telemetry isn’t something that’s going away (in a general, global industry sense). All the more reason to at least think on the problem and see if we can work on a better solution than the ones that have given just about everyone a bad taste in their mouth.

Our Vision

Telemetry is at least intended as a tool to help address the problems of accessibility and usability. And has had a profound effect so far on this space (at least to those developers who are big proponents of telemetry).

As far as the community approach, that’s why I only highlighted that segment of the vision block. THIS discussion is an excellent example of our community approach.

Our Mission

One could easily see that this is an opportunity to have a rather innovative approach to telemetry and telemetric data. Something that was built collaboratively, and ensuring transparency as much as possible.


Well, that was a fun exercise at least, to help me organize my thoughts. I hope others find value in it too. As it stands, I think we all could agree that the proposal as it stands could use some revisions and updates. But I for one am excited at the potential here. One of my long term hopes ever since I found Linux is for the “Year of the Linux Desktop” to finally arrive, while watching all these years for it to always seem just on the horizon. This, to me, is a possible investment approach to truly help make this happen. Hence my personal excitement and enjoyment of it. To me, those are the stakes.

But, I am just one person. Who has a potentially short-sighted or ill-informed view. I could be massively wrong, and completely ignorant to some very serious items. Which is also why I’ve been excited to be involved in this conversation. I’ve already gained massive perspective and understanding from alternate viewpoints, and more respect for the Fedora community as a whole. I’m excited to watch this unfold, no matter what path it may take.

7 Likes

I wanted to address a few recurring elements that keep coming up in these conversations. These quotes are all paraphrased since the same general sentiments have been expressed in multiple ways.

The data is being collected anonymously

Client-side data collection can’t happen anonymously. It is identifiable at the point of collection and at the point it is transmitted/received. It is only after the data has been received that it can be de-identified and stored anonymously.

This is important because it means that Fedora is, at some point, in possession of identifiable data. Making sure that the data stays anonymous requires a lot of things to “go right”.

If you don’t trust Fedora, you shouldn’t use it

Trust is not a single yes or no answer.

  • Do I trust that Fedora will not take malicious actions? - Yes
  • Do I trust that Fedora is infallible? - No
  • Do I blindly trust Fedora - No

For me, the last two answers would be the same for any organization. It isn’t personal.

Further, implementing telemetry as discussed in the proposal would definitely lower my trust in Fedora.

Telemetry isn’t bad!

Opinions will vary on this but it doesn’t matter if telemetry is good or bad. The vast majority of the people criticizing the proposal are supportive of the ability to collect telemetry.

However, there are many other sticking points related to:

  • The process for being included in the collection - This seems to be the biggest issue overall
  • The timing of the data collection as it relates to the request to be included
  • The transparency around what is being collected
  • The methodology used for the collection and de-identification
  • The governance and review for metrics to be added
  • The ongoing review of the process and program by both individual users and the program as a whole
  • Probably more things I missed
5 Likes

3 posts were merged into an existing topic: Decision-Making, Governance, Council, Red Hat — a breakout topic for the F40 Change Request on Privacy-preserving telemetry for Fedora Workstation

Seeing the hundreds of replies it’s unclear if any changes have been, or plan on being adopted. I know that @catanzaro said he would start catching up today (and that’s going to take some time), but some edits might go a long way in helping push the conversations forward.

Wow! I don’t see how introducing telemetry metrics capabilities into Fedora is in any way consistent with this Mission Statement.

In fact, the goal of competing with Ubuntu in any context, as was stated as one motivation for the proposal, would have to begin with modifying it. Ubuntu’s mission is completely inconsistent with that of the Fedora Project.

Am I the only one who’s not seeing that, or am I simply misunderstanding what I’m reading?

1 Like

As far as my interpretation, specifically: " …that enables software developers and community members to build tailored solutions for their users…

I could be way off base, but to further the goal to tailor solutions for users, you need feedback mechanisms for such solutions. Telemetry is a powerful, albeit controversial, feedback mechanism to… enable software developers and community members to build tailored solutions. Which helps cement telemetry as one possible tool to further this goal.

A good or bad tool, right or wrong for Fedora, that’s the debate. But it IS a tool that could do just this.

I’m interpreting it completely differently: You’ve providing a platform that developers and companies who’re interested in doing telemetry can implement it for their users. In fact, your primary downstream partner, RHEL, already does this.

What data telemetry methods that can be implemented in Fedora cannot be implemented in RHEL?

Typing on a phone, so sorry for any errors. I work for RH (I’m the Fedora QA team lead).

I think this is a perfectly reasonable and non-evil idea. I also think we probably shouldn’t do it.

For a lot of the reasons other folks have given, essentially. One, it just looks bad. That sounds like a dumb reason but it isn’t. We exist in an ecosystem with certain values, and one of those values is being extremely suspicious of data collection. If we want to keep existing in the ecosystem we have to respect that.

Two, the issues of competence that others have highlighted. It’s very difficult to be sure you’re doing this stuff “right”, even with good intent. I don’t want us to be eternally waiting for the shoe to drop on a privacy kerfuffle.

Three, issues of trust are kinda insoluble. The Fedora setup with fesco and the council and voting and yadda yadda is a fine setup. It is also, legally speaking, a polite fiction. AIUI - I am not a lawyer - Red Hat effectively “owns” Fedora for all practical purposes. Effectively, Red Hat would own the collection system and the collected data, “Fedora” would not. If RH had a heel turn and went full-on villain and the CEO of RH decided to take the system and use it for the most nefarious purposes they could think of, they could do that. All the elected bodies would be powerless in that scenario; Evil RH’s Evil CEO could just declare them defunct or overridden.

Similarly, IBM owns RH. Regardless of what anyone believes the current relationship between them is like, data collection is for keeps. At any time, since RH owns the data and IBM owns RH, an Evil IBM can take the data and use it for evil. Neither RH nor Fedora could stop Evil IBM in this case.

Given the above, I think it’s unreasonable to ask folks to “trust” Fedora with this. Effectively a request to trust Fedora is also a request to trust RH and IBM, forever, and that’s not reasonable.

At a minimum, we’d need to somehow be sure that all collected data was so innocuous and irreversibly anonymized that an evil owner with complete control over the system and access to all data ever collected by it could do nothing too evil with it. And I think that’s a very difficult bar to clear.

12 Likes

A post was merged into an existing topic: Decision-Making, Governance, Council, Red Hat — a breakout topic for the F40 Change Request on Privacy-preserving telemetry for Fedora Workstation

Happy to. I think it’s confusing / not known by people who aren’t
already involved for a while. I am not sure how better to communicate
this sort of thing. :frowning:

Hi again all. Apologies that I have fallen behind on this conversation. I will try to get back to this soon.

2 Likes

This is a fairly reasonable response to the proposal. I generally trust Fedora to do what’s right for the distro while balancing the needs of Red Hat and the community, though Red Hat will be a higher priority since a lot of the funding/infrastructure and technology is provided by them. This won’t however do much to build good will with those who are currently sceptical, but could be swayed to trust Fedora in the future. Is there a net gain?

Another issue that could be problematic is the consent of minors, specifically anyone under 16.

8.2 The controller shall make reasonable efforts to verify in such cases that consent is given or authorised by the holder of parental responsibility over the child, taking into consideration available technology.

The most expensive thing in the world is trust. It can take years to earn and just a matter of seconds to lose.

1 Like

If some modifications to the original proposal are made, where can we get updated on those modifications?

Can there be a sort of update log on what it’s being reconsidered etc. Because most of us are going off of the original proposal only, making it difficult to have an up to date discussion.

1 Like

The Fedora Change Process is described here: Changes policy :: Fedora Docs. Changes are often updated to address concerns raised before FESCo approves them — sometimes in anticipation of the FESCo decision meeting based on feedback, and sometimes based on explicit conditions from FESCo.

I’ll let @catanzaro speak to his own plans, but I expect we’ll see a revised proposal at some point. Given the general amount of interest here (let alone controversy!) I think we’ll probably post that as a new topic (with a new devel-announce email), when it’s ready.

5 Likes

Thank you Fedora leadership, devs, community, and those who keep the servers/forum/etc. running smoothly. I am a newbie to the community. I thought Fedora was a separate entity from Redhat. However, Fedora and Redhat are one entity based on a Wikipedia article that says:

The project was founded in 2003 as a result of a merger between the Red Hat Linux (RHL) and Fedora Linux projects. It is sponsored by Red Hat (an IBM subsidiary) primarily, but its employees make up only 35% of project contributors, and most of the over 2,000 contributors are unaffiliated members of the community.

Coming from ==> Fedora Project - Wikipedia

To me this explains why one would even consider bringing any sort of data collection to a distro. It is part of corporate entity. If Fedora were truly separate then there would be no reason to bring any sort of data collection or user install counter to the distro.

My response to the proposal:
Please do not do this. Please do not add any data collection, monitoring/surveillance, or install counter components to the distro. I don’t even want it to lie dormant on my drives. Last year Manjaro’s leadership was planning to integrate opentelemetry into their distro. to collect data for the same exact or very similar reasons described in the proposal. I and other Manjaro users asked them to not bring telemetry into the distro. Even though there was push back, so far it seems that they backed off and aborted those plans. In my mind, even just considering/planning the inclusion of data collection schemes puts leadership’s decision making in doubt. Some, myself including, would consider this as a strike against distro./leadership. Unfortunately, Manjaro has made a number of other fumbles which brought me Fedora.

My PoV:
I have reached my limit of tolerance of Win10, its telemetry/spyware, its forced updates, its crappy performance/design, and Microsoft’s (M$) grand disregard for user privacy. I have no love for Apple and their “walled garden” either. I’ve abandoned Win10 for the most part, and only use it for a few specific reasons. I have an Android smart phone, but:

  • I do not use the Chrome browser that comes it
  • I do not install or use social media apps on it
  • I do not use the other Google apps on it such as the youtube app.
  • the gmail account I use with it, sees very little use beyond the obligatory login, every 2-3 months, to keep the account active
  • I don’t do online banking on it
  • I don’t use facial or finger print unlocking
  • the Google assistant is disabled

I am not a fan of social media. My FB account has been dormant for more than 10 years, with no pictures or other real personal data.

I do not want any telemetry, data collection, or other surveillance/tracking components on my desktop. This is not negotiable. Even if I allowed some data collection by another entity, that allowance is not automatically transferred to Fedora, and no it is not a justification for Fedora to monitor my activities and collect my data. If I allow a Linux distro to run data collection components on my desktop then I may as well just run Windows, which would defeat the purpose me abandoning Windows in the first place.

User Hardware Info.:
If Fedora/Redhat (FRH) needs user hardware info. let them ask the users to submit inxi output, with specific flags, to a website. inxi provides plenty of hardware/system info. and the output can be parsed. The parsing can easily be done by a dev with only a small amount of effort. FRH does not need:

  • user activity data
  • software install data
  • install count data

Error/Crash Reporting:
I refuse to engage error and crash report applications (executables). Although, I will gladly submit error/crash report info. via a website. No one is going to convince me to click collect data and then click submit crash report in an executable. There are other users who do not wish to submit any error/crash reporting data for a variety of reasons. Let’s respect the decision of the users. Let the devs handle the tasks of software debugging. Users generally do not want to be viewed/treated as a developer’s lab rat/guinea pig.

Privacy-preserving Metrics System:
Cyclical data collection is a form of monitoring/surveilling, irrespective of user consent. There is no such thing as “privacy-preserving” with a data collection system, irrespective of the system’s initial design. Privacy and data collection are at odds, and one cannot ride those 2 horses galloping in opposite directions. Designs can and will change over time allowing for the expansion of the number of columns in a data table (gathering more data). Over time, gross approximations, estimations, and inferences can be made from the collected data. This comes from the Law of Large Numbers.

The law of large numbers, in probability and statistics, states that as a sample size grows, its mean gets closer to the average of the whole population. This is due to the sample being more representative of the population as the sample become larger.

Coming from ==> Law of Large Numbers: What It Is, How It's Used, Examples

No thank you.

Not Collecting Personally-Identifiable data:
The date, time, and IP address make for a unique identifier, so the idea of not collecting personally-identifiable data is easily shot down. As we continue to tack-on additional columns the identifier becomes more and more unique. For example, start with date, time, IP address and then add:

  • CPU make/model
  • RAM amount
  • BIOS manufacture/version
  • Motherboard make/model/manufacturer/version
  • drive count
  • kernel and kernel version
  • OS version
  • DE and DE version
  • MAC address (unique identifier already… big red flag)
  • installed package count

Almost, if not all of the above, can be obtained from inxi output. IP address can be used as proxy for country, region, and location. The tacking-on of additional columns to build a unique identifier is the same technique law enforcement, in the US, uses to uniquely identify a device visiting a website.

Opt-in/Opt-out:
When it comes to opt-in/out schemes, for me it is a firm “No Thank you”. Opt-in/out is nothing more than a switch that can be flipped at any point in the future. It usually means we’ll integrate and push the component to the user’s installation, and when they are ready or we (the distro. maintainer) are ready, we’ll turn it on. No thank you. I don’t want it on my drive. This is why I have an issue with KDE. If the distro. doesn’t need user activity data then DE creator/maintainer doesn’t need user activity data. Unfortunately, KUserFeedback is tightly integrated into KDE. Removing it would require a complete fork of the project, a thorough combing of the source code, and the surgical removal of the KUserFeedback code and all references to it. As soon as a better fork of KDE comes along, that does not have any data collection components, and is not tied to questionable/corporate entities (Google, M$, etc), I will abandon KDE. They’ve put a toe tag on their project, with the sunset date waiting to be filled in. I am not alone in wanting to steer clear of any data collection, and opt-in/out schemes. No thank you fellas.

What data collection does not do:
There is no guarantee that user data will be used to make improvements that one likes and/or agrees with. There is no guarantee that it won’t be used to justify removing features/applications that one likes/wants/needs. Any policy/approach that is initially employed is not guaranteed to remain the same in the future. Policies change. Laws change. People’s attitudes and levels of greed change.

How should Fedora/Redhat go about improving the distro:
My ideas go back to TQM principles. Focus on delighting the customer and improving product quality. Automated user data collection systems do not guaranteed product quality improvements, it only guarantees data will be collected. The customer is the end user. Engage the user community honestly and take their ideas/suggestions seriously. Use polls, surveys, suggestion boxes, proposals (like what we are engaging in now), focus groups, etc. What is truly valuable is user feedback. So create a feedback loop. I’m a software developer and have been DBA so I’m not new to analysis of structured data. I get it. Structured data creates efficiency for the devs.

I see there is a “Fedora Annual Contributor Survey 2023”. I’ll be sure to do the survey. As for suggestions/feedback, let’s start with don’t bring data collections schemes into the distro.

The following is a real example of user suggestion/feedback:
As a user I need the ability to manage kernels, nvidia proprietary drivers, and update the boot manager’s menu when those components change. Building a kernel is a separate process. I’m expecting at a bare minimum for there to be a simple set of steps via the command line. An easy to use GUI tool would be great and would add to Fedora’s polish. There is no shame in having and using a GUI tool, and having a command line method as a secondary option.

The current process, based on the wiki, is unnecessarily complex. Even finding the info. on how to install a new kernel is unnecessarily complex. If I feed the following search phrase to google, at a minimum I’m expecting to find a single page in the wiki, with a simple set of steps:
“how to install new kernel from command line fedora linux”

However, I’m presented with a long complex wiki article on manual kernel installation. The article suggests that I use “DNF” or “PackageKit” to install a new kernel and provides a link to a wiki article page. However, the link is to a page on “Updating Packages”. This is confusing to a user that is new to Fedora. Package Management and DNF are huge topics that, while related to managing kernels, should be treated separately.

If I feed the following search phrase to google, at a minimum I’m expecting to find a single page in the wiki, with a simple set of steps:
“how to install new kernel with packagekit fedora linux”

This leads one down a proverbial mine shaft in hopes of striking gold. After some digging (no pun intended) I realize that PackageKit is a distro. agnostic package management tool. Great! Problem… There is a Fedora wiki article for the “Package Management System”, that has a link for packagekit, that sends the user to Freedesktop.org. Even that site does not have a simple page on how to use the GUI. Freedesktop.org does go into an explanation of how to use packagekit from the command line, but as a user that is new to Fedora, I’m no closer to understanding how to install a new kernel.

My frustration as a new Fedora user should be obvious at this point. I’m starting to become concerned that there may be different methods based on the individual desktop environments, even though package kit is supposedly distro. agnostic. This is not a problem of being new to Linux (a novice). This is encountered as a user that is new to Fedora, a distro. that has the corporate backing/resources of Redhat. This should have been fixed 10+ years ago.

Possible Solution/Suggestion(s):
Maybe copy the example/ideas from Manjaro Linux. Manjaro has a very good GUI and CLI set of tools for kernel management (install, removal, selection, listing) and Nvidia proprietary driver management (install, removal, selection, listing). The tools are called “Manjaro Settings Manager” (GUI) and “mhwd” (CLI). Kernel and Nvidia driver, installation and switching, are handled separately from system update/upgrade. This means newer versions of these components are not forced on the user. However, as newer versions of these components become available, the user is alerted by a notification on the desktop. When installing newer versions of these components, the tools handle process which include updating the Grub menu. Here is a short youtube video demonstrating the new kernel installation process via the GUI ==> https://www.youtube.com/watch?v=AYCbxSATSfA

Even if Fedora doesn’t copy Manjaro’s GUI application example, the existing process can be simplified and streamlined.

The Fedora wiki is unnecessarily complex. The info. on how to change the kernel is buried deep in the documentation on DNF package management. This needs to be broken up and simplified. Here is how:

  • Package management and DNF syntax usage are big topics. Follow the Arch Wiki example for the “pacman” package management tool ==> pacman - ArchWiki.
  • I suggest copying Manjaro’s wiki example for kernel management, which can be viewed here ==> Manjaro Kernels - Manjaro
8 Likes

4 posts were merged into an existing topic: Approaches to data handling, safety, and avoiding individual identification — a breakout topic for the F40 Change Request on Privacy-preserving telemetry for Fedora Workstation

Wanted to follow up on the argument I made about the damage being done to the Fedora brand by highlighting some examples of notable influencers and what they’ve had to say about this proposal.

Chris Titus Tech, a Linux and tech youtuber with 506k subcribers came out with a Linux tier list. He listed Fedora in the “devil” tier specifically over this proposal about opt-out telemetry. After having recommended Fedora in the past, he said:

Fedora is in the devil tier; nothing I’d ever recommend using anymore.

Jeff Geerling, tech youtuber who uses Linux often with 523k subscribers and 6.3k followers on Mastodon posted in response to this proposal:

Announcing intent to add “privacy respecting” data collection to #Fedora to, for example “know how many of our users are using particular IDEs” is not a great way to engender trust after pulling a stunt like #RedHat did a few weeks ago…

Brodie Robertson, Linux youtuber with 54k subscribers and 4.2k followers on Mastodon, who just a couple of weeks ago had Matthew Miller on his podcast, had this to say:

I implore any of the #Fedora #Linux contributors out there that might see this, please do not back the new telemetry proposal. I know telemetry can be incredibly valuable for improving the product offered by Fedora and I’m sure some great improvements can be made with it but I see this as your Amazon Lens moment.

If #FedoraLinux goes ahead with this it doesn’t matter that the user can opt-out, it doesn’t matter that the data is generic, it doesn’t matter that it’s anonymized, for now until the end of time Fedora will be called a spyware distro. You will hurt community trust in the project far more than anything you will gain from improving the software and I really don’t want to see that happen.

Aral Balkan, with 38k followers on Mastodon, also posted a thread critical of the move.

This is not to mention the simplified way that the proposal has been talked about which has also led to misunderstandings. Things like ‘Fedora adds telemetry’ or ‘Fedora is planning on adding telemetry’ fail to express how individual contributors are the ones who make proposals and that they are not at all announcements from the Fedora Project. Search the #fedora or go to any space that talks about Fedora and you will similar opinions if you haven’t already.

We can argue that this is not something we can control, but it doesn’t change the fact that the waters have been severely muddied. Like I said before, perception is reality and right now we look extremely bad in the Linux community. Will telemetry data be worth the trust we’re losing as a distro? Do we want to be in a similar hole as CentOS, which is still dealing with passionate complaints, misconceptions, and loss of trust two years after Stream was introduced? It’s critical that we protect the reputation of Fedora Linux, or the complaints and badgering we will get will discourage users and contributors for years to come.

10 Likes

Even as one who has discontinued all personal use of Fedora Linux based more on some of the responses here than the actual proposal itself, I wouldn’t put Fedora or even Red Hat in any kind of “devil tier” - that’s a bit over the top…

1 Like