F42 Change Proposal: Opt-In Metrics for Fedora Workstation (system-wide)

mattdm · July 18, 2024, 9:47pm

This doesn’t give useful information, because “package download” and “package use” are very different. Package downloads tend to be from container building, CI systems, etc. to a point that overwhelms anything else.

bob131 · July 19, 2024, 12:40am

The commitment to transparency, community oversight and user control is laudable and strikes me as a significant improvement over the original proposal – thank you! That said, looking at the existing metrics-to-collect.md (bfb78a4) has not inspired confidence in me: the level of detail collected from users appears incommensurate with the level of detail provided about why this information will be collected and how it will be used. Moreover, I’ve come to think that a single switch may be too coarse both for the proposed SIG^[1] and on an individual-user level.

Giving this some thought, I think there are elements of the Firefox Studies model that might be worth adopting. Study proposals in the form of a Product Hypothesis Document specify not just the set of data to collect, but a purpose for that data and limits on the use of that data. That Firefox Studies are just addons that can be installed or uninstalled at will by users makes PHDs the basis for both oversight bodies and users to approve/accept data collection not just in the abstract but for a defined purpose and use.^[2]

If similar ideas could be adopted into a framework for telemetry in Fedora, I think not only could it substantially enhance transparency, community oversight and user choice, but also provide flexibility for study proposals that might be too controversial to be acceptable to everyone but may nonetheless yield useful results from a subset of users who decide to opt-in (I’m thinking somewhat along the lines of Mozilla Rally, but maybe that’s not comparable).

I’m aware that this is rather pie-in-the-sky, particularly given the current FESCo vote and the substantial additional engineering and governance work. I hope it’s at least food for thought.

The current metrics-privacy-transparency.md seems to imply that approval by the SIG of the proposed initial set of metrics as-is is a non-negotiable part of this change. ↩︎
Putting aside for the moment that Firefox Studies has a mechanism for pushing opt-out studies to users ↩︎

catanzaro · July 19, 2024, 5:03am

We already have crash counts for Fedora packages, but I fear we mostly ignore them rather than using them to improve Fedora as we should be. That should be a warning sign to ourselves, I suppose. As for why we would want to collect data on third party apps, I think we usually wouldn’t, but occasionally it might be useful to answer some particular question.

Hi, I actually agree. That’s a huge list and we shouldn’t collect nearly that much IMO. And what data we do collect needs to have a clear, descriptive justification.

The goal with that list was to be as transparent as possible with our brainstorming. I envision having a separate discussion thread each time somebody wants to collect a metric, to justify the collection and make sure we agree it won’t be possible to collect personal data by mistake.

bob131:

Moreover, I’ve come to think that a single switch may be too coarse both for the proposed SIG[1] and on an individual-user level.

Giving this some thought, I think there are elements of the Firefox Studies model that might be worth adopting. Study proposals in the form of a Product Hypothesis Document specify not just the set of data to collect, but a purpose for that data and limits on the use of that data. That Firefox Studies are just addons that can be installed or uninstalled at will by users makes PHDs the basis for both oversight bodies and users to approve/accept data collection not just in the abstract but for a defined purpose and use.[2]

If similar ideas could be adopted into a framework for telemetry in Fedora, I think not only could it substantially enhance transparency, community oversight and user choice, but also provide flexibility for study proposals that might be too controversial to be acceptable to everyone but may nonetheless yield useful results from a subset of users who decide to opt-in (I’m thinking somewhat along the lines of Mozilla Rally, but maybe that’s not comparable).

I’m aware that this is rather pie-in-the-sky, particularly given the current FESCo vote and the substantial additional engineering and governance work. I hope it’s at least food for thought.

I’m skeptical of this strategy, though. For one thing, it sounds complicated, and I frankly personally don’t want to work on it. But even if we had a volunteer to work on this, it sounds kind of like a sophisticated survey, where I expect participation would be very low. Sometimes it’s good to get detailed data from a few willing users, but what we actually want for Fedora is non-detailed data from as many users as possible.

Not my intent. What I promised in the first version of the change proposal was that approval of the change proposal would not imply approval to collect any particular metrics. I would say that no metrics are approved yet and we still need to discuss and debate each one. (Although the new change proposal doesn’t mention this, I don’t see any reason to change my original plan.)

johnandmegh · July 23, 2024, 1:15am

As someone who was pretty passionate about the nitty-gritty of how the opt-in/opt-out mechanism was implemented…I think it should be an enormous confidence-booster for the community as a whole that @catanzaro (and others, I’m sure)…

Was willing to passionately defend his original point of view, and also make compromises based on community feedback
Prioritized good-and-doable over perfect (at least as far as what he would want to see/use ) in those revisions
Is still engaging with folks even after the proposal has been moved along to FESCo for voting

I’m no inherent fan of any particular corporation, but I would hope that folks who run around spreading FUD about the Fedora Project and the Red Hat relationship take the time to read the back-and-forth here and see how effective the Fedora governance model really has been here.

someguy · July 23, 2024, 5:23am

In that case I suggest giving a seperate option ti handle usage metrics from 3rd party apps.

I for one am ok with fedora preinstalled packages giving out basic things, like anonymized count for extensions installed, user themes are enabled or not or how many workspace are active in average.

However, I will highly oppose if the metrics included realtime info like what exact extensions are used when, what 3rd party applications are used when and even what applications are grouped together… I don’t think that kind of granular data collection is needed for improvement and it borders on Windows like telemetry collection, and I personally will not opt in to such a system.

jaidenriordan · July 25, 2024, 5:08pm

Hey, saw that we were mentioned so I thought I’d provide some thoughts.

I’m not shocked by the idea of punting this to downstream, but I think Fedora can benefit from metrics as well. I don’t see any glaring issues with this proposal as it is opt-in, although I do believe a granular control of what’s shared is in the best interest of users. A way to see a sample of what’s sent before opting in may also help with comfort around this. I think this proposal is overall better than the previous one,

We have been wanting to add opt-neutral (the user has to make a choice) analytics for Ultramarine for quite a while.

We wanted to offer something kinda like KDE where the user selects exactly how much they are willing to share.

We have not (yet) discussed how we will handle Fedora’s collection in Ultramarine yet. We want to continue the status-quo (of sorts) of sharing with Upstream (we do this with DNF counting right now.) We’re not sure if this proposal will line up with our implementation’s timeline, but it might be more feasible to simply share the data we collect directly with upstream (and potentially the public.)

qvest · July 26, 2024, 6:47pm

A much better proposal than the one before it. The people in this thread already told their concerns, and I’m not against it (the proposal) anymore.

Only concern for me would be performance. I know it’s impossible to know how it would affect system performance with only words being thrown around and not having a testing ground, but is there an idea of how much it would impact the system? Particularly one that uses a HDD?

I’m asking this more for myself than others, considering that the hardware I have is old. Would these privacy-preserving metrics impact performance significantly? Would it do constant writes to a HDD or significantly impact CPU (most likely not, but just making sure)? Or even speed?

Used to be against the first version of the proposal, but seeing the amount of effort the people behind it put in to address the concerns of the community at large goes to show that the Fedora developers and leaders really care for it and the community around it! Awesome work, distribution and community!

catanzaro · July 26, 2024, 9:39pm

There should be no significant impact unless we seriously mess up. No, it won’t constantly write to your disk.

I think it’s time for a storage upgrade.

marshall99 · September 28, 2024, 10:21pm

I started using Red Hat way back in late 1996, at about the same time that I found FreeBSD. I preferred Linux over BSD and have stayed with Linux since then, on a personal level. I’ve come to prefer OpenBSD on a professional, server/database level.

Over the years, I’ve left Red Hat and have come back to it at different points of the OS iteration. Recently, I’ve wanted to try the Fedora Silverblue immutable OS. However, as I now do, I check to see if there is unwanted telemetry in any of my software and, if so, can I disable it or remove it completely. I prefer having no telemetry at all; outside of that, using Wireshark and other tools available to me, I test to see if any communication is happening outside of my control.

I’ve read the old 500+ thread post from top to bottom and have been reading this post the last 1+ hours. There are points that need to be addressed that I feel have not been when considering such a proposal.

Before I list them, let me give a little background from my perspective. First and foremost, I’m an IT professional with a background in military, oil, medical and private-sector experience. Outside of my technology background, I worked in non-profits as an executive director providing international government-to-government relations guidance to organizations and NGOs. I have worked closely with Fortune 500 companies in developing security strategies to protect classified and proprietary information; to do so, I had to get extremely familiar with statistical analysis and mitigation techniques.

Here are some issues that I believe need to be considered in no particular order (this is from my experience and perspective only, I cannot speak about anyone else’s):

The proposal to collect the listed data can never be truly anonymized. My system is unique, in terms of hardware and software installation. A profile of what I do on my system can certainly be built based on the proposed data collection. Such a profile may not specifically inform any recipients of such data about who I am (such as name, address, etc.); however, the data can certainly inform people about why I am using the system. For example, I have a nice CPU and GPU to meet my workload; I install DaVinci’s Resolve, GIMP, Inkscape, OBS, et al. and do not have Steam, Lutris, POL or other gaming software installed, it can be deduced that I am using this for media content. Tie that to the hardware I currently have and you can create a unique profile that can be used for whatever purposes Fedora/Red Hat/IBM deem necessary.
Third party telemetry breaks the Zero Trust Model. That’s why Microsoft offers Windows Enterprise. I had a 6 month email conversation with, at first, Microsoft Support and then Microsoft Legal about unauthorized access to US HIPAA electronic Protected Health Information through Windows 10 Home/Pro/Education telemetry. Their recommendation was to use Windows Enterprise. For Fedora, perhaps the organization will push people towards Red Hat; that’s for them to decide. Yet, many of my clients have installed Fedora in their businesses, on my recommendation and their evaluation process, with the understanding that access to proprietary and covered medical information is limited. Here’s the problem I see, currently, with the proposed changes: I have to trust that Fedora/Red Hat/IBM will never change what is collected, when and how often; I would have to put systems in place to validate any communication from a desktop to Fedora’s server contained precisely, and only, what is being proposed. “Trust” being the key word; in a Zero Trust model, I would not recommend Fedora installation.
And here is where I have experience that directly relates to Zero Trust, telemetry and government entities. Fedora, Red Hat and IBM all exist under a government’s jurisdiction; all entities do, both corporate and organic; however, focusing only on Fedora and the proposed changes, it can be stated that regardless of jurisdiction, an organization can be forced to clandestinely do certain things or include certain changes without notifying users. A somewhat recent example would be the Lavabit debacle where the owners of Lavabit closed the business rather than comply with the US government dictates. What if the company chose to remain in operation? The impact would have been devastating to people like me who needed secured communication (specifically, ePHI, e.g., unauthorized access) to ensure that I comply with the law. I spoke with legal counsel and this could have opened up the company for whom I was then working to significant legal and civil penalties. Fedora is proposing to implement significant data collection that some governments might not be able to resist getting and who might make a classified request to keep IP addresses, hardware MACs or what ever else data can be scraped with what is already being proposed. This is a non-issue if the proposed changes are not implemented.

To those who are proposing to create this and to those considering approving the proposal, please consider that there are legitimate reasons to have both privacy and anonymity. Fedora would, by necessity of the communications medium, get my IP address when the telemetry is submitted. If this collection happens only once, then that may not be able to show patterns; however, if the telemetry is collected again, say at a conference in a different city, then a pattern can start to be formed about who I am and what I am doing. And if it happens a third time, it becomes almost trivial to match travel information, hardware specifics and installed software packages with usage statistics. All this without considering the benefits that AI-based analysis could provide (again, a person thinks that top secret programs are using public-facing AI technologies, then I would argue that such a person has absolutely no understanding of access clearances and technology development).

To be clear, this makes it more easy to identify individuals, such as journalists covering stories that governments, organizations and/or entities do not want made public; follow activists around and identify those with whom they associate and what they are doing; and/or target a specific group of people that may not be in social or legal favor at any given time. And, if you think for one second that government agencies do not do this, already, and that they will not use such a valuable tool, such as telemetry collected through proposals such as yours and which did not exist before, then I know, for a fact, that you are sadly mistaken.

Please consider all use-case scenarios and how your tools can be used for less than ethical and moral purposes. Will not implementing this proposal cease the ability to track and possibly persecute individuals? Absolutely not; however, it is my belief that it will certainly make interested 3rd party organizations’ jobs easier, especially if they have state-level authority to force compliance.

jandemus · September 29, 2024, 5:17am

When I wrote the first version of this post I didn’t really had read the story to its full extend. I saw something and immediately feared the worst, something I do too often. I know that. My apologies for that.
I have now read the story and there is one change I would like to propose and it’s the following.
Is it possible to only have the program which asks the question if you want to participate or not to be in the default setup? When somebody answers with Yes, I do want to participate, the rest of the software is installed and things start.
That way people, like me, who don’t want to join have this software not on their disk. Yes, it can be uninstalled but I feel safer when it never existed on my disk. The question program can be uninstalled by the user should (s)he wants to.

catanzaro · September 30, 2024, 2:30pm

This is only possible if Fedora’s infrastructure is malicious, though. We’ve already promised that metrics are recorded separately, so it’s not possible to build user profiles unless we’re maliciously running different server software than the open source metrics server.

If you don’t trust us Fedora to be non-malicious, then I suggest not enabling the metrics collection in the first place. That said, we are looking into a couple different ways to remove even the need for even this much trust in the future. E.g. Firefox uses Divvi Up, in which multiple independently-operated servers would have to maliciously collude instead of just one.

You should probably just not enable the metrics collection? Possibly uninstall eos-event-recorder-daemon, so it’s not even possible for users to enable it?

We will be regularly changing what data gets collected (with community approval, of course!); there’s no promise to not do that.

OK, but there’s a lot that seems weird here:

Presumably no government is going to be interested in issuing a secret warrant for the little data we collect, so for such a warrant to be scary, it would have to compel us to secretly collect additional data. The secret warrant would have to compel software engineers to build software that doesn’t already exist. I’m not aware of precedent for this in western countries. Usually warrants can only compel you to reveal data that you already possess. (Except telecommunications companies do have to ensure wiretaps are possible.)
But if the secret warrant really does compel us to collect extra data, well, Fedora is open source. Presumably somebody is going to notice if we start adding a bunch of sketchy data collection? If I was personally compelled to add additional metrics collection, I would do so as loudly and incompetently as possible to maximize the chances that users will notice the problem. To actually succeed at secret data collection, I think we would have to distribute software that doesn’t correspond to the open source code.

We do have room for improvement:

On the client side: there’s been little effort to achieve reproducible builds for Fedora. When a build is reproducible, users can audit that the build shipped by Fedora is actually built from the correct sources and has not been clandestinely modified. Most of Debian is reproducible already, and the remaining exceptions are treated as bugs. But Fedora has not been working on reproducibility at all.
On the server side: yes, the eos-metrics server operator is vulnerable to a secret warrant. But if we were to switch to a protocol like Divvi Up, then multiple secret warrants would be required for each server operator. Presumably we would want to locate the servers in different legal jurisdictions.

We aren’t going to be collecting error messages via this system, at least not in an uncontrolled manner, since there’s a high risk that error messages could contain personal data. We already have separate error reporting via ABRT, which is frankly quite neglected.

Metrics collection won’t fix neglect of particular components. We’re short-staffed, and some projects don’t have enough maintainers, leaving issue trackers neglected. Unfortunately not many contributors are interested in maintaining core components like gdm. Same answer applies to problems with Weather. (The good news is Fedora is much less short-staffed than any competing Linux distribution.)

Metrics collection can help improve components that are not short-staffed, though.

Theoretically, yes it’s possible, except for immutable distros like Silverblue/Kinoite.

In practice, it’s too much effort (it just took us over 5 years to autoinstall openh264, which was critical priority!) and there’s no value in doing so, since it’s harmless when not enabled. But you can uninstall eos-event-recorder-daemon (the component which submits metrics to the server) if desired.

marshall99 · September 30, 2024, 9:16pm

Thank you for the response.

You won’t have to add any extra metrics as what has been proposed to be included is sufficient enough; also, you will get IP addresses because the communication has to come from somewhere going to another place (my system to Fedora’s servers). I have to trust that this information will be scrubbed instead of processed on another server behind Fedora’s firewalls.

My experience echoes history in that I know governments build profiles on people that they consider either people of interest or enemies, for whatever reason. Will these profiles continue to be built outside of Fedora’s initiative? Absolutely. Will Fedora’s information proposed to be collected be unique information that was not previously captured? Possibly, and probably.

Many activists I’ve met around the world use Linux precisely to avoid intrusive metrics. As you have said, one can just uninstall it. Even then, though, I would have to continually check to see if that it gets reinstalled every single time an update or upgrade happens. I have to trust Fedora.

I have to trust Fedora that it will not keep identifiable data, that’s already part of the telemetry being collected and as part of natural communication of that telemetry. I have to trust Fedora will not make it a hard requirement. And, I have to trust Fedora that you will not implement the GPS portion of the EndlessOS which can narrow down to 1° of geographic certainty. Can it be changed to be more specific? Yes. Will it be changed? That remains to be seen.

This was just food for thought. As you decide what telemetry to include. If you truly believe that the world is becoming more free and less restrictive, then that worldview will drive your development. However, I would point out all the arrests being made just for liking or re-posting memes. (e.g., Maine trooper says he was retaliated against for reporting illegal police surveillance of citizens).

Again, I’m not saying that Fedora’s telemetry will be the impetus for mass surveillance, which is already happening. What I am suggesting, however, is that it will provide another source of data collection that could be used immorally and/or unethically. If the telemetry did not exist, at all, then it would be a moot point. It would be a non-issue. Since it is going to exist, in some form, a lot of trust is being placed on Fedora’s shoulders to do what they say they will do.

johnandmegh · September 30, 2024, 11:51pm

If you mistrust the integrity of Fedora Project products to this degree, then I would suggest you should have much different and much bigger concerns than this openly discussed, openly developed usage data collection.

Have you inspected and built from source the code that is used to encrypt private information on your desktop, to make sure that the Fedora Project hasn’t built in, or been compromised by someone who has installed, a backdoor for government agents to use in collecting your information? Have you done the same for all other elements of the operating system to make sure that you’re not exposed to nefarious actions?

Sorry, I’m being a bit absurd there just to illustrate the point that if you don’t trust what’s being done in the open that you can inspect, then you should already distrust what is being done off of your device that you can’t inspect, and if your personal threat model requires that level of micro-inspection and micro-control over data flow, then you might find other projects more fitting to those specific needs.

There are tools available to exclude packages from future installation, like the excludepkgs option.

marshall99 · October 1, 2024, 12:21am

It’s okay to be absurd to make a point. As a Unix admin, I absolutely make sure about every single package I install; however, on Linux, ever since about 2005, I go on trust.

There’s a difference in scenario about what you’re talking about. One is package development and installation, luckily open source advocates catch these… in time. The BASH Shell Shock is one that was in the system for years; the latest fiasco with XZ caught a lot of distros by surprise.

But, telemetry is a different beast altogether. Again, Fedora will have my IP address and I will have to take their word that it is not kept; they will have my hardware information (MAC included? Who knows?) - CPU, GPU, number of hard drives, how many times I use a program, how long an app is in use, et al. This is personal information being sent from my machine to a server where I have to trust that the organization receiving it will handle it as they say.

To provide a solution for this, perhaps engaging an established 3rd party organization to independently verify that information is handled as stated, encrypted and not kept indefinitely.

mark180 · October 1, 2024, 12:44am

I don’t see any downside to this, especially if it helps make a better product. I would be willing to contribute given the abililty to opt-n or out at any given point.

tqcharm · October 1, 2024, 6:04am

I wonder if an implementation can be considered where the user action of opting in to telemetry would trigger the installation of the telemetry packages with the following system update (dnf upgrade of via GNOME Software). Not having the necessary packages on the installation media and on the newly installed systems by default might raise the confidence in the mechanism to a level that is acceptable even by those against the proposal.

marshall99 · October 1, 2024, 11:19pm

@catanzaro I just want to be clear that I appreciate not only your approach to this, but Fedora’s willingness to engage the community. There are too many instances of different organizations in the FOSS world where that is not the case. It instills more trust in the system and process, at least on my end. Thank you.

dwaris · October 2, 2024, 3:50pm

There are still multiple issues I have with this proposal:

The list of metrics you want to collect is already far too extensive in the first place (List). I mean you want to know things like, my partitioning scheme, number of times each app launched and how many world clocks I added, like wtf? If you plan to regularly change them, this MUST require users to give consent AGAIN. You can’t simply implement telemetry, obtain initial consent, and then change what users have agreed to without explicitly asking for consent again.
The eos-event-recorder-daemon should not be installed by default if possible. Instead, it should be installed only after the user has provided their consent. Keeping this daemon installed by default poses a security risk, as it gives the OS itself the ability to silently collect and send data, which could be exploited in a malicious scenario.
Users should be able to adjust the level of telemetry data being sent, similar to how it’s done in KDE. KDE gives users a slider for how much telemetry they want to share, from minimal data to more detailed usage statistics, which is still way less than this crazy list you shared.

beattb · October 2, 2024, 6:26pm

So that they can focus resources on better supporting those non-default setups that actual users actually use (this needs data to evaluate), and on the opposite side of the spectrum, to abandon those nobody actually does use…

Keep in mind, FOSS often suffers from limited resources, so insight into how to better allocate those can be extremely useful and beneficial.

I’m not agreeing to the collection of such data with this post, just trying to give you perspective on how it could indeed be useful for Fedora project, which is what you asked

moralcode · October 5, 2024, 7:54pm

Summary of my position: Generally in support. As a maintainer of an android and iOS mobile application, having metrics that i can get from the app store on how many people installed the app, crashes over time per-release, and associated stack traces for those crashes is INCREDIBLY useful in helping me debug things.

I love how much care and attention has been put into preserving user privacy and choice here.

Maybe this was mentioned earlier (the thread is kinda long) but regarding the reliability of opt-in metrics (i would support the suggestion to call it choice-first metrics if thats a better branding for it): i think from a lay computer users perspective, its hard to really grasp the value of these metrics. many people seem to be in either the apathetic/dont care/accept the default camp, or the “enthusiast who has seen the many examples of data abuse and wants no part in any of it and will opt out of everything possible”.

Providing opt-in-by-default would normally be the way most other orgs would capture this data from the default-accepting crowd, but fedora wants to hold itself to a higher moral standard than that (praise for that btw).

so instead, I think, in order for a “data only collected with explicit opt in” model to provide sufficient statistical data for informed and useful developer decisions, i think fedora needs to go above and beyond at communicating not just the facts (whats collected, why, etc), but maybe venture more into trying to communicate the feelings and/or impact that these metrics have. I realize thats hard without metrics to begin with, but i see two ways to potentially do this:

a slider like KDE does (ik this was suggested before). I think this could be done in a way that lets get quite granular, but regardless of how its implemented, I think having VERY easy/simple levels with explainations helps a lot. examples
1. “just send a one-time ping to let us know you installed fedora, this helps our developers and leaders see the growth and feel good about how many people use the project so we can make better decisions for you”,
2. “click a button to send us a one time ping to tell us you like fedora, this sends us even more good vibes”
3. <from here start slowly adding more and more of the metrics from this proposal, least invasive first giving each a reason thats both technical and slightly emotional, without going too far (how far that is may be up for debate, per fedoras open discussion processes)>

Maybe looking at how apple does it could be an example. the apps i maintain are nowhere near the popularity of fedora, and i believe their data collection (at least for the data I see) is opt-in, so clearly whatever they are doing is statistically significant enough to give me useful data, and should scale well to fedoras usecase.

Edit: i also like the plasma crash reporter that pops up and says “hey something just crashed, do you want to collect and submit automatic crash report data” - i think something like that could complement this metrics proposal well since it would have the same opt-ing structure (only sends data if positive consent given) but the presence of a report could also serve as a way to count crashes, while also giving devs a way to fix the issues. Of course for those not comfortable with this reduction in data aggregation/correlation preventions, i imagine there would be a setting to just say “i never want to send crash data”

Topic		Replies	Views
UNOFFICIAL poll about OPT-OUT metrics proposal Project Discussion fesco , workstation-wg	0	3317	July 9, 2023
How can we make the Change process more clear to people? Project Discussion fesco	12	367	July 12, 2023
Proposed guidelines for discussion of proposed Changes Project Discussion fesco	77	1410	September 7, 2023
About the Change Proposals category Change Proposals	0	417	May 16, 2023
F42 Change Proposal: Unprivileged Disk Management (system-wide) Change Proposals fesco , f42	26	599	July 19, 2024

F42 Change Proposal: Opt-In Metrics for Fedora Workstation (system-wide)

Related topics