This doesn’t give useful information, because “package download” and “package use” are very different. Package downloads tend to be from container building, CI systems, etc. to a point that overwhelms anything else.
The commitment to transparency, community oversight and user control is laudable and strikes me as a significant improvement over the original proposal – thank you! That said, looking at the existing metrics-to-collect.md
(bfb78a4
) has not inspired confidence in me: the level of detail collected from users appears incommensurate with the level of detail provided about why this information will be collected and how it will be used. Moreover, I’ve come to think that a single switch may be too coarse both for the proposed SIG[1] and on an individual-user level.
Giving this some thought, I think there are elements of the Firefox Studies model that might be worth adopting. Study proposals in the form of a Product Hypothesis Document specify not just the set of data to collect, but a purpose for that data and limits on the use of that data. That Firefox Studies are just addons that can be installed or uninstalled at will by users makes PHDs the basis for both oversight bodies and users to approve/accept data collection not just in the abstract but for a defined purpose and use.[2]
If similar ideas could be adopted into a framework for telemetry in Fedora, I think not only could it substantially enhance transparency, community oversight and user choice, but also provide flexibility for study proposals that might be too controversial to be acceptable to everyone but may nonetheless yield useful results from a subset of users who decide to opt-in (I’m thinking somewhat along the lines of Mozilla Rally, but maybe that’s not comparable).
I’m aware that this is rather pie-in-the-sky, particularly given the current FESCo vote and the substantial additional engineering and governance work. I hope it’s at least food for thought.
We already have crash counts for Fedora packages, but I fear we mostly ignore them rather than using them to improve Fedora as we should be. That should be a warning sign to ourselves, I suppose. As for why we would want to collect data on third party apps, I think we usually wouldn’t, but occasionally it might be useful to answer some particular question.
Hi, I actually agree. That’s a huge list and we shouldn’t collect nearly that much IMO. And what data we do collect needs to have a clear, descriptive justification.
The goal with that list was to be as transparent as possible with our brainstorming. I envision having a separate discussion thread each time somebody wants to collect a metric, to justify the collection and make sure we agree it won’t be possible to collect personal data by mistake.
I’m skeptical of this strategy, though. For one thing, it sounds complicated, and I frankly personally don’t want to work on it. But even if we had a volunteer to work on this, it sounds kind of like a sophisticated survey, where I expect participation would be very low. Sometimes it’s good to get detailed data from a few willing users, but what we actually want for Fedora is non-detailed data from as many users as possible.
Not my intent. What I promised in the first version of the change proposal was that approval of the change proposal would not imply approval to collect any particular metrics. I would say that no metrics are approved yet and we still need to discuss and debate each one. (Although the new change proposal doesn’t mention this, I don’t see any reason to change my original plan.)
As someone who was pretty passionate about the nitty-gritty of how the opt-in/opt-out mechanism was implemented…I think it should be an enormous confidence-booster for the community as a whole that @catanzaro (and others, I’m sure)…
- Was willing to passionately defend his original point of view, and also make compromises based on community feedback
- Prioritized good-and-doable over perfect (at least as far as what he would want to see/use ) in those revisions
- Is still engaging with folks even after the proposal has been moved along to FESCo for voting
I’m no inherent fan of any particular corporation, but I would hope that folks who run around spreading FUD about the Fedora Project and the Red Hat relationship take the time to read the back-and-forth here and see how effective the Fedora governance model really has been here.
In that case I suggest giving a seperate option ti handle usage metrics from 3rd party apps.
I for one am ok with fedora preinstalled packages giving out basic things, like anonymized count for extensions installed, user themes are enabled or not or how many workspace are active in average.
However, I will highly oppose if the metrics included realtime info like what exact extensions are used when, what 3rd party applications are used when and even what applications are grouped together… I don’t think that kind of granular data collection is needed for improvement and it borders on Windows like telemetry collection, and I personally will not opt in to such a system.
Hey, saw that we were mentioned so I thought I’d provide some thoughts.
I’m not shocked by the idea of punting this to downstream, but I think Fedora can benefit from metrics as well. I don’t see any glaring issues with this proposal as it is opt-in, although I do believe a granular control of what’s shared is in the best interest of users. A way to see a sample of what’s sent before opting in may also help with comfort around this. I think this proposal is overall better than the previous one,
We have been wanting to add opt-neutral (the user has to make a choice) analytics for Ultramarine for quite a while.
We wanted to offer something kinda like KDE where the user selects exactly how much they are willing to share.
We have not (yet) discussed how we will handle Fedora’s collection in Ultramarine yet. We want to continue the status-quo (of sorts) of sharing with Upstream (we do this with DNF counting right now.) We’re not sure if this proposal will line up with our implementation’s timeline, but it might be more feasible to simply share the data we collect directly with upstream (and potentially the public.)
A much better proposal than the one before it. The people in this thread already told their concerns, and I’m not against it (the proposal) anymore.
Only concern for me would be performance. I know it’s impossible to know how it would affect system performance with only words being thrown around and not having a testing ground, but is there an idea of how much it would impact the system? Particularly one that uses a HDD?
I’m asking this more for myself than others, considering that the hardware I have is old. Would these privacy-preserving metrics impact performance significantly? Would it do constant writes to a HDD or significantly impact CPU (most likely not, but just making sure)? Or even speed?
Used to be against the first version of the proposal, but seeing the amount of effort the people behind it put in to address the concerns of the community at large goes to show that the Fedora developers and leaders really care for it and the community around it! Awesome work, distribution and community!
There should be no significant impact unless we seriously mess up. No, it won’t constantly write to your disk.
I think it’s time for a storage upgrade.
I started using Red Hat way back in late 1996, at about the same time that I found FreeBSD. I preferred Linux over BSD and have stayed with Linux since then, on a personal level. I’ve come to prefer OpenBSD on a professional, server/database level.
Over the years, I’ve left Red Hat and have come back to it at different points of the OS iteration. Recently, I’ve wanted to try the Fedora Silverblue immutable OS. However, as I now do, I check to see if there is unwanted telemetry in any of my software and, if so, can I disable it or remove it completely. I prefer having no telemetry at all; outside of that, using Wireshark and other tools available to me, I test to see if any communication is happening outside of my control.
I’ve read the old 500+ thread post from top to bottom and have been reading this post the last 1+ hours. There are points that need to be addressed that I feel have not been when considering such a proposal.
Before I list them, let me give a little background from my perspective. First and foremost, I’m an IT professional with a background in military, oil, medical and private-sector experience. Outside of my technology background, I worked in non-profits as an executive director providing international government-to-government relations guidance to organizations and NGOs. I have worked closely with Fortune 500 companies in developing security strategies to protect classified and proprietary information; to do so, I had to get extremely familiar with statistical analysis and mitigation techniques.
Here are some issues that I believe need to be considered in no particular order (this is from my experience and perspective only, I cannot speak about anyone else’s):
- The proposal to collect the listed data can never be truly anonymized. My system is unique, in terms of hardware and software installation. A profile of what I do on my system can certainly be built based on the proposed data collection. Such a profile may not specifically inform any recipients of such data about who I am (such as name, address, etc.); however, the data can certainly inform people about why I am using the system. For example, I have a nice CPU and GPU to meet my workload; I install DaVinci’s Resolve, GIMP, Inkscape, OBS, et al. and do not have Steam, Lutris, POL or other gaming software installed, it can be deduced that I am using this for media content. Tie that to the hardware I currently have and you can create a unique profile that can be used for whatever purposes Fedora/Red Hat/IBM deem necessary.
- Third party telemetry breaks the Zero Trust Model. That’s why Microsoft offers Windows Enterprise. I had a 6 month email conversation with, at first, Microsoft Support and then Microsoft Legal about unauthorized access to US HIPAA electronic Protected Health Information through Windows 10 Home/Pro/Education telemetry. Their recommendation was to use Windows Enterprise. For Fedora, perhaps the organization will push people towards Red Hat; that’s for them to decide. Yet, many of my clients have installed Fedora in their businesses, on my recommendation and their evaluation process, with the understanding that access to proprietary and covered medical information is limited. Here’s the problem I see, currently, with the proposed changes: I have to trust that Fedora/Red Hat/IBM will never change what is collected, when and how often; I would have to put systems in place to validate any communication from a desktop to Fedora’s server contained precisely, and only, what is being proposed. “Trust” being the key word; in a Zero Trust model, I would not recommend Fedora installation.
- And here is where I have experience that directly relates to Zero Trust, telemetry and government entities. Fedora, Red Hat and IBM all exist under a government’s jurisdiction; all entities do, both corporate and organic; however, focusing only on Fedora and the proposed changes, it can be stated that regardless of jurisdiction, an organization can be forced to clandestinely do certain things or include certain changes without notifying users. A somewhat recent example would be the Lavabit debacle where the owners of Lavabit closed the business rather than comply with the US government dictates. What if the company chose to remain in operation? The impact would have been devastating to people like me who needed secured communication (specifically, ePHI, e.g., unauthorized access) to ensure that I comply with the law. I spoke with legal counsel and this could have opened up the company for whom I was then working to significant legal and civil penalties. Fedora is proposing to implement significant data collection that some governments might not be able to resist getting and who might make a classified request to keep IP addresses, hardware MACs or what ever else data can be scraped with what is already being proposed. This is a non-issue if the proposed changes are not implemented.
To those who are proposing to create this and to those considering approving the proposal, please consider that there are legitimate reasons to have both privacy and anonymity. Fedora would, by necessity of the communications medium, get my IP address when the telemetry is submitted. If this collection happens only once, then that may not be able to show patterns; however, if the telemetry is collected again, say at a conference in a different city, then a pattern can start to be formed about who I am and what I am doing. And if it happens a third time, it becomes almost trivial to match travel information, hardware specifics and installed software packages with usage statistics. All this without considering the benefits that AI-based analysis could provide (again, a person thinks that top secret programs are using public-facing AI technologies, then I would argue that such a person has absolutely no understanding of access clearances and technology development).
To be clear, this makes it more easy to identify individuals, such as journalists covering stories that governments, organizations and/or entities do not want made public; follow activists around and identify those with whom they associate and what they are doing; and/or target a specific group of people that may not be in social or legal favor at any given time. And, if you think for one second that government agencies do not do this, already, and that they will not use such a valuable tool, such as telemetry collected through proposals such as yours and which did not exist before, then I know, for a fact, that you are sadly mistaken.
Please consider all use-case scenarios and how your tools can be used for less than ethical and moral purposes. Will not implementing this proposal cease the ability to track and possibly persecute individuals? Absolutely not; however, it is my belief that it will certainly make interested 3rd party organizations’ jobs easier, especially if they have state-level authority to force compliance.
I’m wondering about specific scenarios where this could fix something?
This (GNOME log-in failure) is still a thing into F41 beta, and according to a unique error message, has been happening across various distros for over a year. I’m certain it’s known, I’ve plain-texted my experience with it on Fedora months ago, but still see it implying it’s either not a priority or unfixable (potentially without reverting other “improvements” that presumably have higher-priority to keep). Problem Reporter doesn’t catch that specific issue to report conveniently, but I trust someone has written a proper bug report about it by now. At the very least I did a lot of guess work (but only tamed it recently with auto-login that I normally never do)
If metrics got that error message, will that get it prioritized to be fixed over everyone who’s mentioned it? And if so, why would that override everyone who mentioned it prior to the metrics?
If I choose to opt-into metrics, I want it to improve my experience. However most of the relatively-minor issues I ran into on Linux could be seen from just using the distro casually or probably found with openQA tests; implying that known-issues from devs/QA/higher-ups get pushed in regardless. So if metrics are repeating the same known-issues, what good is that?
I like the idea of metrics for helping out smaller projects. Fedora and RH aren’t exactly small, and people are paid to provide a good OS to end-users. RH (as I understand) largely funds GNOME development.
I still don’t understand why removing a bunch of Weather locations to force-push an updated library is beneficial; I can’t select my city any more, nor anything within about 30 miles and several towns over. GNOME devs have their (imo less-than-satisfactory) reasons. Fedora’s still shipping Weather, and it leaves a permanent notification shade to choose a location out-the-box. Ubuntu and openSUSE don’t ship Weather with GNOME. I have to imagine someone in that devs/QA/higher-ups group has to of also seen this (since 2022), and in my case it’s bad QA (shipping a known-broken app with notification). I don’t believe metrics can help with this.
I don’t know if I’ll opt-into metrics, but I’m also not confident if it would really help improve my experience on Linux. The points others mentioned about privacy and identifiable metadata lean me more towards not opting-in.
In-lieu of that though I tolerate minor issues, report more-annoying ones publicly where convenient, and recently entertaining Problem Reporter for filling out bugzilla reports with details, logs, dumps, and all that was pretty awesome! (I very likely wouldn’t be willing to do all that manually without motivation and would probably just report it randomly in a forum )
I also suspect that eventually, opt-in will turn to opt-out. If by then I still see things like the above, I wouldn’t be too happy about that.
Real-people behind QA/beta testing should be catching obvious stuff before it reaches end-users to be later reported in metrics, and leaving easy-bugs to automated metrics feels like a disconnect from what’s supposed to be a user-orientated distro. Real-people not quality-assuring a real-person desktop experience sounds odd, and implies a lack-of care (if real-people don’t care enough to fix stuff before pushing it, I’d question why I’m trusting that distro to start with).
It’s like the claims about Microsoft firing their QA dept with W11 and leaving all reporting to end-users to do. In this case:
- I don’t like the idea of a lack-of paid-QA
- I kind-of like the idea of telemetry being not public and only in MS’s hands with their employment policies
- I kind-of like the idea of forcing users to report issues/suggestions on the OS they’re using
To me that implies: Without employed-QA, MS forces people to have Windows telemetry in order to get real-user metrics, locks the data down to encourage user-trust, and because it’s a paid OS, that comes with expectations of it working good and MS needs to maintain that. MS has to act on user metric data, and has the user-base to get a lot of metrics.
My two main issues with W11 were a lack-of taskbar app un-grouping and not being able to hide the clock in taskbar. I mentioned the clock thing to MS via their official app, and I’m sure a tidal-wave of people made the taskbar un-group dissatisfaction loud-and-clear. At some point (up to 23H2?), both issues got resolved! (can un-group taskbar items and hide the clock).
With this, whatever telemetry MS collects on Windows (that I mostly opt-out of mainly just-cause ), other people, and an official suggestion report got more tangible results than I got with the above on Linux.
When I wrote the first version of this post I didn’t really had read the story to its full extend. I saw something and immediately feared the worst, something I do too often. I know that. My apologies for that.
I have now read the story and there is one change I would like to propose and it’s the following.
Is it possible to only have the program which asks the question if you want to participate or not to be in the default setup? When somebody answers with Yes, I do want to participate, the rest of the software is installed and things start.
That way people, like me, who don’t want to join have this software not on their disk. Yes, it can be uninstalled but I feel safer when it never existed on my disk. The question program can be uninstalled by the user should (s)he wants to.
This is only possible if Fedora’s infrastructure is malicious, though. We’ve already promised that metrics are recorded separately, so it’s not possible to build user profiles unless we’re maliciously running different server software than the open source metrics server.
If you don’t trust us Fedora to be non-malicious, then I suggest not enabling the metrics collection in the first place. That said, we are looking into a couple different ways to remove even the need for even this much trust in the future. E.g. Firefox uses Divvi Up, in which multiple independently-operated servers would have to maliciously collude instead of just one.
You should probably just not enable the metrics collection? Possibly uninstall eos-event-recorder-daemon, so it’s not even possible for users to enable it?
We will be regularly changing what data gets collected (with community approval, of course!); there’s no promise to not do that.
OK, but there’s a lot that seems weird here:
- Presumably no government is going to be interested in issuing a secret warrant for the little data we collect, so for such a warrant to be scary, it would have to compel us to secretly collect additional data. The secret warrant would have to compel software engineers to build software that doesn’t already exist. I’m not aware of precedent for this in western countries. Usually warrants can only compel you to reveal data that you already possess. (Except telecommunications companies do have to ensure wiretaps are possible.)
- But if the secret warrant really does compel us to collect extra data, well, Fedora is open source. Presumably somebody is going to notice if we start adding a bunch of sketchy data collection? If I was personally compelled to add additional metrics collection, I would do so as loudly and incompetently as possible to maximize the chances that users will notice the problem. To actually succeed at secret data collection, I think we would have to distribute software that doesn’t correspond to the open source code.
We do have room for improvement:
- On the client side: there’s been little effort to achieve reproducible builds for Fedora. When a build is reproducible, users can audit that the build shipped by Fedora is actually built from the correct sources and has not been clandestinely modified. Most of Debian is reproducible already, and the remaining exceptions are treated as bugs. But Fedora has not been working on reproducibility at all.
- On the server side: yes, the eos-metrics server operator is vulnerable to a secret warrant. But if we were to switch to a protocol like Divvi Up, then multiple secret warrants would be required for each server operator. Presumably we would want to locate the servers in different legal jurisdictions.
We aren’t going to be collecting error messages via this system, at least not in an uncontrolled manner, since there’s a high risk that error messages could contain personal data. We already have separate error reporting via ABRT, which is frankly quite neglected.
Metrics collection won’t fix neglect of particular components. We’re short-staffed, and some projects don’t have enough maintainers, leaving issue trackers neglected. Unfortunately not many contributors are interested in maintaining core components like gdm. Same answer applies to problems with Weather. (The good news is Fedora is much less short-staffed than any competing Linux distribution.)
Metrics collection can help improve components that are not short-staffed, though.
Theoretically, yes it’s possible, except for immutable distros like Silverblue/Kinoite.
In practice, it’s too much effort (it just took us over 5 years to autoinstall openh264, which was critical priority!) and there’s no value in doing so, since it’s harmless when not enabled. But you can uninstall eos-event-recorder-daemon (the component which submits metrics to the server) if desired.
Thank you for the response.
You won’t have to add any extra metrics as what has been proposed to be included is sufficient enough; also, you will get IP addresses because the communication has to come from somewhere going to another place (my system to Fedora’s servers). I have to trust that this information will be scrubbed instead of processed on another server behind Fedora’s firewalls.
My experience echoes history in that I know governments build profiles on people that they consider either people of interest or enemies, for whatever reason. Will these profiles continue to be built outside of Fedora’s initiative? Absolutely. Will Fedora’s information proposed to be collected be unique information that was not previously captured? Possibly, and probably.
Many activists I’ve met around the world use Linux precisely to avoid intrusive metrics. As you have said, one can just uninstall it. Even then, though, I would have to continually check to see if that it gets reinstalled every single time an update or upgrade happens. I have to trust Fedora.
I have to trust Fedora that it will not keep identifiable data, that’s already part of the telemetry being collected and as part of natural communication of that telemetry. I have to trust Fedora will not make it a hard requirement. And, I have to trust Fedora that you will not implement the GPS portion of the EndlessOS which can narrow down to 1° of geographic certainty. Can it be changed to be more specific? Yes. Will it be changed? That remains to be seen.
This was just food for thought. As you decide what telemetry to include. If you truly believe that the world is becoming more free and less restrictive, then that worldview will drive your development. However, I would point out all the arrests being made just for liking or re-posting memes. (e.g., Maine trooper says he was retaliated against for reporting illegal police surveillance of citizens).
Again, I’m not saying that Fedora’s telemetry will be the impetus for mass surveillance, which is already happening. What I am suggesting, however, is that it will provide another source of data collection that could be used immorally and/or unethically. If the telemetry did not exist, at all, then it would be a moot point. It would be a non-issue. Since it is going to exist, in some form, a lot of trust is being placed on Fedora’s shoulders to do what they say they will do.
If you mistrust the integrity of Fedora Project products to this degree, then I would suggest you should have much different and much bigger concerns than this openly discussed, openly developed usage data collection.
Have you inspected and built from source the code that is used to encrypt private information on your desktop, to make sure that the Fedora Project hasn’t built in, or been compromised by someone who has installed, a backdoor for government agents to use in collecting your information? Have you done the same for all other elements of the operating system to make sure that you’re not exposed to nefarious actions?
Sorry, I’m being a bit absurd there just to illustrate the point that if you don’t trust what’s being done in the open that you can inspect, then you should already distrust what is being done off of your device that you can’t inspect, and if your personal threat model requires that level of micro-inspection and micro-control over data flow, then you might find other projects more fitting to those specific needs.
There are tools available to exclude packages from future installation, like the excludepkgs option.
It’s okay to be absurd to make a point. As a Unix admin, I absolutely make sure about every single package I install; however, on Linux, ever since about 2005, I go on trust.
There’s a difference in scenario about what you’re talking about. One is package development and installation, luckily open source advocates catch these… in time. The BASH Shell Shock is one that was in the system for years; the latest fiasco with XZ caught a lot of distros by surprise.
But, telemetry is a different beast altogether. Again, Fedora will have my IP address and I will have to take their word that it is not kept; they will have my hardware information (MAC included? Who knows?) - CPU, GPU, number of hard drives, how many times I use a program, how long an app is in use, et al. This is personal information being sent from my machine to a server where I have to trust that the organization receiving it will handle it as they say.
To provide a solution for this, perhaps engaging an established 3rd party organization to independently verify that information is handled as stated, encrypted and not kept indefinitely.
I don’t see any downside to this, especially if it helps make a better product. I would be willing to contribute given the abililty to opt-n or out at any given point.
I wonder if an implementation can be considered where the user action of opting in to telemetry would trigger the installation of the telemetry packages with the following system update (dnf upgrade
of via GNOME Software). Not having the necessary packages on the installation media and on the newly installed systems by default might raise the confidence in the mechanism to a level that is acceptable even by those against the proposal.
@catanzaro I just want to be clear that I appreciate not only your approach to this, but Fedora’s willingness to engage the community. There are too many instances of different organizations in the FOSS world where that is not the case. It instills more trust in the system and process, at least on my end. Thank you.
There are still multiple issues I have with this proposal:
-
The list of metrics you want to collect is already far too extensive in the first place (List). I mean you want to know things like, my partitioning scheme, number of times each app launched and how many world clocks I added, like wtf? If you plan to regularly change them, this MUST require users to give consent AGAIN. You can’t simply implement telemetry, obtain initial consent, and then change what users have agreed to without explicitly asking for consent again.
-
The eos-event-recorder-daemon should not be installed by default if possible. Instead, it should be installed only after the user has provided their consent. Keeping this daemon installed by default poses a security risk, as it gives the OS itself the ability to silently collect and send data, which could be exploited in a malicious scenario.
-
Users should be able to adjust the level of telemetry data being sent, similar to how it’s done in KDE. KDE gives users a slider for how much telemetry they want to share, from minimal data to more detailed usage statistics, which is still way less than this crazy list you shared.
Which adds onto my other post as to what this information is supposed to aid?
Fedora knowingly defaults to Btrfs non-Server (at least with Workstation), and I’m hoping does all the QA to make sure it doesn’t break when defaulted to end-users.
I’ve never used that default and always do Custom Partitioning/no LVM, standard traditional layouts ext4/XFS/F2FS. I highly suspect no defaults are being changed based on that, so why does someone want to know about it?