F40 Change Request: Privacy-preserving Telemetry for Fedora Workstation (System-Wide)

2 posts were merged into an existing topic: Opt-in / Opt-Out? A breakout topic for the F40 Change Request on Privacy-preserving telemetry for Fedora Workstation

You’re being too strict. You’re complaining about the case where the setting is in the initial state (user has neither consented nor rejected data collection) and data is collected locally but never uploaded to Fedora. (I know you understand this, but I want to be crystal clear for everybody else who may be skimming this discussion rather than reading your post in detail.) You want us to prominently explaining this level of complexity to users and instruct them how to remove all components from the system, even though they are not uploading any data to Fedora, and even though they can be deactivated by flipping a simple switch. It doesn’t seem like a serious request to me.

What I am more willing to do is present the simple switch to users when upgrading from previous versions of Fedora, using gnome-tour. (I had been hoping we could save this for later, though, or even not do it at all and only collect data from fresh installs.)

I mean, you know where the disable switch will be. View it in gnome-control-center and the local collection will be disabled. You won’t even have to flip the switch, since in gnome-control-center the switch will be off by default and just viewing the page will be enough to disable local collection. Switching to another distro to avoid flipping a toggle switch seems pretty extreme.

Since it seems like we are already collecting that data, what is the distribution of users between workstation and all other desktop alternatives?

I think the question that has been asked about whether only including workstation gets you a truly representative sampling is actually interesting or not. If it is interesting, how would you use it in way that only impacts workstation?

1 Like

On the scope - in EU you have to be particular on purposes of using my private data (the data which directly or indirectly could identify me personally, that’s not only “IDs”, like IP, machine ID, but also behavioral data). GDPR tells data managers to provide specific legitimate business reasons why you or your business partners need specific data. I guess “in order to make Fedora Workstation better” is not enough and will not work. In addition to that, the framework must implement means “to forget” users in RedHat’s (IBM’s) and their partner systems and databases upon user’s request.

On data protection - SSH and HTTPS are not enough to protect user’s data “at rest” and or “in transition” between legal entities.

On corporations doing business - I have no problem with that, that’s RedHat’s and IBM’s choice “to milk” users data for profit. Sorry, I do not believe to vague statements “to make OS better”. It has to be more specific for me to trust and share some stats from my machines.

And for this part I need working transparency on what my data is used by whom for what purposes. This new trust must be earned and with such change I think it is not enough to trust just because in the past everything was OK.

On proxies and audits - proxies will not help less technically savvy users to protect themselves. This is not a realistic solution.
The solution for the audit could be when purposes and parties benefiting data collection and analysis are disclosed in, let’s say, yearly public report, performed by independent 3rd party, which must also check if actual use of collected data stayed within declared purposes and users of those stats.

5 Likes

Maybe they are being a bit paranoid (although it is more a lack of trust) but it would be more reassuring if no data was collected at all when they haven’t seen the consent screen. That way, if someone toggles the telemetry switch they know they are sending information from now on instead of retroactive older information as well

Hi everyone! Thank you all for the passion you’re bringing to this controversial topic — that’s good, but a lot to keep up with. As you may know, this is the first time we’re using Fedora Discussion as the main place for a Change Proposal discussion[1]. I’m glad to see a lot of new and familiar names engaging on the topic.

Discourse[2] has internal threading for replies, but intentionally presents a flat structure. When a conversation gets big like this, that’s quickly overwhelming.

Therefore, we have identified several important themes and created separate break-out conversations for those. I’ve linked them in an edit to the first post way up above, but that’s easy to miss, so I’m repeating here. These are:

If you’d like to read what people have said about those topics, and/or add new related thoughts, please see those topics. As much as possible, please keep this topic for high-level responses or for posts that don’t fit neatly into those categories. (They’re not meant to be comprehensive, just some of the biggest themes.)


  1. previously, this was all on the Fedora Devel mailing list ↩︎

  2. the software that this forum runs on ↩︎

4 Likes

In retrospect, perhaps we should have started the experiment of using the forum for these conversations with something a little… less intense. But hey, no better way to learn than jumping in, right? As we all know, gigantic threads (forum or mailing list!) are hard to navigate.

In the future, I think we need to be faster about identifying these as they emerge and setting up such topics quickly. That would have made things a little less chaotic. Additionally, although it is not obvious, anyone can create a linked post (see Site tip: create linked topics for deep dives or tangents). You can’t create those in this category (so, use Project Discussion — or The Water Cooler or Ask Fedora if appropriate) but a moderator can move such “branches” if they should be.

Also, speaking of lessons learned: I think it will be helpful if instead of including diverse topics in one long reply, we create a new, shorter posts each discrete subject (even if they’re not currently in the “break-out” list). I know that’s counter to how many people are taught to use forums or lists, but I think it will work better here. Not everything must fit that rule, but I think it’ll make the conversation easier for everyone.

Thanks, all!

6 Likes

I went through the frustrating process of making an account (2 accounts?) just to put my voice in here and say this is a terrible idea, and the fact it’s being not only considered seriously but pushed so hard has very much damaged my image of Fedora and Red Hat. I’m already reconsidering my F38 install after years of calling Fedora home, and I will absolutely walk away should this go through. If people, given a yes/no choice, will choose no, then you cannot say you’re doing right by the community to enable it anyway. This rubs me the wrong way perhaps the most. The change feels like it also runs counter to what I thought the spirit of Fedora was too. It feels like such a weird, frustrating 180 against the community, especially as so many have voiced their displeasure at this on a variety of sites. I sincerely hope this change is heavily reconsidered. No clue under which topic my rant should go as I disagree with many facets of it, but I wanted to say something against this.

Edit: To be clear, the only way I’m okay with this (short of opt-in) is if there’s a clear Yes/No option at install with no default selected, a choice must be made to allow the data collection. If that isn’t acceptable, which from what I’ve read it sounds like the change owner does not think it is, then neither is the change itself in my eyes.

12 Likes

I didn’t know about this. @mattdm can you field this question please?

I think the question that has been asked about whether only including workstation gets you a truly representative sampling is actually interesting or not. If it is interesting, how would you use it in way that only impacts workstation?

The primary goal is to make GNOME desktop design changes. I think impact on non-GNOME users would be limited.

1 Like

To be fair, if starting this experiment had waited for something else, and this discussion had happened over a mailing list…can you imagine how much more impossible to track it would have been? Take this as one data point but I think “pressure-testing” the idea of Discourse here was the right move since you learn so much through the process.

And on the overall topic…I think one area I’m having trouble with is covering what, to me, are very conceptually different topics under a broad brush of “telemetry, yes or no?”. As an example, I would feel very, very differently about an automatically collected, periodic hardware survey than I would about granular software usage capture, and I think they are differently useful.

Finding out what hardware is attached to devices running Fedora Linux, and what drivers are installed to leverage that hardware (e.g. what % of users with Nvidia GPUs are using the proprietary drivers), feels like a relatively low risk activity if the data becomes public - it’s a pretty common-place activity in the Ask Fedora section to voluntarily post a hardware profile of your system for the whole world to see.

Finding out how I am using my system feels very different. Let’s hypothetically say that I have accumulated a large DVD collection over the years, and am digitizing those onto a Plex server using MakeMKV and Handbrake. Hypothetical me would be somewhat concerned that someone who looks at that data would associate that activity with bootlegging/piracy if it became broadly public.

This also goes along with my previously-stated belief that granular, detailed telemetry on software usage is not broadly useful - although I would amend that to say, I understand its utility if there is software with a singular purpose (like making a sale for Amazon, or successfully converting an image for an image converter app). But if we’re collecting data on how many people have changed the switch from dynamic to static workspaces in GNOME…what action can we take based on that? We would have to actually find out from people whether they know about the switch at all, if they understood what it meant, why they use or don’t use multiple workspaces in the first place, etc…

So one general concern I would have here is that the proposal would ask users to make one overarching sign off on the transmission of datasets with what I see as multiple levels of sensitivity (and utility).

5 Likes

To be clear, again, is not a lack of trust in Fedora. It would make no sense to use Fedora in the first place otherwise.

I didn’t think of that in terms of retroactivity (mainly because I assume I wouldn’t ever switch the toggle on), but now you mention it, yeah, that’s a big issue. The “from now on” part is essential.

Thanks for the “paranoid” anyways :grinning: But at least you added “a bit”, which I appreciate.

3 Likes

So, I can’t speak about users, just systems, and there are number of big caveats. This comes from DNF Countme, as @siosm mentioned earlier. Particularly, this uses the VARIANT_ID value from /etc/os-release. If you install from a Fedora Spin, you should have that set accordingly. if you install Fedora Workstation, that’s counted as Workstation even if you later (or immediately!) switch to a different release. If you install from your own kickstart or the minimal network install, you get “unspecified”. (This may also include some systems which were installed over a decade ago and have been upgraded since.) That said, here’s the latest graph:

This shows the distribution for the last six releases (when DNF countme started), with each measured at the peak for that release (which, for F38, is “last week”). This is “persistent” systems — although individual systems aren’t tracked, I use a heuristic to guess the number of systems that are temporary — tests, or CI, or something else. (That reduces the amount of Cloud and CoreOS significantly.) Note that containers are also filtered out.

Here is the same with just desktop variants:

Anything not showing up is under 1%, and there is no “other” because the total of under-1% responses does not reach 1% either.

As is traditional, I’ll present more on all of this at Flock in a few weeks. :classic_smiley:

4 Likes

A post was merged into an existing topic: Opt-in / Opt-Out? A breakout topic for the F40 Change Request on Privacy-preserving telemetry for Fedora Workstation

I’m mostly a user and usually don’t take part in online discussions. But I care about Fedora, so I guess this is a good time to start being an active community member.

When I first read the proposal I was shocked. There are way less intrusive ways to gather most information mentioned in the proposal. Need to know what hardware people are running? Host an annual survey with cool prizes. Need to know what IDE developers are using? Do a poll. I don’t understand why this has to be done though the most intrusive way possible - running a data gathering service on every install.

Telemetry has a really bad reputation and I would imagine lack of it is one of the main reasons people start using Linux. I know it was for me and many of my tech-savvy friends. I’ve actually used it in my sales pitch to potential Linux converts. “Use Fedora, there’s no telemetry, none, period.” Guess I would have change that to “Yeah, there’s telemetry, but it’s the good kind… you know?”

Personally I will not be comfortable with any kind of telemetry service running on my systems. Doesn’t matter if it’s opt-in, opt-out, anonymized or if it only pulls noise from /dev/random.

Fedora / Gnome is already awesome and has been getting better with every release. Just trust your vision!

10 Likes

This is a really good example.

I think collecting stuff like file names would very clearly violate user trust. That’s too invasive. I want you to be comfortable with enabling the telemetry system even if you are the world’s biggest media pirate and the telemetry server were to be hosted by the Recording Industry Association of America rather than Fedora. :slight_smile:

But in your example, even the use of specific packaged applications (MakeMKV and Handbrake) could be considered derogatory. That’s not something I had considered. Now in theory, we don’t know that it’s you using these applications, because there are no user profiles and the application use count is just some metric telling us some user somewhere on the planet used Handbrake today, not that you in particular used Handbrake. And it should not be possible to correlate that data point with other data points, so deanonymizing you shouldn’t be possible.

But it’s still reasonable to be concerned, because if the azafea server gets hacked or if Red Hat decides to be evil, then the non-correlation promise goes out the window. Putting a proxy server in the middle could still hide your IP address, though.

Another good point. What would we actually do with random data like static vs. dynamic workspaces? I’m not sure that we would seriously consider switching back to static workspaces even if data tells us that people are using them, because dynamic is better. :smiley: So maybe we shouldn’t be collecting that data at all.

2 Likes

What is not easy is deciding to un-send data taken by a service you didn’t consent to. The system should be designed so this doesn’t happen. Safe by default. Safest being no automated telemetry at all.

I would like to keep my information on my machine. I don’t think this is an unreasonable goal but I have to fight for it to an unreasonable degree day-to-day. I’ve switched from Windows, I’ve tweaked browser settings, installed addons, I’ve set global environment variables DOTNET_CLI_TELEMETRY_OPTOUT=1 and DO_NOT_TRACK=1 (https://do-not-track.dev/), and I opt out of all data collection I can. A single mistake - one overlooked “very very easy to turn off” switch - and my information is sent to some remote server, to some strangers I have to hope don’t abuse it. You are proposing to add to this. At the OS level, no less.

I am concerned the community response was unanticipated. That this thought even made it to the proposed Change level. Given this, on top of recent decisions by Red Hat, I do not think I can trust Fedora as my distribution any more, regardless of whether this proposal is implemented.

8 Likes

I’m going to refer you to my discussion of the GNOME user survey for an explanation of why polls don’t work. Just looking at the results, we can be confident that’s not a representative sample of users, and accordingly it’s really hard to make design decisions based on the results. We don’t want to accidentally build an OS that’s catered to the needs of people who respond to surveys.

1 Like

Well, I anticipated a big negative response.

I didn’t anticipate that it would take place on Discourse rather than the devel mailing list. We are getting a lot more feedback here, from a more diverse group of users, than we would have on the mailing list. The number of comments is, accordingly, larger than I had expected. It takes longer to get threads up to several hundreds of comments on the mailing list. Still, allowing more people to comment more easily is a positive change.

I also didn’t expect the direction of the feedback would be so strongly centered on the opt-in vs. opt-out topic that is being discussed in the breakout thread.

I crafted the change proposal with the intention of heading off as many complaints about creepy tracking as possible, and it does look like I was at least somewhat successful at that. And the proposal does say that it might require significant changes based on community feedback, so it’s not like I failed to anticipate unanticipated feedback. :slight_smile: I intentionally proposed this early enough that we have plenty of time to modify the system and change how things work in response to feedback.

2 Likes

Hi everyone,

First, thanks to everyone who has provided feedback so far. I am trying to read every comment and respond to a good amount of them. I became rather concerned earlier today when comments were coming in faster than I could read and respond to the ones that had already been posted, but I was finally able to get caught up on this main thread after Matthew split out most of the conversation into the four breakout topics. So I’m now caught up on this main discussion thread, but I have not yet read through the breakout topics. I spent 3 hours last night and 10 hours today reading and responding to your feedback here, and I don’t want to spend all weekend on this, so I’m going to take the weekend off and will return to reviewing your feedback on Monday. (I’ll use this post here as a checkpoint for that purpose.)

We’ve received a large amount of feedback on the opt-in vs. opt-out topic. I’ve just left a comment on that here in the breakout topic, which you can respond to there (please not here).

I’ve also been collecting a list of links to some of the more specific/concrete improvement suggestions we’ve received so far. My short-term plan is to edit these into the Feedback section of the change proposal, but that takes time and I really need a break now. :slight_smile: So I will do that early next week.

5 Likes

Oh sorry, I forgot to be outraged about that. Hold on.

Ewww! Telemetry! Creepy!

Sorry, I thought it was funny as it seemed you wanted to give people more ammo XD All kidding aside, as long as you are open to discussion I have a feeling that things can be solved. Some people will jump ship, but I do believe some level of telemetry is acceptable as long as the user has a choice.

2 Likes