Hey all, I’m joining this discussion as someone who wears a few hats, albeit none of them are red: elementary OS co-founder and UX architect for a decade, System76 UX architect on Pop!_OS in its early days, Endless OS Foundation employee working with non-profit partner orgs (though I joined long after the metrics system was designed and deployed), GNOME Foundation member and design contributor, indie app developer, and user of Fedora and GNOME.
I’ve personally been involved in technical discussions, design discussions, user studies, surveys, and remote user testing across all of those projects. Time and time again we get some sort of data from those efforts that helps us make a decision, and time and time again there’s a feeling of massively self-selecting for the specific audiences involved—and after shipping a change, I very often hear complaints about how certain types of users weren’t considered.
I’ve also been having in-person discussion about the need for radically transparent, open source, privacy-preserving metrics for the last five years as a result of my and my teammates frustrations with designing and developing free software; if we’ve talked at GUADEC or LAS or SCALE, we have probably had this chat!
I want to start by noting: I strongly agree with this direction for Fedora—and the open source desktop space as a whole—since I believe we have the opportunity to raise the bar when it comes to privacy-preserving metrics that still enable us to make the tools and projects we use every day better for all of us.
That said, I believe there are some ways that would cement this as the clear, objective net good for Fedora and the broader ecosystem:
Better communication/definition of “opt-out”
This is always a sticking point, and it’s nuanced which makes it even worse to discuss on the Internet.
However, I think people hear “opt out” and assume it means “collects and sends data unless and up to the point at which you turn it off” which I know is not at all what anyone is proposing whatsoever, but again: nuance is hard.
Maybe someone smarter than me has coined terms, but off the top of my head, there are:
-
Buried opt-in: this would be, say, requiring digging into settings or installing a package to enable a feature. This is the least-useful type of metrics data as it comes with extreme self-selection bias.
-
Buried opt-out: this would be on-by-default feature that requires digging into settings or uninstalling a package to disable it. This is the evil-bad version a lot of people think about when hearing “opt out.”
-
Explicit opt-in: this would be similar to how location services works in GNOME today, where it is presented clearly upon first-run, is unchecked, and requires performing an explicit action to opt in. This is what people often think about when hearing “opt in,” and carries the risk of self-selecting way from users who don’t understand the option or don’t know what to do, and so they leave it unchecked.
-
Explicit opt-out: I believe this is what is being proposed. It is just like the explicit opt-in, but the checkbox is checked by default. Notably, the feature is not enabled until a user explictly sees this choice and continues without opting opt.
I think there is also a fifth option that could be considered:
- Explicit choice: this is probably technically an explicit opt-in, but I think it is a more useful presentation. A feature is clearly presented with two options, enable or disable, and the risks/benefits are clearly laid out as part of the choice. The user must actively make a decision, and neither is considered the “default.” It has risks of choice paralysis (“I don’t know which one to choose! What if I make the wrong decision? Can I change my mind later? Maybe I install Ubuntu instead because it just has telemetry checked by default so I don’t have to think about it.”) If the community does not want an explicit opt out, then I think this is the direction I would pursue—I mention this more below when talking about design).
Radical transparency of the data itself
if the concern is that “someone smart could de-anonymize the data,” then a red flag is that someone privy to the data, like the server admins, or theoretical evil Maroon Tophat employees could de-anonymize it as well. The best solution would be to be so confident in the system that even with raw data, nobody could de-anonymize it, and it would be provable by making that data available to all.
If there are still constraints there (like the practicality of making that sort of dataset trivially available), then an open source public dashboard would still go a long way to providing more transparency than nearly any other piece of software in existence. An example of this approach is with privacy-preserving Plausible Analytics (used by elementary and lots of privacy-minded folks) where they publish the very same dashboard that is used by site owners publicly to the world. Fo example, here is my personal website’s dashboard: Plausible · cassidyjames.com
Collaboration with a trusted third party
While I would say the Fedora council and community are a trusted party, it could help a lot to have someone like the EFF or another trusted third-party in the privacy/digital rights space audit and somehow sign off on the proposed system. This would help demonstrate that no, you don’t just have to take our word for it, it has been audited.
I’d be open to hearing suggestions of what sort of third party orgs would instill confidence in this.
Consider additional active privacy protections
I am not an expert in this space, so I will not make a fool of myself, but there are anonymizing mechanisms like differential privacy and randomized response that have been used for decades to help ensure data is mathematically impossible to de-anonymize. If these leading privacy protections were built in and well-communicated, it would help a lot to assure folks.
Be open to/iterate on design improvements
I regret that I was unable to propose a specific design for this ahead of time, but I would be interested in working myself or within the GNOME design community on a design spec (or even multiple proposals) that helps assuage fears of this being some secretly buried opt-out thing. Personally, I find the above-mentioned explicit choice approach the most interesting, so I will try to mock something up in that direction and share it.