F42 Change Proposal: Opt-In Metrics for Fedora Workstation (system-wide)

Opt-In Metrics for Fedora Workstation

This is a proposed Change for Fedora Linux.
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.

Slow-Mode Feature Activated for this topic
This proposal generated a lot of comments the last time it was announced which made moderation really hard. You told us it wasn’t the best experience because of the lack of moderation, and we listened, so in order to keep this conversation useful for everyone, we have enabled a slow mode feature for this change to help with moderation as we want you to have a good experience when engaging in our change proposal process . That being said, while your opinion matters greatly, I will ask and remind everyone to please share yours in a polite and constructive way. Our code of conduct is designed to keep our community a safe place for our members to share their ideas in, and sometimes have disagreements about things, without the fear of people resorting to taking personal shots or making side remarks at each other and at the project overall. If you see a point being made in a comment already posted, there are some great reaction buttons you can use to agree with it. I often see comments that people have shared that capture my thoughts and feelings already, so I add a reaction rather than accidentally spamming the thread :slight_smile: Please remember that feedback on changes should be about technical merit and not personal preference, and trust in our projects governing bodies like Council and FESCo to always act with the best interest of the Fedora Project in mind when deciding to accept or reject any changes people propose, when they progress to the voting stage of the process. Fedora is built on a foundation of Friends, so please keep this at the forefront of your mind when you are engaging with each other.

Wiki
Announced

:link: Summary

The goal of this change proposal is to provide the Fedora community with accurate, representative data about the real world use of Fedora Workstation. By doing this, we believe that we can accelerate the development of Fedora Workstation, and ensure that it improves in line with our users’ needs and requirements.

Protecting user privacy is of utmost importance for this initiative. To this end, the service will only collect generic, standardized data, and will never collect anything that is personally identifying. It will also, of course, be fully open source. On the server side, the data will be stored in a way that prevents user identification.

Another important aspect of the initiative is that it will be run in a transparent manner, and will be governed as part of the Fedora project. A new SIG will be responsible for the service, and will be open to community participation. It will publish analyses of the data which has been collected, provide documentation about how the service operates, will share samples of the database data, and will respond to requests from the community.

Finally, we intend to ensure that metrics reporting is fully under the control of end users. Metrics collection will default to off, and will only be enabled through a clear on/off prompt in initial setup. Users will be able to view the data that has been collected locally, and will be able to remove the client software from their systems, should they choose to do so.

To address concerns that the community might have, the change owners have created a privacy and transparency checklist, which will be updated as the initiative progresses.

:link: Owners

:link: Current status

The proposal is to deploy a pre-existing data collection system - called Azafea - for Fedora Workstation. Azafea has both client and server components. Significant work is required to make a wide scale deployment of Azafea possible (see scope section below).

This updated proposal obsoletes the original proposal.

:link: Detailed Description

This section includes a detailed description of each aspect of the metrics proposal.

:link: Data that will be collected

All collected data will be anonymous:

  • We will not collect identifying information, such as email addresses, online account details, and IP addresses.
  • We will only collect generic, standardized information. For example, we want to collect data on which apps are used, but we will never collect data on which websites are viewed or which files are opened.
  • Server side, each metric will be stored separately and will not be linked to other metrics from the same system. This will prevent user fingerprinting through the cross-referencing of anonymous information.

All of the code in the data collection system will be open source and available for public inspection.

The data we plan on collecting will fall into the following categories:

Category Examples
Hardware details CPU, graphics, cameras, which peripherals are present.
System settings The display language, which input methods are used, which accessibility features are enabled.
Desktop usage patterns Which apps are used, how many open workspaces there are, how often each system settings panel is opened.
Performance reports Disk and memory usage.
Evidence of problems Counts of system crashes, OOM events, app crashes.

For more detailed information, see the preliminary list of metrics that we want to collect. This list indicates the purpose of each metric that we hope to collect.

:link: Steps to ensure anonymity

The metrics that we hope to collect are all generic in nature, and do not contain personal or identifying information.

To prevent accidental collection of identifying information, the data we collect will be filtered on the client side, so that only known, standardized variables are included. For example, when recording which apps are used, we will only record known package names, in order to prevent custom apps with identifying metadata from being recorded.

Wherever possible, the system will aggregate data locally prior to upload. For example, it can report the number of times that a feature was used in a week, instead of the exact time whenever it is used. This method further increases anonymity by reducing the precision of the data that is reported.

We will only deploy the service once it has undergone a thorough period of testing, during which we will verify that the database is only being populated with anonymous data. (Data from the testing phase of the system will be permanently deleted.)

:link: How metrics data will be used

We anticipate that the data we collect will drive myriad improvements within Fedora as well as the wider ecosystem. These improvements include:

Resource prioritization - knowing which hardware, features and apps are used most will allow developers and partners to focus their efforts where they will have the most impact.

Software improvements - data about usage and performance patterns can drive optimisations in existing software, in terms of both technical and UX design.

Configuration enhancements - decisions about default settings and the default composition of Fedora Workstation can be based on observed usage patterns.

Better development practices - we aim to promote and encourage user and data driven development practices through this work.

To achieve these impacts, analysis of the collected data will be published and circulated to the relevant developers and projects.

:link: Who will have access to metrics data

In the interests of transparency, we will put the following mechanisms in place for viewing the data that is collected:

  1. Raw data from the database will be published during the testing phase, prior to wide scale deployment
  2. Members of the community will be able to join the metrics SIG, in order to get full ongoing access to the data
  3. After deployment, a randomly selected sample of the database will be published (once it has been manually checked)
  4. Members of the community will be able to request the SIG for copies of the database, which will be shared privately

This proposal is an attempt to balance the need to protect privacy with the need to provide transparency. We have a high degree of confidence that the database will only contain anonymous data (see “how will we ensure that the system only collects anonymous, generic data?”). However, there is always some risk that something could go wrong with data collection. Out of an abundance of caution, we therefore only want to share data once it has been manually checked.

:link: Approval for changes to the metrics system

Any changes to the metrics system and its governance arrangements will require approval by FESCo. This will include any changes to the:

  • metrics data that is collected
  • the metrics SIG (its rules, role, composition, membership terms)
  • the technology used
  • changes to the UI for user opt in/opt out
  • hosting of the infrastructure or involvement of 3rd parties

:link: User control

The proposed system aims to ensure that users are always in control of metrics collection on their systems. This will be achieved through the following:

  • The setting for metrics collection will enabled/disable both local metrics collection and data upload
  • Metrics collection will be off by default
  • Metrics collection will only be enabled through an explicit opt in from the user, which will be presented as part of initial setup
  • It will always be possible for users to disable metrics collection from the system settings
  • It will be possible for users to view the metrics that have been collected locally on their systems
  • It will be possible for users to remove the metrics collection components from their systems, using dnf

:link: Metrics system components

The metrics system would be composed of server and client Azafea components.

An Azafea server deployment consists of five components: 1. an nginx proxy server, 2. azafea-metrics-proxy, 3. redis, 4. azafea itself, 5. a Postgres database

nginx proxies HTTP requests to azafea-metrics-proxy, which is itself a simple HTTP server that adds batches of metrics into the redis database, where they will be fetched by Azafea and stored into Postgres.

The client side consists of the following components:

  • eos-metrics - a D-Bus interface that applications and services may use to record events, plus a GObject library that provides a simple API around the D-Bus interface
  • eos-event-recorder-daemon - the service that actually implements the D-Bus interface: it collects metrics recorded via D-Bus, batches them together, and sends them to the metrics server at predefined intervals
  • eos-metrics-instrumentation - the component that calls D-Bus methods on eos-event-recorder

:link: Feedback

The initial version of this proposal generated a huge amount of feedback and debate. We have put a lot of time and effort into engaging with this feedback, and the proposal has been substantially changed in response to it. We are grateful to the Fedora community for enabling us to improve the proposal in this way.

We know that there were issues with the original proposal, and that these led to serious concerns amongst the community. We hope that the updated proposal addresses these concerns, and look forward to receiving further feedback.

The following is a summary of the key points from the discussion so far, along with details of the steps that have been taken in response to them. Additional information is also included in the FAQ If we have missed something from that discussion, please let us know.

:link: Opt in or opt out?

The original proposal specified that metrics upload would be disabled by default, and that the UI setup would include an on by default switch to allow users to opt out. This aspect of the proposal attracted by far the most negative feedback.

As a result of this feedback, we have changed the proposal: we now propose that initial setup will show an explicit yes/no prompt which has no default value.

We recognise that feedback about the opt-out UI reflected wider concerns about the privacy and transparency of the metrics system, which we have addressed through other changes.

:link: Proposal omissions

We received feedback that the original proposal omitted key details from the proposal, including:

  • The benefit to Fedora
  • Which metrics will be collected
  • That each metric will be stored separately and will not be correlated
  • How members of the community will be able to access the database
  • Whether users will be able to view the local data that has been collected on their systems
  • That the metrics packages can be removed using DNF
  • The policy through which the collection of specific metrics will be approved

This information has now been added to the proposal.

:link: Ability to view the entire data set

This was a frequent request in the feedback we received. We understand the motivation to have transparency and to verify what data is being collected.

“Who will have access to the data?” contains an updated proposal which we hope will satisfy this desire while also preventing potential privacy issues.

:link: Risks to anonymity if the metrics server is hacked

This was another major subject of discussion, with various concerns being raised.

We are confident that it will not be possible for the administrators of the metrics system to identify or fingerprint users under normal operation of the metrics server. We also want to emphasize the generic nature of the metrics we want to collect.

We have also committed to:

  • Take steps to minimize risks, such as having short retention of server logs
  • Manage the server through the metrics SIG, so that members of the community can contribute their expertise
  • Document the infrastructure setup for the metrics server once it has been setup, in order to solicit further feedback

These points have been added to our privacy and transparency checklist.

The metrics server will not store IP addresses or entire batches of metrics data. However, we acknowledge that, if Fedora infrastructure is compromised, an attacker could begin recording this information. We acknowledge this as a risk of the system.

:link: Local data collection

The original proposal specified that local data collection would default to on, while upload of that data would default to off. Some pointed out that this would be a privacy risk.

In the new version of the proposal, local data collection will only be enabled after the user has consented to metrics collection.

:link: Other suggestions

We received various other suggestions during the debate about the original change proposal. These included:

:link: Provide fine-grained user control over which data is uploaded

This would add complexity to the system and to data analysis. We are also unsure how much these fine-grained controls would be used in practice. This is not something that we are rejecting outright, but it is unlikely something that we ourselves would be able to add to the initial version of the system.

:link: Only collect some metrics for a fixed time period

We agree that this makes sense for some metrics and we have added this to our privacy and transparency checklist, as a future work item.

:link: Restrict metrics collection to a small sample of users

The main issues with this approach would be ensuring that the sample is representative, and our ability to detect issues experienced by subsets of the user base.

:link: Collaborate with a trusted third party

The idea behind this suggestion was for us to get additional oversight and input from an organization that has expertise in data privacy issues. We’d be very happy to do this, but are unsure who that third party would be. We are open to suggestions!

:link: Adopt differential privacy techniques

Differential privacy would potentially allow Fedora systems to submit inaccurate data to the metrics server, while ensuring the overall data set is still representative and useful. We would welcome collaboration from Fedora community members interested in improving the metrics collection system to adopt such techniques.

:link: Benefit to Fedora

See “What will the data be used for?”

:link: Scope

  • Proposal owners: this change requires substantial technical and nontechnical work from the change owners. This will include:
    • Properly packaging eos-metrics, eos-event-recorder-daemon, and eos-metrics-instrumentation for Fedora
    • Modifying eos-metrics-instrumentation so that it does not send events that are not approved for use in Fedora
    • Creation of the metrics SIG and its various policies and procedures
    • Documentation for end users and members of the community
  • Other developers: Community Platform Engineering (CPE) will need to host the metrics server infrastructure.
  • Release engineering: #11514
  • Policies and guidelines: see “How will data collection be approved?”
  • Trademark approval: N/A (not needed for this change)
  • Alignment with objectives: there are currently no Fedora Initiatives. However, the generated data will be broadly applicable to Fedora community activities.

:link: Upgrade/Compatibility Impact

There are no special technical challenges in this regard.

Metrics collection will only be enabled in response to an explicit opt-in by the user, through a UI in either gnome-initial-setup or gnome-control-center. gnome-initial-setup is only shown for new installs, meaning that the only way to enable metrics on an upgraded system would be through gnome-control-center.

:link: How to Test

Testing is not currently possible. Instructions will be provided when this changes.

:link: User Experience

The user experience for the system will consist of:

  1. In initial setup, a UI to choose between metrics collection being on or off. There will be no default in the UI and users will have to explicitly choose one of the two options.
  2. In the privacy Settings, a switch to turn metrics collection on or off
  3. User documentation about the service
  4. A method to view locally collected metrics data

:link: Dependencies

Packages wanting to collect metrics data will need to depend on eos-metrics. For example, to collect statistics about Settings usage, the gnome-control-center package would need to depend on eos-metrics in order to send a metric to eos-event-recorder-daemon.

:link: Contingency Plan

  • Contingency mechanism: remove the eos-metrics, eos-event-recorder-daemon, and eos-metrics-instrumentation packages from the workstation-product comps group, and rebuild any packages that gained a dependency on eos-metrics.
  • Contingency deadline: beta freeze
  • Blocks release? If the change is incomplete, it will need to be reverted before release.

:link: Documentation

This feature depends on several different upstream projects, each of which have their own documentation.

Client side components:

  • eos-metrics has online docs at D-Bus interface XML. API documentation is also built and installed in a docs subpackage.
  • eos-event-recorder-daemon and eos-metrics-instrumentation components do not have online documentation at this time.

Server-side documentation:

:link: Release Notes

These will be provided if the proposal is approved and successfully implemented.

Last edited by @amoloney 2024-07-02T09:30:05Z

14 Likes

How do you feel about the proposal as written?

  • Strongly in favor
  • In favor, with reservations
  • Neutral
  • Opposed, but could be convinced
  • Strongly opposed
0 voters

If you are in favor but have reservations, or are opposed but something could change your mind, please explain in a reply.

We want everyone to be heard, but many posts repeating the same thing actually makes that harder. If you have something new to say, please say it. If, instead, you find someone has already covered what you’d like to express, please simply give that post a :heart: instead of reiterating. You can even do this by email, by replying with the heart emoji or just “+1”. This will make long topics easier to follow.

Please note that this is an advisory “straw poll” meant to gauge sentiment. It isn’t a vote or a scientific survey. See About the Change Proposals category for more about the Change Process and moderation policy.

4 Likes

Obviously some of this still needs flushing out (“How to Test”), but makes it no sense to complete until the premise of the proposal is accepted. My preference for an explicit yes/no prompt with no default is acceptable to me. The details on what data to be collected seems reasonable to me at a glance and should not require information that I would not mind sharing. Based on the initial reading, I would be inclined to opt-in, in fact. I am not familiar with the metrics components, so there may be some concerns there, but that’s for folks to decide on as you get closer to implementation. Thanks for taking the time to consider all the feedback and coming back with a proposal that shows careful thought.

13 Likes

Is it fair to assume that one would not be able to continue setup without answering yes or no via UI or another method (IE: kickstart, cloud-init, ignition, etc…)?

2 Likes

Overall, I reckon this proposal will be beneficial to Fedora.

The information collected will be useful for making important development decisions and the opt-in nature is perfectly aligned with a general pro-privacy stance.

My only specific concern with this proposed system is potential data poisoning. How will we safeguard the database from people submitting bogus information?

As the system will be open-source and privacy preserving, it will be easy for people to submit bad data, and hard for us to filter it. I don’t anticipate it to be a major problem, as a potential attacker has little to gain from this other than causing disruption.

Are there any mitigations/protections being considered to keep collected data clean?

2 Likes

Hey everyone, welcome to round two.

If you’re interested in joining the proposed metrics SIG, please contact me. I hope to be wrong, but I expect it may be difficult to find people who are interested in participating. If you’ve contributed anything to Fedora before, you’re probably a good fit for this SIG. (It’s not limited to software developers.)

Probably, yes. (It would also be permissible under this proposal to allow the user to continue without answering, if this results in metrics being disabled.)

Metrics will always be off after installation and only enabled during gnome-initial-setup. I suspect other Fedora editions will want to add their own metrics in the future, but this proposal is only for Fedora Workstation, so gnome-initial-setup is all we care about. Other editions would need to figure this out in future change proposals.

We should be able to block simple attacks, but honestly a sophisticated attacker is likely to win. There shouldn’t be much incentive to launch such an attack, but… well, this is the internet we’re talking about…

6 Likes

Shame to see it stepped down to opt-in instead of opt-out, but hopefully the information gathered will still be useful.

2 Likes

I think this is a great update for the proposal. It’s clear that a lot of effort was put into answering previous complaints in a very comprehensive way.

I have doubts about one part:

Packages wanting to collect metrics data will need to depend on eos-metrics. For example, to collect statistics about Settings usage, the gnome-control-center package would need to depend on eos-metrics in order to send a metric to eos-event-recorder-daemon.

In general, the “client” will need to gracefully handle the state where the eos-metrics endpoint is not running. For example, the user may disable it, or it may crash, and other applications should still start and work without interruptions. Even if eos-metrics package is installed, there is no guarantee that the endpoint is running. So I don’t think the dependencies on eos-metric package are needed.

I think it’d be preferable to pull in eos-metrics package via comps, and let the stats collection happen if the endpoint is available, without any hard requirements.

4 Likes

I use Linux Operating Systems because they respect privacy.
This gives me the MicroSoft Windows chills.
I am strongly against it, because at some point identifiable information will start to be collected.

4 Likes

This is a reasonable concern, however I would argue this proposal mitigates it.

Firstly, this system is entirely opt-in, therefore if you do not want to provide metrics then not opting-in means no data will be collected.

Secondly, the system is open-source meaning it is easy for anyone to determine what data is being collected.

It is worth noting that collected data will be visible locally, so you can see exactly what is being shared with Fedora. (And if you’re not happy with any of it, there will be a toggle in settings to disable this)

The purpose of the metrics system would be to generate actionable insights to help steer Fedora towards supporting the community better, not to collect information like a big tech company.

If you have specific concerns outside of this, then please do share them as amendments can be made to this proposal.

4 Likes

eos-metrics is going to be a hard dependency because it provides a small API for applications to use without having to interact directly with D-Bus. However, eos-event-recorder-daemon – the component that actually submits metrics – is not a hard dependency, so it will be pulled in only via comps and users can uninstall it if desired, like you suggest. The flow is: application uses eos-metrics → sends metrics over D-Bus to eos-event-recorder-daemon (or not, if it’s not enabled) → eos-event-recorder-daemon submits metrics to Fedora server via HTTPS

Then eos-metrics-instrumentation (the service that submits generic metrics that are not provided by particular applications) will also be optional and pulled in via comps, so you can uninstall that too.

Uninstalling is not necessary to disable the system, but is good to do if you’re paranoid.

2 Likes

The proposal goes into great lengths to guard against that. So, if that were the intention, I don’t think this is a good start towards it!

2 Likes

Since this topic has been calm so far, I’m going to reduce the “slow mode” setting to 15 minutes. If it becomes a problem, moderators will put it back. However, please remember that all comments will be read by change proposals, FESCo, and many others. Repetition isn’t necessary. Thanks!

7 Likes

I agree that having opt-out metrics is not perfect. Having 2 options with no default is better, an interactive choice, not just "annoyingly clicking ‘skip’ "

my question of course is:

This introduces a big intertwining between GNOME and Fedora. Currently, Fedora just ships vanilla GNOME. A small custom setup dialog page, a small extension for the logo, that’s it.

If this proposal goes through, there will only be data about the GNOME users, and not the other Desktops, correct?

Or what parts of this are cross-desktop? Listing installed packages etc. could be done everywhere, but there would be no GUI page to accept the collection.

As a KDE user, and being very much in favor of “Fedora KDE Workstation”, I would never contribute any data.

1 Like

This proposal is specific to GNOME on Fedora Workstation.

However, most of the systems developed will be easy to integrate with other desktop environments and I imagine other editions/spins would integrate opt-in metrics in some capacity going forward. This is out of scope of this proposal however.

4 Likes

While I agree with others that this proposal is far better than the previous one, I’m still voting “Opposed, but could be convinced.”

I do understand that use metrics could be useful, but at what cost? A compromise of the entire Fedora Project mission, as my limited reading comprehension allows me to understand it? There is nothing in the Fedora software constituents, until now should this proposal be accepted, that has a built-in metrics acquisition mechanism. And, as acknowledged, this would be woven into the entire fabric of the project.

If downstream distro packagers want to add that capability for their distributions, fine. Let them have at it.

To me, all of this reeks of a “First, do no evil.” vibe that Google gave us all those eons ago. Then the massive network of data storage centers began popping up by fractions. I mean, after all, they had the means to acquire the data…

But I’m not a developer, I’m just a user, a taker if you will, of all the hard work you folks put into making the entire Project work so well. I sincerely hope that you who make up the its management look at this really closely before you take a step down a path that won’t ever be reversed.

Not true. I know that dnf counts downloads, the servers count various metrics to get an idea of who uses what variants, etc.

KDE has a metric system builtin (being manually opt-in), dont know about GNOME.

I think a strong community keeps its users. Fedora does not have many downstreams.

Regular:

  • Ultramarine Linux
  • Nobara

Atomic:

  • uBlue, bluebuild

These mostly do things Fedora can’t, like codecs, drivers etc.

They also do opinionated things that improve certain workflows, while maybe annoying “powerusers” that prefer a clean system.

To make sure Fedora can improve in the things users actually do day to day, there has to be data.

Google is an Ad company, among many things. They make operating systems, that are platforms, where income is mostly driven by ads. Their services are either driven by ads, or serve their ad system with data.

Fedora is a community project. It’s tests aren’t even directly tied to RedHat. The data about Fedora is not useful for either RedHat or IBM, as they sell support, licenses or hardware.

@boredsquirrel makes some good points. I’d just like to chip in with ABRT - the crash reporting tool that is included by default. ABRT sends certain information about crashes to help maintainers identify issues.

Telemetry systems are not unheard of in open-source projects, and there are stringent measures being proposed to make this metrics system:

  • optional (opt-in)
  • transparent (you can see what data is being transmitted)
  • safe (open-source nature allows you to fully audit code)
5 Likes

As someone who was very concerned with the original proposal, I will admit, as others have done already, that this revision appears to be way more considerate towards the userbase and its rights to privacy. It is clear that the owners have taken the feedback seriously and worked hard to get the proposal to adhere to the users’ suggestions and resolve their concerns.

That said, I particularly remember Red Hat’s role in the original proposal posted by @catanzaro :

One of the main goals of metrics collection is to analyze whether Red Hat is achieving its goal to make Fedora Workstation the premier developer platform for cloud software development.

The only reference to Red Hat I was able to find in this new Change Proposal thread, is this:

which is (obviously) directly contradicted by the original Change Proposal, specifically the above quote.

Could we please either add this information (who is driving this change, Red Hat) in the new proposal, or otherwise get a clarification of how this might have changed in the meantime and what exactly Red Hat’s direct involvement might have been in this specific proposal for Telemetry?

1 Like