I agree the focus on reducing complexity and distributing docs expertise is the right path to boost participation. However, to ensure this Initiative is truly worthwhile and scalable, we must define and track key variables related to contributor experience. Specifically, we should aggressively focus on reducing the Time-to-First-Contribution (TTFC) and increasing, for example, the Active Contributor Diversity outside of the core Docs Team. We need to measure ‘friction’ and ‘decentralization’ to validate that our efforts are successfully lowering the barrier to entry and creating sustainable, community-wide ownership of the documentation.
Let’s ensure the new Docs Initiative is a collaborative effort from the start, building bridges, not walls.
Sure, I understand your concern. It’s definitely better to have some data or quantified hypotheses before pitching a project or a major proposal. Basing predictions solely on anecdotal evidence or gut feelings can make the project feel unstable.
Let’s reframe the discussion and stick to a data-driven perspective for now.
If you measure “Number of Edits” and “Number of Pages Created,” you might combine them into a single variable, like “Total Contribution Volume.” This is a pragmatic, non-statistical way to save ‘degrees of freedom’, statistically speaking.
We can identify all the individual metrics that represent effort or output from a contributor if we look at Quick Docs repo in Pagure, for example. There are hundreds of PRs/issues logged by many.
This approach ensures our brainstorming is focused not on if a problem exists, but on which friction point provides the greatest return on investment to fix. We can then discuss the actual proposals armed with numbers. I’m ready to discuss this further whenever the team is ready.
(Adding back context from the original thread before it was split)
We should also think about how we can assess the usefulness / value of the output - i.e. is the work producing documentation which is useful to users?
A couple of quick ideas:
Page hits (which I guess need some adjustment for scrapers and so on)
Should we have some kind of in-page feedback mechanism? For example, Mozilla has put this thumbs up/down quick feedback widget on most of the Firefox documentation pages. If you give it a thumbs down, it asks for feedback on how to improve it.
As far as I can tell, they’ve built a Django app (“Kitsune”) that manages their support forum and associated docs. The “helpful?” form is built into the wiki page template and has some client-side JS to submit responses to the backend.
Nothing incredibly sophisticated, but I assume that to do something similar we’d need a backend outside of Antora, which is basically a static site generator not really set up for this kind of thing?
As you noted, we don’t have public metrics for the variables we discussed (TTFC and contributor outside Docs group). This is a data gap we need to fill. I believe I can perform an Exploratory Data Analysis (EDA) on data sources like the Pagure repositories (specifically, pull request/issue history for documentation).
I’m ready to move forward with the data side. I’ll reach out to the Infra team myself to find the best way to access the necessary data, whether it’s through Datagrepper/Datanommer or directly from Pagure.
Before I begin the EDA, I wanted to ask: Do you have any other specific derived metrics or proxy variables in mind that, from your perspective, would be crucial for measuring the impact of this Docs Initiative?
My EDA will focus on variable selection across several potential metrics to determine their statistical significance. Given that early sample sizes might be small, I will also assess which statistical methods will provide the most explanatory power for our data.
This approach will ensure we define a solid, meaningful baseline for our proposal.
I would suggest that a published metrics framework is itself an initiative worthy thing we need to have people drive to an outcome. My interest goes well beyond this particular initative and to the initiative process itself and even further into decision making much more generally.
I have a lot of questions that metrics would be useful for.
I get a lot of questions that metrics would be useful for.
This project needs a published metrics framework, that we can purposely add metrics to with the intent on using them for specific impact and health assessments that help us see the impact of changes that directly impact new contributor engagement.
My biggest problem that I have to make an impact on in my tenure is contributor stagnation. We’re gonna end up throwing a lot of darts at that problem, and we’re gonna have to have a way to see which experiments have positive impact. I want to encourage people to experiment and try to move the needle, but I need a needle to measure so I can evaluate which experiments we continue as sustained work and which ones we celebrate for the attempt and let end (or pivot) so we can try another approach.
And to be clear, this isn’t a knock on the docs initative in particular. I feel we don’t have a framework of metrics yet, and therefore I feel uncomfortable making any demands on any initative to selecting metrics for impact assessment when I don’t have a framework for them to plug into.
Comprehensive metrics have been a long-standing desire in Fedora.
It has been attempted multiple times, including by RedHat teams, and failed for various reasons.
@jflory7 and @rwright have been working in this area for a bit now and have some numbers re: new member retention, but we don’t have a comprehensive framework yet.
I just started working on creating a comprehensive data platform (“data lakehouse”) where we can collect and analyze all Fedora data to achieve the insights we want. I’m getting close to a POC and will tag you in that thread when I announce it.
There is upstream / outside work from CHAOSS , aka “Project Aspen”, which A) is a practitioner community centered around measuring community health and sharing best practices, and B) has some associated tools which focus on analyzing the health of software projects and communities through the data available on code forges (commits, code review comments, etc). That is a subset of the data that we need to measure, but we are going to lean on them to help us gather that from e.g. Pagure, Github, etc, and will probably eventually incorporate that into our lakehouse with other data sources. @moralcode is a RedHatter assigned to Project Aspen and has been collaborating closely with us.
The ongoing discussion and increasing momentum around community metrics prompted the creation of #data on Matrix this past week. We are WIP on collecting our various notes / TODOs / roadmaps for community metrics into something coherent, and will definitely announce that asap.
Hi everyone! I forked this conversation from the Fedora Docs 2025 Initiative topic since the conversation was going a bit off-topic from the Initiative, and into a bigger conversation about data and metrics. I tagged this post as docs-team and commops-team since it seems like a fusion of these two teams engaged in the discussion.
We can keep the other topic more focused on the specifics of @pboy and @pbokoc’s Initiative, and have a bigger conversation around data and metrics for the Docs Team in this topic.
I’m glad you forked the content. Before I knew about the Data WG, I posted something related to the Docs initiative. Now I’m looking forward to focusing on exchanging opinions and collaborating on the data framework. Thank you!