Blog by Sumana Harihareswara, Changeset founder
Contribution Metrics Are Messy: An Example
I frequently notice folks asking or answering questions like "how many contributors does this open source project have?" or "how much contribution is this project getting?" Here's an example of why those aren't simple questions.
ffmpeg
is a mindbendingly powerful command-line tool to play and transform audio and video files. There are a zillion commands and flags you can use and it's hard to memorize them. So, several years ago, Ashley Blewer started a great website called ffmprovisr. It's a cookbook of useful ffmpeg
recipes, like "join 2 files of the same type" and "compare two video files for content similarity using perceptual hashing". It's grown into an open source project with many users and multiple committers, gotten redesigned, and even inspired imitators.
So. Would you call ffmprovisr a contribution to ffmpeg
? It's useful documentation for ffmpeg
, but doesn't live in the ffmpeg
repository/repositories. It helps more people use ffmpeg
(and probably reduces the number of support queries its maintainers get). By creating ffmprovisr, has Blewer become a contributor to ffmpeg
? Should we have a category like "indirect contribution", and, if so, how would we delineate that?
Let's go older. In the mid-2000s, the biggest ad for the Ruby programming language was the Rails web framework. Rails was the gateway through which a ton of programmers started to learn and love Ruby. So every Rails committer, documenter, trainer, and bug reporter also ended up doing a favor for Ruby. Can we say that Rails is a contribution to Ruby? Is it useful to say that?
It depends on what further question you're trying to answer. We ask "what is a contribution?" as part of asking "how much contribution are we getting?" "how many contributors do they have?" or "who are this project's contributors?". And those questions have different answers depending on what you want to do with the answers, because these questions have different answers:
And so on. (These questions range through most of the five major ways projects get stuck.) For some of those questions, answers for a project like ffmpeg
change depending on whether you ignore ffmprovisr, or catalog it as something like a plugin or extension to ffmpeg
, a contribution to the ffmpeg
ecology. And some answers for a language, framework, or operating system -- something like Ruby, where usage depends on people making useful tools built on top of the foundation you provide -- only make sense if you incorporate data about ecology inhabitants like Rails.
Sometimes you can answer those questions just by checking some pre-compiled stats in a GitHub repository. Sometimes you can't, because the answers aren't there; they're in a different repository altogether, or on StackOverflow, on mailing lists, or in a mix of places including individuals' private conversations.
If you want to dive deeper, I don't know who's thought more about them than CHAOSS (Community Health Analytics Open Source Software), and as such, GrimoireLab tries to actually gather information from lots of sources instead of just the GitHub API. And even so, any quantitative measure will probably need to be supplemented by a qualitative assessment that can catch ecology-level factors, as in the case of ffmprovisr, if you want to make a significant choice based on it.
If you want to work in the general area of quantitative contributor metrics, CHAOSS will welcome you. But if you're not, then instead of looking for the One Grand Definition of "contributor" or "contribution", get more concrete about the questions you want to answer so you can go from there.
Comments