Some time ago I was asked to do a study of our most popular open source projects to assess 1) what governance models are out there and 2) if the governance model has any effect on the project's success (such as size of developer community) on the one hand and on the other hand on the business of the related vendor(s). Some of the results are quite remarkable and have general applicability, so I wanted to share them here:
(Small updates done on 2011-07-14. OpenJDK size clarified on 2012-05-21.)
How to grow your open source project 10x and revenues 5x
A study into the most popular open source projects, comparing governance models vs size of the (developer) community and estimating the business value of having a large community.
henrik.ingo [at] openlife.cc
Published as a Creative Commons Attribution licensed work,
for details on copying and sharing,
By studying a selection - believed to be more or less complete - of the most popular open source projects and correlating their size with governance model, we have revealed a strong, and to some possibly surprising result:
- There are 9 projects (Linux, KDE, Apache, Eclipse, Perl+CPAN, Mozilla+Addons, Gnome, Drupal and GNU) that stand out as significantly larger - roughly 10 times - than any others.
- All of these projects, categorized as "XtraLarge", are developed as collaborative community projects governed by non-profit foundations. No single vendor project has so far been even close to reaching their magnitude.
- There appears to be a glass ceiling limiting the growth of the Large single-vendor projects (MySQL, Qt, OpenOffice, Mono, JBoss).
- While the unfathomable magnitude and velocity of the Linux project is well studied and commonly known, and the preference of collaborative foundation governed projects has started to become a generally accepted fact, it was surprising to find as many as 9 projects that all clearly stand out from the rest. A 9 to 0 is statistically a very strong result in favor of the foundation governance model.
- The other factor strengthening this result is the clear gap between the "XtraLarge" group and the other projects. This gives further confidence in the validity of the result. Even if the underlying data was deemed to be of poor quality, it is clear it does not have errors that could explain a difference of this magnitude.
- Another common trait all 9 "XtraLarge" projects share is the software architecture being either modular (Linux, Eclipse, Perl+CPAN, Mozilla+Addons, Drupal) or formed as a collection of software (KDE, Apache, Perl+CPAN, Gnome, GNU) - in fact many, like Eclipse and Apache are both of these.
The Oracle controlled OpenJDK may prove to become an exception to the rule, since IBM in 2010 announced that it will contribute to OpenJDK development and abandon Apache Harmony. Previously Red Hat is already a contributor and Apple joined OpenJDK in 2011. While building an XtraLarge developer community around OpenJDK, if successful, is an impressive achievement of Sun and Oracle, the route taken there is perhaps not applicable to the general open source project: Java grew to its current level of importance as a proprietary product, and IBM's abandonment of Apache Harmony seems to have been the result of some kind of non-public but strong coercion from Oracle's side. This "brute force" strategy is simply not available to most open source projects aspiring to reach the XtraLarge category.
By comparing the relative investment of Red Hat and Novell into the development of Linux, it was shown that both the largest and second largest Linux vendor clearly benefit from sharing development cost by collaborative development. Red Hat is the biggest Linux developer by providing 12% of the development effort, but having then in proportion a much larger, 62% market share. This indicates that the leverage Red Hat gets from the collaborative development of Linux is roughly Leverage = 5x. For Novell, the levarage is almost as significant, with 7.6% of development effort, 29% market share, yielding a roughly 4x leverage factor.
It was further observed that if Linux had been developed as a single vendor effort by Red Hat alone, we can see that the engineering effort would probably be about 10 times smaller than the total effort by the whole Linux community is today. This seems to be in harmony with the previous observation that the largest single vendor projects are roughly 10 times smaller than the "XtraLarge" foundation governed projects.
From these results it follows as an obvious recommendation that vendors participating in open source development and business, should look into participating in collaborative community developed projects - where the standard and familiar governance form is a non-profit foundation. If a vendor is currently in control of an open source project, it should explore the option of transferring the project to an existing foundation, or alternatively creating its own foundation for it. Since the original vendor is always the strongest candidate to become the leading vendor also in a collaboratively developed project, the vendor could, as a rule of thumb, expect this strategy - if properly executed - to result in a 10x growth in the project and product, but also 10x larger addressable market, of which the vendor can expect to capture 50% or more as its own market share.
A few lists of open source projects were used as sources to create a (near) complete list of the most popular open source projects in the world. In particular, the included projects had to be:
- Upstream projects, where code is developed (e.g. not a Linux distribution or something like XAMPP)
- Well established, large, and leading (or "tied" like Gnome vs KDE) in their category
Sources used were the lists generated by the popularity-contest package in Debian and Ubuntu, SourceForge all time top downloads and this was complemented with a short list of projects known to be large and important but not found by the above sampling method.
From the Debian and Ubuntu "popcon" lists any projects appearing in the top 1000 were collected. The explanation for this is that +98% of the top 1000 installed packages are common system libraries and utilities that can be grouped under "gnu system tools" or just discarded as "other" if not coming from GNU. But among these 1000 top packages one will also find code from familiar software like Perl, Python, PHP, Gnome and the Linux kernel itself.
The SourceForge list turned out to be a disappointment, it seems it is topped by filesharing applications and even a lot of Windows-only software. Only a few projects judged to be relevant to the general area of "LAMP server or Linux desktop" were selected from here.
Finally, a few obviously important but still missing open source projects were added by the author himself.
This resulted in the following sample of open source projects.
gnu system tools
gnu system tools
Table 1: Sample of popular open source projects used in the study
The size of the developer community of the selected projects was then estimated so that they could be ordered roughly by size. For the largest projects Linux, Apache, KDE, Gnome and Eclipse separate studies have been made into the volume and structure of the development effort (see Sources at the end), or some number of active participants is advertised by the project host. But even for these projects it is not trivial to compare the size of the communities relative to each other, since each such study still produces different measurements.1
For the other projects OHLOH.net was used to get the commits/day, active devs/month and devs all time. While the OHLOH service was a convenient way to quickly gather statistics for such a large number of projects, the quality of the data seems to be rather unreliable. For instance OHLOH claims that MySQL has only 25 active developers in a given month, yet the author is personally familiar with more full time MySQL developers than that. On the other hand within the more than 1000 all time MySQL developers there are many duplicates, triplicates and quadruplicates due to people using different email addresses. It is also very unlikely that Thunderbird would have 3 times more developers than Firefox (both without plugins, only core). And there are caveats to be aware of, for instance searching for "CPAN" on OHLOH gives statistics about the Perl CPAN module, not the entire CPAN archive (which may be impossible to get form anywhere).
Even so, the OHLOH numbers were used for most projects to get an ordered list and the eventual grouping into different sizes of communities, but this was balanced with a subjective check by the authors own observations of the projects. The main results of this study are statistically very strong, with a 10x difference in size between different types of projects, and however large the margin of error due to OHLOH inaccuracies, it is certainly smaller than that.
This resulted in the following table with various statistics on project size:
|project||devs/day||commits/day||loc/day||devs/mo||devs all time||companies all time|
|Unknown but relevant:|
|gnu system tools|
Table 2: Projects ordered by relative size, with various metrics
Numbers in bold are based on published studies from the projects themselves. The other numbers are based on OHLOH.net.
954 people contributed at least 1 patch to the core of Drupal 7, over a 3 year development period. In addition to that, as of this moment, Drupal has some 8291 addon modules, a similar amount as the Perl CPAN archive. These facts don't fit nicely into any column in the above table, but underscores the size of the Drupal community - in fact, Drupal may be the largest open source project out there? The table shows the OHLOH numbers from 2010.
For Perl the only statistic available is the number of modules on CPAN. Most probably the number of developers all time is smaller than 8500, but it gives a good order of magnitude. Same logic is used for Mozilla (where both Firefox and Thunderbird are combined as they share code and plugins).
In 2011 an attempt was made to get more accurate numbers about the engineering investment into OpenJDK - this was motivated by the thinking that it could potentially now be of XtraLarge size due to the investment of Oracle, IBM and Red Hat. On the other hand the OHLOH.net numbers for OpenJDK only report some 50+ monthly developers. By discussions with people familiar with Sun Java development and OpenJDK, it seems this might be roughly correct after all and in any case OpenJDK does not have many hundreds of developers.
Due to the inaccuracy of the data, making a plot or other graph is not useful. Instead, projects can be grouped by size into XtraLarge, Large and Medium categories (small projects were omitted). These were then correlated against the known governance model:
|Linux, KDE, Apache, Eclipse, Perl+CPAN, Mozilla+Addons, Gnome, Drupal|
|GCC, Python, Samba||MySQL, Qt, OpenOffice, Mono, JBoss||PHP+PEAR|
|Medium||GIMP||Subversion, GhostScript, Wordpress||phpMyAdmin|
|Missing data||Xorg, GNU system tools||OpenJDK|
|Foundation||Vendor||"Just a project"|
Table 3: Correlating project governance model with size of development community
Categories are observed, not pre-determined, ie they follow as observations from the sample. For instance "Multiple vendor consortium" is not observed in the sample. (Eg. Eclipse 2001-2003.)
While the Linux project produces essentially one deliverable, the kernel, others like KDE, Apache, Gnome are here entire foundations hosting many sub-projects, but considered here as one community. The justification here is that these collections of software projects still fall under some common theme, such as Apache mostly producing web software and supporting developer utilities. With the donation of OpenOffice to Apache this interpretation may perhaps have reached its limit, other than the Apache license, OpenOffice seems to have nothing at all in common with any of the other Apache projects.
Similarly separate "contributor modules" archives - found in Perl, PHP, Drupal etc... - are considered part of the main project, as a modular architecture is a key enabler of growing a large community. To compare, if MySQL had been similarly community driven, phpMyAdmin could have been part of it, not a separate project as it is now.
GIMP predates Gnome but is now part of it.
GCC is part of GNU, but listed separately as data was available. The author estimates that "the GNU project" would also be an XtraLarge project if data had been found, as GCC alone tops the Large category already.
Similarly OpenJDK may be the first vendor controlled project to break into the XtraLarge category, see discussion in conclusions.
Python changed to Foundation in 2000. Subversion was previously led by CollabNet, but is since 2009 an Apache (Foundation) project and Wordpress is transfering to its own foundation in 2010 from Automattic, but both are here categorized as vendor projects since this is the model that existed for most of their lifetime.
Qt, MySQL and GhostScript are the stars of 1990 dual-licensing era.
OpenOffice was forked in 2010. The Document Foundation producing the LibreOffice fork already got 77 new contributors in a few months. In 2011 Oracle donated the OpenOffice code to Apache Foundation, where IBM is investing developers and rallying for a community. This study classifies the historical OpenOffice by Sun, as it would be too early to say anything about the Apache and LibreOffice descendants.
Mozilla Foundation has ~100MUSD revenues (Update: In December 2011, Mozilla announced that Google now will pay 300MUSD per year to Mozilla Corp.) and employs many engineers, in the other foundation projects engineers typically work at participating companies. It is notable - and relevant for the following discussion on financial motivations - that this makes Mozilla Corp larger on a revenue basis than any of the for-profit vendors in the vendor column! (Update: With over 300MUSD annual revenue, Mozilla Corp becomes an undisputed leader.)
Wordpress only has data for core, plugins and themes is here added as guesstimate to even reach Medium.
Note that PHP was in 2011 moved to the "Just a project" column, after the author became aware that "The PHP Group" has never formally incorporated in any jurisdiction. Despite this fact, PHP does have a well defined process of membership and decision making similar to what more formal organizations tend to have. The lack of a legally incorporated organization seems to mainly be a problem in scenarios where PHP would need to assert trademark or copyright or otherwise enter into legal proceedings against some threat - scenarios which have never materialized. PHP may very well be the largest unincorporated project in the world.
All large projects have some form of formal governance, either single vendor or non-profit foundation.
There are 9 projects (if including GNU, for which data was not available) in the top "XtraLarge" category. These projects clearly stand out from the following "Large" category. On average they are a factor of 10 times larger. (Gnome is a little smaller than the other projects in this top category, but it too clearly stands out from the projects in the next category with roughly a hundred or less monthly developers.)
All of the XtraLarge projects are non-profit foundation governed and none of the single vendor projects have managed to grow even close to this size.
The above result is statistically very strong. While the unfathomable size and velocity of the Linux project is well studied (18 000 lines of code changed per day!), this is not a single case but altogether 9 collaborative and foundation governed projects have reached this size.
There appears to be a glass ceiling for single vendor projects prohibiting their growth from the Large category upwards. To truly reach their fullest potential, open source projects are recommended to consider the proven governance model of a non-profit foundation around which participants collaborate.
Oracle's OpenJDK may become the first vendor controlled project to reach the XtraLarge category. Red Hat already contributes to it and recently IBM announced it will abandon its competing Apache Harmony effort and contribute to OpenJDK instead. OpenJDK is however a special case: 1) Java grew to its current magnitude as a purely proprietary product and 2) while not publicly known, the move by IBM from Apache to OpenJDK seems to have been influenced by some kind of coercion from Oracle, such as related to the Oracle vs Google patent and copyright suit against Dalvik/Harmony, or holding some aspect of the JCP process hostage, etc. So while it is a remarkable achievement to build a large developer community around a previously closed source product, the path taken to achieve that is perhaps not applicable in a general case.
Large Vendor governed projects tend to be controversial:
- MySQL: Financial star, but now forked many times over. A lot of work to just keep it alive now.
- OpenOffice: Typical Sun: Stagnated and mismanaged since 2000. Now successfully forked: all Linuxes immediately backed it, 77 new contributors within 2 months.
- Mono: FOSS fundamentalists boycott it anyway because of .NET origin, the rest don't care that it is vendor managed.
- Qt: Technically superior, but lost total dominance to being 50-50 with GTK (part of Gnome) due to Trolltech over-controlling it. (Financially ok: Nokia acquired in 2008.)
- JBoss has been uncontroversial to the community, but was attacked by IBM backed Apache Geronimo for market share (but survived this).
- OpenJDK is likely to break into the XtraLarge Vendor spot, after Oracle bullied IBM into contributing to it.
Large Vendor projects are also known to have poor community contributions. (TODO: Find out about JBoss?)
XtraLarge foundations "acquire" Medium projects. (Subversion, GIMP)
It is a common claim that for an open source project to flourish, a modular architecture is imperative. This is of course recommended for any software project, but in open source it is seen as a pre-requisite to enable the distributed development typical of open source projects. It is easy to observe that all of the XtraLarge projects have either a modular architecture (Linux, Eclipse, Perl+CPAN, Mozilla+Addons, Drupal) or are collections of software (KDE, Apache, Perl+CPAN, Gnome and GNU). In fact, many like Apache, Eclipse and KDE are both of these.
Business value of having a (Xtra)Large community
Intuitively it seems obvious that it can only be positive - also financially - if a project can grow 10 times larger with one model than another. Yet, it is an appropriate question to ask, from a vendor point of view, whether it would still be financially preferable to keep control of the project at the vendor in order to monetize it better, even with the risk of the project then remaining significantly smaller. Note that this translates directly into the product receiving less investment into its development.
To answer this question, we will look into the Linux project and the financial performance of its dominant vendors. Out of all projects and markets this is the one with most studies available. Also, the Linux vendor market can be seen as somewhat of a pioneer in open source businesses, so it is reasonable to expect - or wish - that dynamics first seen in this market can later be seen also in the other projects.
The task is to estimate the benefit Linux vendors get from sharing development costs by collaborating, and on the other hand what they might lose in revenues and market share by being open. Looking at reports on Linux development on the one hand, and market shares of the different Linux vendors on the other hand, we can see that:
- Red Hat is the largest contributor to Linux development at 12% of commits.
- Red Hat has the most control of Linux development, employing 36% of the lead developers that review commits. (...and this used to be 50% few years ago.) So we make an additional observation that while sharing development cost with others, Red Hat is quite firmly in control of this project.
- Red Hat has 62% market share of Linux operating system sales
The above means that Red Hat as the leading Linux vendor has an unproportionally large market share compared to the development investment. 2 Thus, the leverage that Red Hat receives from participating in the collaborative development of Linux is:
Leverage = 62/12 ~= 5x
To justify the logic behind the above calculation we should make some additional remarks: The revenues generated by a product will depend on
- The total addressable market (which for operating systems is some tens of billions USD)
- The product's market share
- One factor limiting the products market share is how well its functionality and features serve the needs of the total addressable market. For instance, early on one may have claimed that Linux wasn't suitable for all kinds of server workloads, and still isn't perhaps suitable to many users as a desktop operating system. How much of the total market opportunity a product is able to serve, is somewhat a direct result of the engineering investment received by the product.
From the above treatment of Red Hat's role in the collaborative development of Linux on the one hand, and its market share on the other, we can conclude that if Red Hat was solely developing Linux as a single-vendor effort, then Linux would receive only 1/10 of the current total engineering investment. This in turn means the total market share of Linux of the total operating systems market might be about 1/10 of the current market share, since the product would be weaker and not serve well as many customers as it does today. (Of course, more likely, it might then be totally irrelevant and have a very small market share, if at all.)
On the other hand, by developing Linux collaboratively, the total Linux market is 10 times greater, and Red Hat has been successful in capturing 62% of that larger market. This seems like a good strategy to follow, resulting in 5x more revenues than the alternative strategy.
It is also possible to do the same calculation for the second largest Linux vendor Novell:
- 7.6% of all commits
- 29% market share
- Levarage = 3.8x
It seems also Novell is benefiting from the collaborative development model of Linux, having a market share almost 4 times larger than its engineering investment. This means that both Red Hat and Novell gain significantly from collaborative development - this is commonly called a win-win situation. (Of course, if Linux was a single vendor project, then Novell's share as the second largest vendor would tend to be zero or insignificant, so the benefit of participating in a collaborative project is in that light quite dramatic for the second vendor.)
Finally, we should point out the harmony between these observations of vendors revenues and investment into Linux, with the above treatment of governance models of open source projects: We observe here that if Red Hat was developing Linux alone as a single vendor effort, it would be approximately 10x smaller. On the other hand, this is precisely the difference observed between the 9 foundation governed and collaborative projects in the "XtraLarge" category compared to the average size of the largest single-vendor projects.
From these results it follows as an obvious recommendation that vendors participating in open source development and business, should look into participating in collaborative community developed projects - where the standard and familiar governance form is a non-profit foundation. If a vendor is currently in control of an open source project, it should explore the option of transferring the project to an existing foundation, or alternatively creating its own foundation for it. Since the original vendor is always the strongest candidate to become the leading vendor also in a collaboratively developed project, the vendor could, as a rule of thumb, expect this strategy - if properly executed - to result in:
- The project growing 10 times larger.
- The product thus receiving 10 times more investment into its development.
- This larger development community therefore leading also to a 10 times larger addressable market.
- The vendor being able to capture 50% or more of that larger market.
Linux Kernel Development - who writes it
Red Hat Market share
Popular FOSS projects:
popcon.debian.org (top 1000), popcon.ubuntu.com (top 1000), sourceforge.net/top (a few picks),
https://drupalmodules.com/forum/post/5541 / https://drupal.org/project/Modules (Drupal modules)
Updated 2011-12-18: Wordpress obviously is not an Acquia project, Acquia is a Drupal services company, whereas Automattic is the Wordpress company.
- 1The author suggests that future studies into open source projects take note of the statistics presented by the Linux Foundation whitepaper "Who writes Linux" and aim to produce at least the same metrics in their studies.
- 2This conclusion disregards the fact that Red Hat Enterprise Linux contains more software than just the Linux kernel, even so, Red Hat is also a contributor to the other main projects included in its product. Therefore this observation of Linux as the largest and defining piece of its product is still considered a useful "rule of thumb", if not perfect.
- Add new comment
- 371956 views
These are all infrastructure projects
With the GIMP as a major exception and possibly 2 others as minor ones, all the projects listed here are infrastructure projects - they deal with system services, provision of taken-for-granted functionality (email & web) and/or the running of servers. The conclusions that can be drawn from this set of software are not obviously transferrable to niche or domain specific tools that operate with vastly smaller user-bases, much smaller development communities and no clear business model.
re: infrastructure projects
This is a correct observation. The right way to interpret these results is that to grow really big, there are several things that need to happen, but using the collaborative foundation model is one of them. If the technology at hand only serves a niche (x-ray imaging, whatever), then of course that will limit its size anyway.
The study was originally done for an infrastructure technology, so it was very relevant.
Additionally, I have the theory that the more "low level" infrastructure you talk about, the bigger the addressable market of the technology, and this contributes to having a bigger community. So everyone from mobile phones to super computers need an operating system, and that's why Linux is the biggest of these in all respects. Not everyone needs a database, but those that do need one tend to cluster around a few common ones (MySQL for open source, Oracle, Microsoft on the proprietary side) regardless of the application or programming language. Then many need a web server, but this group is again smaller than everyone who needs a database. And for some reason the programming language space is much more fractured than the previous technologies, so we have Perl, PHP, Python, Java, .NET... many more medium-to-large alternatives than we had further down the stack.
Then the same is repeated for applications, so Gnome and KDE obviously are bigger than any one particular application, etc..
Mozilla Corp versus the Mozilla Foundation
For accuracy's sake, I think you will discover that Mozilla Corp is the actual employer of the developers and holds the revenue stream. It doesn't invalidate your observations and recommendations as the relationship between Mozilla Corp and the Mozilla Foundation is rather unique.
re: Mozilla Corp versus the Mozilla Foundation
Yes, I'm aware of that. Just tried not to drift too far from the main point.
Anyway, what is really funny is that to the best of our knowledge, Mozilla Corp/Foundation generates more revenue than any of the vendors in the other column. So even in the thing they are supposed to be good at they lose!
Modular architecture / collections of software
"It is a common claim that for an open source project to flourish, a modular architecture is imperative. This is of course recommended for any software project, but in open source it is seen as a pre-requisite to enable the distributed development typical of open source projects. It is easy to observe that all of the XtraLarge projects have either a modular architecture (Linux, Eclipse, Perl+CPAN, Mozilla+Addons, Drupal) or are collections of software (KDE, Apache, Perl+CPAN, Gnome and GNU). In fact, many like Apache, Eclipse and KDE are both of these."
What is the % of projects which fit that (modular architecture / collections of software) description? If 95% of the projects out there fit a description, and the top-x also fit that description, can we really infer that it's a criteria for success? One could argue that it's just the most common way of building software.
For FOSS Web applications (which is a huge category), almost all projects have an emphasis on modules/plugins/extensions. There is one notable exception: Tiki Wiki CMS Groupware, which is the:
But I wonder if anyone could come up with other projects which are exceptions?
That is a valid question -
That is a valid question - maybe it's just a fact everyone is modular these days. From the projects categorized here, I wouldn't say that MySQL is modular, it is moving increasingly in that direction. I don't know enough about the Samba, Subversion or GhostScript code bases to have an opinion, but it seems all others are indeed quite modular. (For some like Qt and OpenJDK this just follows from being object oriented.)
Add new comment