Explaining Open Source's Exponential Growth
Published 10:26, 17 March 08
One of the problems with open source is that much of it happens invisibly. Whereas proprietary software, which is sold, has to publicised at some point, open source can simply be written: whether or not it gets used is a question of the author's personal inclinations.
Even the big-name open source projects – Linux, Apache, Firefox – have the problem that contributions are made in all sorts of ways, and that there is nobody really tracking who is doing what where.
That makes a paper from SAP Research's Amit Deshpande and Dirk Riehle particularly welcome, since they do the hard work of tracking down just how much coding is going on these days. They start from a hard core of open source activity, ignoring projects that are dormant:
For our analysis, we use the database of the open source analytics firm Ohloh.net, which has been crawling open source software code repositories since 2005. Our database snapshot contains 5122 active and popular open source projects written in 30 different programming languages covering 103 open source licenses. All data is updated on at least a weekly basis.
The database provides fine granular data of developer actions over the last 17 years from 1990 to 2006. We analyze the average amount of source code added per month for the time frame of January 1995 to December 2006 as well as the number of projects added over time.
We find that both the growth rate as well as the absolute amount of source code is best explained using an exponential model. Given that previous research showed that most open source projects grow at a polynomial rate, we suggest and then verify that the number of open source projects is growing at an exponential rate.
Here they are using “exponential” in its strict mathematical sense, not the loose hyperbolic one. That is, something that grows at a rate proportional to its current value; which means that as time goes on the rate of increase goes up. Obviously, this is not sustainable in the long term, but what it means is that open source creation is accelerating. It's not clear what the equivalent formula for proprietary software is, but almost certainly it is not exponential. It's not hard to see why there should be such a contrast between the two modes of software production.
Proprietary code has a very limited fecundity: it begets nothing outside its own immediate descendants, since its licence generally forbids use by others. Open source, by contrast, explicitly encourages its use by different projects. So the more code that is created, the greater the pool of code that is available for others to use and build on, which tends to drive even faster creation of new projects. As in so many other areas, this is something that proprietary code just cannot match – and cannot fake: the only way to get the benefits of free software is to become it.