Open Source is about the differences
Lessons in Apache's million code changes
Published 15:49, 23 September 10
I read an amazing essay by James Bridle titled "On Wikipedia, Cultural Patrimony, and Historiography" where he argues about how "a more nuanced comprehension of historical process would enable us to better weigh truth, whether it concerns the evidence for going to war, the proliferation of damaging conspiracy theories, the polarisation of debate on climate change, or so many other issues."
When I was telling about this essay to a friend sociologist, she was actually wondering about the big research needed to achieve this... until I pointed her to the fact that, like version control tools, Wikipedia stores every single revision that is edited, and will show us the whole history, and even a discussion page that stores conversations about the different changes and how they impact the policies.
There is still a deep gap between those of us who have been involved in the computer revolution and those that are still starting to understand deep changes like this culture of the differences that they would call historiography, as Bridle does. Actually I'm not sure how many readers of this entry know about esoteric tools like diff (part of diffutils) or patch, the history of a wikipedia entry, or tools like Git or Subversion.
People close to me know that I am very concerned because few, if any, curriculi for Computer Science, Software Engineering or Telecommunications cover substantially the typical tooling used by Open Source projects. In particular I think that the students need a much more basic and detailed exposition to what is sometimes called SCM (Source Configuration Management) tools, or Version Control tools.
Those tools store and help us manage the whole history of a project, every single change that happened to it since it started. The repeated exposure to the list of changes, especially in projects that have sizable teams, helps us to understand the intention of the programmers that developed the code, the parallel processes going on, the programmers' fashion of different seasons... In other words, the culture of the community.
James Bridle built a big metaphor about involved the historiography by producing and binding a set of twelve books called "The Iraq War". This work, about "the size of a single old-style encyclopaedia" documents in paper the whole revision history of the Iraq War article in the wikipedia between December 2004 and November 2009. Seven thousand pages that illustrate 12.000 changes to the text.
More than illustrating the Iraq War, those changes illustrate the culture, prejudices, aims, fears and desires of the different writers. Much like the carved stones of old Cathedrals, Mosques or Pyramids tell us a lot about the builders. The last change is, when I write this article, a minor typo, changing heavily to heavy. But every one knows that even the longest trip starts with a small step.
After some time contributing to Open Source communities and projects just by using them, reporting bugs and giving feedback about wanted features or suggested improvements, or even sending small patches, I got more involved in the Apache Software Foundation.
One thing that was quite shocking is that typically Open Source Communities are not about co-ordinating people to do some work, but rather about collecting lots of small contributions "in the right direction". A right direction whose definition in long email discussions and face to face meetings is also paramount.
Bridle cites Phil Graham, who once said "So let us today drudge on about our inescapably impossible task of providing every week a first rough draft of history that will never be completed about a world we can never really understand."
This vision of journalism as literally "writing logs" is somehow radically different from our vision of our work as developers. We tend to think of ourselves as the daily providers of the definitive (current) version of the tools that make information society. Not as the ones that cultivate and make grow the substrate that makes it possible.
By pure coincidence it was the same day that I read Bridle's paper that the Apache Software Foundation celebrated a somehow symbolic number: one million revisions in the source code of the whole set of Apache projects.
Not casually, as it is one of the projects more active, it corresponded to Apache Solr, a search engine part of the Apache Lucene search library. Here it is the change set, and a description of the change as a diff.
One million changes, nearly three thousand developers... At the end of the day, we just sail and log our collective journey through the Sea of Changes to the software commons.
Blog post by Santiago Gala, a Java developer, software architect, R&D manager and professor specialising in Open Source and Linux opportunities between academia, corporations, and public administration.
Santiago is member of the Apache Shindig Project Management Committee, and is interested in Portal technologies for many years, having been Vice President of Apache Portals and a committer to the Jetspeed and Pluto projects. He is also a Project Management Committee member of the Apache Labs initiative.