So You Want to Open Source Your Code?
I’m happy to say that here at Kitware we are seeing a strong, recent uptick of interest for “open sourcing” existing closed code. This interest is emerging across the board from research organizations, academics, national laboratories, government agencies, funding agencies and commercial enterprises. It’s totally consistent with the “open wave” embracing all of us, as mainstream organizations recognize the many benefits of open source–cost savings, agile innovation, avoidance of vendor lock-in, increased software quality, distributed maintenance burden, leveraged community expertise, and so on. Despite this happy turn of events there is need to proceed cautiously: it’s not enough to dump some code into a repository, write a README document, and open the Internet floodgates. In this blog I’ll outline some of the steps that I think are necessary to successfully open source existing software. My motives for this are simple, I don’t want to see open source tainted by bad practices, or see contributing organizations turned off by a bad experience.
- Get your IP ducks in order. This is pretty obvious, but make sure that the ownership of the code is clear, and that you or your organization has the right to license software under new (open source compatible) terms.
- Use the right license. There are many possible OSI-compliant licenses available, choose the right one consistent with the goals of the project. For example, if you expect to collaborate with businesses and want minimal barriers to sharing, use something like an Apache or BSD license (this allows you to more easily work in environments with mixed open and proprietary code). GPL (and other reciprocal licenses) has its place too, but it can often scare collaborators away.
- Establish a software process. From day one the system should have a formal software process in place including a repository; build, test, deploy and documentation infrastructure; a bug tracker; and organizational documents (e.g., coding style guide, requirements, and so on). There are many tools for implementing such a process, we prefer using git (repository), gerrit (code review), CMake (build), CTest and CDash (testing), and CPack (packaging). We often use Doxygen (documentation) and Mantis (bug tracking); other great choices are available too.
- Define methods for community interaction. Along with the software process, particular attention should be given to building and supporting the community. Communication channels should be defined (mailing lists, wikis, web pages, periodic phone calls, periodic face-to-face code-a-thons). Governance is important too, although there is a strong dependence on the size of the community; smaller communities may be fairly ad-hoc, larger ones may require formal governing boards and a hierarchy of lieutenants to govern and manage them. Also, make sure that the transition from an autocratic closed-source situation to an open source community leaves behind any politics and inefficient organizational hierarchies.
- Nurture the community. Community does not happen spontaneously, it requires a lot of daily work and persistence. Newcomers to open source tend to be too focused on software technology and miss the fundamental importance of community. If one has to choose between investing efforts in the community or technology, community should always come first, given that a healthy community will necessarily take care of the technology, while technology will not spur the emergence of a community. We have seen large-scale, well-funded projects shutdown because they were unable to build a community around the software. Pay attention to educating the next generation, welcoming strangers, building culture, improving skills, and creating space for professional and personal growth. A community should be a place where members feel at home, and where they grow and feel appreciated.
- Insist on high emotional IQ. Many open source projects fail because their communities are toxic. Oldsters berate newcomers for asking “stupid” questions; flame wars erupt; antagonistic and divisive behavior turns what should be a pleasurable activity into an ordeal, and so on. By fostering positive relationship practices (recognizing contributions, providing encouragement, carefully listening to and considering other’s ideas), it’s amazing how fulfilling participating in a community can be, and how much fun!
- Provide experts with the code. Software is a codification of human knowledge; the knowledge of experts who understand an application area in a very deep way. Releasing code into the wild without experts to nurture it is a bad idea. After all we teach geese to migrate south, similarly captive closed software needs help to fly again. So find a way to keep the code experts involved, even if only on a consulting basis. And to belabor the point, experts are most effective when they nurture and mentor the community. For example in the VTK community, we are fortunate to have two retired experts (Bill Lorensen and Andrew Maclean) who are essential, if not critical, parts of the community.
- Include open data. Don’t forget to contribute challenging and representative data to the project. Without the proper data it’s almost impossible to properly test and advance the code base. It is essential that data is gathered and curated just as the code base is.
- Build a sustainable business model. While many communities do survive on volunteer effort, it is important to consider how future development is to be paid for. To some extent this is a matter of scale and complexity; smaller communities requiring deep expertise and significant time investments may require funding from external agencies or support from donors. Larger communities may self-organize into efficient volunteer organizations; but even then finding resources for advocacy, resources and collaboration (like conferences) can be very important. Business models built around services and technology integration are a natural fit to the open source enterprise.
- Work with open source experts. Work with companies like Kitware and open source developers who have had the experience of creating, growing and maintaining open source communities. These experts breathe collaboration and know how to foster communities.
Notice that in list above I did not suggest technical changes to the software like “re-architect the code” or “rework the API” or “document more” or “rewrite for common style”. Obviously these are all great things to do prior to open sourcing a project, but they often serve as excuses and delays to getting the code out. In practice, these technical details will be taken care of automatically as soon as you build a healthy community around your project. There is a balance between perfection and getting stuff done. To use a crude analogy that makes our HR department cringe: you can’t be afraid to show your underwear. Open source means revealing more than you may want to; get over it, it’s the nature of the beast and essential to getting the community involved. As counter-intuitive as it may seem, we have found that by embracing imperfection open source communities address deficiencies at a faster rate than most closed systems, resulting in high-quality software. So get the code out; after all if the code is never released, it’s never open sourced.
Open sourcing an existing code base is a wonderful step to take if done with the community, quality and sustainability in mind. To my way of thinking it’s a great way to help lay the foundation for accelerating computational innovation. And remember, open source is all about collaboration, so if you are uncertain about how to go about it, ask for help and contact Kitware, we are the open source collaboration experts.
I think Gerrit may be worth mentioning in “Establish a software process” section.
Excellent point Yumin! I’ll add it now.
Someone wrote an article about Gerrit and the software process in VTK for the new Source, due out any day now.