The Need for Indoor Plumbing
I’ve been in the business of creating and deploying scientific software for more years than I’ll admit to. Over these many years, there have been several constants: the work is challenging, fun, and makes a positive impact on our society. Another constant is the way work like this is funded, typically by large research organizations such as Federal agencies, research foundations, National Labs and commercial R&D labs. However, there is a dirty little rumor which unfortunately has some truth behind it: when developing software through large-scale research programs, the current process is relatively ineffective and wasteful. Rather than creating usable tools for scientists and engineers, often what is created are shiny toys with little practical use. Instead, as one of our collaborators Russ Taylor at UNC so aptly put it, we could use a lot more basic “indoor plumbing” to complement our bleeding-edge zero-G toilets with the latest bells and whistles.
Here’s a typical example. I’ll travel to a university or lab where absolutely amazing scientific work is being done. Imagine that the lead scientists, research staff, and graduate students have developed a new imaging modality with higher resolution, faster acquisition, and greater accuracy. They have created new algorithms to segment, analyze, and/or image the resulting data. But suddenly it all comes crashing down: their software environment is a complete mess.
“What do you do with the data?” I ask. “Well, we put it on these portable USB drives and put them in this cabinet” they say.” (This after the generous funding sponsor spent millions of dollars on data acquisition.)
“Have you validated these algorithms?” Answer: “We wrote a paper and it was accepted in journal XYZ, but as of yet no one has reproduced the results.” (It turns out they don’t release the software so no one has time to reproduce the results.)
“Do you have a software process, for example a CVS, SVN or git repository?” “What’s that?” they answer. (I don’t even bother to ask about software testing.)
“How was the code designed and implemented?” “Well, we have a core FORTRAN module that reads files and outputs new files, it was written by a (biology) grad student 15 years ago. We have some C++ programs, rewritten from the original C implementation to do some analysis. George, our new crackerjack programmer (an ex-physics major) is now trying to integrate this using Python, but he is struggling to adapt program XYZ (an obsolete simple image viewer) to the large data we are generating. Our graduate students all have various implementations of their algorithms, and now that you mention it several have graduated and we have no idea how to run the software, assuming we can find it.”
I have two primary reactions to this mess. As a business person, the klaxon goes off and I realize there is an opportunity here. I know if the Kitware guys get a chance to work their magic, the efficiency and impact of the research program will increase dramatically. This is an easy sell and I am happy because I know the Company will continue to grow. On the other hand, I feel despair — how is it that we’ve completely overlooked the importance of effective software and data practices in our expensive research programs? Just think how much more we could have done if these brilliant scientists and engineers had the right tools!
There are a lot of reasons for this mess. First, in the old days useful software was reasonably simple to create; many scientists and engineers taught themselves coding and produced useful tools. Since then, software has become enormously complex and more expertise is required to develop useful, scalable, and maintainable systems. Second, funding agencies and researchers have often discounted the importance of software and do not budget enough for it. Third, at times there has been a throw-away mentality, software’s not reused but rather reinvented, so we don’t leverage each other’s work (obviously this approach fails when software gets “big” enough). Finally, to add insult to injury, the funding sponsors that do value software will only fund “novel” software and algorithms and discount practical steps to bring modern software practices and systems into organizations (just try writing a proposal that advocates a framework, system, architecture, integration platform, etc., chances are you’ll hear yawns and read criticisms with the words “no novelty”).
The good news is that there are effective solutions out there. Directly incorporating software engineering into a research program works quite well if the PI can make a strong enough argument. For example, Kitware is fortunate to be part of a NIH National Center for Biomedical Computing, the NA-MIC National Alliance of Medical Image Computing. The Engineering Core, which works with the Algorithms Core and Driving Biological Problem Cores, is all about good software engineering and data management. However, even in this case the program was sold by advocating new software engineering tools, hence we had to demonstrate novelty. It is not always easy to do this and frankly some research programs don’t need novelty in their software process, they need basic indoor plumbing. Funding agencies need to become receptive to integrating software processes and infrastructure, not just continually pushing for novelty in all aspects of the program.
Another approach is to work with services companies like Kitware who do wonders for an organization’s software infrastructure. This works particularly well if the organization hires dedicated software staff, who then work with the external software experts to lay down the scientific computing systems, and then remain on to maintain the infrastructure over the long term.
Finally – and you knew this was coming – open-source systems and communities can play an enormously important role in building a productive software infrastructure. Not only is useful technology available quickly with no IP barriers and other commercial stickiness, but OS communities take up some of the maintenance burden, provide a means for reusing good work, and promote good software practices. Plus, if done right, when staff graduates or moves on, the software remains well-tended and available for future work.
Don’t get me wrong, I love creating and polishing shiny new technology objects, and they definitely have an important role to play in the R&D enterprise. However, as technology leaders we have a responsibility to ensure that our efforts and expenditures are as effective as possible, making the largest impact on our collaborators and ultimately our society. And sometimes this means more indoor plumbing, and a lot fewer zero-G toilets.
This was written very well and touches on many truths.
We have a scientific computing group at the University of Wisconsin that teaches graduate students, staff scientists, faculty and undergraduates how to use open source tools to improve their research.
Over Christmas break (January) we will be holding a 3-day bootcamp on software carpentry based of Dr. Greg Wilson’s program. http://software-carpentry.org/
We will have sessions on:
– Version control
– The shell
– Testing
– Build systems
– Documentation systems
– Web services
– Testing
We are doing beta testing of the sessions over the semester, and a prototype for the CMake based build systems session can be found here:
svn co http://hacker-within.googlecode.com/svn/trunk/sc/build_systems/
If you would be interested in taking part in the event in any way — review the tutorials, teach the build systems session, have Kitware financially sponsor the event — please let me know.
We are currently re-vamping our website,
http://hacker-within.org
but our mailing is currently active
http://groups.google.com/group/hacker-within/
Regards,
Matt