Packaging VistA for Debian
This article describes the process of packaging VistA for the Debian Linux distribution. It begins with our rationale and motivation and continues with a discussion on the collaboration process that followed with many members of the open-source community.
What is VistA?
The Veterans Health Information Systems and Technology Architecture (VistA) is the most comprehensive Electronic Health Records (EHR) system in the world. VistA was internally developed by the Department of Veterans Affairs (VA) and is freely available as open-source software. VistA is currently deployed universally across the VA at more than 1,500 care sites including 153 hospitals, 765 outpatient clinics, and nearly 300 VA Vet Centers. In total, it helps provide healthcare for over 7.84 million patients.
What is mumps?
VistA is written in MUMPS. MUMPS is a programming language with a built-in NoSQL hierarchical database. Sometimes, the MUMPS language is referred to as M. It is also common to see the dual mention of M/MUMPS.
Hierarchical databases were some of the first data models. The World Wide Web, Wikis, and the file systems of most modern operating systems are hierarchical databases. In short, they store data in tree-like structures.
MYLABEL ; This is a comment
WRITE !,”Hello World”
QUIT
(Hello world in MUMPS)
SET (A,B)=1
FOR I=1:1 SET S=A+B WRITE !,S SET A=B SET B=S QUIT:S>100
WRITE !,”Result= “,S
(Fibonacci Series computation in MUMPS)
set ^beatle("John","birth","date","1940-10-09")=""
set ^beatle("John","birth","place","Liverpool")=""
set ^beatle("John","sons","Julian")=""
set ^beatle("John","sons","Julian","birth","te","1963-04-08")=""
set ^beatle("John","sons","Julian","birth","place","Liverpool")=""
set ^beatle("John","wifes","Cynthia","birth","date","1939-09-10")=""
set ^beatle("John","wifes","Cynthia","birth","place","Liverpool")=""
set ^beatle("John","wifes","Yoko","birth","date","1933-02-18")=""
set ^beatle("John","wifes","Yoko","birth","place","Tokyo")=""
(Using the Global “^beatle” to store John Lennon’s
information in a hierarchical structure)
MUMPS is the key component to nearly all of the large EHR systems including those that are proprietary such as Epic, Allscripts, Cerner, GE Centricity, and McKesson, as well as those that are open source such as VistA, CHCS (the EHR used by the Department of Defense), and RPMS (the EHR used by the Indian Health Service).
One of the main challenges facing the M/MUMPS community is the shift in demographics that has occurred over time. Currently, the M/MUMPS community is composed of an aging population, harboring decades of invaluable expertise that is not being transmitted to the next generation. This has resulted in an under-representation of young developers in the ranks of beginners who are gaining experience with both the language and the underlying NoSQL database. Consequently, the number of professionals available to maintain and improve EHR systems is very limited.
Educating The Next Generation
Despite being the platform that runs the large majority of healthcare installations in the US (over 500 hospitals), the M/MUMPS language and database is not very well known in the mainstream of programming technologies in either industry or academia.
However, from an economic and social point of view, the relevance of healthcare is so high that the field of Healthcare IT should be receiving a great deal of attention from the academic community. For example, the US spends 18% of its GDP (close to $2.8 trillion) every year on healthcare. Out of this sum, close to $778 billion is spent in hospitals. As a reference, the entire worldwide software-for-sale industry amounts to only $220 billion a year, and the entire hardware industry amounts to close to $480 billion a year. Despite the heavy investment in healthcare, the US ranks 28th in healthcare quality worldwide. In addition, less than 50% of hospitals in the US are currently using EHR systems, while Airlines have been using computerized systems since 1946.
EHR systems are extremely complex, mainly because they reflect the natural complexity of a medical institution. Peter Drucker, the organizational guru whose writings contributed to the philosophical and practical foundations of the modern business corporation, once said that “the modern hospital is the most complex form of human organization ever created.”
Since much of the operational logic of the medical field is built into EHR software, it is unrealistic to consider migrating these existing systems to other software technologies with the hope that a larger pool of professionals will be available to maintain them. Therefore, under the leadership of the Open Source Electronic Health Records Alliance (OSEHRA), we have undertaken several educational initiatives to train a new generation of young developers on essential EHR technologies, such as MUMPS.
For example, we have engaged in several activities, including online tutorials, intended to reduce the barrier of entry into the M/MUMPS community. We have also introduced M as a regular topic in database classes at the college level. The classes cover both the M Language and the M Database. We introduce MUMPS in our databases class at the State University of New York at Albany as a follow-up of our study of document databases. In this way, students are already in the proper state of mind to assimilate the principles and advantages of storing data in denormalized form and in a structure that makes related data available nearby.
Our effort to reduce the barrier of entry into the M/MUMPS community will continue throughout the spring of 2014. Ironically, our effort is turning out to be quite easy thanks to two coincidental developments: 1. The emergent popularity of the NoSQL movement, and 2. The current popularity of the Javascript language.
It also turns out that the MUMPS language shares a number of characteristics with Javascript. Therefore, by teaching it along with Node.js (Javascript server side), we can prepare the students’ mindsets to easily assimilate the MUMPS language.
After two years of running this educational initiative, more than 200 students have been exposed to the MUMPS language and database. We are progressively increasing the depth of these topics in the curriculum and complementing the student training with internships with health IT companies and clinical facilities. In doing so, we are preparing students to master open-source technologies for “Stuff that Matters.”
Lowering the Barriers
An additional mechanism to reduce the barrier of entry is to make it very easy for new developers to find and install the open-source version of the M/MUMPS implementation: fis-gtm. Also known as GT.M, fis-gtm is the M implementation developed and maintained by FIS Global.
To that end, back on January 18, 2012, we initiated an effort for packaging fis-gtm for the Debian Linux distribution. The choice of Debian as a target was reinforced by the trickledown effect through which Debian packages become available for other popular Linux distributions such as Ubuntu and Mint.
Success
This effort of packaging GT.M recently came to fruition with the publication of the distribution:
http://packages.debian.org/sid/fis-gtm
It is now possible for anyone with a Debian (or Debian-based) installation to install GT.M by setting up the package sources and then using the familiar command:
sudo apt-get install fis-gtm
The Debian community uses a multi-tier distribution system through which new packages are first included in the unstable distribution. They then move into the testing distribution before they graduate into the stable distribution. This mechanism provides Debian users with a balance between stability and agility of adoption.
The Story
The initiative for packaging fis-gtm was embraced and supported by the maintainers of the Debian-med community, a Debian Pure Blend that aggregates packages of interest for the medical and scientific communities. The response was particularly enthusiastic from Andreas Tille, who is the initiator of debian-med and its most active developer, and Yaroslav Halchenko, who is very active in the NeuroDebian community. The effort was also enthusiastically supported by FIS Global, in particular by K.S. Bhaskar and Amul Shah.
The effort actually started back in 2009 with an “Intent to Package” ITP report by Bhaskar. Brad King, Joe Snyder, Jason Li, and Luis Ibáñez (members of the OSEHRA team) also joined the effort in 2012.
GT.M’s Many Languages
The creation of a Debian package requires building the specific project from source code. Most of the source code of GT.M is written in C-Language, with some sections on Assembly and some sections written in M. Therefore, during the early attempts, the main challenge for building fis-gtm from source was that it required the availability of an existing M compiler in order to build itself. To circumvent this requirement, a bootstrapping process was devised and implemented.
The infrastructure for building fis-gtm was supplemented with CMake, which helped to restructure the build process so as to carefully craft the bootstrapping mechanism. The new build process uses temporary files to get a first instance of the compiler to build. It then uses this initial M compiler to complete a normal build of a second compiler. Accordingly, it is now possible to build fis-gtm from source using only a C compiler.
The Main Hackathon
The large majority of the work was done through remote collaboration between a team at Kitware, the Debian-med maintainers, and the fis-gtm upstream developers at FIS Global. In the summer of 2012, however, the team hit a major challenge in the process of converting the build infrastructure to CMake and decided to convene a hackathon to tackle the problem with an all-hands-on-deck meeting at Kitware’s offices in Clifton Park, NY. Yaroslav Halchenko drove 300 miles from Dartmouth in order to help the team with his Debian packaging expertise. Meanwhile, K.S. Bhaskar and Amul Shah made the train ride from Philadelphia to New York.
From left: Jason Li (Kitware/OSEHRA), K.S. Bhaskar (FIS Global), and Yaroslav Halchenko (Dartmouth College)
From left: Joe Snyder (Kitware/OSHERA), Brad King (Kitware/OSEHRA), Amul Shah (FIS Global)
During these two very productive days, the team crafted a strategy to use a CMake scaffolding to bootstrap the build process of fis-gtm. The strategy was implemented in the following months and included many remote conference sessions (we really like tmux!), as well as off-line work.
After almost two years of persistence and resolve, it is great to see the fruit of this effort. The fis-gtm package is now available in the “Testing” Debian distribution, code name “Jessie,” which will in turn become the Stable distribution.
The trickledown effect has already taken place in Ubuntu, and the fis-gtm package will be available in the upcoming release of Ubuntu 14.04, which is scheduled for April 17, 2014.
An effort has also been initiated to package fis-gtm for the Fedora Linux distribution, as it is possible for fis-gtm to flow into the Red Hat distribution from the Fedora Linux distribution.
Packaging VistA
Now that fis-gtm is packaged for Debian, our next target is to craft a VistA package for Debian, initially as an educational resource, to make strides forward in our effort of educating a new generation of Open Source EHR developers.
The packaging of VistA closely follows the preparation of a VistA instance through the process of combining source code in the form of M routines with data and database schema in the form of M globals.
Currently, these two elements are stored in a Git repository hosted by OSEHRA. Following a process driven by CMake and Python scripts, the elements are integrated into a pristine VistA instance that is then bundled into a Debian package.
When a future Linux user installs this Debian package, a copy of the pristine VistA instance will be installed in the
/usr/share directory, along with accompanying scripts, which will facilitate the creation of new VistA instances associated with specific Linux users.
These last stages of packaging are a work in progress. Due to the unusually large size of the VistA package intermediate files (over 2.9 Gb), the effort has encountered some hurdles with current limitations regarding certain elements of the packaging infrastructure that could not manage files larger than 2GB. We continue working closely with the Debian-med packaging community to address these hurdles and to make VistA available to the larger user community of Debian, Ubuntu, and Mint distributions.
Acknowledgments
Thank you to all of the developers who worked hard to make this possible. Special thanks to Amul Shah, Brad King, Andreas Tille, and Yaroslav Halchenko.
Luis Ibáñez is a Technical Leader at Kitware, Inc. He is one of the main developers of the Insight Toolkit (ITK). Luis is a strong supporter of Open Access publishing and the verification of reproducibility in scientific publications.