The xz Utils attack on Open Source

April 15, 2024
Generic graphic of a several green boxes and one red box with a bug on it being added to the others.

My entire career has involved open source software. In the early 90s, while I was working in a computer vision group at GE Research my master’s thesis was an open source framework for computer vision research called TargetJr. After a decade at GE, I left as one of the five founders of Kitware, an entire company built on the foundation of an open source toolkit called VTK. Over the years Kitware has created and maintained many other open source tools including the de-facto standard build tool for C++ CMake. I have watched open source go from something interesting to researchers, to the foundation of the modern internet economy. There is open source software in almost everything that has a computer chip in it these days, your car, your watch, your phone, your TV, whatever device you are using to read this text right now. 

xz Utils attack 

On Mar 29, Andres Freund, a software engineer at Microsoft, discovered a backdoor that had been placed in an open source compression library/tool called xz Utils. That backdoor allowed someone with the key to have access to ssh servers around the world. Since it was discovered early, the damage from this attack was minimal, however, the implications are vast and far-reaching. I won’t go into the technical details too much as many others have done that. Here is a small list of resources I have collected about the incident if you want to read more about it:

Why this is different

What makes this attack so novel and frankly disturbing is that this was a very long con by a technically proficient person or team of people lasting almost two years. The perpetrator used social engineering and regular software engineering to gain the trust of and to coerce the maintainer of the XZ library to give them the authority to add new code to the project. The sheer amount of time invested in this endeavor is what makes it so concerning. Since the dawn of open source, the “what if” question of someone with bad intent contributing to a project has finally happened. The fact that this is the first time something like this has happened (that we know of) should be somewhat comforting, but now that it has happened what can be done in the future? 

Open source to blame?

OK, so one take is that open source is not secure and we should all be using proprietary software from now on. Obviously that is ridiculous, this could also have been an insider threat working for a company. The fact that this was discovered so quickly and how it was discovered shows some of the benefits of open source software. The engineer was able to debug and see the source code causing the issue BECAUSE it was open source. As a community we were able to see how the backdoor was created, and injected into the code base. Had this been an insider working on a closed source project only people in that company would have had the access to see the code and find the issue.

What can be done?

Much like the phishing epidemic this is going to be hard to combat because it can be impossible to stop someone with bad intent and access from doing evil things. That said, I think there are some lessons to be learned from this incident and some technical and non-technical things that can be done.

Technical solutions

Well, let’s start with CMake, a project that I know well that made some of the news.  One of the changes made was to the CMake build file for the project. In this case a stray “.” was added to a function that checks to see if a “sandbox” feature is available on the system the software is being built on. The stray “.” would cause this feature to never be enabled on any system.

This resulted in a discussion on the CMake issue tracker to see if check_c_source_compiles could be improved to prevent this. For a full discussion of that see this issue: https://gitlab.kitware.com/cmake/cmake/-/issues/25846. The best way for this to be discovered is to use continuous integration (CI) to check that this feature is enabled where it is expected to be. Having more code review could also benefit the project and using automated code formatters. Of course better CI and more checks involves more funding and eyes on projects which are not technical solutions.

Securing release packages

In the XZ attack all of the groundwork was cleverly done in the git repository. However, for the final deployment of the back door software the final release tar file was modified from the source in git and the hack was enabled. When software is released and a final package is created it oftentimes is different than what is in the git repository. This is normal for projects to have some changes. For example,  curl has put out a statement about how to reproduce their release tarballs:  https://curl.se/mail/lib-2024-03/0062.html

For the CMake project, we produce these using a CI job which we can publish with our releases https://gitlab.kitware.com/cmake/cmake/-/blob/v3.28.4/.gitlab/os-linux.yml#L512.  

Projects going forward should work on a way to make it transparent as to what is in the final release files.

Non-technical solutions

It seems that the world has not yet figured out how to handle small but important open source projects. With something like XZ, once the code has been developed and used for a while, it goes into maintenance mode, it is not going to pay anyone a full time salary because there just is not that much work left to be done. However, the project needs good CI which means more access to hardware and software, it does need maintenance and some level of support. This is best illustrated in this xkcd cartoon:

https://imgs.xkcd.com/comics/dependency.png

Supporting small key open source projects is not yet a solved problem. 

Finding projects

The first step is to find projects that are important but overlooked building blocks. I am part of a DOE funded effort CORSA with a mission to promote sustainable practices in the research software community and support the development, maintenance, and long-term viability of research software. As part of that effort we are working on helping DOE projects find software foundations like Linux Foundation and NumFOCUS to help support them. As part of that effort we plan to expand the Kitware sustainability matrix

Adopt a project

I could see a future where foundations and companies look at software dependencies and create lists of small important projects that everyone depends on. The foundations or companies could then adopt those projects and give them access to improved CI infrastructure, provide engineers for code review  and help to build a community around the project. 

Figure out who is real

“Real isn’t how you are made,” said the Skin Horse. “It’s a thing that happens to you. When a child loves you for a long, long time, not just to play with, but REALLY loves you, then you become Real.” – Velveteen Rabbit

So, from all accounts Jia Tan JiaT75 https://github.com/JiaT75, is not actually a real person, but rather a persona. Is there a better way to confirm someone is real before you give them the keys to a project? Perhaps having a video conference and talking to them would be a good place to start. After COVID everyone in the world can do video conferences. 

Summary

This is not the first time open source has faced issues over security because of a small unmaintained but very important library, see heartbleed.  I am sure it won’t be the last. There are lessons to be learned, and precautions to take. Open source has created an amazing amount of technical innovation. As a community we can learn and grow from incidents like this one. What are your thoughts on how to protect open source software from attacks like this?

2 comments to The xz Utils attack on Open Source

  1. The code in the release tarball differing from the code in the git repo was one of the most surprising parts of this setup to me. Learning that curl is built on some maintainer’s machine (“Install the same set of tools + versions I use”) is very scary. It shows that developing a software release process in 2024 is still not easy. Building releases on CI is clearly the way to go, but we also have to find a way to standardize CI services without sacrificing flexibility.

    Securing the release packages is important, but doing so only prevents obfuscation of the mallicious code. The critical vector for this attack was social engineering. Technology can only help so much here. Still, I think there are innovations to be made. I think some sort of proof-of-identity could go a long way. Perhaps that is as simple as a video conference, but real developers tend to have multiple online presences and the depth of each presence is evidence of personhood. Finding a good mechanism to aggregate that evidence into a sybil-resistant digital identity that can be tied to a reputation would help defend against these persona attacks.

Leave a Reply