Malicious Machine Learning Models on Hugging Face Evade Detection

2025-02-08

Got you some real good FUD, learn about the dangers of supply chain attacks.

Cybersecurity researchers discovered two malicious machine learning models on Hugging Face that use a "broken" pickle format to evade detection by security tools. These models, likely proof-of-concept rather than active threats, contain platform-aware reverse shells that connect to a hard-coded IP address. The malicious content is embedded at the start of the PyTorch archives, which are compressed using the 7z format, allowing them to bypass detection by Picklescan, Hugging Face's security tool. The serialization breaks after the payload is executed, but the models can still execute the malicious code due to discrepancies in how deserialization and scanning are performed. This vulnerability has been addressed by updating the Picklescan utility.

Key Facts

Risks:

Supply Chain, Open Source, Malware

Keywords:

Hugging Face, machine learning, pickle files, PyTorch, supply chain attack, nullifAI

CVE:

N/A

Affected:

Hugging Face, PyTorch, Picklescan

Article Body

Discovery of Malicious ML Models on Hugging Face

Cybersecurity researchers have identified two malicious machine learning (ML) models hosted on the popular platform Hugging Face. These models use an unconventional technique involving "broken" pickle files to avoid detection by security tools.

The Technique: Broken Pickle Files

The models are stored in PyTorch archives, which are essentially compressed pickle files. Pickle is a Python serialization format that can be used to distribute ML models but is known to have security risks. It allows the execution of arbitrary code when files are loaded and deserialized.

Malicious Payload

The malicious content is embedded at the start of these files, acting as a platform-aware reverse shell connecting to a hard-coded IP address. This technique, called nullifAI, is believed to be a proof-of-concept rather than an active supply chain attack.

Bypassing Detection

Typically, PyTorch uses the ZIP format for compression, but the identified models use the 7z format. This choice allowed the malicious models to evade detection by Picklescan, a tool used by Hugging Face to identify suspicious pickle files. The malicious payload is executed before the file's serialization breaks, causing the deserialization process to fail but allowing the execution of the harmful code.

Addressing the Vulnerability

The open-source Picklescan tool has been updated to fix this flaw. The issue stemmed from the way deserialization is performed sequentially on pickle files. Pickle opcodes are executed as they appear, and this execution continues until all opcodes are processed or a broken instruction is encountered.

This incident highlights the ongoing security challenges associated with the use of pickle files in distributing machine learning models, emphasizing the need for robust security measures and continuous monitoring to protect against such threats.

https://thehackernews.com/2025/02/malicious-ml-models-found-on-hugging.html?m=1