This expert unveils a simple method to bypass AI security systems, including ChatGPT 🔓

Published by Cédric,
Author of the article: Cédric DEPOND
Source: Report published on 0din/ai
Other Languages: FR, DE, ES, PT

Is Artificial Intelligence really secure? A recent flaw in ChatGPT raises questions. Researchers show that language models can be bypassed using clever yet simple techniques.

A report by Marco Figueroa, an expert at Mozilla, reveals a method to manipulate advanced language models like OpenAI's GPT-4o. By encoding malicious instructions in hexadecimal, hackers manage to bypass security filters.


Despite its performance, GPT-4o displays weaknesses in handling user-generated content. Indeed, the system can detect potentially harmful commands in plain language, but it has certain limitations. For example, the rapid injection technique revealed by Marco Figueroa highlights these weaknesses, allowing malicious actors to outwit security systems.

Marco Figueroa explains that the model analyzes instructions step by step without grasping the underlying danger. By using various encodings, hackers succeed in manipulating the model without raising suspicions.

In the case he tested, he encoded his malicious instructions in hexadecimal (a language consisting of combinations of letters and numbers), as well as in leet language. Thus, he was able to bypass the keywords that ChatGPT blocks: the inability of GPT-4o to understand the overall context makes this technique effective.

Marco Figueroa calls on OpenAI to rethink the security of its models. Innovation capabilities must not compromise user safety. The need for increased vigilance in the development of Artificial Intelligence is paramount. The question remains: is the future of language models threatened by these vulnerabilities? Companies must redouble their efforts to strengthen user protection against these emerging threats.

The search for bypass methods will not stop. Attackers will continue exploiting vulnerabilities to create ever more sophisticated threats. The case of GPT-4o highlights the importance of security in the field of advanced technologies.

How do artificial intelligences work in terms of security?


Generative Artificial Intelligence Systems (GAIS) use language models to process and generate text. The security of these systems relies on filters designed to detect and block malicious instructions. However, this approach has limitations. GAIS analyze inputs sequentially, evaluating each instruction individually. This method, while effective for clear, direct instructions, exposes flaws when instructions are hidden in unusual formats.

Hexadecimal encoding, which uses numbers and letters to represent data, can be used to hide malicious content. By transforming instructions into a series of symbols, attackers evade detection filters. GAIS, focusing on each fragment of an instruction, fail to grasp the overall context or the potential danger of the complete instruction. Consequently, a malicious instruction can be decoded and executed without raising any suspicions.

This phenomenon highlights the vulnerability of GAIS to being manipulated by clever encodings. The compartmentalization of analysis makes it difficult for them to connect the different stages of a complex instruction. Thus, when a user provides a series of hexadecimal instructions, the system, optimized to process each element individually, ends up executing malicious commands, ignoring their actual intent.

To enhance GAIS security, more sophisticated detection mechanisms must be developed. This requires a better understanding of context and the relationships between instructions, enabling the system to block not only keywords but also potentially dangerous sequences. By improving the detection capabilities of language models, it becomes possible to reduce the risks associated with bypass methods like hexadecimal encoding.
Page generated in 0.089 second(s) - hosted by Contabo
About - Legal Notice - Contact
French version | German version | Spanish version | Portuguese version