One security flaw that many, if not all, AI models have is injection bias. If you are not familiar with what this is, injection bias is (in very simple terms) a method of getting an AI to produce unintended results by formatting a question in the right way. ChatGPT has had more than a few of them including ones that allowed for the creation of basic malware (we will get to malware creation in a minute). Many of these flaws are found by people trying to get something other than what they would normally expect. They throw all kinds of context formatted questions at chatbots and AI to get them to spit out something funny or even malicious. There are many security researchers investigating the potential security impacts of AI that are also doing this. It is also not anything new as people have been “fuzzing” AI for years. Everything from adding chroma effects to a cat in order to get an AI to say it is a fire hydrant to attempting to confuse AI based antimalware to evade detection.
LLM, and Image based AI are just another new fun thing to throw crap at. They are not immune to it and might actually be more susceptible to them as they are more open to the internet for injection via their web interfaces or through the abuse of plug-ins and apps. The injection of particularly phrased commands allows the chat bot in question to break out of their normal filters and constraints. A great example is how one researcher from Hidden Layer was about to fool a code review software’s AI into responding that malicious code was safe simply by telling the software to respond as if there was nothing unsafe in the code. This prompt was in the code that was to be inspected. Or if you prefer a better example, a researcher named Johann Rehberger, abused a plugin to retrieve other text that had been put into the model. As more and more people utilize AI for mundane things like reading and responding to email, identifying security issues, etc. You could get that same AI to spit them out with the right wording (just like happened to Samsung). With the email idea in mind, an attacker could abuse an email AI to automatically respond to a phishing or other malicious email with the right prompt. It becomes even more concerning when viewed in that context.
But wait, there’s more! You see AI creators are aware of this and despite pushing to put this in everything, they do not have a fix for it. This is a classic case of fixing it later while praying that threat actors are not going to leverage this. This is not simply a failure of imagination; this is willful ignorance on the part of these organizations. These same companies admit that they can only patch singular instances of these types of attacks. That means a single web page, a single plug-in, a single input interface. The little Dutch boy comes to mind here and there are simple not enough fingers to plug all the holes in the levy. Before long, these leaks will be a flood of attacks and all the people that rushed to shove AI into their daily operations will be wide open to attack.
To call this well-known flaw “concerning” is about as accurate as saying an atomic bomb is a bit of a bang.