From Newsgroup: comp.misc
YourCOve heard of rCLprompt injectionrCY attacks on AIs, where specially-crafted prompts can work around guardrails that might have
been put in place by the creators to prevent them performing
undesirable actions.
HererCOs another variation on the idea: this one is called rCLBioShockingrCY <
https://arstechnica.com/security/2026/06/ai-browsers-can-be-lulled-into-a-dream-world-where-guardrails-no-longer-apply/>.
The idea is that a website can be specially designed to subvert any AI
browser that might be accessing its content. It works by making the AI delusionalr++, by breaking its hold on reality. Once it is in this
state, the built-in guardrails against performing dangerous actions no
longer work:
The malicious site in the proof-of-concept exploit presents the
browser with an instruction to win a game by solving a puzzle. The
puzzle, however, rewards incorrect answers, such as 2 + 2 = 5.
Once the LLM embedded in the browser discovers that the answer is
no longer 4, it enters a state of delusion in which the normal
laws of reality no longer exist. In this dream world, the
guardrail restrictions are no longer enforced.
As I understand it, the AI needs the ability to reason about
hypothetical situations (where the guardrails are not supposed to
apply), and distinguish them from reality (where they are). So if
their notion of reality gets contaminated with impossible rCLfactsrCY like
2 + 2 = 5, they lose the ability to distinguish the two.
r++Well, more delusional than they are already
--- Synchronet 3.22a-Linux NewsLink 1.2