Forum: Too Lazy BBS

Yet Another Kind Of Prompt-Injection Attack: =?UTF-8?B?4oCcQmlvU2hvY2tpbmfigJ0=?=

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.misc on Wed Jul 1 01:44:38 2026

From Newsgroup: comp.misc

YourCOve heard of rCLprompt injectionrCY attacks on AIs, where specially-crafted prompts can work around guardrails that might have
been put in place by the creators to prevent them performing
undesirable actions.

HererCOs another variation on the idea: this one is called rCLBioShockingrCY <https://arstechnica.com/security/2026/06/ai-browsers-can-be-lulled-into-a-dream-world-where-guardrails-no-longer-apply/>.
The idea is that a website can be specially designed to subvert any AI
browser that might be accessing its content. It works by making the AI delusionalr++, by breaking its hold on reality. Once it is in this
state, the built-in guardrails against performing dangerous actions no
longer work:

The malicious site in the proof-of-concept exploit presents the
browser with an instruction to win a game by solving a puzzle. The
puzzle, however, rewards incorrect answers, such as 2 + 2 = 5.
Once the LLM embedded in the browser discovers that the answer is
no longer 4, it enters a state of delusion in which the normal
laws of reality no longer exist. In this dream world, the
guardrail restrictions are no longer enforced.

As I understand it, the AI needs the ability to reason about
hypothetical situations (where the guardrails are not supposed to
apply), and distinguish them from reality (where they are). So if
their notion of reality gets contaminated with impossible rCLfactsrCY like
2 + 2 = 5, they lose the ability to distinguish the two.

r++Well, more delusional than they are already
--- Synchronet 3.22a-Linux NewsLink 1.2

From not@not@telling.you.invalid (Computer Nerd Kev) to comp.misc on Thu Jul 2 07:36:00 2026

From Newsgroup: comp.misc

Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

Here's another variation on the idea: this one is called "BioShocking" <https://arstechnica.com/security/2026/06/ai-browsers-can-be-lulled-into-a-dream-world-where-guardrails-no-longer-apply/>.
The idea is that a website can be specially designed to subvert any AI browser that might be accessing its content. It works by making the AI delusional?, by breaking its hold on reality. Once it is in this
state, the built-in guardrails against performing dangerous actions no
longer work:

The malicious site in the proof-of-concept exploit presents the
browser with an instruction to win a game by solving a puzzle. The
puzzle, however, rewards incorrect answers, such as 2 + 2 = 5.
Once the LLM embedded in the browser discovers that the answer is
no longer 4, it enters a state of delusion in which the normal
laws of reality no longer exist. In this dream world, the
guardrail restrictions are no longer enforced.

The article references BioShock and 1984, but this definitely
reminds me of how Doctor Who used to deal with computers that try
to take over the world by confusing them with nonsense.

It didn't really imply an AI computer in this case, but this scene
from The Ambassadors of Death is also notable:

"Liz and another scientist called Dobson have found no pattern to
the tape. The Doctor suggests there might be something wrong with
the computer. Dobson is sceptical, saying that the computer is
infallible. The Doctor gets Liz to ask the computer what two plus
two is. The computer responds with five. The Doctor says that
Taltalian has sabotaged the computer."
https://tardis.wiki/wiki/The_Ambassadors_of_Death_(TV_story)#Episode_2
--
__ __
#_ < |\| |< _#
--- Synchronet 3.22a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Thu Jul 2 11:41:05 2026
  from Euclid, Oh via Telnet
- Hannibal
  Thu Jul 2 05:49:27 2026
  from Des Moines via SSH
- Geek2
  Wed Jul 1 16:31:20 2026
  from Euclid, Oh via Telnet
- Hannibal
  Tue Jun 30 16:45:42 2026
  from Des Moines via SSH

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	70
Nodes:	6 (0 / 6)
Uptime:	37:36:46
Calls:	948
Calls today:	2
Files:	1,325
Messages:	280,462

Yet Another Kind Of Prompt-Injection Attack: =?UTF-8?B?4oCcQmlvU2hvY2tpbmfigJ0=?=

Who's Online

Recent Visitors

System Info