• Yet Another Kind Of Prompt-Injection Attack: =?UTF-8?B?4oCcQmlvU2hvY2tpbmfigJ0=?=

    From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.misc on Wed Jul 1 01:44:38 2026
    From Newsgroup: comp.misc

    YourCOve heard of rCLprompt injectionrCY attacks on AIs, where specially-crafted prompts can work around guardrails that might have
    been put in place by the creators to prevent them performing
    undesirable actions.

    HererCOs another variation on the idea: this one is called rCLBioShockingrCY <https://arstechnica.com/security/2026/06/ai-browsers-can-be-lulled-into-a-dream-world-where-guardrails-no-longer-apply/>.
    The idea is that a website can be specially designed to subvert any AI
    browser that might be accessing its content. It works by making the AI delusionalr++, by breaking its hold on reality. Once it is in this
    state, the built-in guardrails against performing dangerous actions no
    longer work:

    The malicious site in the proof-of-concept exploit presents the
    browser with an instruction to win a game by solving a puzzle. The
    puzzle, however, rewards incorrect answers, such as 2 + 2 = 5.
    Once the LLM embedded in the browser discovers that the answer is
    no longer 4, it enters a state of delusion in which the normal
    laws of reality no longer exist. In this dream world, the
    guardrail restrictions are no longer enforced.

    As I understand it, the AI needs the ability to reason about
    hypothetical situations (where the guardrails are not supposed to
    apply), and distinguish them from reality (where they are). So if
    their notion of reality gets contaminated with impossible rCLfactsrCY like
    2 + 2 = 5, they lose the ability to distinguish the two.

    r++Well, more delusional than they are already
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From not@not@telling.you.invalid (Computer Nerd Kev) to comp.misc on Thu Jul 2 07:36:00 2026
    From Newsgroup: comp.misc

    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    Here's another variation on the idea: this one is called "BioShocking" <https://arstechnica.com/security/2026/06/ai-browsers-can-be-lulled-into-a-dream-world-where-guardrails-no-longer-apply/>.
    The idea is that a website can be specially designed to subvert any AI browser that might be accessing its content. It works by making the AI delusional?, by breaking its hold on reality. Once it is in this
    state, the built-in guardrails against performing dangerous actions no
    longer work:

    The malicious site in the proof-of-concept exploit presents the
    browser with an instruction to win a game by solving a puzzle. The
    puzzle, however, rewards incorrect answers, such as 2 + 2 = 5.
    Once the LLM embedded in the browser discovers that the answer is
    no longer 4, it enters a state of delusion in which the normal
    laws of reality no longer exist. In this dream world, the
    guardrail restrictions are no longer enforced.

    The article references BioShock and 1984, but this definitely
    reminds me of how Doctor Who used to deal with computers that try
    to take over the world by confusing them with nonsense.

    It didn't really imply an AI computer in this case, but this scene
    from The Ambassadors of Death is also notable:

    "Liz and another scientist called Dobson have found no pattern to
    the tape. The Doctor suggests there might be something wrong with
    the computer. Dobson is sceptical, saying that the computer is
    infallible. The Doctor gets Liz to ask the computer what two plus
    two is. The computer responds with five. The Doctor says that
    Taltalian has sabotaged the computer."
    https://tardis.wiki/wiki/The_Ambassadors_of_Death_(TV_story)#Episode_2
    --
    __ __
    #_ < |\| |< _#
    --- Synchronet 3.22a-Linux NewsLink 1.2