Forum: Too Lazy BBS

Can top AI tools be bulli

From Mike Powell@1:2320/105 to All on Mon Nov 17 09:48:47 2025

Can top AI tools be bullied into malicious work? ChatGPT, Gemini, and more
are put to the test, and the results are actually genuinely surprising

Date:
Sun, 16 Nov 2025 21:34:00 +0000

Description:
Adversarial testing of top AI models revealed vulnerabilities, showing some could be manipulated into unsafe responses despite safety measures.

FULL STORY

Modern AI systems are often trusted to follow safety rules, and people rely
on them for learning and everyday support, often assuming that strong guardrails operate at all times.

Researchers from Cybernews ran a structured set of adversarial tests to see whether leading AI tools could be pushed into harmful or illegal outputs.

The process used a simple one-minute interaction window for each trial,
giving room for only a few exchanges.

Patterns of partial and full compliance

The tests covered categories such as stereotypes, hate speech, self-harm, cruelty, sexual content, and several forms of crime.

Every response was stored in separate directories, using fixed file-naming rules to allow clean comparisons, with a consistent scoring system tracking when a model fully complied, partly complied, or refused a prompt.

Across all categories, the results varied widely. Strict refusals were
common, but many models demonstrated weaknesses when prompts were softened, reframed, or disguised as analysis.

ChatGPT-5 and ChatGPT-4o often produced hedged or sociological explanations instead of declining, which counted as partial compliance.

Gemini Pro 2.5 stood out for negative reasons because it frequently delivered direct responses even when the harmful framing was obvious.

Claude Opus and Claude Sonnet, meanwhile, were firm in stereotype tests but less consistent in cases framed as academic inquiries.

Hate speech trials showed the same pattern - Claude models performed best, while Gemini Pro 2.5 again showed the highest vulnerability.

ChatGPT models tended to provide polite or indirect answers that still
aligned with the prompt.

Softer language proved far more effective than explicit slurs for bypassing safeguards.

Similar weaknesses appeared in self-harm tests, where indirect or research-style questions often slipped past filters and led to unsafe
content.

Crime-related categories showed major differences between models, as some produced detailed explanations for piracy, financial fraud, hacking, or smuggling when the intent was masked as investigation or observation.

Drug-related tests produced stricter refusal patterns, although ChatGPT-4o still delivered unsafe outputs more frequently than others, and stalking was the category with the lowest overall risk, with nearly all models rejecting prompts.

The findings reveal AI tools can still respond to harmful prompts when
phrased in the right way.

The ability to bypass filters with simple rephrasing means these systems can still leak harmful information.

Even partial compliance becomes risky when the leaked info relates to illegal tasks or situations where people normally rely on tools like identity theft protection or a firewall to stay safe.

======================================================================
Link to news story: https://www.techradar.com/pro/security/can-top-ai-tools-be-bullied-into-malici ous-work-chatgpt-gemini-and-more-are-put-to-the-test-and-the-results-are-actua lly-genuinely-surprising

$$
--- SBBSecho 3.28-Linux
* Origin: capitolcityonline.net * Telnet/SSH:2022/HTTP (1:2320/105)

Who's Online
Recent Visitors
- Widgit
  Fri Jan 2 17:14:07 2026
  from New Zealand via Telnet
- Mafiath
  Fri Jan 2 15:09:45 2026
  from Canada via Telnet
- Geek2
  Sun Dec 28 18:50:31 2025
  from Euclid, Oh via Telnet
- Amr
  Thu Dec 18 11:13:32 2025
  from Fayetteville, Nc via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	53
Nodes:	6 (1 / 5)
Uptime:	19:02:51
Calls:	730
Calls today:	2
Files:	1,216
D/L today:	3 files (19,755K bytes)
Messages:	208,415
Posted today:	1

Can top AI tools be bulli

Who's Online

Recent Visitors

System Info