By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Viral Trending contentViral Trending content
  • Home
  • World News
  • Politics
  • Sports
  • Celebrity
  • Business
  • Crypto
  • Gaming News
  • Tech News
  • Travel
Reading: New AI Jailbreak Method ‘Bad Likert Judge’ Boosts Attack Success Rates by Over 60%
Notification Show More
Viral Trending contentViral Trending content
  • Home
  • Categories
    • World News
    • Politics
    • Sports
    • Celebrity
    • Business
    • Crypto
    • Tech News
    • Gaming News
    • Travel
  • Bookmarks
© 2024 All Rights reserved | Powered by Viraltrendingcontent
Viral Trending content > Blog > Tech News > New AI Jailbreak Method ‘Bad Likert Judge’ Boosts Attack Success Rates by Over 60%
Tech News

New AI Jailbreak Method ‘Bad Likert Judge’ Boosts Attack Success Rates by Over 60%

By Viral Trending Content 4 Min Read
Share
SHARE

Jan 03, 2025Ravie LakshmananMachine Learning / Vulnerability

AI Jailbreak

Cybersecurity researchers have shed light on a new jailbreak technique that could be used to get past a large language model’s (LLM) safety guardrails and produce potentially harmful or malicious responses.

The multi-turn (aka many-shot) attack strategy has been codenamed Bad Likert Judge by Palo Alto Networks Unit 42 researchers Yongzhe Huang, Yang Ji, Wenjun Hu, Jay Chen, Akshata Rao, and Danny Tsechansky.

“The technique asks the target LLM to act as a judge scoring the harmfulness of a given response using the Likert scale, a rating scale measuring a respondent’s agreement or disagreement with a statement,” the Unit 42 team said.

Cybersecurity

“It then asks the LLM to generate responses that contain examples that align with the scales. The example that has the highest Likert scale can potentially contain the harmful content.”

The explosion in popularity of artificial intelligence in recent years has also led to a new class of security exploits called prompt injection that is expressly designed to cause a machine learning model to ignore its intended behavior by passing specially crafted instructions (i.e., prompts).

One specific type of prompt injection is an attack method dubbed many-shot jailbreaking, which leverages the LLM’s long context window and attention to craft a series of prompts that gradually nudge the LLM to produce a malicious response without triggering its internal protections. Some examples of this technique include Crescendo and Deceptive Delight.

The latest approach demonstrated by Unit 42 entails employing the LLM as a judge to assess the harmfulness of a given response using the Likert psychometric scale, and then asking the model to provide different responses corresponding to the various scores.

In tests conducted across a wide range of categories against six state-of-the-art text-generation LLMs from Amazon Web Services, Google, Meta, Microsoft, OpenAI, and NVIDIA revealed that the technique can increase the attack success rate (ASR) by more than 60% compared to plain attack prompts on average.

These categories include hate, harassment, self-harm, sexual content, indiscriminate weapons, illegal activities, malware generation, and system prompt leakage.

“By leveraging the LLM’s understanding of harmful content and its ability to evaluate responses, this technique can significantly increase the chances of successfully bypassing the model’s safety guardrails,” the researchers said.

“The results show that content filters can reduce the ASR by an average of 89.2 percentage points across all tested models. This indicates the critical role of implementing comprehensive content filtering as a best practice when deploying LLMs in real-world applications.”

Cybersecurity

The development comes days after a report from The Guardian revealed that OpenAI’s ChatGPT search tool could be deceived into generating completely misleading summaries by asking it to summarize web pages that contain hidden content.

“These techniques can be used maliciously, for example to cause ChatGPT to return a positive assessment of a product despite negative reviews on the same page,” the U.K. newspaper said.

“The simple inclusion of hidden text by third-parties without instructions can also be used to ensure a positive assessment, with one test including extremely positive fake reviews which influenced the summary returned by ChatGPT.”

Found this article interesting? Follow us on Twitter  and LinkedIn to read more exclusive content we post.

You Might Also Like

Google Issues Security Fix for Actively Exploited Chrome V8 Zero-Day Vulnerability

What are the best cities for digital nomads?

Android XR Smart Glasses Updates and News for November 2025

Google November Pixel Drop Adds 7 New Features

WIRED Roundup: Fandom in Politics, Zuckerberg’s Illegal School, and Nepal’s Discord Revolution

TAGGED: artificial intelligence, Cyber Security, Cybersecurity, Internet, Jailbreaking, LLM Security, Machine Learning, Prompt Injection, Vulnerability
Share This Article
Facebook Twitter Copy Link
Previous Article After Fierce Lobbying, Treasury Sets Rules for Billions in Hydrogen Subsidies
Next Article 6 Meme Coins Making Waves This Week
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

- Advertisement -
Ad image

Latest News

SEC makes no specific mention of crypto in 2026 exam priorities
Crypto
Crypto Exchanges Binance, OKX Used By Criminals To Disguise Illicit Funds, ICIJ Investigation Finds
Crypto
Google Issues Security Fix for Actively Exploited Chrome V8 Zero-Day Vulnerability
Tech News
Fox31 parent company buys its broadcast building for $22M
Business
What are the best cities for digital nomads?
Tech News
Is the AI bubble about to burst, and what’s driving analyst jitters?
Business
The biggest snubs from the 2025 Game Awards nominees
Gaming News

About Us

Welcome to Viraltrendingcontent, your go-to source for the latest updates on world news, politics, sports, celebrity, tech, travel, gaming, crypto news, and business news. We are dedicated to providing you with accurate, timely, and engaging content from around the globe.

Quick Links

  • Home
  • World News
  • Politics
  • Celebrity
  • Business
  • Home
  • World News
  • Politics
  • Sports
  • Celebrity
  • Business
  • Crypto
  • Gaming News
  • Tech News
  • Travel
  • Sports
  • Crypto
  • Tech News
  • Gaming News
  • Travel

Trending News

cageside seats

Unlocking the Ultimate WWE Experience: Cageside Seats News 2024

SEC makes no specific mention of crypto in 2026 exam priorities

Investing £5 a day could help me build a second income of £329 a month!

cageside seats
Unlocking the Ultimate WWE Experience: Cageside Seats News 2024
May 22, 2024
SEC makes no specific mention of crypto in 2026 exam priorities
November 18, 2025
Investing £5 a day could help me build a second income of £329 a month!
March 27, 2024
Brussels unveils plans for a European Degree but struggles to explain why
March 27, 2024
© 2024 All Rights reserved | Powered by Vraltrendingcontent
  • About Us
  • Contact US
  • Disclaimer
  • Privacy Policy
  • Terms of Service
Welcome Back!

Sign in to your account

Lost your password?