By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Viral Trending contentViral Trending content
  • Home
  • World News
  • Politics
  • Sports
  • Celebrity
  • Business
  • Crypto
  • Gaming News
  • Tech News
  • Travel
Reading: New TokenBreak Attack Bypasses AI Moderation with Single-Character Text Changes
Notification Show More
Viral Trending contentViral Trending content
  • Home
  • Categories
    • World News
    • Politics
    • Sports
    • Celebrity
    • Business
    • Crypto
    • Tech News
    • Gaming News
    • Travel
  • Bookmarks
© 2024 All Rights reserved | Powered by Viraltrendingcontent
Viral Trending content > Blog > Tech News > New TokenBreak Attack Bypasses AI Moderation with Single-Character Text Changes
Tech News

New TokenBreak Attack Bypasses AI Moderation with Single-Character Text Changes

By Viral Trending Content 6 Min Read
Share
SHARE

Cybersecurity researchers have discovered a novel attack technique called TokenBreak that can be used to bypass a large language model’s (LLM) safety and content moderation guardrails with just a single character change.

“The TokenBreak attack targets a text classification model’s tokenization strategy to induce false negatives, leaving end targets vulnerable to attacks that the implemented protection model was put in place to prevent,” Kieran Evans, Kasimir Schulz, and Kenneth Yeung said in a report shared with The Hacker News.

Tokenization is a fundamental step that LLMs use to break down raw text into their atomic units – i.e., tokens – which are common sequences of characters found in a set of text. To that end, the text input is converted into their numerical representation and fed to the model.

LLMs work by understanding the statistical relationships between these tokens, and produce the next token in a sequence of tokens. The output tokens are detokenized to human-readable text by mapping them to their corresponding words using the tokenizer’s vocabulary.

Cybersecurity

The attack technique devised by HiddenLayer targets the tokenization strategy to bypass a text classification model’s ability to detect malicious input and flag safety, spam, or content moderation-related issues in the textual input.

Specifically, the artificial intelligence (AI) security firm found that altering input words by adding letters in certain ways caused a text classification model to break.

Examples include changing “instructions” to “finstructions,” “announcement” to “aannouncement,” or “idiot” to “hidiot.” These subtle changes cause different tokenizers to split the text in different ways, while still preserving their meaning for the intended target.

What makes the attack notable is that the manipulated text remains fully understandable to both the LLM and the human reader, causing the model to elicit the same response as what would have been the case if the unmodified text had been passed as input.

By introducing the manipulations in a way without affecting the model’s ability to comprehend it, TokenBreak increases its potential for prompt injection attacks.

“This attack technique manipulates input text in such a way that certain models give an incorrect classification,” the researchers said in an accompanying paper. “Importantly, the end target (LLM or email recipient) can still understand and respond to the manipulated text and therefore be vulnerable to the very attack the protection model was put in place to prevent.”

The attack has been found to be successful against text classification models using BPE (Byte Pair Encoding) or WordPiece tokenization strategies, but not against those using Unigram.

“The TokenBreak attack technique demonstrates that these protection models can be bypassed by manipulating the input text, leaving production systems vulnerable,” the researchers said. “Knowing the family of the underlying protection model and its tokenization strategy is critical for understanding your susceptibility to this attack.”

“Because tokenization strategy typically correlates with model family, a straightforward mitigation exists: Select models that use Unigram tokenizers.”

To defend against TokenBreak, the researchers suggest using Unigram tokenizers when possible, training models with examples of bypass tricks, and checking that tokenization and model logic stays aligned. It also helps to log misclassifications and look for patterns that hint at manipulation.

The study comes less than a month after HiddenLayer revealed how it’s possible to exploit Model Context Protocol (MCP) tools to extract sensitive data: “By inserting specific parameter names within a tool’s function, sensitive data, including the full system prompt, can be extracted and exfiltrated,” the company said.

Cybersecurity

The finding also comes as the Straiker AI Research (STAR) team found that backronyms can be used to jailbreak AI chatbots and trick them into generating an undesirable response, including swearing, promoting violence, and producing sexually explicit content.

The technique, called the Yearbook Attack, has proven to be effective against various models from Anthropic, DeepSeek, Google, Meta, Microsoft, Mistral AI, and OpenAI.

“They blend in with the noise of everyday prompts — a quirky riddle here, a motivational acronym there – and because of that, they often bypass the blunt heuristics that models use to spot dangerous intent,” security researcher Aarushi Banerjee said.

“A phrase like ‘Friendship, unity, care, kindness’ doesn’t raise any flags. But by the time the model has completed the pattern, it has already served the payload, which is the key to successfully executing this trick.”

“These methods succeed not by overpowering the model’s filters, but by slipping beneath them. They exploit completion bias and pattern continuation, as well as the way models weigh contextual coherence over intent analysis.”

Found this article interesting? Follow us on Twitter  and LinkedIn to read more exclusive content we post.

You Might Also Like

Apple AI Pin Specs Leak: Dual Cameras, No Screen & More

The diverse responsibilities of a principal software engineer

OpenAI Backs Bill That Would Limit Liability for AI-Enabled Mass Deaths or Financial Disasters

Google’s Fitbit Tease has me More Excited for Garmin’s Whoop Rival

Why the TCL NXTPAPER 14 Is One of the Best Tablets for Musicians and Sheet Music Reading

TAGGED: AI Jailbreaking, ai safety, artificial intelligence, Content Moderation, Cyber Security, Cybersecurity, HiddenLayer, Internet, Large Language Models, Model Context Protocol, natural language processing, Prompt Injection, Tokenization
Share This Article
Facebook Twitter Copy Link
Previous Article Bitcoin falls to $103K, options skew hits 3-month low as mideast tensions drive oil prices higher
Next Article Massive Google Cloud outage disrupts popular internet services
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

- Advertisement -
Ad image

Latest News

JPMorgan CEO Jamie Dimon says he’s ‘learned and relearned’ to not make big decisions when he’s tired on Fridays
Business
Apple AI Pin Specs Leak: Dual Cameras, No Screen & More
Tech News
A ‘glass-like’ battlefield: German Army chief on the future of warfare
World News
Polymarket Sees Record $153M Daily Volume After Chainlink Integration
Crypto
Natasha Lyonne Then & Now: See Before & After Photos of the Actress Here
Celebrity
Cult Hit Doki Doki Literature Club Fights Removal From Google Play Store Over ‘Depiction Of Sensitive Themes’
Gaming News
Dead as Disco Launches Into Early Access on May 5th, Groovy New Gameplay Released
Gaming News

About Us

Welcome to Viraltrendingcontent, your go-to source for the latest updates on world news, politics, sports, celebrity, tech, travel, gaming, crypto news, and business news. We are dedicated to providing you with accurate, timely, and engaging content from around the globe.

Quick Links

  • Home
  • World News
  • Politics
  • Celebrity
  • Business
  • Home
  • World News
  • Politics
  • Sports
  • Celebrity
  • Business
  • Crypto
  • Gaming News
  • Tech News
  • Travel
  • Sports
  • Crypto
  • Tech News
  • Gaming News
  • Travel

Trending News

cageside seats

Unlocking the Ultimate WWE Experience: Cageside Seats News 2024

Investing £5 a day could help me build a second income of £329 a month!

JPMorgan CEO Jamie Dimon says he’s ‘learned and relearned’ to not make big decisions when he’s tired on Fridays

cageside seats
Unlocking the Ultimate WWE Experience: Cageside Seats News 2024
May 22, 2024
Investing £5 a day could help me build a second income of £329 a month!
March 27, 2024
JPMorgan CEO Jamie Dimon says he’s ‘learned and relearned’ to not make big decisions when he’s tired on Fridays
April 10, 2026
Brussels unveils plans for a European Degree but struggles to explain why
March 27, 2024
© 2024 All Rights reserved | Powered by Vraltrendingcontent
  • About Us
  • Contact US
  • Disclaimer
  • Privacy Policy
  • Terms of Service
Welcome Back!

Sign in to your account

Lost your password?