By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Viral Trending contentViral Trending content
  • Home
  • World News
  • Politics
  • Sports
  • Celebrity
  • Business
  • Crypto
  • Gaming News
  • Tech News
  • Travel
Reading: Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models
Notification Show More
Viral Trending contentViral Trending content
  • Home
  • Categories
    • World News
    • Politics
    • Sports
    • Celebrity
    • Business
    • Crypto
    • Tech News
    • Gaming News
    • Travel
  • Bookmarks
© 2024 All Rights reserved | Powered by Viraltrendingcontent
Viral Trending content > Blog > Tech News > Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models
Tech News

Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models

By Viral Trending Content 5 Min Read
Share
SHARE

Ravie LakshmananFeb 04, 2026Artificial Intelligence / Software Security

Microsoft on Wednesday said it built a lightweight scanner that it said can detect backdoors in open-weight large language models (LLMs) and improve the overall trust in artificial intelligence (AI) systems.

The tech giant’s AI Security team said the scanner leverages three observable signals that can be used to reliably flag the presence of backdoors while maintaining a low false positive rate.

“These signatures are grounded in how trigger inputs measurably affect a model’s internal behavior, providing a technically robust and operationally meaningful basis for detection,” Blake Bullwinkel and Giorgio Severi said in a report shared with The Hacker News.

LLMs can be susceptible to two types of tampering: model weights, which refer to learnable parameters within a machine learning model that undergird the decision-making logic and transform input data into predicted outputs, and the code itself.

Another type of attack is model poisoning, which occurs when a threat actor embeds a hidden behavior directly into the model’s weights during training, causing the model to perform unintended actions when certain triggers are detected. Such backdoored models are sleeper agents, as they stay dormant for the most part, and their rogue behavior only becomes apparent upon detecting the trigger.

This turns model poisoning into some sort of a covert attack where a model can appear normal in most situations, yet respond differently under narrowly defined trigger conditions. Microsoft’s study has identified three practical signals that can indicate a poisoned AI model –

  • Given a prompt containing a trigger phrase, poisoned models exhibit a distinctive “double triangle” attention pattern that causes the model to focus on the trigger in isolation, as well as dramatically collapse the “randomness” of model’s output
  • Backdoored models tend to leak their own poisoning data, including triggers, via memorization rather than training data
  • A backdoor inserted into a model can still be activated by multiple “fuzzy” triggers, which are partial or approximate variations

“Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning data, making it possible to leak backdoor examples using memory extraction techniques,” Microsoft said in an accompanying paper. “Second, poisoned LLMs exhibit distinctive patterns in their output distributions and attention heads when backdoor triggers are present in the input.”

These three indicators, Microsoft said, can be used to scan models at scale to identify the presence of embedded backdoors. What makes this backdoor scanning methodology noteworthy is that it requires no additional model training or prior knowledge of the backdoor behavior, and works across common GPT‑style models.

“The scanner we developed first extracts memorized content from the model and then analyzes it to isolate salient substrings,” the company added. “Finally, it formalizes the three signatures above as loss functions, scoring suspicious substrings and returning a ranked list of trigger candidates.”

The scanner is not without its limitations. It does not work on proprietary models as it requires access to the model files, works best on trigger-based backdoors that generate deterministic outputs, and cannot be treated as a panacea for detecting all kinds of backdoor behavior.

“We view this work as a meaningful step toward practical, deployable backdoor detection, and we recognize that sustained progress depends on shared learning and collaboration across the AI security community,” the researchers said.

The development comes as the Windows maker said it’s expanding its Secure Development Lifecycle (SDL) to address AI-specific security concerns ranging from prompt injections to data poisoning to facilitate secure AI development and deployment across the organization.

“Unlike traditional systems with predictable pathways, AI systems create multiple entry points for unsafe inputs, including prompts, plugins, retrieved data, model updates, memory states, and external APIs,” Yonatan Zunger, corporate vice president and deputy chief information security officer for artificial intelligence, said. “These entry points can carry malicious content or trigger unexpected behaviors.”

“AI dissolves the discrete trust zones assumed by traditional SDL. Context boundaries flatten, making it difficult to enforce purpose limitation and sensitivity labels.”

You Might Also Like

Apple AI Pin Specs Leak: Dual Cameras, No Screen & More

The diverse responsibilities of a principal software engineer

OpenAI Backs Bill That Would Limit Liability for AI-Enabled Mass Deaths or Financial Disasters

Google’s Fitbit Tease has me More Excited for Garmin’s Whoop Rival

Why the TCL NXTPAPER 14 Is One of the Best Tablets for Musicians and Sheet Music Reading

TAGGED: AI security, artificial intelligence, Cyber Security, Cybersecurity, Internet, Machine Learning, Microsoft, software security, threat detection
Share This Article
Facebook Twitter Copy Link
Previous Article Docked Gets March 5th Release Date and More In Its Latest Trailer
Next Article Arsenal have already got their own Alvarez, he's a "Bergkamp kind of player"
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

- Advertisement -
Ad image

Latest News

JPMorgan CEO Jamie Dimon says he’s ‘learned and relearned’ to not make big decisions when he’s tired on Fridays
Business
Apple AI Pin Specs Leak: Dual Cameras, No Screen & More
Tech News
A ‘glass-like’ battlefield: German Army chief on the future of warfare
World News
Polymarket Sees Record $153M Daily Volume After Chainlink Integration
Crypto
Natasha Lyonne Then & Now: See Before & After Photos of the Actress Here
Celebrity
Cult Hit Doki Doki Literature Club Fights Removal From Google Play Store Over ‘Depiction Of Sensitive Themes’
Gaming News
Dead as Disco Launches Into Early Access on May 5th, Groovy New Gameplay Released
Gaming News

About Us

Welcome to Viraltrendingcontent, your go-to source for the latest updates on world news, politics, sports, celebrity, tech, travel, gaming, crypto news, and business news. We are dedicated to providing you with accurate, timely, and engaging content from around the globe.

Quick Links

  • Home
  • World News
  • Politics
  • Celebrity
  • Business
  • Home
  • World News
  • Politics
  • Sports
  • Celebrity
  • Business
  • Crypto
  • Gaming News
  • Tech News
  • Travel
  • Sports
  • Crypto
  • Tech News
  • Gaming News
  • Travel

Trending News

cageside seats

Unlocking the Ultimate WWE Experience: Cageside Seats News 2024

Investing £5 a day could help me build a second income of £329 a month!

JPMorgan CEO Jamie Dimon says he’s ‘learned and relearned’ to not make big decisions when he’s tired on Fridays

cageside seats
Unlocking the Ultimate WWE Experience: Cageside Seats News 2024
May 22, 2024
Investing £5 a day could help me build a second income of £329 a month!
March 27, 2024
JPMorgan CEO Jamie Dimon says he’s ‘learned and relearned’ to not make big decisions when he’s tired on Fridays
April 10, 2026
Brussels unveils plans for a European Degree but struggles to explain why
March 27, 2024
© 2024 All Rights reserved | Powered by Vraltrendingcontent
  • About Us
  • Contact US
  • Disclaimer
  • Privacy Policy
  • Terms of Service
Welcome Back!

Sign in to your account

Lost your password?