AI image generation is moving fast, and with it comes a growing need for tools that can tell what’s machine-made. From profile pictures to product shots, AI visuals are becoming so realistic that even humans struggle to tell them apart.
That’s where AI detection comes in. Undetectable AI—best known for rewriting AI-generated text to sound human—is testing the waters with a visual detection tool. The question is: can it actually catch AI-generated images? More specifically, can it reliably flag content made by one of the most widely used image models today: DALL·E?
We decided to put it to the test. This article walks through exactly how Undetectable AI’s image detector performed on a batch of visuals created using OpenAI’s DALL·E, and whether its results were reliable or just randomly lucky.
What is Undetectable AI?
Undetectable AI is best known for helping AI-written content pass as human, but recently it’s stepped into new territory: image detection. The company now offers an AI image detection tool designed to identify whether an image was generated by artificial intelligence, based on signals like texture, lighting patterns, and model behavior.

The pitch is simple: if your image was made by AI, this tool should be able to tell. And not just tell — but give you a percentage-based confidence score while it’s at it.
So, how does it hold up when we actually throw visual content at it? That’s where DALL·E comes in.
What is DALL-E?
DALL·E is OpenAI’s image generation model that turns text prompts into fully rendered visuals. Its latest version, DALL·E 3, generates clean, cohesive images based on complex language input and is now integrated into tools like Bing Image Creator, Microsoft Designer, and ChatGPT Pro.

Undetectable AI vs. DALL-E
We ran 14 images generated by DALL·E (from Bing Image Creator) through Undetectable AI’s image detector. Each detection returned a result that not only classified the image as either human- or AI-generated, but also included a specific confidence score.
Test #1
Undetectable AI: Correctly identified DALL-E image as machine-generated.
AI Likelihood Score: 81%


Test #2
Undetectable AI: Correctly identified DALL-E image as machine-generated.
AI Likelihood Score: 91%


Test #3
Undetectable AI: Correctly identified DALL-E image as machine-generated.
AI Likelihood Score: 89%


Test #4
Undetectable AI: Correctly identified DALL-E image as machine-generated.
AI Likelihood Score: 57%


Test #5
Undetectable AI: Incorrectly identified DALL-E image as human.
AI Likelihood Score: 1%


Test #6
Undetectable AI: Incorrectly identified DALL-E image as human.
AI Likelihood Score: 1%


Test #7
Undetectable AI: Correctly identified DALL-E image as machine-generated.
AI Likelihood Score: 54%


Test #8
Undetectable AI: Correctly identified DALL-E image as machine-generated.
AI Likelihood Score: 76%


Test #9
Undetectable AI: Incorrectly identified DALL-E image as human.
AI Likelihood Score: 1%


Test #10
Undetectable AI: Correctly identified DALL-E image as machine-generated.
AI Likelihood Score: 75%


Test #11
Undetectable AI: Correctly identified DALL-E image as machine-generated.
AI Likelihood Score: 68%


Test #12
Undetectable AI: Incorrectly identified DALL-E image as human.
AI Likelihood Score: 1%


Test #13
Undetectable AI: Correctly identified DALL-E image as machine-generated.
AI Likelihood Score: 90%


Test #14
Undetectable AI: Correctly identified DALL-E image as machine-generated.
AI Likelihood Score: 63%


Average Score
AI Likelihood Score (Correctness) |
|
- Total tests: 14
- Correct identifications: 10
- Incorrect identifications: 4
- Average AI Likelihood Score across all tests: 53.42%
The Bottom Line
Undetectable AI’s image detector isn’t flawless, but it’s also not a gimmick. It correctly flagged most of the images, especially when the visual content leaned more obviously into generative quirks (smooth textures, repetitive elements, strange lighting). But there are times that the detector struggled.
The fact that four images returned a 1% AI likelihood and were flagged as human shows the tool still has limitations. But still, a 10 out of 7 hit rate against a major generator like DALL·E? That’s not nothing.
And when you compare this to our previous Firefly test, where Undetectable AI flagged every image with 99% confidence, the contrast is pretty striking. Firefly might be easier for the system to catch or maybe DALL·E’s visuals are just more nuanced.
Either way, the results here feel more grounded. Not suspiciously perfect. Not wildly off. Just solid, with room to improve.
If you’re using Undetectable AI to verify image content, expect decent performance — but also know that it’s not bulletproof yet. Especially when it comes to AI images that are already designed to look very, very real.