Ever feel like your content is missing that special something? Like your words are just… falling flat?
In the age of TikTok and podcasts, audio is king. But let’s face it — not all of us are blessed with a voice that could melt butter. And hiring voice actors? That’s a whole other headache (and expense) most of us would rather avoid.
And that’s why AI text-to-speech services are becoming more popular. ElevenLabs seems to be the next in line in AI audio generation. As someone who’s tested more AI tools than I care to admit, I was skeptical. But I wanted to give it a try, and let me tell you, I liked it. A lot.
So, what made me change my mind? Let’s talk about it.
What is ElevenLabs?
Ever wished you could have Morgan Freeman narrate your grocery list? Well, ElevenLabs might not get you that far (yet — though they do have Deepak Chopra), but they’re certainly pushing the boundaries of what’s possible in AI-generated speech. We’re talking about an AI company that’s making content accessible in any language, with any voice you can imagine.
ElevenLabs is more than just playing around with robotic voices. Their research team has AI models that can create (or more appropriately, voice out) realistic and context-aware speech across 32 languages.
Their main feature is their text-to-speech models (which we’ll talk about in detail later) but they also have models for changing your voice, sound effects, and audio isolation. They’re also giving API access to their models if you want to use it for content creation and integrate it with other apps.
Who is ElevenLabs For?
If you’re a content creator who’s tired of hearing your own voice or making faceless content, ElevenLabs should be a great alternative to hiring voice-over actors. It doesn’t matter if you’re on YouTube or TikTok, ElevenLabs won’t copyright strike you for using one of their voices.
But it’s not just for content creators. Game developers and indie filmmakers could also benefit from using ElevenLabs. Imagine being able to prototype character voices without hiring a single actor, or localizing your game into 32 languages without breaking the bank.
And if you’re an author or journalist, ElevenLabs could also turn your articles or books into professional-sounding audiobooks. Students can also use ElevenLabs to make their presentations and videos more interesting.
How Does ElevenLabs Work?
I’ll be honest — I haven’t really used any audio AI models before ElevenLabs, but I didn’t really struggle using their platform. This speaks a lot (no pun intended) to their user-friendliness.
Here’s how their text-to-speech software works:
You just need an input script — stuff that you want the robot to say. Literally anything under the sun. But since I don’t really have a script on hand, I’m going to use ChatGPT to write a short true crime story excerpt.
Now, I’m just going to copy that and paste it into ElevenLabs’ text-to-speech field.
All you need to do now is select a voice that you like most from their selection (there’s a lot and they also include a short note of what the voices are best for) and press “Generate Speech.”
Here’s a quick sample of what it sounds like.
Using their settings, you can also choose how stable you want the output to be (more stable sounds more robotic, less stable is more emotive but can sometimes sound glitchy), similarity, and style exaggeration. By tweaking some of these settings to add some exaggeration and emotion, I think I’ve hit the sweet spot in this version using these settings.
Oh, and when I say that you can do all sorts of things, I mean it. I’ve been trying their voices out all day and I’ve been having fun testing them in different scenarios. Here’s an ASMR sample.
Here’s a dungeon master introducing his new steampunk-themed world.
Or maybe I could interest you with an audiobook narration?
And like I said, it’s multilingual, so you can feed their model scripts in different languages (French, Italian, German, Filipino, Spanish) and you’ll still get a high-quality recording.
The more I use ElevenLabs, the more I like it. Don’t get me wrong — their outputs still have that “uncanny valley” feeling to more discerning ears, but for most people, I don’t think they’ll be able to differentiate it from regular speakers, especially if there’s ambient sounds and background music playing.
Features of ElevenLabs
Text-to-Speech
ElevenLabs’ Text-to-Speech feature isn’t your everyday TTS. With 32 languages and more than 40 voices to choose from, it creates eerily human-like performances. It’s not just reading text — it’s bringing it to life with tone and cadence. Perfect for turning blogs into podcasts or giving voice to your latest story.
One thing I noticed though is that it sometimes cuts the first word of the prompt. This seems to be a glitch since it only happens around 50% of the time in my experience.
Voice Changer
Want to sound like literally anyone else? ElevenLabs’ Voice Changer can do that for you. It’s like having a vocal shapeshifter at your fingertips. Content creators can voice multiple characters without hiring a cast. Amateur filmmakers can create an entire animated series only using this. There’s a lot of potential in using this feature.
Or so they say. So, I tested it. Here’s my own voice:
And here’s the output using one of ElevenLabs’ voices:
One thing I like about it is that it doesn’t just change your voice, it completely captures the context of what you’re saying and uses that to influence how the output sounds without straying away from how you said it.
Sound Effects Creator
Like I said, there’s a little bit of something for everyone with ElevenLabs. For sound designers, the Sound Effects feature creates custom effects in seconds. No more Wilhelm Scream and no more searching for hours on end for the right audio. You can now let ElevenLabs create it for you instead.
For each prompt, ElevenLabs will generate four different effects for you to choose from. This is my favorite using the prompt “creepy footsteps from afar.”
I will say though — out of everything ElevenLabs has to offer, this one left me the most unimpressed. It’s good for short prompts, but when I tried using a prompt with lots of context or something that’s already layered, it ended up ignoring some parts of my request. Here’s one I made for “the sound of waves on a crowded beach.”
Voice Isolator
If you don’t have the money for a professional mic setup, this one’s for you. ElevenLabs’ Voice Isolation feature removes the background noise from an audio input. As someone who’s been using Adobe Premier’s audio clean-up features for video editing, I can honestly say that ElevenLabs’ is not only easier to work with, but also a lot cleaner.
Once again, and I apologize for this, here’s my voice:
And here’s its isolated audio:
ElevenLabs’ Pricing
I’m going to answer the question that I’m sure is already on your mind: yes, there’s a free tier. It’s pretty limited — but it does give you 10,000 credits to work with. So, how does ElevenLabs’ credits work? I’m not too sure with their voice isolator and changer (for reference, a 6-second clip costs 93 credits) but for the other features:
- Text-to-speech: one credit per character.
- Sound effects: 320 credits per prompt.
For the more serious users, here’s an overview of what they offer per tier:
All basic features, audio dubbing, 3 custom cloned voices, license for commercial use |
|||
Everything in the previous tier + Audio Native for website content TTS, higher quality audio, additional credits |
|||
Everything in the previous tier + higher quality audio in API, usage analytics |
|||
Everything in the previous tier + priority support |
|||
Everything in the previous tier + more voice clones |
The Pros and Cons of ElevenLabs
|
|
So, What’s The Verdict?
As someone who’s new to the AI audio generation world, I had a lot of fun using ElevenLabs. It’s effective, quick, user-friendly, and affordable — what more can you ask for?
Turns out, the answer to that question is perfection.
Here’s the thing: if this was released four years ago, I’d have no issues with it. But we’re now living in an era where AI is an everyday thing, so much so that we’ve grown accustomed to it. We know what’s written by AI or not, what’s drawn by AI or not, and (thanks to thousands of hours of faceless TikTok content) what’s said by AI or not.
ElevenLabs’ audio quality is near human, but it’s just not there yet. And we, as humans, can tell.
So, should you use ElevenLabs? By all means. After all, like I said, it’s pretty amazing. But if you have access to resources that allows you to hire a person instead, that will always be the better option. No amount of AI advancement can replace human emotion and talent.