· By Will Harken
How To Make an AI Voice Model (That’s Actually Good)
Looking to give your content a fresh voice? I can help bring your ideas to life through vocal models for hit songs or fun social media snippets.
Believe it or not, you can clone any voice for just $10 with some free tools. It’s more doable than you think!
General Tips
1) Start with the Basics. Begin with a human vocal and transform it into your new AI voice.
2) Prioritize Quality. Cut out instrument noise and unclear sounds from your target vocals. Clearer is always better.
3) Go for Specialized Models. Craft models for specific styles rather than generic ones. If you need soft falsettos, that’s one approach. Strong belting is another. Just think about how you’d handle Freddie Mercury’s softer notes versus his powerhouse screams.
A one-size-fits-all model? That depends on the singer’s skill. A breathy section needs a breathy singer, while a belty model will consistently sound belty, no matter what.
For general-purpose models, keep the vocal mixing consistent—think about EQ and compression.
Capturing Your Target Voice
Setup & Vocal Isolation
First up, choose the tracks you’ll use. A cappella vocals can be hard to find, so if they’re not around, opt for tracks with less instrumentation.
When changing song lyrics, isolating vocals helps the AI get it just right. This method usually pays off.
Download MP3s using a YouTube MP3 downloader—there are plenty of options online.
Quick tip: Automate your MP3 downloads with a Python script. Interested? Join my AI Vocal Engineering Series for my insights.
I use Ultimate Vocal Remover’s UVR and MDX models to isolate vocals. Following tutorials and running two isolation processes typically improves model accuracy.
Clean Up & Export
Once you’ve isolated the audio, import it into your DAW. Remove sections without vocals and any fuzzy sounds.
Need more data? Mix isolations from similar-sounding singers to create a hybrid model—it's a neat trick.
Bonus suggestion: Use Clear by Supertone ($70) to get rid of reverb and delay—just dial the Ambience and Reverb knobs down.
Balance your vocal volumes. Use gain staging or a vocal rider. Keep an eye on your DAW’s meter. It should hang between -10db and 0, or it might end up too quiet for training.
Apply a limiter to your vocals to avoid clipping.
If you’re using Weights.gg, you might need to export multiple audio files because of size limits.
I typically export in both MP3 and WAV formats, just in case some tools only work with one.
Training Your Model
Train your model using Weights.gg or Jammable. With Weights, you can download the model and do conversions on your local setup.
Make sure you upload at least 2-3 minutes of MP3 data. WAV files tend to upload more slowly, plus you stripped the vocal from an MP3 anyway, so you won’t see a quality boost.
This duration is usually enough for lyric swaps. But if you want to sing entirely different songs with that voice, it might get tricky.
It looks like Weights.gg boasts impressive cloud GPUs, making model training pretty quick, often under an hour.
Then, run inference to generate the new AI vocal, ready for you to edit it into your track using your DAW.
With a little setup, you can train or run inference locally, which could speed things up and allow for batch processing. Training on your machine can boost quality with longer training times. Want to dig deeper? Join my AI Vocal Engineering Series for a detailed breakdown.
Related AI Vocal Insights
Don’t miss these posts for a closer look at AI vocals:
👉 Anyone Can 'Sing' Now: AI Voice Cloning
👉 The AI Vocal Mixing Technique No One's Talking About
👉 How AI is Revolutionizing Music
Level Up Your AI Vocals
If you have questions, check out our FAQ Page. You can also browse our reviews from happy customers. And don’t forget to grab your 10% discount when you place your order today!