· By Will Harken
How To Make an AI Voice Model (That’s Actually Good)
Looking to add any voice to your content? AI's got you covered. Let's explore how you can create awesome vocal models for hit songs or just something fun for your social media.
The best part? You can clone any voice for just $10 and some free tools. It’s wild how accessible this power is!
General Tips
1) Plan for Voice-to-Voice. Start with a human vocal, then transform it into your new AI voice.
2) Quality over Quantity. Cut out sections with instrument noise or muddled sounds from your target vocals.
And don’t forget...
3) Go for Specialized Models. Create models for specific styles instead of generic ones. For example, you might need one for soft falsettos and another for strong belting. You’d approach Freddie Mercury’s subtle moments differently than his high-energy shouts.
If you try to make a one-size-fits-all model, it hinges on the singer’s capability. A breathy part needs a breathy singer, while a belty model will always lean belty, no matter what you input.
For general-purpose models, try to keep the vocal mixing similar (think EQ and compression).
Getting Your Target Voice
Setup & Vocal Isolation
First, select the tracks you'll use. While a cappella vocals are fantastic, they can be hard to find. If they’re unavailable, choose tracks that have fewer instruments in the mix.
When changing a song’s lyrics, using solo vocals really helps the AI pick up the sounds accurately. This method usually works well.
Grab MP3s using a YouTube MP3 downloader—just Google it for some choices.
Tip: Automate your MP3 downloads with a Python script. Want details? Join my AI Vocal Engineering Series for my insights.
I lean on Ultimate Vocal Remover’s UVR and MDX models for isolating vocals. I follow a tutorial, and running two isolation processes often boosts model accuracy.
Clean Up & Export
Bring the isolated audio into your DAW. Remove parts without vocals, as well as any fuzzy or unclear bits.
Need more data to work with? You can mix isolations from similar-sounding singers to create a hybrid model, which is a nice little hack.
Optional but handy: Use Clear by Supertone ($70) to wipe out reverb and delay. Just turn the Ambience and Reverb knobs all the way down.
Ensure your vocals are balanced in volume. Employ gain staging or a vocal rider. Keep an eye on your DAW’s meter—ideally, it should stay between -10db and 0, or it might be too quiet for training.
Put a limiter on your vocals to prevent clipping.
If you’re utilizing Weights.gg, exporting multiple audio files may be necessary due to size constraints.
I usually export in both MP3 and WAV formats since certain tools may only accept one.
Training Your Model
Train your model using Weights.gg or Jammable. With Weights, you can download the model to do conversions locally.
Be sure to upload at least 2-3 minutes of MP3 data. WAV tends to upload slower, plus you stripped the vocal from an MP3 anyway, so it doesn’t offer a quality bump.
This duration usually works for lyric swaps, but may be tricky for singing entirely different tunes with that voice.
It seems like Weights.gg employs impressive cloud GPUs because training models is comparatively speedy—often under an hour.
Then, you'll run inference to create the new AI vocal, ready to edit into your track with your DAW.
With a little setup, you can also train or run inference locally. This can be quicker and allows batch processing. Training on your machine can enhance quality thanks to longer training times. Want to explore more? Join my AI Vocal Engineering Series for the breakdown.
Related AI Vocal Insights
For more in-depth looks at AI vocals, check out these posts:
👉 Anyone Can 'Sing' Now: AI Voice Cloning
👉 The AI Vocal Mixing Technique No One's Talking About
👉 How AI is Revolutionizing Music