How To Make an AI Voice Model (That’s Actually Good)

April 21, 2024 · By Will Harken

How To Make an AI Voice Model (That’s Actually Good)

Want to breathe new life into your content? I’m here to help you turn your ideas into reality, whether it’s for hit songs or fun snippets for social media.

Believe it or not, you can clone any voice for only $10 using some free tools. It’s easier than you might think!

Tips to Get You Started

1) Start Simple. Grab a human vocal and morph it into your fresh AI voice.

2) Focus on Clarity. Kick out any instrument noise and unclear bits from your target vocals. The clearer, the better.

3) Get Specific. Build models for certain styles instead of going for vague options. Need soft falsettos? That's one angle. How about strong belting? That's another. Think about Freddie Mercury’s soft notes versus his scream-worthy anthems.

One-size-fits-all models? Well, that hinges on the singer's skills. A breathy moment requires a breathy singer, while a belty model stays true, regardless.

If you're crafting general-purpose models, keep your vocal mixing steady—EQ and compression matter.

How to Capture Your Dream Voice

Setup & Vocal Separation

First, choose the tracks you want. A cappella vocals can be tricky to find, so if they’re missing, go for tracks with lighter instrumentation.

When adjusting song lyrics, isolating those vocals can work wonders. It’s a method that often pays off.

Grab MP3s using a YouTube MP3 downloader—you've got plenty to choose from online.

Pro tip: You can automate your MP3 downloads with a Python script. Want to learn how? Join my AI Vocal Engineering Series for my tips.

I personally use Ultimate Vocal Remover’s UVR and MDX models for vocal isolation. Tuning into tutorials and going through two isolation processes usually ups my game.

Clean Up & Export

After isolating your audio, bring it into your DAW. Trim out parts without vocals and any unwanted fuzzy sounds.

Need more material? Mix isolations from similar-sounding singers to craft a hybrid model—it's a clever trick.

Bonus tip: Use Clear by Supertone ($70) to ditch reverb and delay—just turn down the Ambience and Reverb knobs.

Keep an eye on your vocal levels. Utilize gain staging or a vocal rider. Your DAW’s meter should hang between -10dB and 0; otherwise, it might end up too soft for training.

Use a limiter on your vocals to prevent clipping.

If you’re on Weights.gg, you may need to export multiple audio files due to size restrictions.

I usually export in both MP3 and WAV formats, just to cover all bases and ensure compatibility with different tools.

An animated character wearing headphones

Training Your Model

You can train your model with Weights.gg or Jammable. Weights lets you download the model and do conversions locally.

Be sure to upload at least 2-3 minutes of MP3 data. WAV files upload more slowly, and you’ve already stripped the vocals from an MP3, so there's no quality gain.

This length usually works for swapping lyrics. But if you're looking to belt out entirely different songs with that voice, it can get complicated.

It seems Weights.gg has some impressive cloud GPUs, so model training is pretty quick—often under an hour.

Then, run inference to create your new AI vocal, which you can edit right into your track with your DAW.

With a bit of setup, you can train or run inference locally, saving time and allowing for batch processing. Training on your machine can enhance quality with extended training times. Curious to know more? Join my AI Vocal Engineering Series for a deeper look.

Explore More on AI Vocals

Check out these posts to dive deeper into AI vocals:

👉 Anyone Can 'Sing' Now: AI Voice Cloning

👉 The AI Vocal Mixing Technique No One's Talking About

👉 How AI is Revolutionizing Music

Boost Your AI Vocals

If you have questions, head over to our FAQ Page. You can also check out our reviews from satisfied customers. Don’t forget to snag your 10% discount when you place your order today!

Older Post

Newer Post