How To Make an AI Voice Model (That’s Actually Good)

· By Will Harken

How To Make an AI Voice Model (That’s Actually Good)

Want any voice in your content? AI’s got your back. Let me show you how to make awesome models. Use them for hit songs or just for fun memes.

The best part? For only $10 and some free tools, you can clone any voice. Seriously, it's that cheap to unlock such power!

General Tips

1) Plan for Voice-to-Voice. Start with a human singing, then convert it to your new AI vocal.

2) Quality over Quantity. Trim any parts with instrument bleed or garbled sound from your target vocals.


3) Go for Specialized Models. Make models for specific purposes, not general ones.

For instance, have a model for breathy falsettos and another for powerful belting. You’d want a different model for Freddie Mercury’s quiet moments vs. his near-shouts.

If you do make a general-purpose model, it will depend on the singer's ability. So, if a part’s breathy, the singer needs to be, too.

Conversely, a belty-screamy model will always do belty-screamy, no matter the input.

For general models, aim for similar vocal mixing (EQ/compression).

Getting Audio of the Target Voice

Setup & Vocal Isolation

Pick the tracks you’ll use. Acapella vocals are ideal but rare. In their absence, choose tracks with few instruments.

If changing one song's lyrics, solo vocals help the AI capture the sound accurately. This usually works.

Download MP3s using a YouTube MP3 downloader. Just Google it and pick one that works.

Tip: Use a Python script for project folders and automate MP3 downloads. Want to learn how? Join my AI Vocal Engineering Series.

I use Ultimate Vocal Remover’s UVR and MDX models for isolating vocals. I follow the tutorial below. Running two isolations can improve model fidelity by providing extra training data.

Clean Up & Export

Import the isolations into your DAW. Remove non-vocal sections, group vocals, and garbled parts.

Need more training data? You can cheat a bit like I do. Mix vocal isolations from similar-sounding singers to make a hybrid model.

Optional but helpful: Add Clear by Supertone ($70) to remove reverb and delay for clean vocals. Turn Ambience and Reverb knobs all the way down.

Make sure all vocals hover around the same volume. Use gain staging or a vocal rider. Look at your DAW’s volume meter to confirm it stays around -10db to 0, to avoid being too quiet for training.

Have a Limiter on your vocals to prevent clipping.

If using, you might need to export multiple audio files due to file size limits.

I usually export audio files as both MP3 and WAV since some tools support only one format.

An animated character wearing headphones

Training Your Model

Use or Jammable for training. With Weights, you can download the model for local conversions.

Upload at least 2-3 minutes of data as an MP3. WAV takes longer to upload and isn’t higher quality since you stripped the vocal from an MP3 anyway.

This amount usually works for lyric swaps but might be tricky for singing completely different songs with that voice. must be using some impressive cloud GPUs since they train models pretty fast—within an hour.

Then, you can run inference, creating the new AI vocal to edit into your song with your DAW.

With some setup, you can run training or inference locally. This is often faster and supports batch processing. Training locally can lead to higher quality due to extended training times. Want to learn how? Join my AI Vocal Engineering Series.

Related AI Vocal Insights

For deeper dives into AI vocals, check out these posts:

👉 Anyone Can 'Sing' Now: AI Voice Cloning

👉 The AI Vocal Mixing Technique No One's Talking About

👉 How AI is Revolutionizing Music

Level Up Your AI Vocals


Leave a comment