Fl4m3Ph03n1x

Fl4m3Ph03n1x

What are the best text-to-speech ai generation tools that you can run locally?

Background

Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I am making.

I have limited knowledge on the topic of Neural/Baesyan networks and the area has moved a lot since the last time I studied it in detail, almost decade ago.

So I am admittedly a newcomer in regards to everything tts-ai related.

What I tried

At first I tried using online SaaS tools, like ElevenLabs, but the restrictions are massive and I simply cannot pay.

So I moved to local tools. I tried:

The first 3 failed because they are either no longer maintained, required an NVIDIA GPU (which I don’t have) or because there are simply not enough guides/information online on how to train models with the tools.

I am currently trying out piper, but I am having trouble finding voice datasets in the format they require for training (I only know of a German one, and I need it to be English).

What I need

I am looking for a tool that can create high quality male voiced sound, to read lectures. I don’t need it to be super efficient, but I do need it to work without NVIDIA GPUs. Given my novice status here, I would also appreciate a lot if there is a community that can help me with my questions when setting up or using the tool.

What are the tts-ai tools you would recommend that can fit these requirements?

Marked As Solved

Fl4m3Ph03n1x

Fl4m3Ph03n1x

I was fairly impressed and ended up using kokoro-tts: hexgrad/Kokoro-82M · Hugging Face

I can’t run it locally (no NVIDIA GPU) but Google Colabs works perfectly fine for my needs.
Should anyone have a strong enough NVIDIA GPU, then I would recommend kokoro.

Also Liked

Fl4m3Ph03n1x

Fl4m3Ph03n1x

Mozilla TTS has not been updated in 4 years (at least). The quality of the sound generated is rather poor, or at least I was not able to generate human passable sound using that tool.

tts --text "If you like to use TTS to try a new idea and like to share your experiments with the community, we urge you to use the following guideline for a better collaboration. (If you have an idea for better collaboration, let us know)" \
  --model_name tts_models/en/ljspeech/neural_hmm \
  --vocoder_name vocoder_models/en/ek1/wavegrad \
  --out_path test.wav

OpenVoice, as you very well mentioned, needs to become more mature before it can be used for the purpose I have in mind, as it shares the same poor quality that Mozilla TTS does.

I am currently playing with F5 which even has online tutorials: https://www.youtube.com/watch?v=ASFoTNpkM8o

It seems quite decent. I was able to run it on my local setup as well, which is a big plus. The problem I now face if twofold:

  • I need to find a voice database with male voices in English (have no idea where to find one)
  • I need to then train F5 or whatever tool I use with that voice

As a final step, I will also then need to learn and manipulate said tool to read paragraphs, instead of 1 liners.

Scarlet

Scarlet

You might want to check out VITS or XTTS from Mozilla/TTS, which can run on CPUs and supports training with custom datasets. Another option is OpenVoice (if it matures further) or Voxygen (which has some offline options). For datasets, LibriTTS and Common Voice (filtered for quality) are good English sources. You can also try RHVoice—not the best quality, but flexible for CPU usage.

For community support, the TTS subreddit, GitHub discussions for Mozilla/TTS, and OpenAI TTS communities might be helpful!

Popular Ai topics Top

trazorx
Is it possible to build a recommendation system for my online shop? I have online shop and i want to add a recommendation system on it a...
New
Fl4m3Ph03n1x
Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I...
New

Other popular topics Top

Devtalk
Reading something? Working on something? Planning something? Changing jobs even!? If you’re up for sharing, please let us know what you’...
1021 17084 374
New
ohm
Which, if any, games do you play? On what platform? I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...
New
dasdom
No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:
New
New
AstonJ
Inspired by this post from @Carter, which languages, frameworks or other tech or tools do you think is killing it right now? :upside_down...
New
Rainer
Not sure if following fits exactly this thread, or if we should have a hobby thread… For many years I’m designing and building model air...
New
Exadra37
Oh just spent so much time on this to discover now that RancherOS is in end of life but Rancher is refusing to mark the Github repo as su...
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
PragmaticBookshelf
Rails 7 completely redefines what it means to produce fantastic user experiences and provides a way to achieve all the benefits of single...
New
husaindevelop
Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...
New