Fl4m3Ph03n1x

Fl4m3Ph03n1x

What are the best text-to-speech ai generation tools that you can run locally?

Background

Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I am making.

I have limited knowledge on the topic of Neural/Baesyan networks and the area has moved a lot since the last time I studied it in detail, almost decade ago.

So I am admittedly a newcomer in regards to everything tts-ai related.

What I tried

At first I tried using online SaaS tools, like ElevenLabs, but the restrictions are massive and I simply cannot pay.

So I moved to local tools. I tried:

The first 3 failed because they are either no longer maintained, required an NVIDIA GPU (which I don’t have) or because there are simply not enough guides/information online on how to train models with the tools.

I am currently trying out piper, but I am having trouble finding voice datasets in the format they require for training (I only know of a German one, and I need it to be English).

What I need

I am looking for a tool that can create high quality male voiced sound, to read lectures. I don’t need it to be super efficient, but I do need it to work without NVIDIA GPUs. Given my novice status here, I would also appreciate a lot if there is a community that can help me with my questions when setting up or using the tool.

What are the tts-ai tools you would recommend that can fit these requirements?

Marked As Solved

Fl4m3Ph03n1x

Fl4m3Ph03n1x

I was fairly impressed and ended up using kokoro-tts: hexgrad/Kokoro-82M · Hugging Face

I can’t run it locally (no NVIDIA GPU) but Google Colabs works perfectly fine for my needs.
Should anyone have a strong enough NVIDIA GPU, then I would recommend kokoro.

Also Liked

Fl4m3Ph03n1x

Fl4m3Ph03n1x

Mozilla TTS has not been updated in 4 years (at least). The quality of the sound generated is rather poor, or at least I was not able to generate human passable sound using that tool.

tts --text "If you like to use TTS to try a new idea and like to share your experiments with the community, we urge you to use the following guideline for a better collaboration. (If you have an idea for better collaboration, let us know)" \
  --model_name tts_models/en/ljspeech/neural_hmm \
  --vocoder_name vocoder_models/en/ek1/wavegrad \
  --out_path test.wav

OpenVoice, as you very well mentioned, needs to become more mature before it can be used for the purpose I have in mind, as it shares the same poor quality that Mozilla TTS does.

I am currently playing with F5 which even has online tutorials: https://www.youtube.com/watch?v=ASFoTNpkM8o

It seems quite decent. I was able to run it on my local setup as well, which is a big plus. The problem I now face if twofold:

  • I need to find a voice database with male voices in English (have no idea where to find one)
  • I need to then train F5 or whatever tool I use with that voice

As a final step, I will also then need to learn and manipulate said tool to read paragraphs, instead of 1 liners.

Scarlet

Scarlet

You might want to check out VITS or XTTS from Mozilla/TTS, which can run on CPUs and supports training with custom datasets. Another option is OpenVoice (if it matures further) or Voxygen (which has some offline options). For datasets, LibriTTS and Common Voice (filtered for quality) are good English sources. You can also try RHVoice—not the best quality, but flexible for CPU usage.

For community support, the TTS subreddit, GitHub discussions for Mozilla/TTS, and OpenAI TTS communities might be helpful!

Popular Ai topics Top

New
First poster: bot
Bitcoin uses more electricity annually than the whole of Argentina, analysis by Cambridge University suggests. “Mining” for the cryptocu...
New
First poster: Jsdr3398
Digital currency Bitcoin has risen to a new record high of more than $50,000 (£36,000). The so-called cryptocurrency, which was created ...
New
CommunityNews
How Blockchains Work Chances are, you know what Bitcoin is. After all, it’s valued at over $47,000 per Bitcoin right now. This post isn’t...
New
First poster: bot
At Philip Hughes farm, near the Berwyn mountain range, not far from the Snowdonia National Park, in Denbighshire, cattle chew the lush va...
New
First poster: bot
El Salvador has become the first country in the world to officially classify Bitcoin as legal currency. Congress approved President Nayi...
New
New
First poster: bot
Bitcoin mines require a lot of energy to power the computers inside. Greenidge Generation in New York has converted a former coal plant i...
New
First poster: bot
Fans of crypto-currencies say they are the future of money - but at what cost?
New
RobertRichards
Blockchain technology is set to play a transformative role in the future of online casino game development. By offering enhanced transpar...
New

Other popular topics Top

AstonJ
A thread that every forum needs! Simply post a link to a track on YouTube (or SoundCloud or Vimeo amongst others!) on a separate line an...
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
New
AstonJ
poll poll Be sure to check out @Dusty’s article posted here: An Introduction to Alternative Keyboard Layouts It’s one of the best write-...
New
AstonJ
Inspired by this post from @Carter, which languages, frameworks or other tech or tools do you think is killing it right now? :upside_down...
New
AstonJ
Do the test and post your score :nerd_face: :keyboard: If possible, please add info such as the keyboard you’re using, the layout (Qw...
New
Margaret
Hello content creators! Happy new year. What tech topics do you think will be the focus of 2021? My vote for one topic is ethics in tech...
New
Help
I am trying to crate a game for the Nintendo switch, I wanted to use Java as I am comfortable with that programming language. Can you use...
New
PragmaticBookshelf
A Ruby-Centric Chat with Noel Rappin @noelrappin Once you start noodling around with Ruby you quickly figure out, as Noel Rappi...
New
New