Fl4m3Ph03n1x

Fl4m3Ph03n1x

What are the best text-to-speech ai generation tools that you can run locally?

Background

Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I am making.

I have limited knowledge on the topic of Neural/Baesyan networks and the area has moved a lot since the last time I studied it in detail, almost decade ago.

So I am admittedly a newcomer in regards to everything tts-ai related.

What I tried

At first I tried using online SaaS tools, like ElevenLabs, but the restrictions are massive and I simply cannot pay.

So I moved to local tools. I tried:

The first 3 failed because they are either no longer maintained, required an NVIDIA GPU (which I don’t have) or because there are simply not enough guides/information online on how to train models with the tools.

I am currently trying out piper, but I am having trouble finding voice datasets in the format they require for training (I only know of a German one, and I need it to be English).

What I need

I am looking for a tool that can create high quality male voiced sound, to read lectures. I don’t need it to be super efficient, but I do need it to work without NVIDIA GPUs. Given my novice status here, I would also appreciate a lot if there is a community that can help me with my questions when setting up or using the tool.

What are the tts-ai tools you would recommend that can fit these requirements?

Marked As Solved

Fl4m3Ph03n1x

Fl4m3Ph03n1x

I was fairly impressed and ended up using kokoro-tts: hexgrad/Kokoro-82M · Hugging Face

I can’t run it locally (no NVIDIA GPU) but Google Colabs works perfectly fine for my needs.
Should anyone have a strong enough NVIDIA GPU, then I would recommend kokoro.

Also Liked

Fl4m3Ph03n1x

Fl4m3Ph03n1x

Mozilla TTS has not been updated in 4 years (at least). The quality of the sound generated is rather poor, or at least I was not able to generate human passable sound using that tool.

tts --text "If you like to use TTS to try a new idea and like to share your experiments with the community, we urge you to use the following guideline for a better collaboration. (If you have an idea for better collaboration, let us know)" \
  --model_name tts_models/en/ljspeech/neural_hmm \
  --vocoder_name vocoder_models/en/ek1/wavegrad \
  --out_path test.wav

OpenVoice, as you very well mentioned, needs to become more mature before it can be used for the purpose I have in mind, as it shares the same poor quality that Mozilla TTS does.

I am currently playing with F5 which even has online tutorials: https://www.youtube.com/watch?v=ASFoTNpkM8o

It seems quite decent. I was able to run it on my local setup as well, which is a big plus. The problem I now face if twofold:

  • I need to find a voice database with male voices in English (have no idea where to find one)
  • I need to then train F5 or whatever tool I use with that voice

As a final step, I will also then need to learn and manipulate said tool to read paragraphs, instead of 1 liners.

Scarlet

Scarlet

You might want to check out VITS or XTTS from Mozilla/TTS, which can run on CPUs and supports training with custom datasets. Another option is OpenVoice (if it matures further) or Voxygen (which has some offline options). For datasets, LibriTTS and Common Voice (filtered for quality) are good English sources. You can also try RHVoice—not the best quality, but flexible for CPU usage.

For community support, the TTS subreddit, GitHub discussions for Mozilla/TTS, and OpenAI TTS communities might be helpful!

Popular Ai topics Top

ErlangSolutions
2020 has seen a significant uptick in the number of companies looking at adding Digital Wallets and tokenization to their offerings. On o...
New
New
First poster: bot
Bitcoin uses more electricity annually than the whole of Argentina, analysis by Cambridge University suggests. “Mining” for the cryptocu...
New
First poster: Jsdr3398
Digital currency Bitcoin has risen to a new record high of more than $50,000 (£36,000). The so-called cryptocurrency, which was created ...
New
CommunityNews
How Blockchains Work Chances are, you know what Bitcoin is. After all, it’s valued at over $47,000 per Bitcoin right now. This post isn’t...
New
First poster: bot
At Philip Hughes farm, near the Berwyn mountain range, not far from the Snowdonia National Park, in Denbighshire, cattle chew the lush va...
New
First poster: bot
El Salvador has become the first country in the world to officially classify Bitcoin as legal currency. Congress approved President Nayi...
New
First poster: bot
Bitcoin mines require a lot of energy to power the computers inside. Greenidge Generation in New York has converted a former coal plant i...
New
First poster: bot
Fans of crypto-currencies say they are the future of money - but at what cost?
New
RobertRichards
Blockchain technology is set to play a transformative role in the future of online casino game development. By offering enhanced transpar...
New

Other popular topics Top

ohm
Which, if any, games do you play? On what platform? I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...
New
axelson
I’ve been really enjoying obsidian.md: It is very snappy (even though it is based on Electron). I love that it is all local by defaul...
New
rustkas
Intensively researching Erlang books and additional resources on it, I have found that the topic of using Regular Expressions is either c...
New
foxtrottwist
A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...
New
PragmaticBookshelf
Author Spotlight James Stanier @jstanier James Stanier, author of Effective Remote Work , discusses how to rethink the office as we e...
New
AstonJ
If you get Can't find emacs in your PATH when trying to install Doom Emacs on your Mac you… just… need to install Emacs first! :lol: bre...
New
PragmaticBookshelf
Author Spotlight Jamis Buck @jamis This month, we have the pleasure of spotlighting author Jamis Buck, who has written Mazes for Prog...
New
First poster: bot
Large Language Models like ChatGPT say The Darnedest Things. The Errors They MakeWhy We Need to Document Them, and What We Have Decided ...
New
PragmaticBookshelf
Author Spotlight: Tammy Coron @Paradox927 Gaming, and writing games in particular, is about passion, vision, experience, and immersio...
New
PragmaticBookshelf
Author Spotlight: Sophie DeBenedetto @SophieDeBenedetto The days of the traditional request-response web application are long gone, b...
New