Fl4m3Ph03n1x

Fl4m3Ph03n1x

What are the best text-to-speech ai generation tools that you can run locally?

Background

Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I am making.

I have limited knowledge on the topic of Neural/Baesyan networks and the area has moved a lot since the last time I studied it in detail, almost decade ago.

So I am admittedly a newcomer in regards to everything tts-ai related.

What I tried

At first I tried using online SaaS tools, like ElevenLabs, but the restrictions are massive and I simply cannot pay.

So I moved to local tools. I tried:

The first 3 failed because they are either no longer maintained, required an NVIDIA GPU (which I don’t have) or because there are simply not enough guides/information online on how to train models with the tools.

I am currently trying out piper, but I am having trouble finding voice datasets in the format they require for training (I only know of a German one, and I need it to be English).

What I need

I am looking for a tool that can create high quality male voiced sound, to read lectures. I don’t need it to be super efficient, but I do need it to work without NVIDIA GPUs. Given my novice status here, I would also appreciate a lot if there is a community that can help me with my questions when setting up or using the tool.

What are the tts-ai tools you would recommend that can fit these requirements?

Marked As Solved

Fl4m3Ph03n1x

Fl4m3Ph03n1x

I was fairly impressed and ended up using kokoro-tts: hexgrad/Kokoro-82M · Hugging Face

I can’t run it locally (no NVIDIA GPU) but Google Colabs works perfectly fine for my needs.
Should anyone have a strong enough NVIDIA GPU, then I would recommend kokoro.

Also Liked

Fl4m3Ph03n1x

Fl4m3Ph03n1x

Mozilla TTS has not been updated in 4 years (at least). The quality of the sound generated is rather poor, or at least I was not able to generate human passable sound using that tool.

tts --text "If you like to use TTS to try a new idea and like to share your experiments with the community, we urge you to use the following guideline for a better collaboration. (If you have an idea for better collaboration, let us know)" \
  --model_name tts_models/en/ljspeech/neural_hmm \
  --vocoder_name vocoder_models/en/ek1/wavegrad \
  --out_path test.wav

OpenVoice, as you very well mentioned, needs to become more mature before it can be used for the purpose I have in mind, as it shares the same poor quality that Mozilla TTS does.

I am currently playing with F5 which even has online tutorials: https://www.youtube.com/watch?v=ASFoTNpkM8o

It seems quite decent. I was able to run it on my local setup as well, which is a big plus. The problem I now face if twofold:

  • I need to find a voice database with male voices in English (have no idea where to find one)
  • I need to then train F5 or whatever tool I use with that voice

As a final step, I will also then need to learn and manipulate said tool to read paragraphs, instead of 1 liners.

Scarlet

Scarlet

You might want to check out VITS or XTTS from Mozilla/TTS, which can run on CPUs and supports training with custom datasets. Another option is OpenVoice (if it matures further) or Voxygen (which has some offline options). For datasets, LibriTTS and Common Voice (filtered for quality) are good English sources. You can also try RHVoice—not the best quality, but flexible for CPU usage.

For community support, the TTS subreddit, GitHub discussions for Mozilla/TTS, and OpenAI TTS communities might be helpful!

Where Next?

Popular Ai topics Top

trazorx
Is it possible to build a recommendation system for my online shop? I have online shop and i want to add a recommendation system on it a...
New
Fl4m3Ph03n1x
Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I...
New
dPhong31415
I’ve been playing with the idea of building a generative AI app (images/video). Part of me really wants to use Elixir because I love how ...
New
AadityaJujagar
So I came across this privacy-first AI tool for desktop, Jan.ai, so I decided to use it for document analysis purposes with the use of a ...
New
j_0t
Hi everyone, I would like to ask you for some resources to learn how to compute HMM transition matrix and then start coding the solution...
New
Fl4m3Ph03n1x
Background A friend of mine is currently showing interest in changing careers. She is in the IT sector, but her experience focuses mostly...
New

Other popular topics Top

New
PragmaticBookshelf
Machine learning can be intimidating, with its reliance on math and algorithms that most programmers don't encounter in their regular wor...
New
PragmaticBookshelf
Brace yourself for a fun challenge: build a photorealistic 3D renderer from scratch! In just a couple of weeks, build a ray tracer that r...
New
DevotionGeo
I know that -t flag is used along with -i flag for getting an interactive shell. But I cannot digest what the man page for docker run com...
New
AstonJ
Biggest jackpot ever apparently! :upside_down_face: I don’t (usually) gamble/play the lottery, but working on a program to predict the...
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
AstonJ
If you get Can't find emacs in your PATH when trying to install Doom Emacs on your Mac you… just… need to install Emacs first! :lol: bre...
New
husaindevelop
Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...
New
First poster: bot
zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...
New
PragmaticBookshelf
Explore the power of Ash Framework by modeling and building the domain for a real-world web application. Rebecca Le @sevenseacat and ...
New