CommunityNews

CommunityNews

AudioX: Diffusion Transformer for Anything-to-Audio Generation

Audio and music generation have emerged as crucial tasks in many applications, yet existing approaches face significant limitations: they operate in isolation without unified capabilities across modalities, suffer from scarce high-quality, multi-modal training data, and struggle to effectively integrate diverse inputs. In this work, we propose AudioX, a unified Diffusion Transformer model for Anything-to-Audio and Music Generation. Unlike previous domain-specific models, AudioX can generate both general audio and music with high quality, while offering flexible natural language control and seamless processing of various modalities including text, video, image, music, and audio. Its key innovation is a multi-modal masked training strategy that masks inputs across modalities and forces the model to learn from masked inputs, yielding robust and unified cross-modal representations. To address data scarcity, we curate two comprehensive datasets: vggsound-caps with 190K audio captions based on the VGGSound dataset, and V2M-caps with 6 million music captions derived from the V2M dataset. Extensive experiments demonstrate that AudioX not only matches or outperforms state-of-the-art specialized models, but also offers remarkable versatility in handling diverse input modalities and generation tasks within a unified architecture.

Read in full here:

Where Next?

Popular General Dev topics Top

Exadra37
As part of our continued goal of helping developers provide safer products for businesses and consumers, we here at McAfee Advanced Threa...
New
First poster: bot
Site Fingerprinting google.com Yes youtube.com Yes Amazon.com Yes Yahoo.com Yes Zoom.us No Facebook.com Yes Reddit.com Ye...
New
First poster: dwaynebradley
Maybe it’s just my experience, but Object-Oriented Programming seems like a default, most common paradigm of software engineering. The on...
New
First poster: iPaul
TOKYO (Kyodo) – Japan’s government plans to encourage firms to let their employees choose to work four days a week instead of five, aimin...
New
First poster: AstonJ
We engineered a wearable microphone jammer that is capable of disabling microphones in its user’s surroundings, including hidden micropho...
New
First poster: AstonJ
:tada: Launching Fig I am excited to announce that, as of today, Fig is generally available to the public for download. With our public ...
New
First poster: bot
openai-python/chatml.md at main · openai/openai-python. The OpenAI Python library provides convenient access to the OpenAI API from appl...
New
First poster: AstonJ
On the benefits of learning in public. Learning in public helps me grow as an engineer and seems to benefit others too. Here’s why I sho...
New
First poster: dyowee
olmOCR is an open-source tool for converting PDFs to text with high accuracy, preserving reading order and supporting tables, equations, ...
New
First poster: braycarla
In beginning the NVIDIA Blackwell Linux testing with the GeForce RTX 5090 compute performance, besides all the CUDA/OpenCL/OptiX benchmar...
New

Other popular topics Top

Devtalk
Reading something? Working on something? Planning something? Changing jobs even!? If you’re up for sharing, please let us know what you’...
1052 21915 398
New
New
New
rustkas
Intensively researching Erlang books and additional resources on it, I have found that the topic of using Regular Expressions is either c...
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
PragmaticBookshelf
Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...
New
husaindevelop
Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...
New
hilfordjames
There appears to have been an update that has changed the terminology for what has previously been known as the Taskbar Overflow - this h...
New
First poster: bot
zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...
New
AstonJ
This is a very quick guide, you just need to: Download LM Studio: https://lmstudio.ai/ Click on search Type DeepSeek, then select the o...
New