CommunityNews

CommunityNews

AudioX: Diffusion Transformer for Anything-to-Audio Generation

Audio and music generation have emerged as crucial tasks in many applications, yet existing approaches face significant limitations: they operate in isolation without unified capabilities across modalities, suffer from scarce high-quality, multi-modal training data, and struggle to effectively integrate diverse inputs. In this work, we propose AudioX, a unified Diffusion Transformer model for Anything-to-Audio and Music Generation. Unlike previous domain-specific models, AudioX can generate both general audio and music with high quality, while offering flexible natural language control and seamless processing of various modalities including text, video, image, music, and audio. Its key innovation is a multi-modal masked training strategy that masks inputs across modalities and forces the model to learn from masked inputs, yielding robust and unified cross-modal representations. To address data scarcity, we curate two comprehensive datasets: vggsound-caps with 190K audio captions based on the VGGSound dataset, and V2M-caps with 6 million music captions derived from the V2M dataset. Extensive experiments demonstrate that AudioX not only matches or outperforms state-of-the-art specialized models, but also offers remarkable versatility in handling diverse input modalities and generation tasks within a unified architecture.

Read in full here:

Where Next?

Popular General Dev topics Top

First poster: bot
Site Fingerprinting google.com Yes youtube.com Yes Amazon.com Yes Yahoo.com Yes Zoom.us No Facebook.com Yes Reddit.com Ye...
New
First poster: dpritchett
It’s not what programming languages do, it’s what they shepherd you to. How many of you have listened, read or taken part in a discussio...
New
First poster: dyowee
Everyone seems to be striving for ‘clean’ code at the moment. You can’t read a blog post without the author telling you how clean their a...
New
First poster: mindriot
LG 28-inch 16:18 DualUp Monitor with Ergo Stand and USB Type-C™ (28MQ780-B) | LG USA. Shop LG 28MQ780-B on the official LG.com website ...
New
First poster: bot
Apple’s Tim Cook to take 50% pay hit after shareholder feedback. ‘Target compensation’ for CEO down from $99.4m in 2022 to an expected $...
New
CommunityNews
A Brief Review of the Minisforum V3 AMD Tablet. Update: I have created an awesome-minisforum-v3 GitHub repository to list information fo...
New
CommunityNews
After switching from Firefox to LibreWolf, I became interested in the idea of self-hosting my own Firefox Sync server. Although I had see...
New
First poster: joeb
The new frameworks will continue until morale improves.
/js
New
First poster: alvinkatojr
About accelerationism, NRx, and the intersection of technology, religion, and philosophy: an analysis of the essential ideas in the new A...
New
CommunityNews
Rendering Action Mailer emails with Phlex components and layouts: Clean, Composable, and Completely Ruby - Blog post by Camillo Visini
New

Other popular topics Top

malloryerik
Any thoughts on Svelte? Svelte is a radical new approach to building user interfaces. Whereas traditional frameworks like React and Vue...
New
wolf4earth
@AstonJ prompted me to open this topic after I mentioned in the lockdown thread how I started to do a lot more for my fitness. https://f...
New
AstonJ
I’ve been hearing quite a lot of comments relating to the sound of a keyboard, with one of the most desirable of these called ‘thock’, he...
New
AstonJ
Just done a fresh install of macOS Big Sur and on installing Erlang I am getting: asdf install erlang 23.1.2 Configure failed. checking ...
New
dimitarvp
Small essay with thoughts on macOS vs. Linux: I know @Exadra37 is just waiting around the corner to scream at me “I TOLD YOU SO!!!” but I...
New
AstonJ
In case anyone else is wondering why Ruby 3 doesn’t show when you do asdf list-all ruby :man_facepalming: do this first: asdf plugin-upd...
New
AstonJ
If you are experiencing Rails console using 100% CPU on your dev machine, then updating your development and test gems might fix the issu...
New
AstonJ
Continuing the discussion from Thinking about learning Crystal, let’s discuss - I was wondering which languages don’t GC - maybe we can c...
New
New
First poster: joeb
The File System Access API with Origin Private File System. WebKit supports new API that makes it possible for web apps to create, open,...
New