CommunityNews

CommunityNews

AudioX: Diffusion Transformer for Anything-to-Audio Generation

Audio and music generation have emerged as crucial tasks in many applications, yet existing approaches face significant limitations: they operate in isolation without unified capabilities across modalities, suffer from scarce high-quality, multi-modal training data, and struggle to effectively integrate diverse inputs. In this work, we propose AudioX, a unified Diffusion Transformer model for Anything-to-Audio and Music Generation. Unlike previous domain-specific models, AudioX can generate both general audio and music with high quality, while offering flexible natural language control and seamless processing of various modalities including text, video, image, music, and audio. Its key innovation is a multi-modal masked training strategy that masks inputs across modalities and forces the model to learn from masked inputs, yielding robust and unified cross-modal representations. To address data scarcity, we curate two comprehensive datasets: vggsound-caps with 190K audio captions based on the VGGSound dataset, and V2M-caps with 6 million music captions derived from the V2M dataset. Extensive experiments demonstrate that AudioX not only matches or outperforms state-of-the-art specialized models, but also offers remarkable versatility in handling diverse input modalities and generation tasks within a unified architecture.

Read in full here:

Popular General Dev topics Top

First poster: bot
Typing is Hard Type Checking and Type Inference Common terms Completeness Soundness Decidability Hindley-Milner Type System Dependent t...
New
First poster: AstonJ
Pocketlang is a small (~3000 semicolons) and fast functional language written in C. It’s syntactically similar to Ruby and it can be lear...
New
First poster: bot
In recent months I use Go for the implementation of Proof of Concept in my leisure time, partly to study of Go programming language itsel...
/go
New
First poster: gflashner
Why Flutter is the most popular cross-platform mobile SDK. Running a development team for each mobile platform sucks up resources from o...
New
OvermindDL1
Yet another rust-made text editor, though I’m really liking the looks of how this one works!
New
First poster: bot
GitHub - nim-works/nimskull: An in development statically typed systems programming language; with sustainability at its core. We, the co...
New
First poster: bot
Large Language Models like ChatGPT say The Darnedest Things. The Errors They MakeWhy We Need to Document Them, and What We Have Decided ...
New
First poster: bot
Rewrite it in Rust by ridiculousfish · Pull Request #9512 · fish-shell/fish-shell. (Sorry for the meme; also this is obligatory.) I thi...
New
CommunityNews
Apple Patents Suggest Future AirPods Could Monitor Biosignals & Brain Activity - AppleMagazine. The US Patent & Trademark Office...
New
New

Other popular topics Top

Devtalk
Hello Devtalk World! Please let us know a little about who you are and where you’re from :nerd_face:
New
PragmaticBookshelf
A PragProg Hero’s Journey with Brian P. Hogan @bphogan Have you ever worried that your only legacy will be in the form of legacy...
New
brentjanderson
Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...
New
AstonJ
This looks like a stunning keycap set :orange_heart: A LEGENDARY KEYBOARD LIVES ON When you bought an Apple Macintosh computer in the e...
New
PragmaticBookshelf
Build highly interactive applications without ever leaving Elixir, the way the experts do. Let LiveView take care of performance, scalabi...
New
PragmaticBookshelf
Rails 7 completely redefines what it means to produce fantastic user experiences and provides a way to achieve all the benefits of single...
New
AstonJ
Was just curious to see if any were around, found this one: I got 51/100: Not sure if it was meant to buy I am sure at times the b...
New
New
husaindevelop
Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...
New
First poster: bot
zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...
New