CommunityNews

AudioX: Diffusion Transformer for Anything-to-Audio Generation

Audio and music generation have emerged as crucial tasks in many applications, yet existing approaches face significant limitations: they operate in isolation without unified capabilities across modalities, suffer from scarce high-quality, multi-modal training data, and struggle to effectively integrate diverse inputs. In this work, we propose AudioX, a unified Diffusion Transformer model for Anything-to-Audio and Music Generation. Unlike previous domain-specific models, AudioX can generate both general audio and music with high quality, while offering flexible natural language control and seamless processing of various modalities including text, video, image, music, and audio. Its key innovation is a multi-modal masked training strategy that masks inputs across modalities and forces the model to learn from masked inputs, yielding robust and unified cross-modal representations. To address data scarcity, we curate two comprehensive datasets: vggsound-caps with 190K audio captions based on the VGGSound dataset, and V2M-caps with 6 million music captions derived from the V2M dataset. Extensive experiments demonstrate that AudioX not only matches or outperforms state-of-the-art specialized models, but also offers remarkable versatility in handling diverse input modalities and generation tasks within a unified architecture.

Read in full here:

View thread on forum

#audio

0 432 0

2025-04-15 03:05:18 UTC

Where Next?

View thread on forum

audio

Home General Dev>In The News

#audio

0 432 0

Last post

Popular General Dev topics

General Dev>In The News

Fuzix: A Unix-ish operating system for small machines by Alan Cox

FUZIX FUZIX is a fusion of various elements from the assorted UZI forks and branches beaten together into some kind of semi-coherent pla...

fuzix.org

#unix

0 2180 0

2021-01-04 22:15:21 UTC

New

General Dev>In The News

The faster you unlearn OOP, the better for you and your software

Maybe it’s just my experience, but Object-Oriented Programming seems like a default, most common paradigm of software engineering. The on...

dpc.pw

#oop

36 2275 15

2021-06-21 01:31:51 UTC

New

General Dev>In The News

A reason why Mac speakers sound better and louder than most

Hector Martin (@marcan@treehouse.systems). Attached: 1 image For those wondering why the hell we need all this safety system stuff for...

social.treehouse.systems

0 1576 0

2023-02-26 14:48:41 UTC

New

General Dev>In The News

Stats: macOS system monitor in your menu bar

GitHub - exelban/stats: macOS system monitor in your menu bar. macOS system monitor in your menu bar. Contribute to exelban/stats develo...

github.com

#macos #github #monitor

1 537 1

2025-02-01 10:43:17 UTC

New

General Dev>In The News

Kokoro WebGPU: Real-time text-to-speech 100% locally in the browser

High-quality speech synthesis powered by Kokoro TTS

huggingface.co

#browser #webgpu

0 780 0

2025-02-07 23:19:31 UTC

New

General Dev>In The News

Distributed Systems Programming Has Stalled

Over the last decade, we’ve seen great advancements in distributed systems, but the way we program them has seen few fundamental improvem...

shadaj.me

#programming

4 981 2

2025-03-10 05:54:07 UTC

New

General Dev>In The News

Self-Hosting a Firefox Sync Server

After switching from Firefox to LibreWolf, I became interested in the idea of self-hosting my own Firefox Sync server. Although I had see...

blog.diego.dev

#hosting #firefox

0 1154 0

2025-03-09 03:43:04 UTC

New

General Dev>In The News

olmOCR – Open-Source OCR for Accurate Document Conversion

olmOCR is an open-source tool for converting PDFs to text with high accuracy, preserving reading order and supporting tables, equations, ...

olmocr.allenai.org

2 1314 1

2025-03-09 05:08:33 UTC

New

General Dev>In The News

Yoke is really cool

Infrastructure as code, but actually

xeiaso.net

2 707 1

2025-03-11 21:17:39 UTC

New

General Dev>In The News

Phlex for Rails Emails: Action Mailer without ERB

Rendering Action Mailer emails with Phlex components and layouts: Clean, Composable, and Completely Ruby - Blog post by Camillo Visini

camillovisini.com

/rails #emails

0 805 0

2025-03-11 18:50:49 UTC

New

Other popular topics

Backend>Learning Resources

Testing Elixir

Write Elixir tests that you can be proud of. Dive into Elixir’s test philosophy and gain mastery over the terminology and concepts that u...

pragprog.com

#pragprog /elixir #published-book /book-testing-elixir

33 5004 8

2021-01-05 06:17:50 UTC

New

General Dev>Code Editors

Poll: Which code editor do you use?

You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...

#community #polls /vim /emacs #code-editors /vscode #notepad /sublime-text #atom /textmate #codespaces #brackets /onivim #geany

121 5796 61

2025-09-05 00:52:19 UTC

New

General Dev>Code Editors

Dendron: a personal knowledge management tool on top of VSCode

/vscode #visual-studio-code

30 8077 9

2021-05-05 12:15:29 UTC

New

General Dev>Dev Chat

How fast do you type? Check your WPM here!

Do the test and post your score :nerd_face: :keyboard: If possible, please add info such as the keyboard you’re using, the layout (Qw...

typing-speed-test.aoeu.eu

/keyboards

82 7682 31

2021-07-10 05:52:20 UTC

New

Backend>Chat

How to install Ruby 3 with ASDF

In case anyone else is wondering why Ruby 3 doesn’t show when you do asdf list-all ruby :man_facepalming: do this first: asdf plugin-upd...

/ruby #asdf

11 5961 4

2021-02-02 08:02:13 UTC

New

Backend>Learning Resources

Concurrent Data Processing in Elixir

Learn different ways of writing concurrent code in Elixir and increase your application's performance, without sacrificing scalability or...

pragprog.com

#pragprog /elixir #published-book /book-concurrent-data-processing-in-elixir

78 6059 24

2021-09-04 12:35:42 UTC

New

General Dev>Dev Chat

The V Programming Language

The V Programming Language Simple language for building maintainable programs V is already mentioned couple of times in the forum, but I...

#programminguages /v

21 13874 7

2021-04-12 15:13:42 UTC

New

Backend>Learning Resources

Effective Haskell

Build efficient applications that exploit the unique benefits of a pure functional language, learning from an engineer who uses Haskell t...

pragprog.com

#pragprog /haskell #published-book /book-effective-haskell

15 10218 1

2022-02-16 10:09:51 UTC

New

Community>In The Spotlight

Spotlight: Jamis Buck (Author) Interview and AMA!

Author Spotlight Jamis Buck @jamis This month, we have the pleasure of spotlighting author Jamis Buck, who has written Mazes for Prog...

#author-spotlight /ruby /book-the-ray-tracer-challenge /book-mazes-for-programmers

21 6352 9

2022-09-28 18:21:15 UTC

New

Backend>Learning Resources

Server-Driven Web Apps with htmx

Build modern server-driven web applications using htmx. Whatever programming language you use, you’ll write less (and cleaner) code. ...

pragprog.com

#pragprog #web-development #published-book /book-server-driven-web-apps-with-htmx

6 5257 3

2024-06-08 22:37:09 UTC

New

General Dev>In The News

Project Cost Estimator — Know What Your Website Should Cost (2026)

General Dev>In The News

Oooo.audio - Looping plugin and standalone app for evolving tape-style textures

General Dev>In The News

eBay pays $46M to journalists it targeted in bizarre harassment campaign

General Dev>In The News

WhatsApp Opening:Threema Rejects Integration Not Just Because of Data Protection

General Dev>In The News

Why models write slop: the environments are too small

General Dev>In The News

Idempotency Fundamentals & API Guarantees

General Dev>In The News

Machines will never understand language

General Dev>In The News

CSV Is Never Just CSV

General Dev>In The News

Rohboter — Discover, Compare & Finance Commercial Robots

General Dev>In The News

Giving Money Away Can Be Harder than Making It

General Dev>In The News

General Dev In The News ❯

Latest on Devtalk

New Free-to-play game: Ro - Group theory puzzle game (like Rubik's Cube)

Game Dev>Chat

Amber v2.0.0-beta.2 and v2.0.0-beta.1 released!

Backend>Official News

'First tremors' of AI earthquake showing in digital revenue hit

AI>In The News

Project Cost Estimator — Know What Your Website Should Cost (2026)

General Dev>In The News

Oooo.audio - Looping plugin and standalone app for evolving tape-style textures

General Dev>In The News

AI for Smarties (Smarties)

AI>Learning Resources

Thoughts on the Teach Yourself Computer Science curriculum

General Dev>Learning Methods

Nova v0.15.2 released!

Backend>Official News

Commodification of Intelligence: Good, Bad, and Ugly Circular AI Deals

AI>In The News

The coolest use for the Vision Pro

macOS>In The News

‘Vibe coding’ is fun and easy, but there’s a major catch

AI>In The News

Symfony v8.0.16 released!

Backend>Official News

Node.js v24.18.1 released!

Backend>Official News

Ruby on Rails v8.1.3.1, v8.0.5.1 and v7.2.3.2 released!

Backend>Official News

Node.js v26.5.1 and v22.23.2 released!

Backend>Official News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

AudioX: Diffusion Transformer for Anything-to-Audio Generation

CommunityNews

AudioX: Diffusion Transformer for Anything-to-Audio Generation

Where Next?

Popular General Dev topics

Fuzix: A Unix-ish operating system for small machines by Alan Cox

The faster you unlearn OOP, the better for you and your software

A reason why Mac speakers sound better and louder than most

Stats: macOS system monitor in your menu bar

Kokoro WebGPU: Real-time text-to-speech 100% locally in the browser

Distributed Systems Programming Has Stalled

Self-Hosting a Firefox Sync Server

olmOCR – Open-Source OCR for Accurate Document Conversion

Yoke is really cool

Phlex for Rails Emails: Action Mailer without ERB

Other popular topics

Testing Elixir

Poll: Which code editor do you use?

Dendron: a personal knowledge management tool on top of VSCode

How fast do you type? Check your WPM here!

How to install Ruby 3 with ASDF

Concurrent Data Processing in Elixir

The V Programming Language

Effective Haskell

Spotlight: Jamis Buck (Author) Interview and AMA!

Server-Driven Web Apps with htmx

Sponsor Spotlight

General Dev>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta