ManningBooks

ManningBooks

Devtalk Sponsor

CUDA for Deep Learning (Manning)

CUDA for Deep Learning shows you how to work within the CUDA ecosystem, from your first kernel to implementing advanced LLM features like Flash Attention. You’ll learn to profile with Nsight Compute, identify bottlenecks, and understand why each optimization works.

Elliot Arledge

CUDA for Deep Learning focuses on using CUDA directly to get more out of NVIDIA GPUs, beyond what you can squeeze out of framework-level tweaks. The book starts at the fundamentals—writing your first kernels—and works its way up to performance-critical building blocks used in modern models, including techniques behind things like Flash Attention.

What sets this book apart is the emphasis on why an optimization works, not just how to apply it. You’ll learn how to profile with Nsight Compute, spot memory and compute bottlenecks, and reason about performance across multiple layers of abstraction. The goal is to build an intuition for CUDA that holds up even as hardware evolves.

This isn’t about replacing PyTorch or TensorFlow. It’s for cases where you need lower-level control, want to understand GPU behavior deeply, or are working on custom kernels, research code, or performance-sensitive production systems.


Don’t forget you can get 45% off with your Devtalk discount! Just use the coupon code “devtalk.com” at checkout :+1:

Where Next?

Popular Ai topics Top

ManningBooks
Build an AI Agent (From Scratch) is a step-by-step guide to creating a working AI agent, starting with the bare essentials and growing yo...
New
ManningBooks
AI Governance: Secure, privacy-preserving, ethical systems presents a structured playbook for safely harnessing the potential of Generati...
New
ManningBooks
Hugging Face in Action reveals how to get the absolute best out of everything Hugging Face, from accessing state-of-the-art models to bui...
New
ManningBooks
Introduction to Generative AI, Second Edition, guides you from your first eye-opening interaction with tools like ChatGPT to how AI tools...
New
ManningBooks
Build AI-Enhanced Web Apps guides you through AI development using only JavaScript and other common web dev skills–no Python or Machine L...
New
ManningBooks
Retrieval Augmented Generation, The Seminal Papers explores 12 foundational research papers that explain why RAG works, how it’s built, a...
New
ManningBooks
How can you be sure your next AI project is worthwhile before you build it? Look Before You Leap offers a repeatable go/kill/pivot decisi...
New
ManningBooks
Building LLM Applications with DSPy introduces DSPy best practices you can adopt to create reliable, production-ready systems through pro...
New
ManningBooks
In Designing AI Agents, you’ll learn how to establish agent architectures that manage costs and take governance seriously from day one. T...
New
ManningBooks
Crack Any Codebase with AI shows you how to use an efficient AI-driven process to quickly and accurately make sense of any software proje...
New

Other popular topics Top

Devtalk
Hello Devtalk World! Please let us know a little about who you are and where you’re from :nerd_face:
New
PragmaticBookshelf
Free and open source software is the default choice for the technologies that run our world, and it’s built and maintained by people like...
New
AstonJ
Or looking forward to? :nerd_face:
503 14742 279
New
New
dimitarvp
Small essay with thoughts on macOS vs. Linux: I know @Exadra37 is just waiting around the corner to scream at me “I TOLD YOU SO!!!” but I...
New
Exadra37
Oh just spent so much time on this to discover now that RancherOS is in end of life but Rancher is refusing to mark the Github repo as su...
New
foxtrottwist
A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...
New
PragmaticBookshelf
Build modern server-driven web applications using htmx. Whatever programming language you use, you’ll write less (and cleaner) code. ...
New
AstonJ
Curious what kind of results others are getting, I think actually prefer the 7B model to the 32B model, not only is it faster but the qua...
New
PragmaticBookshelf
Use advanced functional programming principles, practical Domain-Driven Design techniques, and production-ready Elixir code to build scal...
New