ManningBooks

ManningBooks

Devtalk Sponsor

Quantization and Fast Inference (Manning)

Today’s AI models demand a lot of memory, compute, and server horsepower–which quickly translates into cost. Quantization and Fast Inference show you how you can optimize AI models without architectural redesigns or task-specific compression.

Vivek Kalyanarangan

If you’ve worked with modern AI models in production, you’ve probably run into the same wall: great results in development, followed by uncomfortable conversations about memory, latency, and cost. Quantization and Fast Inference is built for that exact point in the workflow.

This book walks through how to shrink and speed up models without redesigning them from scratch. It starts with the fundamentals—what quantization actually does to numbers and why it works—then moves into techniques you can apply right away. You’ll get hands-on with post-training quantization (PTQ), quantization-aware training (QAT), and the details that tend to cause trouble in practice, like activation outliers in LLMs or pressure on the KV cache.

What stands out is the full pipeline view. It doesn’t stop at “here’s how to quantize a model.” It covers how those choices affect deployment, runtime behavior, and tradeoffs you have to make along the way. There’s also coverage of newer low-precision formats like NF4 and FP4, which are starting to show up more often in real systems.

If you’re trying to run larger models on tighter budgets—or just want to understand what’s happening under the hood when you compress them—this is a solid place to dig in while the book is still in Early Access.


Don’t forget you can get 45% off with your Devtalk discount! Just use the coupon code “devtalk.com” at checkout :+1:

Where Next?

Popular Ai topics Top

New
pragdave
Build robust LLM-powered apps, chatbots, and agents while mastering AI engineering principles that will help you outlast the tools and th...
New
ManningBooks
AI agent technology is changing fast! This totally revised Second Edition of AI Agents in Action by Micheal Lanham guides you through the...
New
ManningBooks
Dr Luca Belli, co-founder and former research lead for Twitter’s Machine Learning Ethics, Transparency and Accountability team, has been ...
New
ManningBooks
Hugging Face in Action reveals how to get the absolute best out of everything Hugging Face, from accessing state-of-the-art models to bui...
New
ManningBooks
Build AI-Enhanced Web Apps guides you through AI development using only JavaScript and other common web dev skills–no Python or Machine L...
New
ManningBooks
AI applications need much more than a connection to a model. To work well in the real world, they need memory, access to company knowledg...
New
ManningBooks
How can you be sure your next AI project is worthwhile before you build it? Look Before You Leap offers a repeatable go/kill/pivot decisi...
New
ManningBooks
AI is changing how offensive security workflows are designed, executed, and analyzed. AI Agents for Offensive Security: AI-powered attack...
New
ManningBooks
Today’s AI models demand a lot of memory, compute, and server horsepower–which quickly translates into cost. Quantization and Fast Infere...
New

Other popular topics Top

PragmaticBookshelf
Stop developing web apps with yesterday’s tools. Today, developers are increasingly adopting Clojure as a web-development platform. See f...
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
dasdom
No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:
New
AstonJ
You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...
New
AstonJ
We have a thread about the keyboards we have, but what about nice keyboards we come across that we want? If you have seen any that look n...
New
PragmaticBookshelf
Author Spotlight Jamis Buck @jamis This month, we have the pleasure of spotlighting author Jamis Buck, who has written Mazes for Prog...
New
New
New
New
New