ManningBooks

ManningBooks

Devtalk Sponsor

Quantization and Fast Inference (Manning)

Today’s AI models demand a lot of memory, compute, and server horsepower–which quickly translates into cost. Quantization and Fast Inference show you how you can optimize AI models without architectural redesigns or task-specific compression.

Vivek Kalyanarangan

If you’ve worked with modern AI models in production, you’ve probably run into the same wall: great results in development, followed by uncomfortable conversations about memory, latency, and cost. Quantization and Fast Inference is built for that exact point in the workflow.

This book walks through how to shrink and speed up models without redesigning them from scratch. It starts with the fundamentals—what quantization actually does to numbers and why it works—then moves into techniques you can apply right away. You’ll get hands-on with post-training quantization (PTQ), quantization-aware training (QAT), and the details that tend to cause trouble in practice, like activation outliers in LLMs or pressure on the KV cache.

What stands out is the full pipeline view. It doesn’t stop at “here’s how to quantize a model.” It covers how those choices affect deployment, runtime behavior, and tradeoffs you have to make along the way. There’s also coverage of newer low-precision formats like NF4 and FP4, which are starting to show up more often in real systems.

If you’re trying to run larger models on tighter budgets—or just want to understand what’s happening under the hood when you compress them—this is a solid place to dig in while the book is still in Early Access.


Don’t forget you can get 45% off with your Devtalk discount! Just use the coupon code “devtalk.com” at checkout :+1:

Where Next?

Popular Ai topics Top

ManningBooks
Before deploying an AI model into production, you need to know more than just its accuracy. Will it be fast enough for your users? Will i...
New
ManningBooks
In Build a Reasoning Model (From Scratch), acclaimed ML research engineer Sebastian Raschka takes you inside the black box of reasoning-e...
New
New
ManningBooks
Grokking AI Algorithms, Second Edition introduces the most important AI algorithms using relatable illustrations, interesting examples, a...
New
pragdave
Build robust LLM-powered apps, chatbots, and agents while mastering AI engineering principles that will help you outlast the tools and th...
New
ManningBooks
In Build a DeepSeek Model (From Scratch) you’ll build your own DeepSeek clone from the ground up. First, you’ll quickly review LLM fundam...
New
ManningBooks
CUDA for Deep Learning shows you how to work within the CUDA ecosystem, from your first kernel to implementing advanced LLM features like...
New
ManningBooks
Build AI-Enhanced Web Apps guides you through AI development using only JavaScript and other common web dev skills–no Python or Machine L...
New
New
ManningBooks
AI tools like ChatGPT, Claude Code, and OpenClaw produce impressive results that can be shockingly human-like. But are they really thinki...
New

Other popular topics Top

siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
AstonJ
Curious to know which languages and frameworks you’re all thinking about learning next :upside_down_face: Perhaps if there’s enough peop...
New
PragmaticBookshelf
Tailwind CSS is an exciting new CSS framework that allows you to design your site by composing simple utility classes to create complex e...
New
PragmaticBookshelf
Build highly interactive applications without ever leaving Elixir, the way the experts do. Let LiveView take care of performance, scalabi...
New
PragmaticBookshelf
Create efficient, elegant software tests in pytest, Python's most powerful testing framework. Brian Okken @brianokken Edited by Kat...
New
PragmaticBookshelf
Use WebRTC to build web applications that stream media and data in real time directly from one user to another, all in the browser. ...
New
AstonJ
If you get Can't find emacs in your PATH when trying to install Doom Emacs on your Mac you… just… need to install Emacs first! :lol: bre...
New
PragmaticBookshelf
Explore the power of Ash Framework by modeling and building the domain for a real-world web application. Rebecca Le @sevenseacat and ...
New
RobertRichards
Hair Salon Games for Girls Fun Girls Hair Saloon game is mainly developed for kids. This game allows users to select virtual avatars to ...
New
mindriot
Ok, well here are some thoughts and opinions on some of the ergonomic keyboards I have, I guess like mini review of each that I use enoug...
New