ManningBooks

Devtalk Sponsor

Quantization and Fast Inference (Manning)

Today’s AI models demand a lot of memory, compute, and server horsepower–which quickly translates into cost. Quantization and Fast Inference show you how you can optimize AI models without architectural redesigns or task-specific compression.

Vivek Kalyanarangan

If you’ve worked with modern AI models in production, you’ve probably run into the same wall: great results in development, followed by uncomfortable conversations about memory, latency, and cost. Quantization and Fast Inference is built for that exact point in the workflow.

This book walks through how to shrink and speed up models without redesigning them from scratch. It starts with the fundamentals—what quantization actually does to numbers and why it works—then moves into techniques you can apply right away. You’ll get hands-on with post-training quantization (PTQ), quantization-aware training (QAT), and the details that tend to cause trouble in practice, like activation outliers in LLMs or pressure on the KV cache.

What stands out is the full pipeline view. It doesn’t stop at “here’s how to quantize a model.” It covers how those choices affect deployment, runtime behavior, and tradeoffs you have to make along the way. There’s also coverage of newer low-precision formats like NF4 and FP4, which are starting to show up more often in real systems.

If you’re trying to run larger models on tighter budgets—or just want to understand what’s happening under the hood when you compress them—this is a solid place to dig in while the book is still in Early Access.

Full details: Quantization and Fast Inference - Vivek Kalyanarangan

Don’t forget you can get 45% off with your Devtalk discount! Just use the coupon code “devtalk.com” at checkout

View thread on forum

#manning #published-book #mlops #pytorch #efficient-ai #quantization #edge-ai #model-compression

0 1 0

2026-05-08 13:54:51 UTC

Where Next?

View thread on forum

manning

published-book

mlops

pytorch

efficient-ai

quantization

edge-ai

model-compression

Home AI>Learning Resources

#manning #published-book #mlops #pytorch #efficient-ai #quantization #edge-ai #model-compression

0 1 0

Last post

Popular Ai topics

AI>Learning Resources

AI Model Evaluation

Before deploying an AI model into production, you need to know more than just its accuracy. Will it be fast enough for your users? Will i...

manning.com

#manning #published-book #machine-learning #ai-model-evaluation #mlops #ai-evaluations #ml-observability #llm-evaluation #human-in-the-loop #responsible-ai

5 663 3

2025-08-29 03:38:08 UTC

New

AI>Learning Resources

Reinforcement Learning for Business

Reinforcement Learning for Business teaches the essentials of business optimization using reinforcement learning and AI models through re...

manning.com

#ai #manning #published-book #machine-learning #reinforcement-learning #rl #deep-rl #optimization #business-optimization #operations-research #supply-chain #dynamic-pricing #logistics #ai-for-business #rlhf #ppo #dqn #actorcritic

4 540 2

2025-09-16 09:19:51 UTC

New

AI>Learning Resources

Build an AI Agent (From Scratch)

Build an AI Agent (From Scratch) is a step-by-step guide to creating a working AI agent, starting with the bare essentials and growing yo...

manning.com

#ai #llm #mcp #rag #llms #ai-agents

1 631 0

2025-10-09 10:13:00 UTC

New

AI>Learning Resources

Grokking AI Algorithms, Second Edition

Grokking AI Algorithms, Second Edition introduces the most important AI algorithms using relatable illustrations, interesting examples, a...

manning.com

#ai #manning #published-book #algorithms #artificial-intelligence

1 336 0

2025-10-20 10:19:52 UTC

New

AI>Learning Resources

Build a DeepSeek Model (From Scratch)

In Build a DeepSeek Model (From Scratch) you’ll build your own DeepSeek clone from the ground up. First, you’ll quickly review LLM fundam...

manning.com

#published-book

6 617 3

2025-11-28 21:20:19 UTC

New

AI>Learning Resources

Rearchitecting LLMs

Rearchitecting LLMs: Structural techniques for efficient models turns research from the latest AI papers into production-ready practices ...

manning.com

#slms

0 3 0

2026-01-28 14:59:20 UTC

New

AI>Learning Resources

Build AI-Enhanced Web Apps

Build AI-Enhanced Web Apps guides you through AI development using only JavaScript and other common web dev skills–no Python or Machine L...

manning.com

#ai #manning #published-book #webdev #webapps #llms

1 39 0

2026-03-03 15:12:04 UTC

New

AI>Learning Resources

Building LLM Applications with DSPy

Building LLM Applications with DSPy introduces DSPy best practices you can adopt to create reliable, production-ready systems through pro...

manning.com

#dspy #prompt-programming #context-engineering #prompt-tuning #prompt-evaluation #llm-based-applications #llm-based-tools

0 1 0

2026-05-20 14:24:53 UTC

New

AI>Learning Resources

Build Applications with Local AI Models on a Mac

Build Applications with Local AI Models on a Mac shows you exactly how to build and run a ChatGPT-style assistant entirely on your own Ma...

manning.com

#macos /python #langchain #llm #ollama #llama #streamlit #genai #pythonforai #aichatbot #privacyai #local-ai #applesilicon #opensourceai

2 188 4

2026-06-22 09:41:59 UTC

New

AI>Learning Resources

Architecting for Autonomy

What changes when AI stops being just a tool you call, and starts becoming part of the way work is planned, delegated, monitored, and exe...

manning.com

#manning #published-book #agentic-ai #enterprise-architecture #enterprise-ai #autonomous-agent #autonomous-model

0 56 1

2026-07-16 08:00:07 UTC

New

Other popular topics

Backend>Learning Resources

Programming Machine Learning

Machine learning can be intimidating, with its reliance on math and algorithms that most programmers don't encounter in their regular wor...

pragprog.com

#pragprog #ai /python #published-book /book-programming-machine-learning #math #algorithms

6 5350 3

2023-10-03 15:08:13 UTC

New

Game Dev>Learning Resources

The Ray Tracer Challenge

Brace yourself for a fun challenge: build a photorealistic 3D renderer from scratch! In just a couple of weeks, build a ray tracer that r...

pragprog.com

#pragprog #published-book /book-the-ray-tracer-challenge #algorithms

3 6115 0

2020-09-22 14:26:56 UTC

New

Backend>Questions

Can someone explain the -t option/flag in docker run command?

I know that -t flag is used along with -i flag for getting an interactive shell. But I cannot digest what the man page for docker run com...

#docker

7 10261 2

2020-09-01 07:19:16 UTC

New

General Dev>Hardware

Custom keyboard keycaps

There’s a whole world of custom keycaps out there that I didn’t know existed! Check out all of our Keycaps threads here: https://forum....

#hardware /keyboards #keycaps #mechanical-keyboards

15 11086 19

2023-07-27 16:30:57 UTC

New

Backend>Learning Resources

Python Testing with pytest, Second Edition

Create efficient, elegant software tests in pytest, Python's most powerful testing framework. Brian Okken @brianokken Edited by Kat...

pragprog.com

#pragprog /python #published-book /book-python-testing-with-pytest-second-edition

16 7461 4

2021-06-25 16:57:39 UTC

New

General Dev>Dev Chat

Warp—The blazingly fast, Rust-based terminal

A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...

/rust #terminal

52 6785 22

2025-02-26 17:47:24 UTC

New

Android>Questions

Clipboard readtext not working in android webview

Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...

#android #clipboard

1 5651 0

2022-09-27 18:52:03 UTC

New

Community>In The Spotlight

Spotlight: Peter Ullrich (Author) Interview and AMA!

Author Spotlight: Peter Ullrich @PJUllrich Data is at the core of every business, but it is useless if nobody can access and analyze ...

/elixir /phoenix /book-building-table-views-with-phoenix-liveview

72 4765 21

2023-10-17 17:07:59 UTC

New

Game Dev>Questions

I want to learn how make a game, but where should I start?

I’m able to do the “artistic” part of game-development; character designing/modeling, music, environment modeling, etc. However, I don’t...

#game-dev

15 4965 9

2025-10-18 13:12:58 UTC

New

Backend>Official News

Node.js v22.14.0 released!

Node.js v22.14.0 has been released. Link: Release 2025-02-11, Version 22.14.0 'Jod' (LTS), @aduh95 · nodejs/node · GitHub

github.com

/nodejs #official-news

0 4251 0

2025-02-11 15:30:14 UTC

New

AI>Learning Resources

AI for Smarties (Smarties)

AI>Learning Resources

Architecting for Autonomy (Manning)

AI>Learning Resources

LLM Customization and Fine-Tuning (Manning)

AI>Learning Resources

Build Applications with Local AI Models on a Mac (Manning)

AI>Learning Resources

Context Engineering (Manning)

AI>Learning Resources

Crack Any Codebase with AI (Manning)

AI>Learning Resources

Designing AI Agents (Manning)

AI>Learning Resources

Building LLM Applications with DSPy (Manning)

AI>Learning Resources

Building Agentic Applications with CrewAI and MCP (Manning)

AI>Learning Resources

Machines that Think (Manning)

AI>Learning Resources

AI Learning Resources ❯

Latest on Devtalk

Open Hardware and Free Software: Teufel Mynd, a case study - FSFE

General Dev>In The News

The Age of Technology Companies

General Dev>In The News

Authorize, don’t authenticate

General Dev>In The News

Software for One

General Dev>In The News

I ♥ RSS – Andrew Shell's Weblog

General Dev>In The News

The Silicon Valley Founder Meat Grinder

General Dev>In The News

LLMs Can Infer Political Alignment from Online Conversations

AI>In The News

A Surveillance Treaty in Disguise: The Trouble With Canada's Quiet Decision to Sign the UN Cybercrime Convention - Michael Geist

General Dev>In The News

Preact 10.29.8 released!

Frontend>Official News

New Free-to-play game: Ro - Group theory puzzle game (like Rubik's Cube)

Game Dev>Chat

Amber v2.0.0-beta.2 and v2.0.0-beta.1 released!

Backend>Official News

'First tremors' of AI earthquake showing in digital revenue hit

AI>In The News

Project Cost Estimator — Know What Your Website Should Cost (2026)

General Dev>In The News

Oooo.audio - Looping plugin and standalone app for evolving tape-style textures

General Dev>In The News

AI for Smarties (Smarties)

AI>Learning Resources

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

Quantization and Fast Inference (Manning)

ManningBooks

Quantization and Fast Inference (Manning)

Vivek Kalyanarangan

Where Next?

Popular Ai topics

AI Model Evaluation

Reinforcement Learning for Business

Build an AI Agent (From Scratch)

Grokking AI Algorithms, Second Edition

Build a DeepSeek Model (From Scratch)

Rearchitecting LLMs

Build AI-Enhanced Web Apps

Building LLM Applications with DSPy

Build Applications with Local AI Models on a Mac

Architecting for Autonomy

Other popular topics

Programming Machine Learning

The Ray Tracer Challenge

Can someone explain the -t option/flag in docker run command?

Custom keyboard keycaps

Python Testing with pytest, Second Edition

Warp—The blazingly fast, Rust-based terminal

Clipboard readtext not working in android webview

Spotlight: Peter Ullrich (Author) Interview and AMA!

I want to learn how make a game, but where should I start?

Node.js v22.14.0 released!

Sponsor Spotlight

AI>Learning Resources

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta