ManningBooks

Devtalk Sponsor

Quantization and Fast Inference (Manning)

Today’s AI models demand a lot of memory, compute, and server horsepower–which quickly translates into cost. Quantization and Fast Inference show you how you can optimize AI models without architectural redesigns or task-specific compression.

Vivek Kalyanarangan

If you’ve worked with modern AI models in production, you’ve probably run into the same wall: great results in development, followed by uncomfortable conversations about memory, latency, and cost. Quantization and Fast Inference is built for that exact point in the workflow.

This book walks through how to shrink and speed up models without redesigning them from scratch. It starts with the fundamentals—what quantization actually does to numbers and why it works—then moves into techniques you can apply right away. You’ll get hands-on with post-training quantization (PTQ), quantization-aware training (QAT), and the details that tend to cause trouble in practice, like activation outliers in LLMs or pressure on the KV cache.

What stands out is the full pipeline view. It doesn’t stop at “here’s how to quantize a model.” It covers how those choices affect deployment, runtime behavior, and tradeoffs you have to make along the way. There’s also coverage of newer low-precision formats like NF4 and FP4, which are increasingly common in real systems.

If you’re trying to run larger models on tighter budgets—or want to understand what’s happening under the hood when you compress them—this is a solid place to dig in while the book is still in Early Access.

Full details: https://www.manning.com/books/erlang-and-otp-in-action

Don’t forget you can get 45% off with your Devtalk discount! Just use the coupon code “devtalk.com” at checkout

View thread on forum

#manning #published-book #mlops #transformers #pytorch #modelcompression #efficient-ai #quantization #int8 #int4 #gpt0 #awq #llm-quantization #edge-ai #locak-llama

0 1 0

2026-05-05 10:02:27 UTC

Where Next?

View thread on forum

manning

published-book

mlops

transformers

pytorch

modelcompression

efficient-ai

quantization

int8

int4

gpt0

awq

llm-quantization

edge-ai

locak-llama

Home AI>Learning Resources

#manning #published-book #mlops #transformers #pytorch #modelcompression #efficient-ai #quantization #int8 #int4 #gpt0 #awq #llm-quantization #edge-ai #locak-llama

0 1 0

Last post

Popular Ai topics

AI>Learning Resources

Sutskever's List

Based on Ilya Sutskever’s famous “must-read” list of ~30 AI papers, this book walks you through the research that shaped today’s deep lea...

manning.com

#deep-learning

11 2726 7

2026-02-19 08:48:18 UTC

New

AI>Learning Resources

Learn AI Data Engineering in a Month of Lunches

Learn AI Data Engineering in a Month of Lunches is a fast, friendly guide to integrating large language models into your data workflows. ...

manning.com

#ai #manning #published-book #machine-learning #openai #mlops #data-engineering #llms #ai-for-data #prompt-engineering #generative-ai #book-learn-ai-data-engineering-in-a-month-of-lunches #ai-data-pipelines

6 1261 3

2025-09-25 18:04:03 UTC

New

AI>Learning Resources

Build an AI Agent (From Scratch)

Build an AI Agent (From Scratch) is a step-by-step guide to creating a working AI agent, starting with the bare essentials and growing yo...

manning.com

#ai #llm #mcp #rag #llms #ai-agents

1 631 0

2025-10-09 10:13:00 UTC

New

AI>Learning Resources

Deep Learning with Python, Third Edition

The bestselling book on Python deep learning, now covering generative AI, Keras 3, PyTorch, and JAX! François Chollet and Matthew ...

manning.com

#ai #manning /python #published-book #machine-learning #tensorflow #deep-learning #keras #francois-chollet #pytorch

5 474 3

2025-10-31 04:42:13 UTC

New

AI>Learning Resources

A Common-Sense Guide to AI Engineering

Build robust LLM-powered apps, chatbots, and agents while mastering AI engineering principles that will help you outlast the tools and th...

pragprog.com

#pragprog

17 1852 8

2025-10-31 17:38:39 UTC

New

AI>Learning Resources

Build a Multi-Agent System (from Scratch)

Erlang and OTP in Action teaches you the concepts of concurrent programming and the use of Erlang’s message-passing model. It walks you t...

manning.com

#published-book

1 291 0

2025-10-28 13:11:17 UTC

New

AI>Learning Resources

Process Over Magic: Beyond Vibe Coding

Build a prototype in a weekend or a full product in a month or two. Untangle legacy systems, improve tests and documentation, and tackle ...

pragprog.com

#pragprog

6 386 4

2025-12-15 02:15:44 UTC

New

AI>Learning Resources

Rearchitecting LLMs

Rearchitecting LLMs: Structural techniques for efficient models turns research from the latest AI papers into production-ready practices ...

manning.com

#slms

0 3 0

2026-01-28 14:59:20 UTC

New

AI>Learning Resources

CUDA for Deep Learning

CUDA for Deep Learning shows you how to work within the CUDA ecosystem, from your first kernel to implementing advanced LLM features like...

manning.com

#nvidia #cuda #deep-learning #transformers #pytorch #gpu-programming #aiperformance #flashattention #aiinfrastructure

0 1 0

2026-02-04 19:09:32 UTC

New

AI>Learning Resources

Quantization and Fast Inference

Today’s AI models demand a lot of memory, compute, and server horsepower–which quickly translates into cost. Quantization and Fast Infere...

manning.com

#manning #published-book #mlops #transformers #pytorch #modelcompression #efficient-ai #quantization #int8 #int4 #gpt0 #awq #llm-quantization #edge-ai #locak-llama

0 0 0

2026-05-05 10:02:27 UTC

New

Other popular topics

Science/Tech>Tech Chat

Games! Which do you play?

Which, if any, games do you play? On what platform? I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...

#games

246 6097 101

2024-08-22 11:09:29 UTC

New

Backend>Questions

Can someone explain the -t option/flag in docker run command?

I know that -t flag is used along with -i flag for getting an interactive shell. But I cannot digest what the man page for docker run com...

#docker

7 10261 2

2020-09-01 07:19:16 UTC

New

General Dev>Code Editors

Poll: Which code editor do you use?

You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...

#community #polls /vim /emacs #code-editors /vscode #notepad /sublime-text #atom /textmate #codespaces #brackets /onivim #geany

121 5796 61

2025-09-05 00:52:19 UTC

New

General Dev>Hardware

BIIP MT3 Extended 2048 Custom Keycap Set (Drop)

This looks like a stunning keycap set :orange_heart: A LEGENDARY KEYBOARD LIVES ON When you bought an Apple Macintosh computer in the e...

/keyboards #apple #keycaps #mechanical-keyboards

14 6713 7

2020-12-12 19:58:26 UTC

New

General Dev>Dev Chat

Languages Without Garbage Collection

Continuing the discussion from Thinking about learning Crystal, let’s discuss - I was wondering which languages don’t GC - maybe we can c...

#garbage-collection

21 5575 7

2021-05-06 05:54:58 UTC

New

Android>Questions

Clipboard readtext not working in android webview

Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...

#android #clipboard

1 5651 0

2022-09-27 18:52:03 UTC

New

Community>In The Spotlight

Spotlight: Peter Ullrich (Author) Interview and AMA!

Author Spotlight: Peter Ullrich @PJUllrich Data is at the core of every business, but it is useless if nobody can access and analyze ...

/elixir /phoenix /book-building-table-views-with-phoenix-liveview

72 4765 21

2023-10-17 17:07:59 UTC

New

Backend>Learning Resources

Machine Learning in Elixir

Leverage Elixir and the Nx ecosystem to build intelligent applications that solve real-world problems in computer vision, natural languag...

pragprog.com

#pragprog /elixir #published-book #machine-learning #nx /book-machine-learning-in-elixir

18 4615 7

2024-11-08 22:13:04 UTC

New

AI>Chat

Post your DeepSeek results

Curious what kind of results others are getting, I think actually prefer the 7B model to the 32B model, not only is it faster but the qua...

/deepseek

15 4275 15

2025-03-06 23:29:12 UTC

New

AI>Chat

Claude Code's entire source just leaked (512K lines) - anyone else digging through it?

Woke up to this today: Claude Code’s complete source code exposed via npm source map. Not a snippet. All 512,000 lines. 1,900 TypeScript ...

#claude

1 5903 4

2026-04-02 21:40:32 UTC

New

AI>Learning Resources

Quantization and Fast Inference (Manning)

AI>Learning Resources

AI Agents for Offensive Security (Manning)

AI>Learning Resources

Look Before You Leap (Manning)

AI>Learning Resources

Designing AI Systems (Manning)

AI>Learning Resources

Evaluation and Alignment, The Seminal Papers (Manning)

AI>Learning Resources

Retrieval Augmented Generation, The Seminal Papers (Manning)

AI>Learning Resources

Build AI-Enhanced Web Apps (Manning)

AI>Learning Resources

Introduction to Generative AI, Second Edition (Manning)

AI>Learning Resources

CUDA for Deep Learning (Manning)

AI>Learning Resources

Hugging face in Action (Manning)

AI>Learning Resources

AI Learning Resources ❯

Latest on Devtalk

Quantization and Fast Inference (Manning)

AI>Learning Resources

Breaking Up With WordPress After Two Decades | Yusuf Aytas

General Dev>In The News

Utah first state to hold websites liable for users who mask their location with VPNs — law goes into effect, designed to prevent bypassing age checks

General Dev>In The News

Humanoid Robot Actuators: The Complete Engineering Guide

General Dev>In The News

Quarkus 3.35.2 and 3.34.7 released!

Backend>Official News

Quarkus 3.33.1.1, 3.27.3.1 and 3.20.6.1 released!

Backend>Official News

Ash v3.24.6 released!

Backend>Official News

Online developer tools website

General Dev>Dev Chat

Tar Files Created on macOS Display Errors When Extracting on Linux

Linux>In The News

Using “underdrawings” for accurate text and numbers

AI>In The News

Ash v3.24.5 released!

Backend>Official News

Artemis II Photo Timeline

General Dev>In The News

Are "Vintage LLMs" the start of a new humanistic field?

AI>In The News

Why are there both TMP and TEMP environment variables, and which one is right? - The Old New Thing

General Dev>In The News

DeepSeek V4—almost on the frontier, a fraction of the price

AI>In The News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

Quantization and Fast Inference (Manning)

ManningBooks

Quantization and Fast Inference (Manning)

Vivek Kalyanarangan

If you’ve worked with modern AI models in production, you’ve probably run into the same wall: great results in development, followed by uncomfortable conversations about memory, latency, and cost. Quantization and Fast Inference is built for that exact point in the workflow.

Where Next?

Popular Ai topics

Sutskever's List

Learn AI Data Engineering in a Month of Lunches

Build an AI Agent (From Scratch)

Deep Learning with Python, Third Edition

A Common-Sense Guide to AI Engineering

Build a Multi-Agent System (from Scratch)

Process Over Magic: Beyond Vibe Coding

Rearchitecting LLMs

CUDA for Deep Learning

Quantization and Fast Inference

Other popular topics

Games! Which do you play?

Can someone explain the -t option/flag in docker run command?

Poll: Which code editor do you use?

BIIP MT3 Extended 2048 Custom Keycap Set (Drop)

Languages Without Garbage Collection

Clipboard readtext not working in android webview

Spotlight: Peter Ullrich (Author) Interview and AMA!

Machine Learning in Elixir

Post your DeepSeek results

Claude Code's entire source just leaked (512K lines) - anyone else digging through it?

Sponsor Spotlight

AI>Learning Resources

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta