ManningBooks

Devtalk Sponsor

Quantization and Fast Inference (Manning)

Today’s AI models demand a lot of memory, compute, and server horsepower–which quickly translates into cost. Quantization and Fast Inference show you how you can optimize AI models without architectural redesigns or task-specific compression.

Vivek Kalyanarangan

If you’ve worked with modern AI models in production, you’ve probably run into the same wall: great results in development, followed by uncomfortable conversations about memory, latency, and cost. Quantization and Fast Inference is built for that exact point in the workflow.

This book walks through how to shrink and speed up models without redesigning them from scratch. It starts with the fundamentals—what quantization actually does to numbers and why it works—then moves into techniques you can apply right away. You’ll get hands-on with post-training quantization (PTQ), quantization-aware training (QAT), and the details that tend to cause trouble in practice, like activation outliers in LLMs or pressure on the KV cache.

What stands out is the full pipeline view. It doesn’t stop at “here’s how to quantize a model.” It covers how those choices affect deployment, runtime behavior, and tradeoffs you have to make along the way. There’s also coverage of newer low-precision formats like NF4 and FP4, which are increasingly common in real systems.

If you’re trying to run larger models on tighter budgets—or want to understand what’s happening under the hood when you compress them—this is a solid place to dig in while the book is still in Early Access.

Full details: https://www.manning.com/books/erlang-and-otp-in-action

Don’t forget you can get 45% off with your Devtalk discount! Just use the coupon code “devtalk.com” at checkout

View thread on forum

#manning #published-book #mlops #transformers #pytorch #modelcompression #efficient-ai #quantization #int8 #int4 #gpt0 #awq #llm-quantization #edge-ai #locak-llama

0 1 0

2026-05-05 10:02:27 UTC

Where Next?

View thread on forum

manning

published-book

mlops

transformers

pytorch

modelcompression

efficient-ai

quantization

int8

int4

gpt0

awq

llm-quantization

edge-ai

locak-llama

Home AI>Learning Resources

#manning #published-book #mlops #transformers #pytorch #modelcompression #efficient-ai #quantization #int8 #int4 #gpt0 #awq #llm-quantization #edge-ai #locak-llama

0 1 0

Last post

Popular Ai topics

AI>Learning Resources

Sutskever's List

Based on Ilya Sutskever’s famous “must-read” list of ~30 AI papers, this book walks you through the research that shaped today’s deep lea...

manning.com

#deep-learning

11 2726 7

2026-02-19 08:48:18 UTC

New

AI>Learning Resources

Learn AI Data Engineering in a Month of Lunches

Learn AI Data Engineering in a Month of Lunches is a fast, friendly guide to integrating large language models into your data workflows. ...

manning.com

#ai #manning #published-book #machine-learning #openai #mlops #data-engineering #llms #ai-for-data #prompt-engineering #generative-ai #book-learn-ai-data-engineering-in-a-month-of-lunches #ai-data-pipelines

6 1261 3

2025-09-25 18:04:03 UTC

New

AI>Learning Resources

Build a Multi-Agent System (from Scratch)

Erlang and OTP in Action teaches you the concepts of concurrent programming and the use of Erlang’s message-passing model. It walks you t...

manning.com

#published-book

1 291 0

2025-10-28 13:11:17 UTC

New

AI>Learning Resources

AI Agents in Action, Second Edition

AI agent technology is changing fast! This totally revised Second Edition of AI Agents in Action by Micheal Lanham guides you through the...

manning.com

#ai #manning #published-book #rag #generative-ai #ai-agents #agentic-ai

0 379 3

2025-11-24 09:38:44 UTC

New

AI>Learning Resources

The RLHF Book

After ChatGPT used RLHF to become production-ready, this foundational technique exploded in popularity. In The RLHF Book, AI expert Natha...

manning.com

#published-book

1 190 0

2025-11-18 10:44:21 UTC

New

AI>Learning Resources

Process Over Magic: Beyond Vibe Coding

Build a prototype in a weekend or a full product in a month or two. Untangle legacy systems, improve tests and documentation, and tackle ...

pragprog.com

#pragprog

6 386 4

2025-12-15 02:15:44 UTC

New

AI>Learning Resources

Hugging face in Action

Hugging Face in Action reveals how to get the absolute best out of everything Hugging Face, from accessing state-of-the-art models to bui...

manning.com

#ai #manning #published-book #computer-vision #langchain #llm #rag #transformers #hugging-face #autotrain #llamaindex #langflow #gradio

0 1 0

2026-02-02 14:25:12 UTC

New

AI>Learning Resources

Retrieval Augmented Generation, The Seminal Papers

Retrieval Augmented Generation, The Seminal Papers explores 12 foundational research papers that explain why RAG works, how it’s built, a...

manning.com

#manning #published-book #llm #rag #retrieval-augmented-generation

0 1 0

2026-03-06 12:57:26 UTC

New

AI>Learning Resources

Designing AI Systems

AI applications need much more than a connection to a model. To work well in the real world, they need memory, access to company knowledg...

manning.com

#manning #published-book #rag #system-design #generative-ai #ai-agents #ai-engineering #llmops #platform-engineering #ai-infrastructure

1 244 2

2026-04-29 13:42:28 UTC

New

AI>Learning Resources

Look Before You Leap

How can you be sure your next AI project is worthwhile before you build it? Look Before You Leap offers a repeatable go/kill/pivot decisi...

manning.com

#manning #published-book #generative-ai #agentic-ai #business-analytics #ai-problem-solving #ai-road-test #ai-business-case #ai-governance

0 2 0

2026-04-17 13:51:04 UTC

New

Other popular topics

General Dev>Hardware

Which keyboard do you have?

If it’s a mechanical keyboard, which switches do you have? Would you recommend it? Why? What will your next keyboard be? Pics always w...

#hardware /keyboards #sticky #mechanical-keyboards

144 9115 50

2021-01-07 23:58:36 UTC

New

Backend>Learning Resources

Testing Elixir

Write Elixir tests that you can be proud of. Dive into Elixir’s test philosophy and gain mastery over the terminology and concepts that u...

pragprog.com

#pragprog /elixir #published-book /book-testing-elixir

33 5004 8

2021-01-05 06:17:50 UTC

New

General Dev>Hardware

Poll: Which keyboard layout do you use?

poll poll Be sure to check out @Dusty’s article posted here: An Introduction to Alternative Keyboard Layouts It’s one of the best write-...

colemakmods.github.io

#polls /keyboards

10 6048 11

2020-10-31 23:12:33 UTC

New

General Dev>Hardware

BIIP MT3 Extended 2048 Custom Keycap Set (Drop)

This looks like a stunning keycap set :orange_heart: A LEGENDARY KEYBOARD LIVES ON When you bought an Apple Macintosh computer in the e...

/keyboards #apple #keycaps #mechanical-keyboards

14 6713 7

2020-12-12 19:58:26 UTC

New

Linux>Chat

RancherOS is in end of life

Oh just spent so much time on this to discover now that RancherOS is in end of life but Rancher is refusing to mark the Github repo as su...

#linux #rancheros

10 6358 6

2021-01-30 21:04:03 UTC

New

Backend>Learning Resources

Concurrent Data Processing in Elixir

Learn different ways of writing concurrent code in Elixir and increase your application's performance, without sacrificing scalability or...

pragprog.com

#pragprog /elixir #published-book /book-concurrent-data-processing-in-elixir

78 6059 24

2021-09-04 12:35:42 UTC

New

Frontend>Chat

Online Hand to eye coordination test

Was just curious to see if any were around, found this one: I got 51/100: Not sure if it was meant to buy I am sure at times the b...

#online-tools

4 4562 1

2022-03-27 10:53:45 UTC

New

Android>Questions

Unresolved Reference to android in build.gradle.kts – Beginner Issue

Hello, I’m a beginner in Android development and I’m facing an issue with my project setup. In my build.gradle.kts file, I have the foll...

#binding

0 7460 2

2024-12-09 21:07:33 UTC

New

AI>Chat

Post your DeepSeek results

Curious what kind of results others are getting, I think actually prefer the 7B model to the 32B model, not only is it faster but the qua...

/deepseek

15 4275 15

2025-03-06 23:29:12 UTC

New

Backend>Official News

Node.js v22.14.0 released!

Node.js v22.14.0 has been released. Link: Release 2025-02-11, Version 22.14.0 'Jod' (LTS), @aduh95 · nodejs/node · GitHub

github.com

/nodejs #official-news

0 4251 0

2025-02-11 15:30:14 UTC

New

AI>Learning Resources

Quantization and Fast Inference (Manning)

AI>Learning Resources

AI Agents for Offensive Security (Manning)

AI>Learning Resources

Look Before You Leap (Manning)

AI>Learning Resources

Designing AI Systems (Manning)

AI>Learning Resources

Evaluation and Alignment, The Seminal Papers (Manning)

AI>Learning Resources

Retrieval Augmented Generation, The Seminal Papers (Manning)

AI>Learning Resources

Build AI-Enhanced Web Apps (Manning)

AI>Learning Resources

Introduction to Generative AI, Second Edition (Manning)

AI>Learning Resources

CUDA for Deep Learning (Manning)

AI>Learning Resources

Hugging face in Action (Manning)

AI>Learning Resources

AI Learning Resources ❯

Latest on Devtalk

React Native v0.85.3 released!

Hybrid>Official News

Windows quality update: Progress we’ve made since March

Windows>In The News

A Physics Engine with Incremental Rollback

Game Dev>In The News

The Agent Harness Belongs Outside the Sandbox

AI>In The News

Q2 2026 Funding Announcement

General Dev>In The News

Alert-driven monitoring

General Dev>In The News

Node.js v26.0.0 released!

Backend>Official News

sRGB profile comparison

General Dev>In The News

Deploy Phoenix to a VPS in Minutes - ElixirCasts

Backend>Learning Resources

Thinking Elixir 302 - BEAM in Your Pocket

Backend>Blogs/Talks

Quantization and Fast Inference (Manning)

AI>Learning Resources

Breaking Up With WordPress After Two Decades | Yusuf Aytas

General Dev>In The News

Utah first state to hold websites liable for users who mask their location with VPNs — law goes into effect, designed to prevent bypassing age checks

General Dev>In The News

Humanoid Robot Actuators: The Complete Engineering Guide

General Dev>In The News

Quarkus 3.35.2 and 3.34.7 released!

Backend>Official News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

Quantization and Fast Inference (Manning)

ManningBooks

Quantization and Fast Inference (Manning)

Vivek Kalyanarangan

If you’ve worked with modern AI models in production, you’ve probably run into the same wall: great results in development, followed by uncomfortable conversations about memory, latency, and cost. Quantization and Fast Inference is built for that exact point in the workflow.

Where Next?

Popular Ai topics

Sutskever's List

Learn AI Data Engineering in a Month of Lunches

Build a Multi-Agent System (from Scratch)

AI Agents in Action, Second Edition

The RLHF Book

Process Over Magic: Beyond Vibe Coding

Hugging face in Action

Retrieval Augmented Generation, The Seminal Papers

Designing AI Systems

Look Before You Leap

Other popular topics

Which keyboard do you have?

Testing Elixir

Poll: Which keyboard layout do you use?

BIIP MT3 Extended 2048 Custom Keycap Set (Drop)

RancherOS is in end of life

Concurrent Data Processing in Elixir

Online Hand to eye coordination test

Unresolved Reference to android in build.gradle.kts – Beginner Issue

Post your DeepSeek results

Node.js v22.14.0 released!

Sponsor Spotlight

AI>Learning Resources

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta