ManningBooks

Devtalk Sponsor

AI Model Evaluation (Manning)

Before deploying an AI model into production, you need to know more than just its accuracy. Will it be fast enough for your users? Will it scale under real-world traffic? Can you trust its decisions in critical scenarios? AI Model Evaluation (Manning Publications) gives you the practical tools and strategies to answer these questions—and more—so you can ship AI systems that actually work in the real world.

Leemay Nassery

What you’ll learn in AI Model Evaluation:

Build diagnostic offline evaluations to uncover hidden model behaviors
Use shadow traffic to simulate production conditions safely
Design A/B tests to measure real business and product impact
Spot nuanced failures with human-in-the-loop feedback
Scale evaluations with LLMs as automated judges

Author Leemay Nassery (Spotify, Comcast, Dropbox, Etsy) shares real-world insights on what it really takes to prepare models for production. You’ll go beyond standard accuracy metrics to evaluate latency, user experience, and long-term impact on product goals.

Inside the book:
Each chapter explores a different evaluation method, from offline testing and A/B experiments to shadow deployments and qualitative analysis. Hands-on examples, including a movie recommendation engine, make it easy to apply these techniques to your own AI projects.

Full details: AI Model Evaluation - Leemay Nassery

Don’t forget you can get 45% off with your Devtalk discount! Just use the coupon code “devtalk.com” at checkout

3 comments

#manning #published-book #machine-learning #ai-model-evaluation #mlops #ai-evaluations #ml-observability #llm-evaluation #human-in-the-loop #responsible-ai

5 435 3

2025-08-29 03:38:08 UTC

Most Liked

peterchancc

We started exploring AI apps with LLMs, so this book should be a good reference for evaluating the open-source LLMs that we plan to use.

Post #2

ManningBooks

Devtalk Sponsor

Definitely. Here are some questions to help your team that the book addresses clearly:

What happens if your model is “accurate” offline but tanks your engagement metrics in production — how would you know why?
(Follow-up: Do you have evaluation strategies beyond just accuracy or F1?)
When was the last time your team measured the system latency impact of a new AI model before launching it?
(And what if the model slowed down page load time by 200ms — would you catch it before it hits users?)
If a model makes worse predictions for a specific user segment, do you catch that in your current evaluation process? Or are those failures only visible after a launch?
Before you ship a model, do you know how it affects:

Feature latency?
Cold start performance?
Infrastructure cost at scale?
(Or are you finding out during the fire drill after launch?)

Are you still using the same evaluation metrics your team used 3 years ago?
(What if the nature of your product or user behavior has changed — and your evaluations are now stale?)

Hope this helps.

Cheers

Post #3

peterchancc

Thanks!

Post #4

Where Next?

View thread on forum

manning

published-book

machine-learning

ai-model-evaluation

mlops

ai-evaluations

ml-observability

llm-evaluation

human-in-the-loop

responsible-ai

Home AI>Learning Resources

#manning #published-book #machine-learning #ai-model-evaluation #mlops #ai-evaluations #ml-observability #llm-evaluation #human-in-the-loop #responsible-ai

5 435 3

Last post

Popular Ai topics

AI>Learning Resources

AI Model Evaluation

Before deploying an AI model into production, you need to know more than just its accuracy. Will it be fast enough for your users? Will i...

manning.com

#manning #published-book #machine-learning #ai-model-evaluation #mlops #ai-evaluations #ml-observability #llm-evaluation #human-in-the-loop #responsible-ai

5 435 3

2025-08-29 03:38:08 UTC

New

AI>Learning Resources

Build a Reasoning Model (From Scratch)

In Build a Reasoning Model (From Scratch), acclaimed ML research engineer Sebastian Raschka takes you inside the black box of reasoning-e...

manning.com

#ai #manning /python #published-book #machine-learning #llm #largeuage-models #reasoning-model

12 1196 4

2025-09-08 14:25:47 UTC

New

AI>Learning Resources

Reinforcement Learning for Business

Reinforcement Learning for Business teaches the essentials of business optimization using reinforcement learning and AI models through re...

manning.com

#ai #manning #published-book #machine-learning #reinforcement-learning #rl #deep-rl #optimization #business-optimization #operations-research #supply-chain #dynamic-pricing #logistics #ai-for-business #rlhf #ppo #dqn #actorcritic

1 294 2

2025-09-16 09:19:51 UTC

New

AI>Learning Resources

Build an AI Agent (From Scratch)

Build an AI Agent (From Scratch) is a step-by-step guide to creating a working AI agent, starting with the bare essentials and growing yo...

manning.com

#ai #llm #mcp #rag #llms #ai-agents

0 1 0

2025-10-09 10:13:00 UTC

New

AI>Learning Resources

Grokking AI Algorithms, Second Edition

Grokking AI Algorithms, Second Edition introduces the most important AI algorithms using relatable illustrations, interesting examples, a...

manning.com

#ai #manning #published-book #algorithms #artificial-intelligence

0 0 0

2025-10-20 10:19:52 UTC

New

AI>Learning Resources

Build a Multi-Agent System (from Scratch)

Erlang and OTP in Action teaches you the concepts of concurrent programming and the use of Erlang’s message-passing model. It walks you t...

manning.com

#ai #manning #published-book #mcp #ai-agents #multi-agent-systems #a2a

0 1 0

2025-10-28 13:11:17 UTC

New

AI>Learning Resources

Build a DeepSeek Model (From Scratch)

In Build a DeepSeek Model (From Scratch) you’ll build your own DeepSeek clone from the ground up. First, you’ll quickly review LLM fundam...

manning.com

#ai #manning #published-book #llm /deepseek #llms #mixture-of-experts #latent-attention #multi-token-prediction #model-distillation #efficient-parallelization

6 472 3

2025-11-28 21:20:19 UTC

New

AI>Learning Resources

AI Governance

AI Governance: Secure, privacy-preserving, ethical systems presents a structured playbook for safely harnessing the potential of Generati...

manning.com

#ai #manning #published-book #llm #rag #generative-ai #ai-agents

0 0 0

2025-11-10 14:28:19 UTC

New

AI>Learning Resources

AI Agents in Action, Second Edition

AI agent technology is changing fast! This totally revised Second Edition of AI Agents in Action by Micheal Lanham guides you through the...

manning.com

#ai #manning #published-book #rag #generative-ai #ai-agents #agentic-ai

0 218 3

2025-11-24 09:38:44 UTC

New

AI>Learning Resources

The RLHF Book

After ChatGPT used RLHF to become production-ready, this foundational technique exploded in popularity. In The RLHF Book, AI expert Natha...

manning.com

#published-book

1 43 0

2025-11-18 10:44:21 UTC

New

Other popular topics

Backend>Learning Resources

Testing Elixir

Write Elixir tests that you can be proud of. Dive into Elixir’s test philosophy and gain mastery over the terminology and concepts that u...

pragprog.com

#pragprog /elixir #published-book /book-testing-elixir

33 3731 8

2021-01-05 06:17:50 UTC

New

Science/Tech>Tech Chat

Games! Which do you play?

Which, if any, games do you play? On what platform? I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...

#games

246 5882 101

2024-08-22 11:09:29 UTC

New

Linux>Questions

AMD or Intel for Programming with Linux as the OS?

I am thinking in building or buy a desktop computer for programing, both professionally and on my free time, and my choice of OS is Linux...

#desktop-computer

36 5740 10

2020-07-12 20:50:05 UTC

New

Game Dev>Learning Resources

Apple Game Frameworks and Technologies

Design and develop sophisticated 2D games that are as much fun to make as they are to play. From particle effects and pathfinding to soci...

pragprog.com

#pragprog #ios #game-dev #macos /swift #published-book #apple /book-apple-game-frameworks-and-technologies

30 6234 10

2021-04-22 16:51:02 UTC

New

General Dev>Hardware

BIIP MT3 Extended 2048 Custom Keycap Set (Drop)

This looks like a stunning keycap set :orange_heart: A LEGENDARY KEYBOARD LIVES ON When you bought an Apple Macintosh computer in the e...

/keyboards #apple #keycaps #mechanical-keyboards

14 6365 7

2020-12-12 19:58:26 UTC

New

Community>Journals

Programming Crystal Book Club

Crystal recently reached version 1. I had been following it for awhile but never got to really learn it. Most languages I picked up out o...

/crystal /book-programming-crystal #book-club

155 4581 65

2021-07-09 11:44:56 UTC

New

Backend>Learning Resources

Programming WebRTC

Use WebRTC to build web applications that stream media and data in real time directly from one user to another, all in the browser. ...

pragprog.com

#pragprog #published-book /js #webrtc /book-programming-webrtc

27 4743 6

2021-11-20 19:03:04 UTC

New

General Dev>Code Editors

Doom-Emacs: Can't find emacs in your PATH

If you get Can't find emacs in your PATH when trying to install Doom Emacs on your Mac you… just… need to install Emacs first! :lol: bre...

#macos /emacs #doom-emacs

4 5350 0

2022-02-04 00:32:03 UTC

New

Windows>Chat

Taskbar Overflow Menu (NOT System Tray Overflow)

There appears to have been an update that has changed the terminology for what has previously been known as the Taskbar Overflow - this h...

#taskbar-overflow-win-11

3 2694 2

2023-02-13 08:43:55 UTC

New

General Dev>In The News

X can’t stop spread of explicit, fake AI Taylor Swift images

Will Swifties’ war on AI fakes spark a deepfake porn reckoning?

arstechnica.com

/swift

0 7404 0

2024-01-26 05:47:12 UTC

New

AI>Learning Resources

The RLHF Book (Manning)

AI>Learning Resources

AI Agents in Action, Second Edition (Manning)

AI>Learning Resources

AI Governance (Manning)

AI>Learning Resources

Build a DeepSeek Model (From Scratch) (Manning)

AI>Learning Resources

Build a Multi-Agent System (from Scratch) (Manning)

AI>Learning Resources

A Common-Sense Guide to AI Engineering (PragProg)

AI>Learning Resources

Deep Learning with Python, Third Edition (Manning)

AI>Learning Resources

Grokking AI Algorithms, Second Edition (Manning)

AI>Learning Resources

Build an AI Agent (From Scratch) (Manning)

AI>Learning Resources

Reinforcement Learning for Business (Manning)

AI>Learning Resources

AI Learning Resources ❯

Latest on Devtalk

Ash v3.10.1 released!

Backend>Official News

Lifting a few of Ash’s DSL ideas (or stealing them outright) to build an Either DSL in Funx

Backend>Blogs/Talks

State of Clojure 2025 Survey

Backend>Official News

React Native v0.83.0-rc.3 released!

Hybrid>Official News

DeepSeek-v3.2: Pushing the frontier of open large language models

AI>In The News

Django security releases issued: 5.2.9, 5.1.15, and 4.2.27

Backend>Official News

Kotlin v2.3.0-RC2 released!

Backend>Official News

Thinking Elixir 281 - Planning for the Unexpected

Backend>Blogs/Talks

Elixir: Lazier Binary Decision Diagrams (BDDs) for set-theoretic types

Backend>Official News

Haskell: GHC 9.14.1-rc3 is now available

Backend>Official News

Intel could finally return to Apple computers in 2027

macOS>In The News

Sycophancy is the first LLM "dark pattern"

AI>In The News

Read Instagram chief Adam Mosseri's memo ordering staff to the office five days a week in 2026

General Dev>In The News

Lawmakers Want To Ban VPNs—And They Have No Idea What They’re Doing

General Dev>In The News

Rocky Linux 9.7 Available Now

Linux>Official News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

AI Model Evaluation (Manning)

ManningBooks

AI Model Evaluation (Manning)

Leemay Nassery

Most Liked

peterchancc

ManningBooks

peterchancc

Where Next?

Popular Ai topics

AI Model Evaluation

Build a Reasoning Model (From Scratch)

Reinforcement Learning for Business

Build an AI Agent (From Scratch)

Grokking AI Algorithms, Second Edition

Build a Multi-Agent System (from Scratch)

Build a DeepSeek Model (From Scratch)

AI Governance

AI Agents in Action, Second Edition

The RLHF Book

Other popular topics

Testing Elixir

Games! Which do you play?

AMD or Intel for Programming with Linux as the OS?

Apple Game Frameworks and Technologies

BIIP MT3 Extended 2048 Custom Keycap Set (Drop)

Programming Crystal Book Club

Programming WebRTC

Doom-Emacs: Can't find emacs in your PATH

Taskbar Overflow Menu (NOT System Tray Overflow)

X can’t stop spread of explicit, fake AI Taylor Swift images

Sponsor Spotlight

AI>Learning Resources

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta