ManningBooks

ManningBooks

Devtalk Sponsor

AI Model Evaluation (Manning)

Before deploying an AI model into production, you need to know more than just its accuracy. Will it be fast enough for your users? Will it scale under real-world traffic? Can you trust its decisions in critical scenarios? AI Model Evaluation (Manning Publications) gives you the practical tools and strategies to answer these questions—and more—so you can ship AI systems that actually work in the real world.

Leemay Nassery

Before deploying an AI model into production, you need to know more than just its accuracy. Will it be fast enough for your users? Will it scale under real-world traffic? Can you trust its decisions in critical scenarios? AI Model Evaluation (Manning Publications) gives you the practical tools and strategies to answer these questions—and more—so you can ship AI systems that actually work in the real world.

What you’ll learn in AI Model Evaluation:

  • Build diagnostic offline evaluations to uncover hidden model behaviors
  • Use shadow traffic to simulate production conditions safely
  • Design A/B tests to measure real business and product impact
  • Spot nuanced failures with human-in-the-loop feedback
  • Scale evaluations with LLMs as automated judges

Author Leemay Nassery (Spotify, Comcast, Dropbox, Etsy) shares real-world insights on what it really takes to prepare models for production. You’ll go beyond standard accuracy metrics to evaluate latency, user experience, and long-term impact on product goals.

Inside the book:
Each chapter explores a different evaluation method, from offline testing and A/B experiments to shadow deployments and qualitative analysis. Hands-on examples, including a movie recommendation engine, make it easy to apply these techniques to your own AI projects.


Don’t forget you can get 45% off with your Devtalk discount! Just use the coupon code “devtalk.com” at checkout :+1:

Most Liked

peterchancc

peterchancc

We started exploring AI apps with LLMs, so this book should be a good reference for evaluating the open-source LLMs that we plan to use.

ManningBooks

ManningBooks

Devtalk Sponsor

Definitely. Here are some questions to help your team that the book addresses clearly:

  1. What happens if your model is “accurate” offline but tanks your engagement metrics in production — how would you know why?
    (Follow-up: Do you have evaluation strategies beyond just accuracy or F1?)

  2. When was the last time your team measured the system latency impact of a new AI model before launching it?
    (And what if the model slowed down page load time by 200ms — would you catch it before it hits users?)

  3. If a model makes worse predictions for a specific user segment, do you catch that in your current evaluation process? Or are those failures only visible after a launch?

  4. Before you ship a model, do you know how it affects:

  • Feature latency?
  • Cold start performance?
  • Infrastructure cost at scale?
    (Or are you finding out during the fire drill after launch?)

Are you still using the same evaluation metrics your team used 3 years ago?
(What if the nature of your product or user behavior has changed — and your evaluations are now stale?)

Hope this helps.

Cheers

peterchancc

peterchancc

Thanks!

Where Next?

Popular Ai topics Top

ManningBooks
Grokking AI Algorithms, Second Edition introduces the most important AI algorithms using relatable illustrations, interesting examples, a...
New
ManningBooks
The bestselling book on Python deep learning, now covering generative AI, Keras 3, PyTorch, and JAX! François Chollet and Matthew ...
New
pragdave
Build robust LLM-powered apps, chatbots, and agents while mastering AI engineering principles that will help you outlast the tools and th...
New
ManningBooks
Erlang and OTP in Action teaches you the concepts of concurrent programming and the use of Erlang’s message-passing model. It walks you t...
New
ManningBooks
AI Governance: Secure, privacy-preserving, ethical systems presents a structured playbook for safely harnessing the potential of Generati...
New
ManningBooks
After ChatGPT used RLHF to become production-ready, this foundational technique exploded in popularity. In The RLHF Book, AI expert Natha...
New
ManningBooks
Hugging Face in Action reveals how to get the absolute best out of everything Hugging Face, from accessing state-of-the-art models to bui...
New
ManningBooks
Introduction to Generative AI, Second Edition, guides you from your first eye-opening interaction with tools like ChatGPT to how AI tools...
New
ManningBooks
Build AI-Enhanced Web Apps guides you through AI development using only JavaScript and other common web dev skills–no Python or Machine L...
New
ManningBooks
Retrieval Augmented Generation, The Seminal Papers explores 12 foundational research papers that explain why RAG works, how it’s built, a...
New

Other popular topics Top

Devtalk
Hello Devtalk World! Please let us know a little about who you are and where you’re from :nerd_face:
New
PragmaticBookshelf
Machine learning can be intimidating, with its reliance on math and algorithms that most programmers don't encounter in their regular wor...
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
dasdom
No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:
New
DevotionGeo
I know that -t flag is used along with -i flag for getting an interactive shell. But I cannot digest what the man page for docker run com...
New
Margaret
Hello everyone! This thread is to tell you about what authors from The Pragmatic Bookshelf are writing on Medium.
1147 29994 760
New
rustkas
Intensively researching Erlang books and additional resources on it, I have found that the topic of using Regular Expressions is either c...
New
PragmaticBookshelf
Build efficient applications that exploit the unique benefits of a pure functional language, learning from an engineer who uses Haskell t...
New
PragmaticBookshelf
Programming Ruby is the most complete book on Ruby, covering both the language itself and the standard library as well as commonly used t...
New
AstonJ
Curious what kind of results others are getting, I think actually prefer the 7B model to the 32B model, not only is it faster but the qua...
New