ManningBooks

ManningBooks

Devtalk Sponsor

Evaluation and Alignment, The Seminal Papers (Manning)

Erlang and OTP in Action teaches you the concepts of concurrent programming and the use of Evaluation and Alignment: The Seminal Papers teaches you to think of evaluation as a design constraint. You’ll employ a “working backwards” methodology that begins with what your system must get right, which directs you to the appropriate evaluation approach. As you internalize the define > evaluate > analysis > align cycle, you’ll start making more informed tradeoffs, and expertly balancing helpfulness, safety, and brand voice in your models.

Hanchung Lee

Evaluation and Alignment: The Seminal Papers brings together a set of influential research papers and connects them to day-to-day engineering work. The focus isn’t just on metrics in isolation, but on how evaluation shapes the system you end up building.

The book traces how evaluation has evolved. It starts with straightforward approaches like text matching, moves through semantic similarity, and reaches more recent methods where models are used to judge other models. Seeing that progression helps explain why certain techniques break down and where newer ones fit.

One idea that runs throughout the book is treating evaluation as a design constraint. Instead of measuring quality after the fact, you begin by defining what the system must get right. That choice influences everything else—what data you collect, which metrics you use, and how you interpret results.

There’s also a strong emphasis on closing the loop. Evaluation feeds analysis, which leads to changes in prompts, data, or architecture. Those changes get tested again. Over time, this cycle becomes part of how you build and maintain AI systems, not something you bolt on at the end.

Some of the topics covered along the way:

  • choosing evaluation methods that match the behavior you care about

  • spotting failure modes that simple metrics tend to miss

  • working with subjective qualities like helpfulness, safety, and tone

  • using evaluation results to guide alignment decisions

If you’ve worked on LLM-based systems, you’ve probably run into the gap between a model that “looks good in a demo” and one that holds up in production. This book is aimed squarely at that gap.


Don’t forget you can get 45% off with your Devtalk discount! Just use the coupon code “devtalk.com” at checkout :+1:

Where Next?

Popular Ai topics Top

ManningBooks
Before deploying an AI model into production, you need to know more than just its accuracy. Will it be fast enough for your users? Will i...
New
ManningBooks
In Build a Reasoning Model (From Scratch), acclaimed ML research engineer Sebastian Raschka takes you inside the black box of reasoning-e...
New
New
New
pragdave
Build a prototype in a weekend or a full product in a month or two. Untangle legacy systems, improve tests and documentation, and tackle ...
New
ManningBooks
Rearchitecting LLMs: Structural techniques for efficient models turns research from the latest AI papers into production-ready practices ...
New
ManningBooks
Hugging Face in Action reveals how to get the absolute best out of everything Hugging Face, from accessing state-of-the-art models to bui...
New
ManningBooks
Retrieval Augmented Generation, The Seminal Papers explores 12 foundational research papers that explain why RAG works, how it’s built, a...
New
ManningBooks
AI applications need much more than a connection to a model. To work well in the real world, they need memory, access to company knowledg...
New
ManningBooks
Build Applications with Local AI Models on a Mac shows you exactly how to build and run a ChatGPT-style assistant entirely on your own Ma...
New

Other popular topics Top

PragmaticBookshelf
Machine learning can be intimidating, with its reliance on math and algorithms that most programmers don't encounter in their regular wor...
New
PragmaticBookshelf
Andy and Dave wrote this influential, classic book to help their clients create better software and rediscover the joy of coding. Almost ...
New
Exadra37
I am thinking in building or buy a desktop computer for programing, both professionally and on my free time, and my choice of OS is Linux...
New
AstonJ
You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...
New
PragmaticBookshelf
From finance to artificial intelligence, genetic algorithms are a powerful tool with a wide array of applications. But you don't need an ...
New
Exadra37
I am asking for any distro that only has the bare-bones to be able to get a shell in the server and then just install the packages as we ...
New
AstonJ
In case anyone else is wondering why Ruby 3 doesn’t show when you do asdf list-all ruby :man_facepalming: do this first: asdf plugin-upd...
New
AstonJ
Continuing the discussion from Thinking about learning Crystal, let’s discuss - I was wondering which languages don’t GC - maybe we can c...
New
rustkas
Intensively researching Erlang books and additional resources on it, I have found that the topic of using Regular Expressions is either c...
New
mindriot
Ok, well here are some thoughts and opinions on some of the ergonomic keyboards I have, I guess like mini review of each that I use enoug...
New